SENTIMENT ANALYSIS OF SOCIAL MEDIA CONTENT IN ROMAN URDU LANGUAGE USING DATA MINING TECHNIQUES
DOI:
https://doi.org/10.62019/r578bb85Abstract
Sentiment analysis (SA) is a kind of text mining that incorporates context. SA is usually used to explore and extract unique knowledge from the origin of information for businesses to grasp the social sentiment about a certain product or service of their brand while monitoring online discussions/chats. It indicates the difference (either positive or negative) in opinions or one viewpoint. This study centers around the SA of comments over Social Media Sites in Roman Urdu through the Machine Learning (ML) approach. This approach relies highly on the type of algorithms used and the peculiarity of the training data. We use the latest ML, deep learning algorithms, and feature engineering techniques: TF-IDF, Bag-of-Words, N-gram, Word2vec, and GloVe. We used an online social media data set and tagged it as a) positive, b) negative, or c) neutral by two distinct native and well-versed Urdu speakers with a 0.95 Cohen's Kappa score. After that, we run three sets of probes for subjectivity analysis, tertiary categorization, recognition, and binary classification. The test is carried out to evaluate the efficacy of the structure. Accuracy, f1-score, Precision, and Recall are used to assess the performance. The SVM has a higher level of accuracy than other ML and DL algorithms in the test.
Keywords: Roman Urdu, Sentiment Analysis, Data mining, Machine Learning.