SENTIMENT ANALYSIS OF SOCIAL MEDIA CONTENT IN ROMAN URDU LANGUAGE USING DATA MINING TECHNIQUES

Authors

  • Kamran Khadim* Department of Computer Science, COMSATS University Islamabad, Wah Campus -Pakistan Author
  • Zeeshan Riasat Department of Sociology, Government College University, Faisalabad, Pakistan Author
  • Farhan Amjad Department of Sociology, Government College University, Faisalabad, Pakistan Author
  • Muhammad Asif Department of Sociology, Government College University, Faisalabad, Pakistan Author
  • Maha Arush Author

DOI:

https://doi.org/10.62019/r578bb85

Abstract

Sentiment analysis (SA) is a kind of text mining that incorporates context. SA is usually used to explore and extract unique knowledge from the origin of information for businesses to grasp the social sentiment about a certain product or service of their brand while monitoring online discussions/chats. It indicates the difference (either positive or negative) in opinions or one viewpoint. This study centers around the SA of comments over Social Media Sites in Roman Urdu through the Machine Learning (ML) approach. This approach relies highly on the type of algorithms used and the peculiarity of the training data. We use the latest ML, deep learning algorithms, and feature engineering techniques: TF-IDF, Bag-of-Words, N-gram, Word2vec, and GloVe. We used an online social media data set and tagged it as a) positive, b) negative, or c) neutral by two distinct native and well-versed Urdu speakers with a 0.95 Cohen's Kappa score. After that, we run three sets of probes for subjectivity analysis, tertiary categorization, recognition, and binary classification. The test is carried out to evaluate the efficacy of the structure. Accuracy, f1-score, Precision, and Recall are used to assess the performance. The SVM has a higher level of accuracy than other ML and DL algorithms in the test.

Keywords: Roman Urdu, Sentiment Analysis, Data mining, Machine Learning.

Additional Files

Published

2024-12-31

Issue

Section

Articles