DEVELOPING AI MODELS FOR URDU AND REGIONAL LANGUAGE PROCESSING: SENTIMENT ANALYSIS AND FAKE NEWS DETECTION IN PAKISTAN’S SOCIAL MEDIA

Dr. Mumtaz Ali Shah; Naila Nawaz; Nageena; Hina Gul; Muhammad Shahbaz

Authors

Dr. Mumtaz Ali Shah
Naila Nawaz
Nageena
Hina Gul
Muhammad Shahbaz

Keywords:

Artificial Intelligence; Natural Language Processing; Urdu Language Processing; Sentiment Analysis; Fake News Detection; Transformer Models; XLM-RoBERTa; Low-Resource Languages; Social Media Analytics; Multilingual NLP

Abstract

The rapid growth of social media platforms in Pakistan has intensified the spread of user-generated content in Urdu and regional languages, leading to significant challenges in sentiment interpretation and misinformation detection. This study aimed to develop and evaluate advanced Artificial Intelligence (AI) models for Urdu and regional language processing, focusing on sentiment analysis and fake news detection in a multilingual social media environment. A comparative experimental design was employed using traditional machine learning models, deep learning architectures, and transformer-based models, including BERT and XLM-RoBERTa. The dataset was collected from social media platforms such as Facebook, X (Twitter), and WhatsApp-forwarded content, followed by extensive preprocessing including normalization, tokenization, and code-mixing handling. Experimental results revealed that transformer-based models significantly outperformed traditional and deep learning approaches, with XLM-RoBERTa achieving the highest accuracy in both sentiment analysis (93.4%) and fake news detection (92.6%). Furthermore, a unified multi-task learning framework demonstrated improved efficiency by simultaneously performing both classification tasks with high accuracy. The study concludes that transformer-based AI models provide a robust solution for low-resource language processing in Pakistan’s digital ecosystem. The findings have important implications for digital governance, cybersecurity, and AI-driven misinformation detection systems in multilingual environments.