CROSS-LINGUAL CYBER-BULLYING: A SYSTEMATIC REVIEW OF DETECTION METHODS
Abstract
The growing use of social media has led to a significant rise in cyber-bullying and hate speech, creating serious social and mental health challenges worldwide.Consequently, numerous automated detection methods have been developed, but their performance varies widely across languages, datasets, and modeling strategies. This paper reviews existing literature on state-of-the-art approaches to cyber-bullying and hate speech detection, with particular emphasis on multilingual and low-resource language settings such as Roman Urdu and English. The reviewed studies are analyzed across several dimensions, including dataset characteristics, preprocessing methods, and feature engineering techniques, followed by an evaluation of machine learning, deep learning, and transformer- based models. The findings indicate that traditional machine learning models provide a strong baseline but struggle with contextual and intent-aware detection. Deep learning approaches achieve improved performance, yet these approaches are still limited by data scarcity and dependence on binary classification. While transformer-based models demonstrate state-of-the-art performance, they struggle with emoji-aware processing, slang interpretation, and differentiating playful teasing from harmful cyber-bullying. By identifying key research gaps, this review underscores the importance of multilingual, emoji-aware, and intent-sensitive cyber-bullying detection frameworks, supporting further research and practical moderation systems.













