CONTEXT-AWARE AND EXPLAINABLE HYBRID CLASSIFICATION OF CROSS-SITE SCRIPTING ATTACKS USING MACHINE LEARNING

Ghulam Qasim; Tooba Shaikh; Farhan; Sarang Ahmed; Muhammad Tahir

Authors

Ghulam Qasim
Tooba Shaikh
Farhan
Sarang Ahmed
Muhammad Tahir

Keywords:

Cross-Site Scripting (Xss), Web Application Security, Context-Based Features, Hybrid Machine Learning, Explainable AI, Unsupervised Clustering

Abstract

Cross-site Scripting (XSS) will always be one of the most common and destructive vulnerabilities to modern web applications because it allows the attackers to inject and run malicious client-side scripts into the trusted execution environments. Whereas machine-learning-based approaches have significantly advanced the accuracy of XSS attacks detection, most of the existing solutions are based on superficial lexical representations, explicitly construed multiclass parametrization and black-box decision-making functions. Their restriction restricts their usefulness in practice scenarios, where attack payloads are often badly obfuscated, contextual performance is more important, and high-quality multi-class labels are not given. To alleviate these limitations, the current paper suggests a situational and interpretable hybrid machine-learning model to detect XSS attacks and to classify its behavior. The implemented solution is a combination of supervised Binary classification by using the Random Forest and an unsupervised K-Means clustering layer which discovers latent XSS attack patterns, without using multi-class labels. Context-sensitive features obtained based on the URLs, HTML structures as well as JavaScript behavioral traits are utilized to enhance robustness and interpretability. Experimental assessment on the Fawaz2015 XSS corpus indicates a high detection rate and at the same time it can afford meaningful discrimination based on behaviors between reflected, stored, and DOM-based XSS attacks. The results support the fact that the hybrid framework efficiently helps close the gap between the accurate detection and practice security context, which makes it applicable to the routine web-security usage.