UNCOVERING ADVERSARIAL ATTACKS ON AI: HOW HACKERS FOOL AND MANIPULATE MACHINE LEARNING MODELS

Waqas Ali; Sohail Ahmad; Nisar Ahmed Memon

Authors

Waqas Ali
Sohail Ahmad
Nisar Ahmed Memon

Keywords:

Adversarial Attacks, Machine Learning Robustness, Deep Neural Networks, Adversarial Perturbations, Model Vulnerability, Adversarial Defense Mechanisms, AI Security

Abstract

Artificial intelligence and machine learning systems have become integral to critical domains including healthcare, finance, cybersecurity, and autonomous systems. However, these models are fundamentally vulnerable to adversarial attacks carefully engineered perturbations that deceive models into producing incorrect outputs with high confidence. This study investigated the mechanisms, typologies, and consequences of adversarial attacks on machine learning models, with particular attention to how malicious actors exploit model vulnerabilities. The researchers employed a mixed-method research design, integrating a systematic literature review with controlled experiments on benchmark datasets including MNIST, CIFAR-10, and ImageNet. Adversarial techniques such as the Fast Gradient Sign Method (FGSM), Carlini & Wagner (C&W) attacks, and Projected Gradient Descent (PGD) were applied under both white-box and black-box attack scenarios. The findings revealed significant degradation in model accuracy following adversarial perturbation, with some models experiencing accuracy drops exceeding 60%. The study identified key attack patterns, defense mechanisms, and gaps in current robustness frameworks. The results underscored the urgent need for robust, adversarially hardened AI systems and informed policy interventions to safeguard machine learning applications in high-stakes environments. This research contributed practical insights and a structured evaluation framework for improving AI security.