A FEDERATED LEARNING APPROACH FOR BIG DATA ANALYTICS IN ENHANCED INTRUSION DETECTION

Mobashirah Nasir; Muhammad Tahir Mehmood; Waheed Raza; Hifza Rani; Uzma Iqbal; Muhammad Yousif

Authors

Mobashirah Nasir
Muhammad Tahir Mehmood
Waheed Raza
Hifza Rani
Uzma Iqbal
Muhammad Yousif

Abstract

As the attacks become more complex and more numerous on the interconnected networks, traditional intrusion detection systems (IDS) failed to adapt because of their reliance on centralized architectures, with very large privacy, scalability and adaptability problems. The Multiple Data Analysis Using Federated Learning with ML Algorithms in Intrusion Detection presents a novel approach to developing an intelligent, privacy-preserving and scalable intrusion detection system (IDS), addressing these challenges. The goal of the research is to develop and evaluate a decentralized FL-based IDS with at least 95% detection accuracy to be deployed in different heterogeneous data sets, offering the data privacy and high functionality in the distributed environment of the real world. These are working on an FL-based prototype using CNN models, comparing the performance of the model with the traditional centralized ML models and evaluating the scalability of the model using different datasets such as TII-SSRC-23, NSL-KDD, and UNSW-NB15 by running multiple federated clients. In summary, study reveals that federated learning is a powerful solution to big data analytics for enhanced privacy-preserving intrusion detection systems. The need for intelligent cybersecurity solutions that can be applied to ensure privacy and data utility in an era where data sharing has been curtailed by legislation such as GDPR drives this research. The proposed FL system trains the local model along the distributed nodes, thereby decreasing the risk of data leakage and greatly enhancing the flexibility of the FL system to the different network conditions. This is in contrast to the traditional centralized system where raw data need to be aggregated and then used for training the model. Centralized and federated CNN models will then be constructed and trained on, and techniques to protect privacy, such as Differential Privacy and Secure Aggregation will be added to ensure that the data is not revealed during the round of communication. The models will be tested by using accuracy, precision, recall, F1-score and ROC-AU evaluation metrics and scalability will be tested by adding more federated clients to the real-life distributed scenario. The research will also try to visualize the results of performance using confusion matrices and ROC curve and make the performance results easy to understand and transient. The proposed system is expected to be more effective than traditional IDS systems in terms of the detection accuracy, the ability to generalize the action on various datasets and privacy protection without undermining its efficiency. The expected outcomes are the development of a fully functional prototype of an IDS based on FL, a comparative performance report between federated and centralized approach and documentation of data and code repository for future research and reproduction. Lastly, the objective of this research is to create a scalable, privacy-preserving framework that can be successfully used to detect and mitigate intrusions in the network at multiple organizations and data source across different data sources to help the cybersecurity community. The results will aid companies, researchers and policymakers in finding new ways to enhance the network defense mechanisms in the highly data-oriented digital world.