MACHINE LEARNING-BASED NETWORK TRAFFIC ANALYSIS FOR IDENTIFYING CYBER ATTACKS USING FLOW-LEVEL FEATURES
Abstract
Background: Due to the growing volume and heterogeneity of traffic generated by modern digital services, cyber attacks are increasingly affecting enterprise, academic, cloud, and government networks. The traditional signature-based intrusion detection systems are capable of detecting known attacks but it is not effective when there is a change in attacks or new malicious behaviours emerge. Purpose: This research article presents a machine learning-based network traffic analysis framework in identifying cyber attacks by use of flow-level features. This paper is concerned with binary classification where each network flow is classified as benign or malicious. Procedure: The proposed framework is based on the CICIDS2017 intrusion detection dataset that contains labelled benign and attack traffic, packet captures and flow-based CSV files. The methodology consists of the data cleaning, label encoding, feature selection, train-test splitting, supervised model training and performance evaluation. The choice of the Logistic Regression, Decision Tree, Random Forest, and XGBoost are made to offer the baseline and the ensemble-based classification performance. Evaluation: Accuracy, precision, recall, F1-score, false positive rate and confusion matrix is used to evaluate the models. These measures are chosen since accuracy in itself can be deceptive in unbalanced datasets of intrusion detection. Contribution: The article has contributed to a structured research design, mathematical formulation, and experimentation procedure that can be direct implemented in Python to identify cyber attacks. It also points out practical concerns, including imbalance in classes, false alarms, biased dataset and the discrepancy between the benchmark performance and the real deployment of the network.













