A HYBRID EFFICIENTNET-B4 AND SWIN TRANSFORMER V2 FRAMEWORK FOR LARGE-SCALE MALWARE FAMILY RECOGNITION

Authors

  • Asif Khan
  • *Saddam Hussain Khan
  • Muhammad Saad Salman
  • Abdur Rahman
  • Mian Saeed Akbar

Abstract

Malware classification on a large scale is one of the most significant issues in modern cybersecurity research. The conversion of malware executables into grayscale PNG images for byte-level visualization allows the problem to be redefined as a computer vision problem solvable by deep learning networks. In this study, we introduce the EfficientNetB4-SwinV2 Ensemble method, which incorporates two parallel neural network backbones, EfficientNet-B4 (favouring local texture patterns) and Swin Transformer V2-Base (favouring global relation modeling), connected by a soft voting mechanism based on class probability prediction scores. We pre-trained both backbones on a highly deduplicated and harmonized dataset of 32,601 malware images distributed across 59 different malware families using the Malimg, Microsoft BIG 2015, and MaleVis datasets, resized to 256 × 256 pixels. The issue of severe class imbalance is addressed using an inverse frequency loss weight and WeightedRandomSampler, whose effectiveness is validated separately and jointly. EfficientNet-B4 obtains an accuracy of 98.40 ± 0.12% and macro AUC 0.9986, while Swin Transformer V2-Base reaches 98.55 ± 0.09% accuracy and macro AUC 0.9993, and the soft-voting ensemble obtains 98.67 ± 0.07% accuracy and macro AUC 0.9996, which obtains the best performance with the lowest Expected Calibration Error (ECE = 0.0094) after three independent training trials. The ablation study shows that soft voting significantly surpasses hard voting and learned stacking in the current experiment. Confusion pattern analysis indicates that there are three confusion clusters within each family group with visual similarity. Limitations, deployment considerations, and future work are elaborated upon in Section 6.

Keywords : Malware classification; Malware visualization; Deep learning; EfficientNet-B4; Swin Transformer V2; Ensemble learning; Soft voting; Calibration; Cybersecurity

Downloads

Published

2026-05-15

How to Cite

Asif Khan, *Saddam Hussain Khan, Muhammad Saad Salman, Abdur Rahman, & Mian Saeed Akbar. (2026). A HYBRID EFFICIENTNET-B4 AND SWIN TRANSFORMER V2 FRAMEWORK FOR LARGE-SCALE MALWARE FAMILY RECOGNITION. Spectrum of Engineering Sciences, 4(5), 1282–1304. Retrieved from https://thesesjournal.com/index.php/1/article/view/2826