BAYESIAN NETWORK AND TREE-BASED CLASSIFIERS FOR CREDIT CARD FRAUD DETECTION: A COMPARATIVE STUDY ON RAW AND TRANSFORMED TRANSACTION DATA

Waleed Khan; Muqqadus Bibi; Sarang Ahmed; Muhammad Tahir

Authors

Waleed Khan
Muqqadus Bibi
Sarang Ahmed
Muhammad Tahir

Keywords:

Credit Card Fraud Detection; Bayesian Network; Naive Bayes; TAN; Logistic Regression; J48; WEKA; PCA; Feature Engineering; Data Preprocessing

Abstract

Credit card fraud detection remains a critical concern for financial institutions, which face billions of dollars in annual losses from fraudulent transactions. Traditional rule-based detection methods struggle to keep pace with the constantly evolving tactics used by fraudsters, motivating the adoption of Machine Learning (ML) approaches that can automatically learn discriminative patterns from transaction data. This study evaluates five classification algorithms first, K2, Naïve Bayesian, second, Tree-Augmented Naive Bayes (TAN), third, Logistic Regression, and fifth, J48 decision tree — for detecting fraudulent credit card transactions using the WEKA data mining tool with 10-fold cross-validation. Two experiments were conducted: the first applied the classifiers directly to a raw dummy transaction dataset, while the second applied the same classifiers after data transformation and Principal Component Analysis (PCA)-based dimensionality reduction. Results show a substantial performance gain after preprocessing: classifier accuracy rose from a range of 41.8%–84.0% on the raw dataset to 95.8%–100% on the transformed dataset, while false positive rates fell sharply across all models. Logistic Regression and J48 achieved the strongest overall performance on the transformed dataset, each reaching 100% accuracy, precision, recall, and F-measure. These findings confirm that rigorous data preprocessing and dimensionality reduction are decisive factors in building reliable, low-false-alarm credit card fraud detection systems.