INTERPRETABLE CORONARY HEART DISEASE PREDICTION USING RANDOM FOREST, XGBOOST, AND SHAP-BASED EXPLAINABILITY

Authors

  • Shafiq Hussain
  • Muhammad Usman Ahmad
  • Muhammad Arman
  • Farhan Majeed
  • Tauqir Ahmad
  • Tahreem Fatima
  • Aleena Jamil
  • Adeen Amjad
  • Waqar Ahmad
  • Arslan Ali Mansab
  • Muhammad Hamza
  • Muhammad Waqas

Keywords:

Coronary heart disease, machine learning, Random Forest, XGBoost, ensemble feature selection, explainable AI, SHAP, clinical prediction

Abstract

With heart disease continuing to rank as one of the most common killers across the globe, there is an increasing requirement to have prediction models that are both accurate and interpretable, that can be used to assist with the identification of potential patients for early diagnosis. This paper provides a comparison of the two ensemble-based machine learning models based on Random Forest and XGBoost using all clinical features from the UCI Heart Disease dataset (13 features). To determine which features are important to model prediction, the authors conducted a Chi-square test, ANOVA F-test, and a Mutual Information scoring of each feature; however, did not perform any feature reduction so that every feature that had clinical significance was retained. Using a stratified 75/25 train-test split, both Random Forest and XGBoost were trained, with XGBoost utilizing standardized inputs. The Random Forest classifier produced an accuracy of 78.95%, recall of 83.33%, and Area Under Curve (AUC) score of 0.8679, whereas XGBoost produced an accuracy of 80.26%, recall of 88.10% and AUC score of 0.8771. Using the SHAP method for explainability, the authors were able to identify that certain features, specifically chest pain type, maximum heart rate, ST-depression (oldpeak), exercise-induced angina, and thalassemia (thal), related features, greatly influenced predictions. Therefore, the use of ensemble tree-based models in conjunction with explainability techniques can assist in providing reliable and clinically interpretable tools for assessing a patient’s risk for heart disease.

Downloads

Published

2025-01-28

How to Cite

Shafiq Hussain, Muhammad Usman Ahmad, Muhammad Arman, Farhan Majeed, Tauqir Ahmad, Tahreem Fatima, Aleena Jamil, Adeen Amjad, Waqar Ahmad, Arslan Ali Mansab, Muhammad Hamza, & Muhammad Waqas. (2025). INTERPRETABLE CORONARY HEART DISEASE PREDICTION USING RANDOM FOREST, XGBOOST, AND SHAP-BASED EXPLAINABILITY. Spectrum of Engineering Sciences, 3(1), 573–584. Retrieved from https://thesesjournal.com/index.php/1/article/view/1717