MACHINE LEARNING FOR DIABETES PREDICTION USING RANDOM FOREST: A COMPREHENSIVE ANALYSIS WITH CLASS BALANCING TECHNIQUES
Abstract
Diabetes is a major global prevalence. Health problems that affect millions of people round the world and placing significant burdens on health systems. Primitive diabetes risk detection and prediction can be significantly improved. Patient outcomes through timely intervention and management Strategies. This article provides a comprehensive analysis Predicting diabetes using machine learning techniques. Special Focus on Random Forest Algorithm and Class Balance Methods. We have a strong pre-processing pipeline that handles missing values that encodes variable category and removing irrelevant identifiers from the dataset. Addressing the imbalance class, there are two types of synthetic techniques: synthetic Minority oversampling Techniques (SMOTE) and SMOTE with edited nearest Neighbours (SMOTE-ENN). Our Forest classifier achieves notable performance metrics, with original model 99.00% accuracy, accuracy, remember and F1-score. Comparative analysis shows that when SMOTE (98.50% accuracy) and SMOTE-ENN (95.00% accuracy) reduce overall performance metrics; they provide more balanced representation of class. The analysis of the characteristics shows that HbA1c، BMI، and age are the most important predictors diabetes in all models. The learning curve and the chaos the matrix demonstrates the strong classification capabilities of the models and clarifies the compromises between class balance and general equilibrium execution. This research has led to the expansion of the body understanding the applications of machine learning by healthcare a detailed methodology for the diagnosis of diabetes it can also be applied to clinical decision support systems. Improving early diagnosis and patient management strategies.
Index Terms diabetes prediction, feature importance, machine learning, medical diagnosis, Random Forest, classification, SMOTE, SMOTE-ENN, class imbalance













