A HYBRID STATISTICAL–MACHINE LEARNING MODEL FOR EARLY PREDICTION OF ALZHEIMER’S DISEASE USING CLINICAL BIOMARKERS
Keywords:
Alzheimer’s disease; Hybrid machine learning; Cerebrospinal fluid biomarkers; Structural MRI; Multimodal prediction; Neurodegeneration.Abstract
Alzheimer’s disease (AD) is a biologically driven neurodegenerative disorder characterized by amyloid-β accumulation, tau-mediated neuronal injury, and progressive structural brain atrophy that precede clinical dementia by years. Developing accurate and interpretable predictive models capable of integrating these pathological domains remains a central challenge in translational neuroscience. This study proposes a hybrid statistical–machine learning framework for early AD prediction using multimodal baseline data from 1,200 participants classified as cognitively normal, mild cognitive impairment, or Alzheimer’s disease. Predictors included demographic variables, APOE4 status, cognitive scales (MMSE, CDR-SB), cerebrospinal fluid biomarkers (Aβ42, total tau), plasma neurofilament light chain, and MRI-derived hippocampal and ventricular volumes. An L2-regularized logistic regression model quantified adjusted effects, while a stacked ensemble incorporating logistic regression, random forest, and support vector machine captured nonlinear interactions. Model performance was evaluated using stratified hold-out testing and cross-validation with macro-averaged metrics and ROC analysis. Biomarkers demonstrated coherent stage-wise gradients consistent with established AD pathophysiology, and the hybrid model achieved strong internal discrimination. Feature importance analysis revealed cognitive measures as dominant predictors, with fluid and imaging markers providing incremental value. These findings demonstrate the analytical feasibility of integrating interpretable statistics with ensemble learning, while highlighting the importance of external validation to establish clinical generalizability and early-stage predictive utility.













