MINT-NET: TOWARD INTELLIGIBLE NEURO-AI FOR MULTIMODAL BRAIN DISORDER CLASSIFICATION VIA CROSS-MODAL ATTENTION AND NEURAL REPRESENTATION LEARNING
Keywords:
Alzheimer's disease, multimodal deep learning, cross-modal attention, interpretable AI, neuroimaging, representation learning, ADNIAbstract
Classifying Accurately distinguishing Alzheimer’s Disease (AD) from normal cognitive aging remains a major challenge in neuroimaging research. Although deep learning models have achieved remarkable classification performance, their black-box nature continues to limit clinical trust and scientific interpretability. In high-stakes medical settings, predictive systems must not only provide accurate diagnoses, but also offer meaningful insight into the reasoning behind their decisions. To address this challenge, we propose MINT-Net (Multimodal Intelligible Neural Network), a framework designed to jointly optimize predictive accuracy and neurobiological interpretability. MINT-Net integrates structural MRI, FDG-PET, and clinical information through a cross-modal multi-head attention mechanism that models complex interactions across modalities rather than relying on simple feature concatenation. In addition, the framework learns a shared latent representation space using supervised contrastive learning and center loss, encouraging representations that are both discriminative and clinically meaningful [10], [11]. To improve interpretability, we introduce a hierarchical intelligibility module that combines gradient-based attention visualization with SHAP-based clinical feature attribution [12], [13]. This allows interpretation at multiple levels, ranging from individual predictions to population-level biomarker analysis. We evaluated MINT-Net on 515 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset using rigorous 5-fold cross-validation [31]. The proposed framework achieved 96.12% classification accuracy, 96.92% sensitivity, and an AUC-ROC of 0.984, outperforming six state-of-the-art baseline methods. Ablation analysis further demonstrated that cross-modal attention improved performance by 2.67% compared to direct feature concatenation, while supervised contrastive learning improved latent space separability by approximately 35%. Attention visualizations consistently highlighted the hippocampus, posterior cingulate cortex, and temporoparietal junction, regions strongly associated with AD pathology [1], [2]. Overall, MINT-Net provides not only accurate diagnostic predictions, but also interpretable neurobiological insight into disease-related representations. The framework represents a step toward bridging predictive Neuro-AI with scientifically grounded understanding of brain disorders neuroimaging.












