A MULTIMODAL EXPLAINABLE AI FRAMEWORK FOR REAL-TIME DEPRESSION AND ANXIETY DETECTION
Keywords:
Depression detection, Anxiety analysis, Multimodal AI, Explainable AI, Digital phenotyping, Eye-tracking, Facial expression analysis, Voice biomarkers, Mental health monitoringAbstract
Depression and anxiety are among the leading causes of disability worldwide, yet current diagnostic methods rely mainly on self-reports and clinical interviews, which suffer from subjectivity and limited ecological validity. Advances in artificial intelligence (AI) and digital phenotyping offer novel opportunities to improve detection by leveraging multimodal behavioral and physiological signals. This paper presents the design of an explainable multimodal AI system that integrates three primary modalities: eye-tracking, facial expressions, and voice biomarkers, with future extensions to GPS mobility, typing dynamics, and gesture analysis. The proposed system activates three models at regular intervals of 2 hours, each recording data for 5 minutes to ensure unobtrusive, energy-efficient, and ecologically valid monitoring. The framework is designed to be lightweight, interpretable, and optimized for mobile deployment. Expected outcomes include improved accuracy in early detection of depression and anxiety, interpretable outputs for clinician trust, and a scalable framework extendable to additional modalities in the future.












