SCAFFOLD-AWARE MACHINE LEARNING–DOCKING PIPELINE FOR TYK2 INHIBITOR DISCOVERY WITH CALIBRATED PRIORITIZATION OF 32 ACTIVES INCLUDING DEUCRAVACITINIB
Keywords:
TYK2 inhibitors, Deucravacitinib, Scaffold-aware machine learning, Molecular docking, Bioactivity prediction, CheminformaticsAbstract
The FDA's Deucravacitinib inhibitor is a well-established drug development target for tyrosine kinase II from an immunological perspective. However, noisy bioactivity data, scaffold bias, and high experimental cost are still the major obstacles to finding novel TYK2 modulators. Herein, we propose a scaffold-aware machine learning framework that integrates robust data curation, fingerprint-based feature engineering, and calibrated classification models with downstream molecular docking validation. Standardized TYK2 bioactivity data (pIC50) were encoded using ECFP4, MACCS, and physicochemical descriptors, followed by variance and correlation-based pruning. Three classifiers, namely Support Vector Machine, Random Forest, and XGBoost, were benchmarked under scaffold-split cross-validation to ensure realistic generalization. Our proposed XGBoost classifier yielded a superior performance compared to the RF and SVM baselines, with ACC = 0.875, F1 = 0.913, and AUC = 0.951. On application to >10,000 compounds, the model prioritized 32 candidates as highly probable actives. Docking confirmed the stable binding of several novel scaffolds. Most importantly, Deucravacitinib had been correctly predicted as an active and ranked consistently, providing external robustness. This work provides a reproducible, high-performing AI-driven pipeline for kinase inhibitor repurposing. By coupling state-of-the-art classification with physics-based docking, we provide a validated computational funnel that accelerates TYK2 drug discovery.













