PREDICTIVE ANALYTICS FOR CI/CD PIPELINE FAILURES USING MACHINE LEARNING MODELS: A HYBRID DATA-DRIVEN STUDY

Authors

  • Akshay Kumar
  • Ritik Kumar
  • Annas Ahmed
  • Mohammed Sohail Ahmed

Keywords:

CI/CD, DevOps, Machine Learning, Pipeline Failure Prediction, XGBoost, Software Reliability

Abstract

Background:
Continuous Integration and Continuous Deployment (CI/CD) pipelines are critical components of modern DevOps ecosystems; however, pipeline failures remain a persistent challenge, leading to deployment delays, increased costs, and reduced software reliability. Recent advances in machine learning (ML) offer promising approaches for predictive failure detection in CI/CD workflows.

Objective:
This study aims to develop and evaluate machine learning models for predicting CI/CD pipeline failures using a hybrid dataset combining simulated and real-world-inspired DevOps metrics.

Methods:
A hybrid dataset comprising 1,200 CI/CD pipeline executions was generated based on realistic DevOps parameters, including build duration, code churn, test coverage, commit frequency, dependency issues, and environment instability. Five machine learning models—Logistic Regression, Random Forest, Support Vector Machine, XGBoost, and Multilayer Perceptron—were trained and evaluated. Performance metrics included accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). Statistical analyses were performed to identify key predictors of pipeline failure.

Results:
Tree-based ensemble models demonstrated superior predictive performance, with XGBoost achieving the highest AUC (0.91), followed by Random Forest (0.88). Significant predictors of pipeline failure included high code churn, low test coverage, dependency conflicts, and prior failure history (p < 0.05). Logistic regression confirmed these variables as independent predictors.

Conclusion:
Machine learning models, particularly ensemble techniques, can effectively predict CI/CD pipeline failures, enabling proactive mitigation strategies. Integration of predictive analytics into DevOps workflows may significantly enhance software delivery reliability.

Downloads

Published

2026-04-28

How to Cite

Akshay Kumar, Ritik Kumar, Annas Ahmed, & Mohammed Sohail Ahmed. (2026). PREDICTIVE ANALYTICS FOR CI/CD PIPELINE FAILURES USING MACHINE LEARNING MODELS: A HYBRID DATA-DRIVEN STUDY. Spectrum of Engineering Sciences, 4(4), 2000–2009. Retrieved from https://thesesjournal.com/index.php/1/article/view/2746