OPTIMIZING FAULT TOLERANCE IN CLOUD COMPUTING WITH MACHINE LEARNING: A COMPARATIVE STUDY OF DEVELOPING AND INNOVATIVE HYBRID APPROACH
Keywords:
Cloud computing, Delta check pointing, Fault tolerance, Fault classification and prediction, Machine learning algorithms, Weibull distributionAbstract
Cloud computing is one of the most rapidly growing areas of the computer industry, providing a wide variety of advantages and opportunities for users. As a result of their rapid growth during the past decade, many businesses have transitioned to cloud services so they can gain better access to their data, scale quickly to meet customer requirements, and have improved visibility into the operations of their company. In addition to these benefits, there are many challenges associated with cloud computing, including resource allocation, security, availability, privacy, quality of service, data management, performance compatibility, and fault tolerance. Fault tolerance (also known as FT) is the property of a system that enables the system to continue serving its intended purpose even in the presence of one or more faults. There are many challenges associated with FT, including heterogeneity, the absence of well-defined standards, automation problems, and reliability of downtime for a cloud service, recovery points, recovery time objectives, and management of cloud workloads. Machine learning (ML) algorithms such as AdaBoostM1, Bagging, Decision Tree (also called J48), Deep Learning (also called Dl4jMLP), and Naive Bayes Tree (NB Tree) have been incorporated into the proposed study to improve the accuracy of predicting faults occurring in cloud computing environments and to reduce the number of errors associated with predicting faults, as well as to improve the reliability of cloud computing. In addition, it uses an error tolerance strategy called delta-check pointing. Virtual Machines (VMs) were used to create data used to determine the best machine learning classifier. Then, in order to create accurate data and reduce the errors in prediction associated with the repair or failure of these Virtual Machines, the original data for this study was divided into an 80/20 split and a 70/30 split. Another technique used for this study was 10-fold cross validation. In the area of Machine Learning, the Naive Bayes Tree (NB Tree) classifier provided the highest accuracy with the lowest number of errors. The accuracy for the 80/20 Split was 97.05%, for the 70/30 Split was 96.09%, and for the 10-Fold Cross Validation was 96.78%, according to the original dataset used in this study. The execution time for the NB Tree classifier was 1.01 seconds which is significantly longer than the second-best classifier, Decision Tree (J48). The accuracy of Decision Tree (J48) as ranked is 96.78% for the 80/20 Split, 95.95% for the 70/30 Split, and 96.78% for 10-fold cross-validation which places it second overall. One distinguishing feature of J48 compared to the NB Tree is its lower algorithmic complexity (0.11 seconds for J48 versus 1.01 seconds for NB Tree) and fewer errors in prediction for J48. Although the accuracy and error prediction between J48 Decision Tree and NB Tree differ by only 0.90%, there exists a nine-second operational complexity in time between them as well. Consequently, we decided to modify our J48 Decision Tree algorithm based upon the findings from my analysis. My recommendation for using this method is because it generates the best accuracies and least predictions when applied to the datasets I analyzed: 97.05% when I performed an 80/20 split; 96.42% for a 70/30 split; and 97.07% after performing a 10-fold cross-validation.













