FROM DATA TO HARVEST: A PCA-BASED FEATURE SELECTION APPROACH WITH K-NEAREST NEIGHBOR AND SUPPORT VECTOR REGRESSION FOR CROP YIELD PREDICTION

Bashir Ahmad; Asad Ullah; Malak Roman; Awrang Zaib; Toufeeq Ur Rehman

Authors

Bashir Ahmad
Asad Ullah
Malak Roman
Awrang Zaib
Toufeeq Ur Rehman

Keywords:

Machine Learning, Crop Yield Prediction, K-Nearest Neighbor, Support Vector Regression, Principal Component Analysis, Agriculture

Abstract

Crop yield prediction is a key component of modern agricultural management, as it provides reliable estimates that support farmers and agricultural planners in making decisions related to resource allocation, crop scheduling, and food production planning. Machine Learning (ML) and modern computational techniques are increasingly used to address major challenges in agriculture, including low productivity, climate variability, disease outbreaks, and inefficient resource use. These methods allow agricultural systems to process large amounts of data from soil records to satellite images and convert them into practical recommendations for farmers and planners. The expansion of agricultural datasets provides an opportunity to use machine learning for more reliable forecasting. This research evaluates and contrasts two Machine Learning approaches, K-Nearest Neighbors (KNN) and Support Vector Regression (SVR), in predicting crop yield after applying Principal Component Analysis (PCA) for data reduction. A comprehensive preprocessing pipeline was implemented clean, encode, standardize, and transform the data using PCA to remove redundancy while retaining 95% variance. Model assessment employed multiple statistical metrics including the coefficient of determination (R²), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Explained Variance. Results indicate that SVR is more effective in capturing yield variation when used with PCA preprocessing. The study concludes that dimensionality reduction and proper preprocessing significantly enhance model robustness and prediction accuracy, providing a practical framework for applying machine learning in agricultural yield forecasting.