A Comparative Machine Learning Framework for Location-Specific Maize Yield Prediction in the U.S. Corn Belt Using SVM, Random Forest, and Decision Tree Models: Integrating Agronomic, Genetic, and Spatial Factors
Prince Michael Akwabeng *
Department of Mathematics and Statistics, Austin Peay State University, Clarksville, TN, USA.
*Author to whom correspondence should be addressed.
Abstract
This paper fills in the key gaps in the field of agricultural machine learning by comparing the performance of the Support Vector Machine (SVM), the Random Forest (RF) and the Decision Tree (DT) algorithms to predict the yield of maize in various locations in the U.S. Corn Belt. In contrast to the past methods that consider location as fixed effect, this study deals with location as an explicit dynamic predictor, hence allowing the environment-specific interaction to be modelled. The study tests the performance of the Support Vector Machine (SVM), the Random Forest (RF) and the Decision Tree (DT) models using a dataset of 1,640 plot-level observations in four geographically different sites and adjusting them to consider spatial autocorrelation so that the models do not artificially increase the performance measure. Accuracy metrics of RMSE and R-squared were used to determine the best model. Findings revealed that RF is the superior model with R2 of 0.84 and RMSE of 24. This was followed by DT (R2 = 0.79; RMSE = 27) and SVM model had the least performance (R2 = 0.03; RMSE = 58). The variable importance of the RF revealed that the most significant variable was location_id, within-field spatial variables (row, range) and nitrogen. The results show the superiority of site-specific environmental variables over genetic and management variables in explaining yield variability. This study offers a methodology for location-based predictive modeling and contains evidence-based recommendations on the application of precision agriculture approaches to improve the accuracy or precision of yield and resource-saving recommendations.
Keywords: Machine learning, random forest, SVM, decision tree, maize, location, prediction