Assessment of the Different Machine Learning Models for Prediction of Cluster Bean (Cyamopsis tetragonoloba L. Taub.) Yield

Pangarkar, Darshan Jagannath and Sharma, Rajesh and Sharma, Amita and Sharma, Madhu (2020) Assessment of the Different Machine Learning Models for Prediction of Cluster Bean (Cyamopsis tetragonoloba L. Taub.) Yield. Advances in Research, 21 (9). pp. 98-105. ISSN 2348-0394

[thumbnail of Pangarkar2192020AIR60327.pdf] Text
Pangarkar2192020AIR60327.pdf - Published Version

Download (479kB)

Abstract

Prediction of crop yield can help traders, agri-business and government agencies to plan their activities accordingly. It can help government agencies to manage situations like over or under production. Traditionally statistical and crop simulation methods are used for this purpose. Machine learning models can be great deal of help. Aim of present study is to assess the predictive ability of various machine learning models for Cluster bean (Cyamopsis tetragonoloba L. Taub.) yield prediction. Various machine learning models were applied and tested on panel data of 19 years i.e. from 1999-2000 to 2017-18 for the Bikaner district of Rajasthan. Various data mining steps were performed before building a model. K- Nearest Nighbors (K-NN), Support Vector Regression (SVR) with various kernels, and Random forest regression were applied. Cross validation was also performed to know extra sampler validity. The best fitted model was chosen based cross validation scores and R2 values. Besides the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), and root relative squared error (RRSE) were calculated for the testing set. Support vector regression with linear kernel has the lowest RMSE (23.19), RRSE (0.14), MAE (19.27) values followed by random forest regression and second-degree polynomial support vector regression with the value of gamma = auto. Instead there was a little difference with R2, placing support vector regression first (98.31%), followed by second-degree polynomial support vector regression with value of gamma = auto (89.83%) and second-degree polynomial support vector regression with value of gamma = scale (88.83%). On two-fold cross validation, support vector regression with a linear kernel had the highest cross validation score explaining 71% (+/-0.03) followed by second-degree polynomial support vector regression with a value of gamma = auto and random forest regression. KNN and support vector regression with radial basis function as a kernel function had negative cross validation scores. Support vector regression with linear kernel was found to be the best-fitted model for predicting the yield as it had higher sample validity (98.31%) and global validity (71%).

Item Type: Article
Subjects: Archive Science > Multidisciplinary
Depositing User: Managing Editor
Date Deposited: 17 Mar 2023 08:48
Last Modified: 25 May 2024 09:33
URI: http://editor.pacificarchive.com/id/eprint/249

Actions (login required)

View Item
View Item