Modeling and Estimation of Reference Evapotranspiration using Machine Learning Algorithms: A Comparative Performance Analysis

Department of Computer Science and Applications, Barkatullah University, Bhopal, M.P., India.

Fresh, clean water is necessary for human health. Currently, the agriculture sector uses the majority of freshwater for irrigation without using planning or optimization techniques. Evapotranspiration, which may have a major impact in planning water supply management and crop yield improvement, is an element of the hydrological cycle. Accurate anticipation of reference evapotranspiration (ET_O) is an intricate job due to its nonlinear behavior. Machine learning approach based model may be an intelligent tool to predict the accurate ET_O. This study investigates and compares the predictive skills of three regression based supervised learning algorithms: decision tree (dtr), and random forest (rfr), and k-nearest-neighbors (knnr) along with tuning their hyper-parameters like how many neighbors there are in knnr, minimum samples in dtr at a leaf node and quantity of trees in the rfr scenario to forecast ET_O. Every model's performance is quantified on four different groups of meteorological parameters. Groups are created based on close correlation of meteorological parameters with ETo. In this investigation, analysis is carried out on daily meteorological information of New Delhi, India for the periods from 2000 to 2021. The predicted results of the knnr, dtr and rfr models on four groups of meteorological inputs (twelve different models) are compared with ET_O obtained from the FAO-PM56 equations. The study's conclusions show that the k-nearest-neighbors and random forest regression-based models outperform the decision tree regression models concerning performance. The finest performance noted by knnr and rfr models with r2 (coefficient of determination) of 0.99 and rmse of 0.21 and 0.22 mm/day respectively whereas dtr model noted r2 of 0.98 and rmse of 0.40 mm/day. Therefor these models may provide scientists, engineers, and farmers with more potent choices for managing water resources and scheduling irrigation.

Decision tree regression; FAO-PM56; Hyper-parameter tuning; K-nearest-neighbors regression; Random forest regression

Introduction

According to NITI Aayog’s 2019 Composite Water Management Index report, the majority of groundwater (more than 60%) in India is utilized for irrigation. Because there aren’t enough suitable water management policies and technologies, conventional irrigation methods are applied in the various parts of the country without any quantification of crop-water requirements. As per the report, wheat and rice are the two main crops grown in India. Approximately seventy-four percentage of the area cultivated with wheat and sixty-five percentage of the area cultivated with rice faces severe water scarcity issues. In agriculture sector, efficient water saving techniques are required and quantification of crop-water requirements using the evapotranspiration method can be extremely important in this context. It combines the transpiration of plants and evaporation from groundwater supplies. Crop evapotranspiration (ETc) is calculated using ETo, a climatic parameter that solely depends on other climatic variables like wind speed, humidity, temperature, solar radiation, and sunshine hours. Allen RG et al. (1998)¹ elaborated The Food and Agriculture Organization of the United Nation’s FAO-PM56, a well-known empirical approach that requires many weather-related variables and constants. Numerous empirical models have been proposed by writers in the literature to estimate ETo. Hargreaves et al. introduced the idea of temperature-based ETo estimation, which yields outcomes more in line with FAO-PM56. Many weather observation stations use sensors and powerful computers to generate and record large amounts of meteorological data every day. This has motivated us to investigate a variety of machine learning algorithms in an effort to forecast ETo with accuracy. A new field called artificial intelligence promises to revolutionize agriculture by giving software intelligence. An artificial intelligence-based strategy helps measure irrigation water consumption and increase crop yields. One type of artificial intelligence tool is machine learning algorithms, which can process large amounts of data and reliably extract meaningful patterns. It can be an alternative solution instead of using empirical methods that require complex computation work. Numerous authors have used machine learning and soft computing techniques since the turn of the century and discovered that they may be effective means of estimating ETo. In this section, a few techniques have been examined and discussed.

Khosravi K et al. (2019)² explained the capacity of a number of machine learning models and soft computing methods, including M5P, RF, RT, REPT, and KStar, as well as four adaptive neuro-fuzzy inference systems are applied and assessed to estimate ETo, Kisi O (2007)³ used Levenberg–Marquardt based feed forward artificial neural network, Gocić M et al. (2015)⁴ applied support vector machine–wavelet, artificial neural network, genetic programming, and support vector machine–firefly algorithm, Feng Y et al. (2016)⁵ explained wavelet neural network models, back-propagation neural networks optimized by genetic algorithms, and extreme learning machine, Sanikhani H et al. (2019)⁶ used artificial intelligence techniques such as GRNN, MLP, RBNN, GEP, ANFIS-GP and ANFIS-SC, Feng Y et al. (2017)⁷ tested random forest and generalized regression neural network models, Fan J et al. (2018)⁸ applied tree-based ensemble algorithms, namely random forest, M5Tree, gradient boosting decision tree, and extreme gradient boosting models, Yamaç SS et al. (2019)⁹ evaluated k-nearest neighbor, artificial neural network, and adaptive boosting, Tabari H et el. (2013)¹⁰ demonstrated adaptive neuro-fuzzy inference system and support vector machines models, Valipour M. et al. (2017)¹¹ applied genetic algorithm and gene expression programming models, Granata F. (2019)¹² suggested M5P regression tree, bagging, random forest, and support vector regression, Abyaneh HZ et al. (2011)¹³ used Artificial intelligence techniques included artificial neural network and adaptive neuro-fuzzy inference system, Aghajanloo MB (2013)¹⁴ tested artificial neural network, neural network–genetic algorithm, and multivariate nonlinear regression methods, Feng Y et al. (2017) ¹⁵ shown extreme learning machine and generalized regression neural network models, Wen X et al. (2015)¹⁶ applied the support vector machine, Nema MK et al. (2017)¹⁷ artificial neural network model with Levenberg–Marquardt training algorithm with a single hidden layer having nine neurons to quantified ETo of various regions across the world, Saggi MK et al. (2019¹⁸ proposed H2O model framework to estimate ETo Hoshiarpur and Patiala district, Mehta R et al. (2015)¹⁹ estimated the ETc of wheat and maize for various places of Gujarat.

This study investigates and compares the skills of three regression based supervised learning algorithms namely decision tree (dtr), and random forest (rfr), and k-nearest-neighbors (knnr) models for estimation of ETo. Many developing and underdeveloped countries lack the resources necessary to obtain the high-accuracy and reliable meteorological data. This encourages us to look into how well the model performs in different combinations of meteorological parameters, limited to what is necessary. Based on groups are created due to the significant relationship between meteorological parameters and ETo. Therefore, twelve different models are evaluated and contrasted here and aims to identify the better models to forecast ETo.

Materials and Methods

Datasets

This study’s daily meteorological data, which spans the years 2000–2021, was obtained from IMD, Pune. There are 8036 samples in it. New Delhi experiences a wide range of climates, from humid-subtropical to semi-arid, with significant variations in summer and winter temperatures (from -2.2⁰C to 49.2⁰C). New Delhi, which lies in the northern part of the nation between the latitudes of 28°-24′-17″ and 28°-53′-00″ North and the longitudes of 76°-50′-24″ and 77°-20′-37″ East, with elevation of 217 meters. The data set consist with the daily temp_min (minimum temperature) and temp_max (maximum temperature) in ⁰C, R_h (humidity) in percentage, u (wind speed) in m/s, and R_s (solar radiation) in MJ/m²/day. An explanation of New Delhi’s meteorological data statistically is appears in Table 1. Table 2 displays the correlation coefficients between the meteorological data and the observed ETo by FAO-PM565 and visualize with the help of heat map in Fig. 1. Fig. 2 displays the weekly variation in the ETo of New Delhi.

Figure 1: Heat map of correlation matrix

Parameters	Dataset	Maximum	Minimum	Mean	Standard Deviation
Temp_max (^oC)	Training	48.79	12.55	33.0	7.06
Temp_max (^oC)	Test	48.05	14.33	32.79	7.09
Temp_min (^oC)	Training	34.6	-1.36	19.18	8.10
Temp_min (^oC)	Test	34.30	-0.27	18.98	8.14
R_h(%)	Training	95.12	4.19	45.54	21.09
R_h(%)	Test	93.62	5.75	45.62	21.37
U (m/s)	Training	6.42	0.47	2.20	0.83
U (m/s)	Test	6.14	0.59	2.22	0.85
R_s(MJ/m²/day)	Training	30.02	1.47	17.59	5.41
R_s(MJ/m²/day)	Test	28.54	1.98	17.32	5.54
ET_o(mm/day)	Training	18.74	0.694	7.49	3.21
ET_o(mm/day)	Test	19.16	1.00	7.39	3.28

	New Delhi
	temp_max	temp_min	R_h	U	R_s	ET_o
temp_max	1
temp_min	0.86	1
R_h	-0.33	0.12	1
U	0.21	0.07	-0.24	1
R_s	0.77	0.53	-0.40	0.28	1
ET_o	0.84	0.53	-0.64	0.48	0.88	1

	knnr1	knnr2	knnr3	knnr4	dtr1	dtr2	dtr3	dtr4	rfr1	rfr2	rfr3	rff4
Mean	1.43e-15	4.97e-16	9.94e-16	5.48e-17	-1.50e-16	-4.82e-16	-1.01e-15	-1.21e-15	3.47e-16	-7.11e-16	1.28e-15	1.15e-15
Std.	1.16	0.81	0.46	0.16	1.28	0.93	0.61	0.38	1.22	0.83	0.45	0.18
Min	-4.33	-4.08	-3.20	-0.93	-4.09	-4.41	-2.80	-1.92	-4.54	-4.26	-2.80	-1.24
25%	-0.69	-0.37	-0.24	-0.09	-0.78	-0.46	-0.32	-0.21	-0.73	-0.39	-0.25	-0.09
50%	-0.06	-0.02	0.0007	-0.004	-0.04	-0.04	-0.0002	0.007	-0.06	-0.01	0.01	0.01
75%	0.06	0.03	0.02	0.08	0.07	0.04	0.03	0.02	0.06	0.03	0.02	0.01
max	5.33	4.06	2.11	1.20	6.03	4.72	2.29	1.44	5.46	4.16	2.02	0.08

Statistical Indicators	knnr1	knnr2	knnr3	knnr4	dtr1	dtr2	dtr3	dtr4	rfr1	rfr2	rfr3	rff4
SI₁	0.9474	0.599	0.378	0.1567	1.0366	0.6755	0.4678	0.3023	0.9834	0.611	0.375	0.1724
SI₂	1.566	0.7145	0.2498	0.0429	1.8465	0.914	0.3937	0.1612	1.6949	0.74	0.2364	0.0489
SI₃	1.2515	0.8453	0.4998	0.2071	1.3589	0.956	0.6275	0.4015	1.3019	0.8602	0.4863	0.2211
SI₄	0.9247	0.9667	0.9891	0.9986	0.9117	0.9575	0.9817	0.9928	0.9186	0.9656	0.9895	0.9983
SI₅	0.855	0.9344	0.9783	0.9972	0.8312	0.9169	0.9637	0.9856	0.8438	0.9324	0.9791	0.9965

MenuMenu

Modeling and Estimation of Reference Evapotranspiration using Machine Learning Algorithms: A Comparative Performance Analysis

You may also like...

Comparative Studies on Effects of Simulated Microgravity on Growth and Photosynthetic Parameters in Rice and Mungbean

Isolation and Molecular Profiling of Halotolerant Plant Growth Promoting Rhizosphere Fungi from Salt affected Agroforestry Plantation

Comparative Study on the Bacterial Community of Cultivated and Uncultivated Rice Field Soils