- Research
- Open access
- Published:
Forecasting of meteorological drought using ensemble and machine learning models
Environmental Sciences Europe volume 36, Article number: 160 (2024)
Abstract
This study highlights drought forecasting for understanding the semi-arid area in India, where drought phenomena play vital role in the irrigation, drinking water supplies, and sustaining the ecological with economic balance for every nation. Therefore, drought forecasting is important for the future drought planning based on the machine learning (ML) models. Hence, The Standardized Precipitation Index (SPI) at 3- and 6-month periods have been selected and used for future drought forecasting scenarios in area. The combinations of ten inputs SPI-1- and SPI-10 were used for predicting modeling for SPI-3 and SPI-6 timescales, that modeling developed based on the historical SPI datasets from 1989 to 2019 years. The SPI-3 and SPI-6 maximum and minimum values are shown SPI-3 (2.03 and -5.522) and SPI-6 (1.94 and -6.93). The SPI is a popular method for estimating the drought analysis and has been used everywhere at global level. The developed models have been compared with each other, with the best combination of input variables selected using subset regression models and sensitivity studies. After that, the active input parameters were used for forecasting of SPI-3 and SPI-6 values to understanding of drought in semi-arid area. The finest input variables combination have been used in the Ml models and established the novel five models such as robust linear regression, bagged trees, boosted trees, support vector regression (SVM-Linear), and Matern Gaussian Process Regression (Matern GPR) models. Such kind of models first time has been applied for the forecasting of future drought conditions. Whole models were fine and improved modeling by using hyperparameters tuning, bagging, and boosting models. Entire ML models’ accuracy was compared using different statistical metrics. Compared with five ML models accuracy, we have found that the Matern GPR model better accuracy than other ML models. The best model accuracy is R2 = 0.95 and 0.93, RMSE, MSE, MAE, MARE, and NSE values, respectively, for predicting SPI-3 and SPI-6 values in the area. Therefore, the Matern GPR model was identified as the finest ML algorithm for predicting SPI-3 and SPI-6 associated with other algorithms. This research demonstrates the Matern GPR model's efficacy in predicting multiscale SPI-3 and SPI-6 under climate variations. It can be helpful in soil and water resource conservation planning and management and understanding droughts in the entire basin areas of the country India.
Introduction
The Earth has been warming recently, as evidenced by the 0.6 °C (0.4–0.8 °C) growth in the worldwide temperature between 1901 and 2001 [94]. Consequently, extreme heat, rainfall, and ongoing wet or dry circumstances have negatively influenced habitat health [89]. Likewise, global droughts brought on by severe temperatures and dry conditions have increased frequently [95, 98, 110]. Global warming is the leading cause of these drought episodes and their frequency, with 30% of the land surface anticipated to withstand up to twice as severe droughts by the finale of this period [87, 99, 100], impacting the popular of the globe people [95, 98]. The food security in affected regions, controlling and maintaining droughts is a top priority from an agricultural point of view [27]. India is observing one of the most ecologically fragile nations due to a significant negative impact on climate variation. Drought directly impacts on the geographic position and socioeconomic circumstances [7, 14, 62]. India has a poor adaptation capacity due to its developing economy, geographic location, and dense population. As a result, India is less equipped to withstand the harmful significance of the atmosphere change [64]. Since most agricultural operations depend on rainfall, the adverse effects of environmental variation are often appreciated in the agronomic area, more advanced technology such as remote sensing datasets helpful to understanding the environmental pattern and drought analysis [8, 109, 123]. India's Gross Domestic Product (GDP) mainly comprises agriculture, which employs 40% of the country's workforce. Regional droughts now impact around 2.5 million and 1.2 million hectares of agricultural land annually in rainy and dry periods because of little or no rainfall (Division, B.M. of F.F. Bangladesh 2018). As a result, predicting drought might be a strategy for putting measures in place to lessen and reduce the regional consequences of extended dry spells [57, 111]. The water cycle is understanding at the water level for helpful to drought analysis in the China [118].
Droughts of many sizes have been detected and described. Droughts have been classified as climatic, agronomic, hydraulic, or social-economic [101, 113, 119]. The degree of aridness (a measure of precipitation parting) and the distance of dry spells have been used to describe meteorological droughts [17, 37, 49, 104, 105]. Agricultural drought is inadequate soil wetness to see the demands of certain crops at the proper time due to prolonged periods of insufficient precipitation. A hydrological scarcity happens based on observations of river flows, lakes, and groundwater tables [48, 121, 122]. There are shortages in surface and subterranean water deliveries. Conversely, socioeconomic drought refers to conditions when the amount of water delivered is less than what is needed in a particular area [30, 54]. To describe drought occurrences and their severity and determine the spatiotemporal spreading of droughts. The most often utilized climatological drought index is SPI [53, 56, 120], calculated using monthly precipitation [19]. Another valuable technique for identifying the traits of droughts is the absolute drought index (EDI) [20, 117]. While showing its usefulness in recognizing short- and long-term droughts, using SPI revealed certain limits in describing short- to long-term droughts at various time scales. Additionally, whereas EDI only offers a single value each month, various monthly SPIs are detected in that month, leading to incorrect interpretation of droughts [19, 41, 79]. According to other researchers [39, 41, 116], EDI can identify various drought occurrences. Additionally, the Standardized Precipitation Evapotranspiration Measure, which is rainfall and temperature, is a drought index, i.e., Standardized Precipitation Evapotranspiration Index (SPEI). By incorporating the impact of temperature variability on scarcity evaluations, SPEI focuses on its superiority [103].
Other methods besides the drought index have been utilized to describe a drought event or time [18]. A dry period is well-described as a string of severe drought times that occurred back-to-back over time [45, 71]. When there was less than 2 or 5 mm of precipitation, some meteorologists and climatologists called it a drought [19]. A prolonged dry spell of 25 days in a row defines a drought. Drought periods were defined by 15 straight dry days [24, 38, 65, 83, 97]. Additionally, rainy and dry periods have been used to illustrate climatic scenarios, and it has been stated that they are valuable weather indicators [11, 21, 28, 47, 90, 92]. In Switzerland, it has been discovered that the regional and temporal tendencies of wet and dry periods may be helpful in understanding and assessment of climate [38]. Heat waves were discovered to be produced by dry days in tropical regions, while they are directly associated with heat waves [77]. Heatwave susceptibility has been used to pinpoint hot spots in a particular area through meteorological, sociological, physical, and environmental characteristics [44, 68,69,70]. The impact of the North Atlantic Oscillation was also examined [66]. An extensive meteorological connection was also employed to analyze urban and semi-urban air temperatures day and night, finding that the urban heat index (UHI) was most significant during dry climates [36, 107, 108]. Therefore, the likelihood of heat waves is logically related to drought. Few studies used a whole number of dry days in a month to forecast future dry days. To show a zonal environment and prove the applicability of successive dry days, other studies, for instance, primarily concentrated on Monthly Consecutive Dry Days (MCDD) across Japan [63]. Research on monthly dry days (MDD) asserted that while MDD cannot be used to define a specific kind of drought directly, it would be helpful to identify trends in how dry spells have changed over time in different months. This research aims to develop novel methods for identifying connections between MDD and monthly wet days (MWD) [84].
Physical or data-driven models can anticipate or make projections about dry periods or droughts. A data-driven approach for flood forecasting [4], which produced results quickly, only needed a small amount of information over a short period. Various researchers also used statistical data-driven models to predict precipitation and droughts. For instance, long-term drought prediction using SPI has widely employed linear regression [59], SVM-Linear [15], Gaussian process regression (GPR), Robust regression, ensemble models, and artificial neural network (ANN) [46]. These data-driven models used factors related to the preceding months' rainfall or drought as inputs and outputted the indicators of precipitation or drought. Compared to other algorithms, ANN-based models were better able to predict droughts. Additionally, ANN outperformed multiple linear regressions at Wilsons Promontory, Australia, when forecasting SPEI. Rainfall prediction also used several ML algorithms, and the outcomes were regularly improved by applying autocorrelation functions [60, 80]. Furthermore, SPEI prediction for Pakistan demonstrated that SVM outperformed as associated with ANN and other models [43, 76]. SVM is more accurate than ANN, while SVM used to forecast SPI in Iran was verified by a different study [85]. The tests were conducted assuming that ML models would be more accurate if they used hydro-meteorological information rather than understanding the actual physiological processes [29]. Drought forecasting with extensive main periods and more precision is valuable in agriculture applications. The difficulties of lead time forecasting have been acknowledged in research on various lead time phenomena across several drought studies. Several studies have employed ANN-based models, which have proved usefulness in predicting droughts within a 12-month lead time [13, 15, 29, 35, 55]. Moradkhani Meier [58] worked on the two ML method techniques for ensemble forecasting. The agriculture drought forecast used large-scale climatic variables, and the accuracy of various models was checked using a statistical approach [106, 114, 124]. The least squares support vector machine (LSSVM), M5 Model Tree, and multivariate adaptive regression splines (MARS) to forecast the streamflow pattern in Turkey's Mediterranean area. The results show that the LSSVM model executed well for streamflow modeling compared to other ML models referencing weather data [22, 23]. Therefore, four stations of Iran were observed and collected rainfall data during January 1967 to December 2009. The eight most critical climatic metrics were used to predict drought. They noticed that the Atlantic Meridional Mode (AMM) had the strongest correlation with Atlantic Surface Temperature (AST) inversed association by SPI. It has been determined that the Neuro-Fuzzy (N.F.) model's forecasting ability was superior to the Stepwise Regression (S.R.) model in a 2-year lag. An ensemble drought forecast for Africa is provided by AghaKouchak [5] using their Multivariate Standardized Drought Index (MSDI). The ESP approach was initially useful in evaluating monthly precipitation and soil moisture statistics to forecast seasonal changes. The probability seasonal fluctuations in soil moisture and rainfall will be cumulatively expected by ESP method.
Oliver Meseguer-Ruiz et al. [72] have used to SPIand SPEI to found the drought analysis and future forecasting with the comparing SPIand SPEI based on the various rainfall and temperature rules in the Chile country through the previous four periods. Khalid En-Nagre et al. [32] have stated that the observing drought particular in the semi-arid area due to environment variation is dominant significance [8, 93, 104, 105]. The forecasting of climatological drought analysis based on the SPI and SPEI index using ML models and environment information. In this study, very popular ML models such as Random Forest, Voting, AdaBoost, and K-Nearest Neighbors Regressors, were estimated to forecast the SPEI values for 3- and 12-month times. Hence, we have adopted the novel ML models and different approach for forecasting the drought based on the SPI values in the study area. Our results and models have been compared with recent and past works, it is found that studies and results of ML models different and novelty methodology adopted for understanding the drought in the semi-arid area. Md. Abdullah Al Mamun et al. [9] identified of powerful climate factors and periodic drought forecast in the Bangladesh using 24 ML regression models. Mokhtar, A. et al. [61] and Abdel-Fattah et al. [1] have been used to the drought forecast ML models values for helpful to sustainable water management and development. Current applications of ML algorithms recommended the benefit of maximum flexible and robust for drought forecasting [34, 52, 60].
ML algorithms superior capture challenging connection with input and output factors, handling nonlinearity and time-based dependencies then best algorithms have been used to drought forecasting based on the SPI and SPEI at different scales [10, 52]. The agriculture drought has been forecasting the ML models using feature selection methods [67, 82].
The results of the ensemble model have been useful for the drought predication in the semi-arid areas. It estimates the frequency of severe droughts and provides data on their probability, hence these areas many times every year drought situation issues created due to demand of drinking and irrigation purposes with every summer season drought situation rising due to climate change. The important aim and objective for this study was to examine monthly deficits and identify the impact on the agriculture production. As outcomes, the investigation presents an innovative tool for observing regional drought analysis based on the SPI-3 and SPI-6 scales and ML models. This research exactly ensures: Therefore, the mian aim of this study focus on the create and develop the methods and models for drought forecasting events and spatial range using the SPI. As an example of investigation, current research work will concentration on Nashik district, Maharashtra state, India. While maximum agriculture in study area is under the summer droughts due to no more proper rainfall recorded which develop even extra antagonistic situations for farming. The best subset regression models and sensitivity analysis will identify the optimal variables for each dataset as inputs in the L.R. models. To develop the novel ML algorithms, including SVM, Matern (GPR), robust regression, and ensemble models (bagged and boosted trees), and a robust climate prediction model based on the weather station. This research adopted methodology depending on the input variables choice and relative analysis of various ML algorithms, in which predicting uncertainty is also studied. Finally, assumptions are strained, and future suggestions and helpful to policy are made.
Study area and data collection
The basin area, covering 312,812 km2 in peninsular India, is the country's second largest river basin (Fig. 1). The River Godavari, originating at 1067 m above mean sea level in the Nashik district near Trimbakeshwar, traverses the Deccan Plateau. The river flows for approximately 1465 km, ultimately emptying into the Bay of Bengal. The Maharashtra plateau is bisected by the river, which flows eastward into the level floodplains that are frequently inundated. The basin is rugged and fragmented, particularly in the northeastern region. The average annual minimum and maximum temperatures in the basin area are 20.53 °C and 32.85 °C, respectively. Additionally, the basin receives an average annual rainfall of 1,096.92 mm, with 85% of the total occurring during the southwest monsoon season between June and September. The monthly rainfall datasets from 1989 to 2019 years were collected from open source website (https://power.larc.nasa.gov/data-access-viewer/). The main important parameters SPI-3 and SPI-6 were calculated based on the rainfall datasets in the R software.
Methodology
SPI
It is a comparatively new drought index that takes rainfall into account. It is a probabilistic index that works with any time horizon. Usually, a month or two is the critical time frame. Some operations, like dry land agriculture, are swiftly impacted by changes in the atmosphere. The SPI was created by reference [51] and is frequently used to calculate the rainfall deficit to estimate drought situations. To apply SPI, long-term historical rainfall records must fit a desired probability distribution, particularly the gamma distribution. The SPI is distorted into a normal distribution [33]. The rainfall data are collected from NASA power website (https://power.larc.nasa.gov/). Precipitation information from 1989 to 2019 is considered in this investigation. In this research, we have selected two scales of SPI 3 and 6 months for understanding the drought in the semi-arid area. The SPI-3 and 6 months were estimated based on the rainfall datasets and monthly datasets used for forecasting the SPI values of both scales to drought condition understanding and agriculture purpose. The initial 24 years of data were used to train the model, while the latter six years were used to testing the model. The software for drought indices was utilized to calculate the SPI. Those values were used by Masroor et al. [51] to classify climatological droughts. An SPI value between 0 and −0.99 indicates a ‘mild drought’, −1 and −1.49 ‘moderate drought’, −1.5 and −1.99 ‘severe drought’ and an SPI value above 2 indicates an ‘intense drought’. An index of the chance of rainfall that may be utilized for every period of time is the SPI. Palmer's indexes are water content indexes that take into account water supply rainfall, demand evapotranspiration, and loss, while the SPI is a possibility index that only takes precipitation into account [115]. There are only a few dry areas [88]. SPI is a commonly utilized SPEI indicator, which is commonly used in drought studies and strikingly comparable to SPI. Although SPI just uses precipitation to assess metrological drought, SPEI places more emphasis on the potential evapotranspiration (PET) parameter. Consequently, the calculation of the SPEI is more intricate. Furthermore, there is research demonstrating the applicability of SPEI in studies on meteorological drought. SPI is widely used for many different reasons, including its ease of calculation, suitability for normal distribution, ability to define weather-related drought under different climate conditions, and ability to give early drought warnings for varying time scales (Sırdas, 2002). It also depends only on precipitation data. The precipitation for the chosen period is subtracted from the mean, and the outcome is divided by the standard deviation to determine the SPI. Using this approach, several periods ranging from one month to 48 months can be examined to see the impact of the absence of precipitation on various industries. More precipitation than the median is indicated by positive SPI numbers, while less precipitation than the median is indicated by negative values. Because the SPI is standardized over a standard distribution, climates that are drier or wetter can be represented in a comparable way [75]. The adopted methodology is presented in Fig. 2.
The gamma distribution, commonly used for modeling precipitation data, was used to determine the PDF in Eq. 1, where β represents the scale variable, α represents the shape variable, x represents the rainfall quantity, and Γ(α) represents the gamma function:
Equations 2, 3, and 4 provide estimated lower and upper limits for the values of α and β. These equations are used to calculate the cumulative probability of rainfall events greater than zero using Eq. 5:
Machine learning models
Support vector machine (SVM)
SVM is a ML technique, which is excellent at classifying and recognizing patterns. Finding the ideal categorization border with the greatest discriminating power is where its strength rests. In order to distinguish classes that are not linearly separable, SVM accomplishes this by converting input data into a higher-dimensional feature space using kernels like Gaussian, polynomial, and sigmoid. The underlying principle of the SVM algorithm is to learn the connection among input and output data, which is established through training datasets (xk, yk), where k = 1,…,N. The nature of the problem being addressed by SVM is determined by whether y is continuous (linear regression) or represents a class label (classification). Samsudin and colleagues (2010) reported that SVM classification is based on datasets. Many investigations developed it for the area [33]. The binary classification models under the classification model are helpful for classification research. The binary classifier assumes the task has two classes, which the decision surface can accurately identify. A series of binary classifiers can handle multiclass jobs. For instance, two kinds of flags have been used for this investigation. A non-event (background) was categorized as 0, whereas an event was 1. In this model, we have used SVM-linear function for predicting the drought in study area.
Boosted tree
A decision tree's variance can be reduced by using a technique called bagging or Bootstrap aggregation. This method is frequently used to generate re-samples of training datasets in order to assess drought properly. In the beginning, samples are bootstrapped from the unprocessed data to create several training datasets. The fact that the bagging method combines all the trees into a more robust combined tree model output rather than just a single tree model is one of its key benefits. By deleting the original training datasets rather than randomly picking new training datasets for each time step, it also avoids the instability inherent in regression tree development. A more reliable method than depending just on one decision tree is to average all the forecasts from several trees to arrive at the final prediction. Binary recursive dividing is an iterative procedure that splits information into dividers or divisions to create a regression trees model. The method divides a training set containing pre-classified records into partitions divided into more manageable groups. A tree has been constructed by each binary split made in each field to divide the data into the first two branches. The goal is to minimize the total squared deviations from the mean in two different barriers, so the model chooses the divided that achieves the result. This splitting rule has been helpful to every new division. The procedure remains up to apiece node extent, the smallest size quantified by the operator, and develops an incurable node [93].
Matern Gaussian process regression (GPR)
The name-value pair argument "Kernel Function", "matern52" can indicate the Matern 5/2 kernel function. The name-value pair argument "Kernel Function", "matern52" can mean the Matern 5/2 kernel function. The Matern covariance function is characterized by two hyperparameters: the length scale (λ) and the smoothness parameter (ν). The length scale controls the range over which data points influence each other, while the smoothness parameter governs the smoothness of the resulting regression curve. The choice of these hyperparameters significantly impacts the model's performance and must be carefully tuned for optimal results. The Matern GPR model assumes that the underlying data can be modeled as a continuous, infinitely differentiable stochastic process. This adaptability makes the model well-suited for various applications, including environmental modeling, geo-statistics, and time-series analysis. One of the important benefits of the Matern GPR model is capability to handle various levels of smoothness, which makes it more versatile than other covariance functions used in GP regression. The model can smoothly transition from highly smooth curves (ν → ∞) to less smooth ones (ν = 0.5), capturing different degrees of complexity in the data. Furthermore, the Matern GPR model's Bayesian nature provides a principled framework for uncertainty quantification. It estimates the mean regression curve and provides confidence intervals, allowing for reliable uncertainty estimation in predictions. This is particularly valuable in scenarios where uncertainty is critical, such as in decision-making processes and risk assessment. However, the Matern GPR model's computational complexity can be a drawback for large datasets, as it involves calculating and inverting covariance matrices. Techniques like sparse approximation methods and stochastic variational inference have been proposed to address this limitation and improve the model's scalability.
It is nonparametric probabilistic models using kernels [78]. The fitrgp function can be utilized to train a GPR model. Study the training set (xi, yi; I = 1, 2,…, n, which is derived from an unknown distribution and where xi = Rd and yi = R. A GPR (Gaussian Process Regression) model challenges to forecast the value of a reply variable y new certain a fresh input vector x new and training data. Unlike a linear regression model, a GPR model is a probabilistic model that models the entire distribution of possible functions that can fit the training data (Eq. 6):
where ε ∼ N (0, σ2). The data are utilized to assessment the coefficients and error variance. h, and latent variables, f (xi), from a Gaussian process (GP), to describe the reply.
Robust linear regression
When outliers or other sorts of departures from the assumptions of the ordinary least squares (OLS) approach are present, robust linear regression is a statistical methodology used to estimate the parameters of a linear regression model. While robust regression techniques are intended to be less impacted by outliers than OLS, they are more dependable and helpful when the data are noisy or non-normal since OLS is susceptible to these abnormalities. Finding a line that fits the data as closely as feasible while reducing the impact of any outliers or significant observations is the aim of robust linear regression. There are various distinct robust regression techniques, each having advantages and disadvantages. The Huber M-estimator, the least trimmed squares (LTS) estimator, and the median absolute deviation (MAD) estimator are some of the most commonly used methods. The Huber M-estimator is a robust regression method that down weights the influence of outliers by using a weighting function that is less sensitive to significant deviations from the mean. This method is particularly useful when the data contain a small number of extreme outliers, but is less effective when the outliers are more numerous or when the underlying distribution is highly skewed. The LTS estimator is a robust regression method that selects a subset of the data that is most consistent with the linear model and then fits the model using only this subset. This approach is advantageous when the data contain a large number of outliers or when the outliers are highly influential. The MAD estimator is a robust regression method that uses the median absolute deviation of the residuals as a measure of the dispersion of the data. This method is beneficial when the data contain a moderate number of outliers and when the distribution is not highly skewed. In this context, robust linear regression is a valuable statistical method used to predict the parameters of a linear regression model in the presence of outliers or other deviations from the assumptions of OLS. By minimizing the influence of outliers, robust regression methods are more reliable and effective in situations where the data are noisy or non-normal.
Bagged trees
The bagged trees model is a powerful ensemble learning technique used to decrease the variance of decision trees and advance the general analytical performance. The method, known as bootstrap aggregation or bagging, involves generating multiple re-sampled datasets from the original training data. Each dataset is created by randomly selecting data points with replacements, resulting in multiple variations of the training data. Several decision trees are built using these re-sampled datasets in the bagging process. Each tree is trained independently on a different subset of the data, resulting in diverse trees with slightly different structures and predictions. This diversity is essential in reducing overfitting, as it helps capture different patterns and relationships in the data. One of the key benefits of bagging is that it combines the predictions of all the individual trees into a more robust and accurate combined tree model output. Rather than relying on the prediction of a single decision tree, bagging aggregates the predictions from multiple trees, resulting in a more reliable and stable final prediction. Moreover, bagging helps to address the instability often associated with decision tree development. By randomly selecting new training datasets for each tree instead of using the original training data, bagging minimizes the impact of outliers and noise in the data, leading to more reliable and consistent predictions. Bagged trees are particularly effective in handling complex and high-dimensional datasets, where a single decision tree may struggle to capture all the underlying patterns. The ensemble of trees produced by bagging works together to create a more comprehensive and accurate representation of the data. In addition to reducing variance and improving predictive performance, bagging provides a measure of uncertainty in the predictions. By averaging the forecasts from multiple trees, it is possible to estimate the uncertainty associated with the final forecast, which is crucial for decision-making processes and risk assessment. Bagged trees have been effectively useful in different fields, with business, healthcare, and ecological science, due to their robustness and ability to handle challenging datasets. This ensemble learning technique has proven to be valuable in machine learning, particularly when dealing with complex and noisy data, where it consistently provides accurate and reliable predictions.
Ensemble modeling
It is a popular and important investigation topic in data mining and machine learning. A learning system's simplification power can be significantly increased by training multiple base learning structures, combining these base learning systems, and applying different iterations of the learning system to similar problems. The use of ensemble learning has a wide range of potential applications. Today, there are numerous ensemble learning methods available, including bagging, boosting, and subspace, with bagging being one of the more well-known. Python's most important component is its matrix-based machine-learning language, which enables the most organic expression of computer mathematics.
Performance evaluation
The mean square error (MSE), root mean square error (RMSE), mean absolute root error (MARE), mean absolute error (MAE), normalized squared error (NSE), and coefficient of determination were the six statistical indices used to quantity the correctness of ML models (r2). The MSE calculates a fixed line's proximity to data points [117]. The RMSE of time series expected values from observed results is represented by RMSE statistics [78]. The NSE is frequently used to assess how well hydrological models function. The NSE is superior to other metrics like the coefficient of determination [117] even though it is subject to outliers since it sums over the squared values of the discrepancies between the projected and the data. At the same time, the MARE gauges the error in the expected rainfall relative to the observed values in absolute terms [117]. MAE statistics show the mean absolute deviation of forecast values from experiencing values of the period in order. Additionally, r2 measures the linear relationship among dependent and independent factors [117]. To simulate the SPI index, the models with greater r2 values (closer to 1) and lesser values for MSE, RMSE, MAE, NSE, and MARE are deemed superior. N is the number of observations, O and P stand for observed and forecast or simulated values for an ith dataset, OAvg and PAvg for the average or mean magnitude of observed and predicted or simulated values in Eqs. 7, 8, 9 and 10:
Results and discussion
The research results of study area are different analyses about creating ML models for forecasting SPI-3 and SPI-6 values. The feature selection of input variables has been applied to the datasets and used in the ML models. Input variables, performance evaluations, and a comparative study of ML models for drought predicting at the basin level have completed the sensitivity analysis. The models rely on input variables such as SPI-1 to SPI-6, which correspond to the SPI over one to six months and show the different scenarios of drought and future drought intensity for particular basin levels. The study area results can be useful in the development of policy and water resources planning and development during the drought conditions at the basin level. The forecasting of metrological drought always impacts agriculture production, and the economy, etc. Hence, we have proposed the five important ML models (i.e., Robust linear regression, bagged trees, boosted trees, support vector machines, and Matern Gaussian process regression) were selected for the future monthly values of SPI at 3 and 6 in the area. These future values can be important for understanding the future drought at the basin level of the semi-arid regions.
Input selection using best subset model for the SPI‐3, and 6 months
Machine learning models have been developed to forecast SPI 3 and 6 regression studies executed on various input combinations. Six different input combinations have been used to determine the finest input combination in the analysis and development models. The best input combinations have been chosen using numerous statistics indicators. The selection standard for the finest input combination uses the highest R2 and Adjusted R2 values and the lower values of MSE, Mallows’ Cp, Akaike’s AIC, and Amemiya’s PC. Results of the regression analysis for SPI‐3 and SPI‐6 prediction are presented in Tables 1 and 2, respectively. From Table 1, it is evident that the finest input mixture for the SPI‐3 forecast is combination 5, which includes various parameters SPI‐1/SPI‐3/SPI‐4/SPI‐5/SPI‐6, with greater values of R2 and adjusted R2 of 0.746 and 0.741, correspondingly. Similarly, Table 2 shows that combination 2, which includes variables SPI‐1/SPI‐2, is chosen as the finest input grouping for the forecast of SPI‐6, with greater values of R2 and adjusted R2 of 0.842 and 0.840, respectively.
Sensitivity analysis
The sensitivity analysis was conducted based on input factors, which were determined to be the most significant factors for this area dataset. These analysis results have been used to forecast SPI-3 and SPI-6, obtainable in Tables 3 and 4, respectively. Table 3 displays that SPI(t-1), SPI(t-3), SPI(t-4), SPI(t-5), and SPI(t-6) have been identified as actual factors for predicting SPI-3 with absolute standard coefficient (β) values of 0.047, 0.077, 0.089, 0.089, and 0.065, respectively. For SPI-6 forecast, Table-4 presents results as SPI (t-1) (β = 0.065) and SPI(t-2) (β = 0.065) were identified as active factors. Figures 3 and 4 depict a graphical demonstration of active input factors. The correlation between the variables for SPI-3 and SPI-6 is established in Tables 5 and 6, respectively. Here, for SPI-3 and SPI-6, the finest correlation is detected between SPI-2 and SPI-1, SPI-3 and SPI-2, SPI-4 and SPI-3, SPI-5 and SPI-4, and SPI-6 and SPI-5, each having coefficients around 0.857 and 0.916, respectively.
Assessment of ML models using best combinations of subset models
The Indian river basin study, so important for short-term SPI-3 and SPI-6 has predicted metrological drought using five ML models. The accuracy of ML models was measured by using six multiple performance metrics such as r, R2, MSE, RMSE, MAE, MARE, and NSE, and compared the results of ML models and find out the better ML model for estimating of SPI-3 and 6 months at basin level. The SPI datasets were separated in 80% and 20% for training and testing phases, respectively. The detailed model outcomes of SPI-3 and SPI-6 are presented in Tables 7 and 8 during the training and testing phases. The outcomes of the ML algorithms have demonstrated that the most accurate predicting of the SPI-3 and 6 for helpful for mitigation planning and development sustainable activates for less metrological drought impact on the crop and other areas. The performance of several ML models is tested in order to determine which model performed best when it came to multiscale SPI forecasting in the entire river basin with the planning of crop and water resources. The ML best models have forecasted the accurate values for SPI-3 and SPI-6-month entire basin area. Hence, the selection of best ML models criteria is smaller values of MAE, RMSE, RAE, and RRSE close to zero and larger NSE, r, and R2 values close to one. During the training phase, the Matern Gaussian process regression model's MSE, RMSE, MAE, MARE, NSE, r, and R2 values observed are 0.18, 0.42, 0.33, 1.34, 0.69, 0.92, and 0.84, respectively. However, during the testing phase, the ML models' performance values are 0.48, 0.69, −0.85, 0.54, −0.50, 0.89, and 0.95, respectively. The Matern Gaussian process regression model was more widely used than other developed models during the training and testing period shown in Table 8. Its values are MSE (0.05, 0.88), RMSE (0.22, 0.94), MAE (0.16, −0.59), MARE (0.50, 0.55), NSE (0.91–0.36), r (0.99, 0.86), and R2 (0.99, 0.93), in that order. Using through-line plots and scatter plots for each of the five machine learning models—robust linear regression, bagged trees, boosted trees, SVM, and Matern Gaussian process regression—the study also included a graphical analysis of the developed models during the testing phases. The research results for the SPI-3 and SPI-6 forecasts are shown in Figs. 5 and 6, respectively. The graphs showed that all of the ML models that were being developed agreed more favorably with the 1:1 line, indicating high prediction accuracy. The Matern Gaussian process regression model was found to be a better-corrected machine-learning model for predicting SPI-3 and SPI-6 values in the basin area. The study’s main objective is to forecast metrological drought in the semi-arid region in future climate problems.
Discussion
The agriculture and metrological drought are severe natural tragedies, which directly impact on the natural resources, and climate. It badly affects the livelihood of people, who are almost dependent on agriculture. However, several droughts have been frequently seen in the country in the last two decades. Therefore, numerous scientists have explored and documented several drought prevention and program enlargement studies for India [12, 40, 42, 81, 86, 91, 96, 102]. In the present investigation, the usefulness of hybrid ML models, i.e., Robust linear regression, bagged trees, boosted trees, SVM, and Matern GPR, were calculated to various period forecasting of SPI-3 and six months. Among these models, Matern Gaussian process regression models showed the finest show between specific models in the training and testing phase for both time scales of SPI. In a research accompanied by Osmani et al. [74], various methods and algorithms were utilized to estimate the sum of monthly dry days in Northern Bangladesh. The different ML models included, such as the ensemble model, bagging, and stacking the machine learning models and methods, have been used to improve accurate forecasting values and best models applicable in solving issues related to natural research, climate, and atmosphere [2, 3]. The recommended methods and models of study can be considered reliable for predicting monthly dry days in the region. Our results were also compared with other current researches showed across various areas, like Bangladesh, China, Ethiopia, and India. As both phases of training and testing, the long-term SPI projecting model kinds more precise forecasts than the short-term SPI analytical models [112]. The current research results agree with the research by [6, 50]. This reproduces that long-term rainfall outlines differ less than short- and medium-term rainfall patterns [16, 31].
Further, assessed the capability of different meta-heuristics methods using random forest, boosted trees, and Matern GPR for crop water efficiency prediction of wheat and maize under limited climatic conditions and found that the Matern Gaussian process and bagged trees as the finest model for wheat and maize crop, respectively. In addition, developed the drought index using a SVM, Matern GPR, and bagged trees for the Ganga River Basin in India. The identified results show that the Matern Gaussian process and the support vector machine created the finest outcomes associated with others. Further, the outcomes of this study demonstrated the capability of different hybrid ML algorithms to forecast monthly SPI-3 and SPI-6. The research showed that the Matern Gaussian process regression model performed better than the Robust linear regression, bagged trees, boosted trees, and SVM models for both SPI‐3 and SPI-6 forecast in the training and testing period. The study found that the long-term SPI model was additional accurateness to SPI short-term forecasting of SPI-3 and 6 values using ML models during training and testing periods. The results showed a significant enhancement in correctness using the long-term model. The present results corroborate the research findings of Aghelpour and Varshavian [6], Yaseen et al. [112], and Osmani et al. [74]. Citakoglu and Coşkun [25] investigated the metrological drought in Turkey using hybrid ML and comparison methods for forecasting short-term insufficiencies. [26, 73] the SPI forecasting is based on the LSTM and some advanced models. Compared the previous results and models with current methodology, we have adopted the novel approaches applied on the drought forecasting of SPI-3 and SPI-6 in the study area. Nowadays, so many problems are related to deficient rainfall and dry periods longer than the rainy seasons of the last five years. Climate change directly impacts rainfall and demand, increasing irrigation water, and many villages face drought due to insufficient rain in the entire basin. Hence, these research results are essential for understanding the deficit and how it affects the area's natural resources. The results of this study are critical for understanding the drought trend in the basin area, such novel results can achieve the sustainability development goal. The Taylor diagram is represented in Fig. 7A–D. These Taylor diagrams have shown SPI-3 and SPI-6 drought forecasting models performance during training and testing phases, respectively.
Advantage of proposed models
-
1.
The Matern GPR model established greater accurateness in predicting SPI-3 and SPI-6 values, with R2 values of 0.95 and 0.93, respectively.
-
2.
The models use a grouping of ten input factors (SPI-1 to SPI-10) to improve the robustness and accuracy of the forecasts, ensuring a complete study of drought situations.
-
3.
The ML models were used to bagging, and boosting methods, which better their performance and dependability.
-
4.
This study denotes the major application of these innovative ML models for predicting future drought situations in the indicated area.
-
5.
The model can be modified for further semi-arid areas and possibly for various climatic situations, creating a useful implement for drought forecast.
Research limitations
In general, our research gives expressive results in terms of finding out and predicting short- and long-term drought analysis, i.e., SPI-3 and SPI-6 using past climatic time series of 30 years datasets (1989–2019) used to various ML models.
In the present study, SPI-3 and SPI-6 is a world metrological organization (WMO)-suggested and extensively utilized drought index to observer the short-long-term soil moisture shortfall, but it results from the individual use of precipitation dataset only which restrictions its ability to study the droughts on an extensive scale in the semi-arid area. The long short-term forecasting SPI-3 and SPI-6 drought from new atmospheric factors showed to be a better method but due to the limitations in data accessibility single one climatic indicator is being used currently with precipitation for predicting SPI-3 and SPI-6 values for indicating drought condition. However, the forecast accurateness can be more better by including other climatic variables like wind speed, air pressure, relative humidity, water vapor pressure, evapotranspiration, etc. [43, 74, 76]. In terms of ML models, we studied the ability of maximum tree-based models which we aim to range to another approaches including deep learning models and advanced methods [112]. We separated the 30 years datasets into two classes such as training, and testing. This method assisted to avoid overfitting and confirmed dependable ML models performance. Furthermore, in the upcoming, assessing the algorithms implication using comparative importance examination can deliver more vision into the variables related with drought occurrence.
Conclusion
The current research investigated whether machine learning models can help predict the SPI-3 and six drought index at various scales in the basin area. The study relied on monthly precipitation data collected from a single meteorological station from 1989 to 2019 years datasets. The statistical autocorrelation approach has been used to develop forecasting models, which showed promising outcomes. For forecasting the SPI-3, the perfect input combination has been selected to be SPI-1 to SPI-6, with the highest accuracy R2 = 0.746 and the lowest MSE = 0.48. For forecasting the SPI-6, the perfect input combination was SPI-1 and SPI-2, which yielded the highest R2 of 0.842 and the most insufficient MSE of 0.426. The results from the forecasted outcomes were consistent with those obtained using the Mater GPR models. The Matern GPR model of SPI-3 and 6 showed better results during training and testing with minimum RMSE values of 0.42 and 0.22 and 0.69 and 0.94, respectively. Based on the measurable and qualitative investigation, the Matern GPR model was the most accurate model for forecasting SPI-3 and 6 in the entire basin area. Using ML models to predict drought has become increasingly popular in recent years. Among the various models available, those that rely on the SPI have shown the best performance. This model can provide accurate forecasting for both short-term and mid-term drought situations. The study area results can significantly affect drought and water resources management plans. For instance, the precise prediction of drought can help decision-makers plan for water allocation and irrigation, which can be necessary for understating the effect on crop creation and food security. Overall, the use of machine learning models for drought prediction has the potential to revolutionize the way we manage water resources and plan for future drought events.
Future scope and direction
Based on the present investigation results on predicting the SPI-3 and SPI-6 using ML models, the following future study directions and applications can be considered: the future more scope can be used to other climatic and environmental variables (e.g., temperature, soil moisture, vegetation indices) to enhance the accuracy of drought predictions. To discover advanced ML models such as Long Short-Term Memory (LSTM) networks, Convolutional Neural Networks (CNN), and ensemble learning methods for possibly well performance. Apply the models to various spatial scales (e.g., sub-basin or regional level) to recognize localized drought dynamics. Assess the socioeconomic impacts of the predicted drought scenarios on agriculture, water resources, and local communities. Adapt the ML models to account for changing climatic conditions and improve their resilience to future scenarios. By following these future current directions and applications, the possible of ML models in drought forecast can be entirely comprehended, important to more actual and active water resource management and agriculture plan and policies.
Availability of data and materials
No datasets were generated or analyzed during the current study.
References
Abdel-Fattah MK, Mokhtar A, Abdo AI (2021) Application of neural network and time series modeling to study the suitability of drain water quality for irrigation: a case study from Egypt. Environ Sci Pollut Res 28:898–914
Achite M, Katipoğlu OM, Jehanzaib M, Elshaboury N, Kartal V, Ali S (2023) Hydrological drought prediction based on hybrid extreme learning machine: Wadi mina basin case study, Algeria". Atmosphere 14(9):1447. https://doi.org/10.3390/atmos14091447
Achite M, Katipoglu OM, Şenocak S et al (2023) Modeling of meteorological, agricultural, and hydrological droughts in semi-arid environments with various machine learning and discrete wavelet transform. Theor Appl Climatol 154:413–451. https://doi.org/10.1007/s00704-023-04564-4
Adamowski JF (2008) Development of a short-term river food forecasting method for snowmelt driven foods based on wavelet and cross-wavelet analysis. J Hydrol (Amst) 353:247–266
AghaKouchak AA (2015) Multivariate approach for persistence-based drought prediction: application to the 2010–2011 East Africa drought. J Hydrol 526:127–135. https://doi.org/10.1016/j.jhydrol.2014.09.063
Aghelpour P, Varshavian V (2021) Forecasting different types of droughts simultaneously using multivariate standardized precipitation index (MSPI), MLP neural network, and imperialistic competitive algorithm (ICA). Complexity 2021:1–16
Akter KS, Rahman MM (2012) Spatio-temporal quantification and characterization of drought patterns in Bangladesh. J Water Environ Technol 10:277–288
Alamgir M et al (2015) Analysis of meteorological drought pattern during different climatic and cropping seasons in Bangladesh. J Am Water Resour Assoc 51:794–806
Al Mamun MA, Sarker MR, Sarkar MAR et al (2024) Identification of influential weather parameters and seasonal drought prediction in Bangladesh using machine learning algorithm. Sci Rep 14:566. https://doi.org/10.1038/s41598-023-51111-2
A Alshahrani M, Laiq M, Noor-ul-Amin M et al (2024) A support vector machine based drought index for regional drought analysis. Sci Rep 14:9849. https://doi.org/10.1038/s41598-024-60616-3
Bai Y, Chen Z, Xie J, Li C (2016) Daily reservoir inflow forecasting using multiscale deep feature learning with hybrid models. J Hydrol 532:193–206
Bandyopadhyay N, Bhuiyan C, Saha AK (2020) Drought mitigation: critical analysis and proposal for a new drought policy with special reference to Gujarat (India). Prog Dis Sci 5:100049
Barua S, Ng AWM, Perera BJC (2012) Artificial neural network–based drought forecasting using a nonlinear aggregated drought index. J Hydrol Eng 17:1408–1413
Begum K et al (2019) Modelling greenhouse gas emissions and mitigation potentials in fertilized paddy rice fields in Bangladesh. Geoderma 341:206–215
Belayneh A, Adamowski J, Khalil B, Ozga-Zielinski B (2014) Long-term SPI drought forecasting in the Awash River Basin in Ethiopia using wavelet neural networks and wavelet support vector regression models. J Hydrol 508:418–429
Belayneh A, Adamowski J (2013) Drought forecasting using new machine learning methods. J Water Land Dev 18:3–12. https://doi.org/10.2478/jwld-2013
Blumenstock, G. (1942). Drought in the United States analyzed by means of the theory of probability.
Broccoli AJ, Manabe S (1992) The effects of orography on midlatitude Northern Hemisphere dry climates. J Clim 5:1181–1201
Byun HR, Kim DW (2010) Comparing the effective drought index and the standardized precipitation index. Options Méditerr Sér A Mediterr Semin 89:85–89
Byun HR, Wilhite DA (1999) Objective quantification of drought severity and duration. J Clim 12:2747–2756
Caloiero T, Coscarelli R, Ferrari E, Sirangelo B (2015) Analysis of dry spells in southern Italy (Calabria). Water (Basel) 7:3009–3023
Choubin B et al (2014) Drought forecasting in a semi-arid watershed using climate signals: a neuro-fuzzy modeling approach. J Mt Sci 11(6):1593–1605. https://doi.org/10.1007/s11629-014-3020-6
Choubin B et al (2019) Extreme Hydrology and Climate Variability. Elsevier, Amsterdam
Cindrić K, Pasarić Z, Gajić-Čapka M (2010) Spatial and temporal analysis of dry spells in Croatia. Teor Appl Climatol 102:171–184. https://doi.org/10.1007/s00704-010-0250-6
Citakoglu H, Coşkun Ö (2022) Comparison of hybrid machine learning methods for the prediction of short-term meteorological droughts of Sakarya Meteorological Station in Turkey. Environ Sci Pollut Res 29(50):75487–75511
Coşkun Ö, Citakoglu H (2023) Prediction of the standardized precipitation index based on the long short-term memory and empirical mode decomposition-extreme learning machine models: the case of Sakarya, Türkiye. Physics Chem Earth Parts A/B/C 131:103418
Dai A, Zhao T (2017) Uncertainties in historical changes and future projections of drought. Part I: estimates of historical drought changes. Clim Change 144:519–533
Dai A, Trenberth KE, Karl TR (1998) Global variations in droughts and wet spells: 1900–1995. Geophys Res Lett 25:3367–3370
Deo RC, Şahin M (2015) Application of the extreme learning machine algorithm for the prediction of monthly effective drought index in eastern Australia. Atmos Res 153:512–525
Du W, Wang G (2013) Intra‐event spatial correlations for cumulative absolute velocity, arias intensity, and spectral accelerations based on regional site conditions. B Seismol Soc Am 103(2A):1117–1129. https://doi.org/10.1785/0120120185
Elbeltagi A, Kumar M, Kushwaha NL et al (2023) Drought indicator analysis and forecasting using data driven models: case study in Jaisalmer, India. Stoch Environ Res Risk Assess 37:113–131. https://doi.org/10.1007/s00477-022-02277-0
En-Nagre K et al (2024) (2024) Assessment and prediction of meteorological drought using machine learning algorithms and climate data. Clim Risk Manag 45:100630. https://doi.org/10.1016/j.crm.2024.100630
Fung, et al (2019) Drought forecasting: a review of modelling approaches 2007–2017. J Water Clim Change 11:771–799
Granata F (2019) Evapotranspiration evaluation models based on machine learning algorithms-a comparative study. Agric Water Manag 217:303–315
Hao Z, Singh VP, Xia Y (2018) Seasonal drought prediction: advances, challenges, and future prospects. Rev Geophys 56:108–141
Hardin AW, Liu Y, Cao G, Vanos JK (2018) Urban heat island intensity and spatial variability by synoptic weather type in the northeast US. Urban Clim 24:747–762
Hudson HE, Hazen R (1964) Droughts and low streamflow. Handb Appl Hydrol 18:1–26
Huschke R. E. (1959). Glossary of meteorology (American Meteorological Society, 1959).
Jain VK, Pandey RP, Jain MK (2015) Spatio-temporal assessment of vulnerability to drought. Nat Hazards 76:443–469
Javadinejad S, Dara R, Jafary F (2021) Analysis and prioritization the effective factors on increasing farmers resilience under climate change and drought. Agric Res 10:497–513
Kamruzzaman M, Hwang S, Cho J, Jang MW, Jeong H (2019) Evaluating the spatiotemporal characteristics of agricultural drought in Bangladesh using effective drought index. Water (Switzerland) 11:2437
Katipoğlu OM (2023) Prediction of streamflow drought index for short-term hydrological drought in the semi-arid Yesilirmak basin using wavelet transform and artificial intelligence techniques. Sustainability 15(2):1109. https://doi.org/10.3390/su15021109
Khan N et al (2020) Prediction of droughts over Pakistan using machine learning algorithms. Adv Water Resour 139:103562
Kumari S, Kumar D, Kumar M (2023) Modeling of standardized groundwater index of Bihar using machine learning techniques. Phys Chem Earth Parts A/B/C 130:103395
Kim KS (1968) Water budgets of the 10 big river valleys of South Korea. J Korean Meteorol Soc 4:1–13
Le JA, El-Askary HM, Allali M, Struppa DC (2017) Application of recurrent neural networks for drought projections in California. Atmos Res 188:100–106
Li X, Meshgi A, Babovic V (2016) Spatio-temporal variation of wet and dry spell characteristics of tropical precipitation in Singapore and its association with ENSO. Int J Climatol 36:4831–4846
Li R, Zhu G, Lu S et al (2023) Effects of urbanization on the water cycle in the Shiyang River basin: based on a stable isotope method. Hydrol Earth Syst Sci 27(24):4437–4452. https://doi.org/10.5194/hess-27-4437-2023
Linsley, R. K., Kohler, M. A. & Paulhus, J. L. H. (1958). Hydrology for Engineers (McGraw-Hill Book Co., 1958).
Malik A, Kumar A, Rai P, Kuriqi A (2021) Prediction of multi-scalar standardized precipitation index by using artificial intelligence and regression models. Climate 9:28. https://doi.org/10.3390/cli9020028
Masroor M, Rehman S, Avtar R, Sahana M, Ahmed R, Sajjad H (2020) Exploring climate variability and its impact on drought occurrence: evidence from Godavari middle sub-basin. India Weather Clim Extrem 30:100277
Masinde M (2014) (2014) Artificial neural networks models for predicting effective drought index: factoring effects of rainfall variability. Mitig Adapt Strategy Glob Chang 19:1139–1162
McKee, T. B., Doesken, N. J. & Kleist, J. (1993). The relationship of drought frequency and duration to time scales. In: Proceedings of the 8th Conference on Applied Climatology, 17, 179–183 (Boston, 1993).
Mehran A, Mazdiyasni O, AghaKouchak AA (2015) Hybrid framework for assessing socioeconomic drought: linking climate variability, local resilience, and demand. J Geophys Res Atmos 120:7520–7533
Mishra AK, Desai VR (2006) Drought forecasting using feed-forward recursive neural network. Ecol Model 198:127–138
Mishra AK, Singh VP (2010) A review of drought concepts. J Hydrol (Amst) 391:202–216
Mondal MH (2010) Crop agriculture of Bangladesh: challenges and opportunities. Bangladesh J Agric Res 35:235–245
Moradkhani H, Meier M (2010) Long-lead water supply forecast using large-scale climate predictors and independent component analysis. J Hydrol Eng. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000246
Moreira EE, Coelho CA, Paulo AA, Pereira LS, Mexia JT (2008) SPI-based drought category prediction using loglinear models. J Hydrol (Amst) 354:116–130
Mouatadid S, Raj N, Deo RC, Adamowski JF (2018) Input selection and data-driven model performance optimization to predict the Standardized precipitation and Evaporation index in a drought-prone region. Atmos Res 212:130–149
Mokhtar A et al (2021) Estimation of SPEI meteorological drought using machine learning algorithms. IEEE Access 9:65503–65523
Mullick MRA, Nur MRM, Alam MJ, Islam KMA (2019) Observed trends in temperature and rainfall in Bangladesh using pre-whitening approach. Glob Planet Change 172:104–113
Nakano M, Kanada S, Kato T, Kurihara K (2011) Monthly maximum number of consecutive dry days in Japan and its reproducibility by a 5-km-mesh cloud-system resolving regional climate model. Hydrol Res Lett 5:11–15
Naser MM (2015) Climate change and migration: law and policy perspectives in Bangladesh. Asian J Law Soc 2:35–53
Nastos PT, Zerefos CS (2009) Spatial and temporal variability of consecutive dry and wet days in Greece. Atmos Res 94:616–628
Nimac I, Herceg-Bulić I, Žuvela-Aloise M, Žgela M (2022) Impact of North Atlantic Oscillation and drought conditions on summer urban heat load-a case study for Zagreb. Int J Climatol 42:4850–4867
Nikdad P, Mohammadi Ghaleni M, Moghaddasi M et al (2024) Enhancing a machine learning model for predicting agricultural drought through feature selection techniques. Appl Water Sci 14:125. https://doi.org/10.1007/s13201-024-02193-4
Pande CB, Moharir KN, Varade AM et al (2023) Intertwined impacts of urbanization and land cover change on urban climate and agriculture in Aurangabad city (MS), India using google earth engine platform. J Clean Prod. 422:138541. https://doi.org/10.1016/j.jclepro.2023.138541
Pande CB (2020) Sustainable watershed development planning. In: Sustainable watershed development. SpringerBriefs in water science and technology. Springer, Cham. https://doi.org/10.1007/978-3-030-47244-3_4
Pande CB, Moharir KB, Singh SK et al (2021) Estimation of crop and forest biomass resources in a semi-arid region using satellite data and GIS. J Saudi Soc Agric Sci 20(5):302–311. https://doi.org/10.1016/j.jssas.2021.03.002
Oliver JE (2008) Encyclopedia of world climatology. Springer, Berlin
Meseguer-Ruiz O et al (2024) (2023) comparing SPI and SPEI to detect different precipitation and temperature regimes in Chile throughout the last four decades. Atmos Res 297:107085. https://doi.org/10.1016/j.atmosres.2023.107085
Coşkun Ö, Citakoglu H (2023) Prediction of the standardized precipitation index based on the long short-term memory and empirical mode decomposition-extreme learning machine models: the case of Sakarya Türkiye. Phys Chem Earth Parts A/B/C. https://doi.org/10.1016/j.pce.2023.103418
Osmani SA, Kim JS, Jun C, Sumon MW, Baik J, Lee J (2022) Prediction of monthly dry days with machine learning algorithms: a case study in Northern Bangladesh. Sci Rep 12(1):19717
Poornima and Pushpalatha (2019) Drought prediction based on SPI and SPEI with varying timescales using LSTM recurrent neural network. Soft Comput 23:8399–8412. https://doi.org/10.1007/s00500-019-04120-1
Khan N, Sachindra DA, Shahid S, Ahmed K, Shiru MS, Nawaz N (2020) Prediction of droughts over Pakistan using machine learning algorithms Adv. Water Resour 139(2020):103562. https://doi.org/10.1016/j.advwatres.2020.103562
Raja DR, Hredoy MSN, Islam MdK, Islam KMA, Adnan MSG (2021) Spatial distribution of heatwave vulnerability in a coastal city of Bangladesh. Environ Chall 4:100122
Rasmussen CE (2003) Gaussian processes in machine learning in summer school on machine learning. Berlin, Springer
Sadrtdinova R, Augusto Corzo Perez G, Solomatine DP (2024) Improved drought forecasting in Kazakhstan using machine and deep learning: a non-contiguous drought analysis approach. Hydrol Res 55(2):237–261
Ridwan WM et al (2021) Rainfall forecasting model using machine learning methods: case study Terengganu. Malaysia Ain Shams Eng J 12:1651–1663
Saini M, Dutta V, Joshi PK (2021) Reassessment of drought management policies for India: learning from Israel, Australia, and China. Environ Sustain 4(4):671–689
Mohammed S et al (2024) Utilizing machine learning and CMIP6 projections for short-term agricultural drought monitoring in central Europe (1900–2100). J Hydrol 633:130968
Schmidli J, Frei C (2005) Trends of heavy precipitation and wet and dry spells in Switzerland during the 20th century. Int J Climatol 25:753–771
Shah SMA, Hasan GMJ (2014) Statistical analysis and trends of dry days in Sylhet region of Bangladesh. J Urban Environ Eng 8:48–58
Shahbazi ARN, Zahraie B, Sedghi H, Manshouri M, Nasseri M (2011) Seasonal meteorological drought prediction using support vector machine. World Appl Sci J 13:1387–1397
Sharma A (2023) Drought risk management in Madhya Pradesh, India: a policy perspective. Int J Emerg Manag 18(1):23–46
Shi J et al (2018) Trends in the consecutive days of temperature and precipitation extremes in China during 1961–2015. Environ Res 161:381–391
Shaowei Z, Hongchao Z, Pengcheng R, Guangjie X, Bangdong L, Wencheng D, Liying W (2013) Application of standardized precipitation evapotranspiration index in China Clim. Environ Res 18(2013):617–625
Shivam G, Goyal MK, Sarma AK (2019) Index-based study of future precipitation changes over Subansiri river catchment under changing climate. J Environ Inf 34:1–14
Singh N, Ranade A (2010) The wet and dry spells across India during 1951–2007. J Hydrometeorol 11:26–45
Singh NP, Bantilan C, Byjesh K (2014) Vulnerability and policy relevance to drought in the semi-arid tropics of Asia–a retrospective analysis. Weather Clim Extremes 3:54–61
Sirangelo B, Caloiero T, Coscarelli R, Ferrari E (2017) Stochastic analysis of long dry spells in Calabria (Southern Italy). Teor Appl Climatol 127:711–724
Solomatine D, Dulal KN (2003) Model trees as an alternative to neural networks in rainfall–runoff modelling. Hydrol Sci J 48:399–411
Solomon S, Manning M, Marquis M, Qin D (2007) Climate change 2007-the physical science basis: working group I contribution to the fourth assessment report of the IPCC. Cambridge University Press, Cambridge
Stott PA et al (2016) Attribution of extreme weather and climate-related events. Wiley Interdiscip Rev Clim Change 7:23–41
Subbiah AR (1993) Indian drought management: from vulnerability to resilience. Drought Assess Manag Plan Theory Case Stud. https://doi.org/10.1007/978-1-4615-3224-8_9
Tolika K, Maheras P (2005) Spatial and temporal characteristics of wet spells in Greece. Teor Appl Climatol 81:71–85
Touma D, Ashfaq M, Nayak MA, Kao SC, Difenbaugh NS (2015) A multi-model and multi-index evaluation of drought characteristics in the 21st century. J Hydrol (Amst) 526:196–207
Tian H, Huang N, Niu Z, Qin Y, Pei J, Wang J (2019) Mapping winter crops in China with Multi-source satellite imagery and phenology-based algorithm. Remote Sens 11(7):820. https://doi.org/10.3390/rs11070820
Tian H, Pei J, Huang J, Li X, Wang J, Zhou B, Wang L (2020) Garlic and winter wheat identification based on active and passive satellite imagery and the google earth engine in Northern China. Remote Sens 12(3539):3539. https://doi.org/10.3390/rs12213539
Luo J, Wang G, Li G, Pesce G (2022) Transport infrastructure connectivity and conflict resolution: a machine learning analysis. Neural Comput Appl 34(9):6585–6601. https://doi.org/10.1007/s00521-021-06015-5
Udmale P, Ichikawa Y, Manandhar S, Ishidaira H, Kiem AS (2014) Farmers׳ perception of drought impacts, local adaptation and administrative mitigation measures in Maharashtra State, India. Int J Dis Risk Reduct 10:250–269
Vicente-Serrano SM, Beguería S, López-Moreno JI (2010) A multiscalar drought index sensitive to global warming: The standardized precipitation evapotranspiration index. J Clim 23:1696–1718
Wilhite DA, Glantz MH (1985) Understanding: Te drought phenomenon: the role of definitions. Water Int 10:111–120
Wilhite, D. A. (2000). Drought as a natural hazard: concepts and definitions.
Xie, D, Huang H, Feng L et al (2023) Aboveground biomass prediction of arid shrub-dominated community based on airborne LiDAR through parametric and Nmnparametric methods. Remote Sens. 15(13):3344. https://doi.org/10.3390/rs15133344
Xiang X, Zhou J, Deng Y et al (2024) Identifying the generator matrix of a stationary Markov chain using partially observable data. Chaos: J Nonlinear Sci 34(2)–023132. https://doi.org/10.1063/5.0156458
Xie X, Xie B, Cheng J et al (2021) A simple Monte Carlo method for estimating the chance of a cyclone impact. Nat Hazards 107(3):2573–2582. https://doi.org/10.1007/s11069-021-04505-2
Xu J, Zhou, G, Su S et al (2022) The development of a rigorous model for bathymetric mapping from multispectral satellite-images. Rem Sens 14(10). https://doi.org/10.3390/rs14102495
Xu L, Wang A, Wang D, Wang H (2019) Hot spots of climate extremes in the future. J Geophys Res Atmos 124:3035–3049
Wu X, Feng X, Wang Z, Chen Y, Deng Z (2023) Multi-source precipitation products assessment on drought monitoring across global major river basins. Atmos Res 295:106982. https://doi.org/10.1016/j.atmosres.2023.106982
Yaseen ZM, Ali M, Sharafati A, Al-Ansari N, Shahid S (2021) Forecasting standardized precipitation index using data intelligence models: regional investigation of Bangladesh. Sci Rep 11(1):3435
Yi J, Li H, Zhao Y et al (2022) Assessing soil water balance to optimize irrigation schedules of flood-irrigated maize fields with different cultivation histories in the arid region. Agricul Water Manag 265:107543. https://doi.org/10.1016/j.agwat.2022.107543
Yin L, Wang L, Keim BD et al (2023) Spatial and wavelet analysis of precipitation and river discharge during operation of the Three Gorges Dam, China. Ecol Indic 154: 110837. https://doi.org/10.1016/j.ecolind.2023.110837
Zhai F (2009) Spatial and temporal pattern of precipitation and drought in Gansu Province, Northwest China. Nat Hazards 49(2009):1–24. https://doi.org/10.1007/s11069-008-9274-y
Zarei A et al (2017) Comparison of meteorological indices for spatio-temporal analysis of drought in chahrmahal-bakhtiyari province in Iran. Hrvat Meteorol Cas 52:13–26
Zargar A, Sadiq R, Naser B, Khan FI (2011) A review of drought indices. Environ Rev 19:333–349
Zhang K, Li Y, Yu Z (2022) Xin'anjiang nested experimental watershed (XAJ-NEW) for understanding multiscale water cycle: scientific objectives and experimental design. Engineering 18(11):207–217. https://doi.org/10.1016/j.eng.2021.08.026
Zhao Y, Wang H, Song B et al (2023) Characterizing uncertainty in process-based hydraulic modeling, exemplified in a semiarid Inner Mongolia steppe. Geoderma 440:116713. https://doi.org/10.1016/j.geoderma.2023.116713
Zhao Y, Li J, Wang Y et al (2024) Warming climate-induced changes in cloud vertical distribution possibly exacerbate intra-atmospheric heating over the tibetan plateau. Geophys Res Lett 51(3):e2023GL107713. https://doi.org/10.1029/2023GL107713
Zhao Y, Li J, Zhang L et al (2023) Diurnal cycles of cloud cover and its vertical distribution over the Tibetan Plateau revealed by satellite observations, reanalysis datasets, and CMIP6 outputs. Atmos Chem Phys 23(1):743–769. https://doi.org/10.5194/acp-23-743-2023
Zhao Y, Lu M, Chen D et al (2024) Understanding the weakening patterns of inner Tibetan Plateau vortices. Environ Res Lett 19(6):064076. https://doi.org/10.1088/1748-9326/ad5193
Zhou G, Zhang H, Xu C et al (2023) A real-time data acquisition system for single-band bathymetric LiDAR. IEEE Trans Geosci Remote Sens 61. https://doi.org/10.1109/TGRS.2023.3282624
Zhu C (2023) An adaptive agent decision model based on deep reinforcement learning and autonomous learning. Int J Logist 10(3):107–118. https://doi.org/10.33168/JLISS.2023.0309
Acknowledgements
The authors are thankful to the Deanship of Graduate Studies and Scientific Research at Najran University for funding this work under the Growth Funding Program grant code (NU/GP/SERC/13/196-1). This work was supported by Tenaga Nasional Berhad (TNB) and UNITEN through the BOLD. Refresh Postdoctoral Fellowships under the project code of J510050002-IC-6665 BOLDREFRESH2025-Centre of Excellence. The authors are thankful to the Deanship of Graduate Studies and Scientific Research at Najran University for funding this work under the Growth Funding Program grant code (NU/GP/SERC/13/196-1).
The authors would like to express sincere gratitude to AlMaarefa University, Riyadh, Saudi Arabia, for supporting this research. The authors would like to express their gratitude to the Higher Institution Centre of Excellence (HICoE), Ministry of Higher Education (MOHE), Malaysia, under the project code 2024001HICOE as referenced in JPT(BPKI)1000/016/018/34(5). Thanks to NASA Prediction of Worldwide Energy Resources (POWER) | Data Access Viewer Enhanced (DAVe) provided the datasets of rainfall for SPI analysis.
Funding
None.
Author information
Authors and Affiliations
Contributions
Chaitanya. B. Pande: conceptualization, idea of topic, investigation, writing—original draft, review, & editing, supervision, formal analysis, developed ML algorithms, software, results and analysis, field data collection, visualization of figures and methodology chart, Neyara Radwan: writing—original draft, review, & editing, formal analysis, Lariyah Mohd Sidek: writing—original draft, supervision, formal analysis, Abhay M. Varade: writing—review, original draft, & editing, formal analysis, Ismail Elkhrachy: writing—review, original, & editing, formal analysis, Ahmed Elbeltagi: writing—review & editing, developed ml algorithms, formal analysis, Abebe Debele Tolche: writing—original draft, review, & editing, formal analysis.
Corresponding author
Ethics declarations
Ethics approval and to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Pande, C.B., Sidek, L.M., Varade, A.M. et al. Forecasting of meteorological drought using ensemble and machine learning models. Environ Sci Eur 36, 160 (2024). https://doi.org/10.1186/s12302-024-00975-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12302-024-00975-w