Skip to main content

Water level prediction using long short-term memory neural network model for a lowland river: a case study on the Tisza River, Central Europe



Precisely predicting the water levels of rivers is critical for planning and supporting flood hazard and risk assessments and maintaining navigation, irrigation, and water withdrawal for urban areas and industry. In Hungary, the water level of rivers has been recorded since the early nineteenth century, and various water level prediction methods were developed. The Discrete Linear Cascade Model (DLCM) has been used since 1980s. However, its performance is not always reliable under the current climate-driven hydrological changes. Therefore, we aimed to test machine learning algorithms to make 7-day ahead forecasts, choose the best-performing model, and compare it with the actual DLCM.


According to the results, the Long Short-Term Memory (LSTM) model provided the best results in all time horizons, giving more precise predictions than the Baseline model, the Linear or Multilayer Perceptron Model. Despite underestimating water levels, the validation of the LSTM model revealed that 68.5‒76.1% of predictions fall within the required precision intervals. Predictions were relatively accurate for low (≤ 239 cm) and flood stages (≥ 650 cm), but became less reliable for medium stages (240–649 cm).


The LSTM model provided better results in all hydrological situations than the DLCM. Though, LSTM is not a novel concept, its encoder–decoder architecture is the best option for solving multi-horizon forecasting problems (or “Many-to-Many” problems), and it can be trained effectively on vast volumes of data. Thus, we recommend testing the LSTM model in similar hydrological conditions (e.g., lowland, medium-sized river with low slope and mobile channel) to get reliable water level forecasts under the rapidly changing climate and various human impacts.

Graphical Abstract


  • The actual use of the DLCM predicts water stages less precisely than the LSTM model.

  • The LSTM underestimates water stages on each predicted horizon (1–7 days).

  • The LSTM model reacts too late or incorrectly in certain hydrological situations (rising limb of floods, medium stages).

  • The most precise forecast was given for low stages, followed by floods.

  • The lowest accuracy was provided for medium stages.


The precise prediction of river stages is fundamental to supporting flood protection, navigation, and water withdrawal. The first stage predictions were based on regression curves between water levels at neighbouring gauging stations [15, 40], their accuracy was subsequently improved using stage data from tributaries [35]. Their reliability is influenced by impoundment, slope changes during a flood [22, 23], changes in channel morphology and roughness [11, 14],or the changing nature of hydrological events [3, 5].

The Discrete Linear Cascade Model (DLCM) is widely used to predict water stages [7, 30, 34, 36]. It is based on mass and energy conservation principles, with interconnected compartments representing an element of the hydrological system and incorporating the concept of time lags [16, 29, 31, 37]. Statistical methods (i.e., regression or maximum likelihood estimation [34, 38], are used to estimate its parameters. The DLCM is simple with low computational efficiency [30, 34], it does, however, assume linear correlations between the hydrological components [7, 31], though hydrological systems are often nonlinear and exhibit time-varying behaviour, especially under extreme events [13, 16]. Besides, the DLCM cannot handle downstream disturbances, such as hydraulic structures, tributaries, or impoundments [34].

Therefore, numerically based water level forecast models were developed [6, 39], applying predefined upper and lower boundary conditions. Their accuracy is highly dependent on the available elevation and roughness data. Most machine learning (ML) algorithms perform a transformation on the input data, extracting the underlying information and then predict the water stages using even a simple regression model. Moradkhani et al. [26] achieved excellent one-step-ahead prediction by applying a neural network approach with a nonlinear relationship between the input and output data and a proper cross-validation framework.

Forecasts based on data-driven algorithms open up new perspectives in hydrology. Though at-a-site measurements produce a massive amount of data, only a small portion is processed for forecasting. To predict water levels, data-driven methods use orders of magnitude more data. As the relationships between the data change (due to climate or human-driven morphological or hydrological alteration), these methods refine the network of relationships; thus, the forecast is based on the most accurate relationships. Mosavi et al. [27] highlighted that ML models could outperform physics-based or statistical models. They are cost-effective and can help find the optimal parametrization of a data-driven model to support forecasts [9, 33].

The artificial neural network (ANN) could predict water stages using one hidden layer. The trained model of Chen et al. [10] was compared with 2D and 3D hydrodynamic models, proving that the data-driven models provided reliable results. Wei et al. [42] used the ANN with other simple (“lazy”) and complex (“eager”) models to predict hourly hydrological data. It was determined that the more complex models produced more precise results, and the ANN approach provided accurate results at the expense of lower computational efficiency during the parameter optimization.

An ensemble of multilayer perceptron (MLP) models was used for a one-step-ahead forecast [25]. First, a singular spectrum analysis of the multi-dimensional input (multiple time series) was performed. Then, trained models were selected based on accuracy and diversity, and average predictions were calculated. However, the proposed methodology needs to be tested for multi-step ahead forecasts.

The combination of the autoregressive integrated moving average (ARIMA) model and ML models, such as the random forest (RF), support vector regression (SVR), recurrent neural network (RNN), and long short-term memory model (LSTM), led to reliable hourly water stage prediction [32]. The main idea behind the hybrid model was that the ARIMA model could handle the linear components of the data, whereas the ML models used nonlinear relations.

Several authors compared the accuracy of water level forecasts produced by various models. Kim et al. [20] compared ML and classical approaches, concluding that the LSTM-RNN model gave the best flood forecast, outperforming the support vector machine (SVM) and gradient boosting (GB) models. Adikari et al. [1] used convolutional neural network (CNN), LSTM, and wavelet decomposition (WD) functions combined with the adaptive neuro-fuzzy interface system (ANFIS). Their findings indicate that the CNN model, which deals with complex data dependencies, could be used for short-term hydrological predictions. Ahmed et al. [2] developed 7-day, 14-day, and 28-day forecasts. The input data were fed to a CNN encoder and a bi-directional LSTM decoder. They concluded that the recursive encoder–decoder model provided accurate short-term prediction, but failed at longer time horizons due to the cumulative error in the recursive process [12].

Hungarian hydrologists have recently been confronted by end-users with the issue of the inaccuracy of the Discrete Linear Cascade Model (DLCM), which has been used to predict stages of Hungarian rivers since the 1980s [4]. The predictions on the Tisza River are especially incorrect due to deteriorating slope conditions, periodical channel changes, hydrological extremities, and frequent impoundments [14, 23, 24, 41]. The combination of rising flood levels and long-duration floods exposes the population to greater flood risk [24],hence, predicting water levels, particularly peak flood levels, is crucial for flood hazard management and warning systems.

Therefore, the primary aim of the research was to assess the applicability of the LSTM model under various hydrological situations and prediction horizons to estimate water levels. To understand the limitations of the LSTM model, its performance was compared to a naive (Baseline) model and simpler ML models (Linear and MLP). Our specific goals are as follows: (1) test various ML algorithms to make a 7-day-ahead water level forecast; (2) analyse and compare their results; (3) assess the performance of the best algorithm in predicting recent floods; and (4) compare the results of the best algorithm with the predictions of the DLCM used by authorities.

Although LSTM is not a novel concept, its encoder–decoder architecture is the most effective at solving multi-horizon forecasting (“Many-to-Many”) challenges. Still, the LSTM model is cutting-edge in time-series forecasting: newer transformer models do not outperform cell-based RNNs in performance, because they can be trained on large amounts of data and lack memory (which is required to incorporate patterns from the past). The novelty of the research is that in specific fluvial environments (i.e., very low slope conditions, frequent impoundments, hydrological extremities), the performance of the model has not been tested yet, and the Tisza River, with its new hydrological challenges, provides a great opportunity for that. The proposed LSTM-LSTM model is not an existing model. The approach of building the LSTM model is well-known since encoder–decoder architectures are the current best practices in all fields using deep learning, and the model's final structure and hyperparameters allow for an understanding of the patterns in the time series data for such a lowland river, like the Tisza River.

Study area

The Tisza River is a Danube tributary (length: 962 km), that drains an area of approximately 157.200 km2 in Central Europe (Fig. 1). The Tisza River has a lowland character, as its mean water slope is just 2 cm/km; thus, the flow velocity is 0.6‒1.3 m/s. The slope has a declining trend, as the mean slope was 2.1 cm/km in 1900–1910 and 1.4 cm/km in 2000–2010 [23]. The discharge of the Tisza varies between 58 and 4346 m3/s at Szeged (mean 825 m3/s). Based on daily water level measurements since 1 January 1900, the absolute water level change at the Szeged gauging station is 12.59 m (varying between − 250 and + 1009 cm). The bankfull level at Szeged is 550 cm; however, the flood warning levels are 650, 750, and 850 cm. Floods have been recorded 80 years since 1900, with a mean duration of 45 days, but the longest flood lasted for 173 days [23].

Fig. 1
figure 1

The study area is the Tisza River in Central Europe. The input data of the model created for Szeged originate from 11 upstream and downstream gauging stations

Flood levels have increased by over 4 m since 1900, due to catchment-scale run-off changes [18] and decreasing flood conductivity [21, 24]. Thus, in 120 years, six new record flood levels were set at Szeged in 1907, 1913, 1919, 1932, 1970, and 2006 [23]. However, climate change has also an impact on hydrology. High and long lasting floods occurred at the beginning of the 2first century (1998‒2006), but since then, only two small overbank floods have been recorded (2010 and 2013), and only below-bankfull flood waves have developed.

Impoundment also has an impact on flood levels. The Danube and the largest tributary of the Tisza, the Maros, can block the floods on the Tisza. This impoundment increases the flood levels by 30‒40 cm, and it lasts until the water level of the Danube or Maros drops [41]. The impoundment influences the Tisza along its 300‒350 km long section (from its confluence with the Danube to Szolnok). The floods in 2006 and 2013 were impounded ones [24], therefore, though the 2006 flood was the highest on record (H: 1009 cm, it had a lower peak discharge (Q: 3780 m3/s) than the second largest flood in 1970 (H: 959 cm; Q: 3820 m3/s).

Another essential feature of the Tisza that influences the stages is its intense cross-sectional area change during floods. The channel is sand-bedded and deeply incised. Therefore, during the rising limb of the floods the channel incises up to 1.5‒3 m, giving it a maximum depth of 18.5‒19.1 m. However, when the flow velocity drops during the flood’s peak (due to impoundment), large amount of sand is deposited on the bottom, reducing the maximum depth to 17.6 m and the cross-sectional area by approximately 10%. As the impoundment terminates, the flow velocity increases, and the channel slightly incises again thus during the falling limb. These cross-sectional changes influence the discharge-stage curves: during the rising stages, the incision results in a lower stage of a given discharge than during flood peak or falling stages, when the in-channel aggradation results in water level increase of the same discharge [23].



The models were created for the Szeged gauging station (173 river km). The present study employed daily water levels measured at 12 gauging stations along the Tisza and its two tributaries (Körös and Maros) near the Szeged station (Fig. 1), see list of the gauging station in Additional file 1: S1

. The water level is measured using fluviometers, with the “0” point set arbitrarily. The daily measured water stage data between 1 January 1951 and 31 December 2020 were used in the modelling. It is a common approach to divide the data into two parts to track the performance of a model on unseen data [8]. The general approach for data without a temporal component is to keep 80% of the data randomly selected for finding the best parametrization of the model (training dataset) and the remaining 20% for testing the predicting ability (validation dataset). However, randomly slicing time-dependent data would cause issues. First, it separates time essential properties such as trend and seasonality. Second, it may cause look-ahead bias, which is associated with using data from the future. Thus, the correct method for splitting time-series data is to choose a date, for which the previous values are used in the parameter search (training), and the dates after that day serve as validation. We selected 21 April 2004, as the splitting point because we wanted to include the flood in 2006 in the validation (Fig. 2). Thus, 76% of the data (1 January 1951–21 April 2004) was used for parametrization and 23% for validation (1 January 2005–31 December 2020), and a gap was introduced between the training and validation sets, as suggested by Cerqueira et al. [8]. In this way, floods appeared both in the parametrization period (in 1970, 2000 and 2001) and the validation period (in 2006 and 2013). Finally, we normalized each gauging station’s dataset separately, and normalization parameters were calculated from the training set and then applied to both the training and validation sets. Since the validation set is assumed to be unknown to the model, we cannot consider normalization parameters from the validation set.

Fig. 2
figure 2

Water stages measured at Szeged between 1951 and 2020. The data were split to training and validation datasets with a gap in between


The modelling’s main challenge was to map the input multivariate sequence (time-series of multiple features) to the target univariate sequence (time-series of a single feature). Thus, a suitable model should have considered the temporal nature of the data, handled multivariate time series and have been able to forecast for multiple horizons ahead. Classical ML models, such as feedforward neural networks cannot preserve the sequence’s temporal structure. Statistical models, such as Autoregressive Integrated Moving Average (ARIMA), typically provide a robust solution for univariate problems but cannot handle multivariate time series. Recurrent Neural Networks (RNNs) meanwhile, are well-suited for our problem because they have a built-in memory mechanism that allows them to maintain context and retain information about previous elements in the sequence. Moreover, processing multivariate (variable-length) input data is straightforward and arbitrarily long predictions are produced iteratively in RNNs. Long Short-Term Memory (LSTM) is a cell-based RNN model [17], described in more detail in Additional file 1: S1. We built a multilayer LSTM model by stacking multiple LSTM cells. Thus, the layers are LSTM cells, receiving the output from the previous layer as its input and generating its own output. This allows the model to learn more complicated data representations. More specifically, we implemented an LSTM encoder–decoder architecture [28], where a stacked LSTM model processed the input data (encoder) and another stacked LSTM model generated the predictions (decoder), as shown in Fig. 3. We will refer to the LSTM-based encoder–decoder model later as the LSTM model.

Fig. 3
figure 3

Architecture of the LSTM encoder–decoder model. \(T\) is the number of past data, \(P\) is the length of the prediction (forecast horizon). The \({y}_{t-1}\) is the real (known) target data at the t‒1 time point, and \({\widehat{y}}_{t,i}\) is the prediction given at the time t‒1 for the date t + i

The encoder receives the historical data as input vectors (\({x}_{t-T},{x}_{t-T+1},\dots {x}_{t-1}\)), where \(T\) is the number of data points used from the past for one prediction. The encoder is responsible for learning complex patterns in the input and providing the decoder with valuable, condensed information through the hidden states of the LSTM cells. The decoder is expected to decode the encoded information and provide predictions (\({\widehat{y}}_{t,0}, {\widehat{y}}_{t,1},\dots { \widehat{y}}_{t,P-1}\)), where \(P\) is the length of the prediction (\(P\)-step ahead forecast). The decoder’s input in the first time step is the known target value from the previous time step (\({y}_{t-1}\)). The decoder’s subsequent inputs are the predictions from the previous time step (\({\widehat{y}}_{t,i}\)).

In addition to the LSTM model, we developed simpler models to compare their performance. The Baseline model was developed to assess the performance of more advanced ML models by providing a constant extrapolated forecast based on the most recent water level observation at the Szeged gauging station. The Linear model is based on the long-term data of 12 gauging stations, taking all feature values from the past time window (15 days), flattening this data, and then applying a transformation (matrix multiplication) to obtain the 7-day ahead forecast. The Multilayer Perceptron (MLP) model was also developed as a third model, which transforms data using a neural network with two hidden layers (256 and 128 units) and ReLU activation functions. More details on the model and training are described in Additional file 1: S2.

Statistical analysis

The global performance of different models was evaluated based on four evaluation metrics: the mean absolute error (MAE in cm), root mean square error (RMSE in cm), R2 correlation, and Willmott’s Index (WI). Smaller MAE and RMSE values, as well as R2 and WI values closer to 1.0, indicate to a better fit. To compare each prediction with the measured data at the Szeged gauging station, quantile–quantile (QQ) plots were applied. The suitability of the model was evaluated for low (≤ 239 cm), medium (240‒649 cm) and high (≥ 650 cm) water levels (650 cm is the lowest threshold of the warning system). The required precisions for the 7-day ahead forecast on its first day is ± 5 cm, on its third day is ± 15 cm, and on the fifth and seventh days are ± 25 and ± 35 cm, respectively. These precision intervals were specified by hydrologist and field experts.

The project’s ultimate goal was to develop models for precise flood forecasting. Thus, measured and forecasted water levels for the 2006 and 2013 floods were compared. The forecast on the first, third, fifth, and seventh days of the 7-day ahead forecast was compared to the measured water level. The confidence intervals of the π-day before predictions for each day \(t\) [denoted by \({\sigma }_{\pi }\left(t\right)\)] were calculated according to the following formula:

$$\sigma _{\pi } \left( t \right) = \frac{1}{{15}}\sum\limits_{{i = 1}}^{{15}} {\hat{y}_{\pi } \left( {t - i} \right) - y\left( {t - i} \right)}$$

The absolute differences between the \(\pi\)-day before forecast value for day \(t\) [denoted by \({\widehat{y}}_{\pi }\left(t\right)\)] and the measured data for the day \(t\) [i.e., \(y\left(t\right)\)] were averaged over the past 15 days. Thus, the confidence interval is the average performance of the model for the last 15 days.

Finally, the performance of the best model was compared to the results of the official prediction (DLCM) made by the Hungarian Directorate of Water Management (OVF). The DLCM predictions were available for 2014‒2019. The DLCM provides a 6-day forecast with a 6-h frequency [4]. As our models provide one data per day, the predictions for 6:00 am were selected from the DLCM and compared with the results of our best-performing model.


Comparison of different models

The performance of the applied models was evaluated for the 2006‒2020 period (Fig. 4 and Table 1). As the forecast moves further ahead, the Baseline model’s performance deteriorates. For example, the MAE is 9.7 cm on the first day, while it is 51.1 cm on the seventh day. The Linear model is based on long-term data from several gauging stations. As a result, compared to the Baseline model, there is a significant improvement based on all four metrics, particularly for the longer-term forecasts. However, there was only a slight improvement on the first day of the 7-day ahead forecast (MAE: 7.7 cm).

Fig. 4
figure 4

Performance of the 7-day ahead forecasts using the LSTM, MLP, Linear and Baseline models on the test set over different forecast horizons (1‒7 days) using different evaluation metrics: A MAE, B RMSE, C R2 correlation and D WI

Table 1 Performance of the 7-day ahead forecasts using the LSTM, MLP, Linear and Baseline models on the test set over different forecasting horizons (1‒7 days) using different evaluation metrics: MAE, RMSE, R2 correlation, and WI

The MLP model resulted in further refinement on the third‒seventh days of the 7-day ahead forecast (e.g., MAE is between 14.9 and 38.4 cm, respectively). However, it resulted in a less precise forecast on the first and second days (e.g., the MAE was higher than in the case of the Linear model, as it was 9.9 and 12.3 cm, respectively).

When compared to other models, the LSTM model’s excellent performance on the first and second days of the 7-day ahead forecasts (e.g., MAE: 4.2 and 7.6 cm, respectively). Besides, the MAE was not only the lowest on the first day but also throughout the forecasted horizons, remaining 34.7 cm on the seventh day.

Aside from the easily interpretable MAE, which describes the average magnitude of the errors, the other values showed similar trends. RMSE, which gives a higher weight to larger errors and thus emphasizes outliers, showed higher values but similar trends as MAE. The R2, which measures how much of the total variance of the data is explained by the model, and the Willmott’s Index (WI), which represents the ratio of the mean square error and the potential error, were almost identical until the fifth day of the 7-day forecasts and only slightly differed on the sixth and seventh days. Compared to the Baseline model, the LSTM model improved the R2 and WI values similarly: from 0.992 to 0.999 on the first day and from 0.79 to 0.89 on the seventh day of the forecast.

Global performance of the LSTM model

The validation dataset contained 5471 days (between 01 January 2005 and 24 December 2019). Considering all days, 68.4‒76.2% of the predicted data fall within the required precision intervals on given days of the 7-day ahead forecast (Table 2). The best performance was found on the third day of the forecast. The model generally tends to underestimate the water stages on each predicted horizon, as 14.1‒20.1% of the data were underestimated; however, only 9.7‒13.0% of the data were overestimated by as much as 334 cm.

Table 2 Precision of the prediction made by the LSTM model for the validation period (01 January 2005–24 December 2019)

The global performance of the LSTM model was further investigated based on general quantile–quantile (QQ) plots (Fig. 5) and violin plots (Fig. 6A). There was no significant deviation in the slopes of linear fits (Fig. 5: red line) on the first and third days of the 7-days ahead forecasts, and the R2 values were close to 1.0. On the contrary, the predicted data were underestimated in some cases at the fifth and seventh day ahead horizons, where the slopes of fitted lines were 0.97 and 0.92, respectively (Fig. 5).

Fig. 5
figure 5

General quantile–quantile (QQ) plots of the 7-day ahead forecasts obtained using the LSTM model on the test set over different forecasting horizons (1‒7 days). The gray band indicates the required precision of a given day’s forecast, and the red linear fit indicates the average deviation of the predicted data from the observed data

Fig. 6
figure 6

Probability density of the results of the 7-day ahead forecasts obtained using the LSTM model over different forecasting horizons (1‒7 days) for the entire test set (A), for low stages (B), medium stages (C), and floods (D). Green stripes indicate the required precision of a given day’s forecast

The performance of the LSTM model was also evaluated in term of view of various hydrological situations. The Tisza at Szeged is dominated by low stages (≤ 239 cm), accounting for 69% of all data. The results show that 76.7–83.2% of the predicted low-level data fall within the required precision intervals (Table 3, Fig. 6B). The best results were achieved on the third and fifth days of the 7-day ahead forecasts. In the case of low stages, the underestimation of the water levels is almost three times more common (12.6‒17.4%) than overestimation (4.0‒7.9%). Although the median absolute difference between predicted and measured stages is only 2.5‒12.7 cm, the maximum difference could be as high as 43‒304 cm.

Table 3 Precision of the prediction of low stages (< 240 cm) obtained using LSTM model for the validation period (1 January 2005–24 December 2019)

Medium stages (240‒649 cm) were less common than low stages, accounting for only 27% of all data during the validation period. The prediction of these stages was the least precise, as only 44.7‒57.6% of the data fell within the acceptable intervals (Table 4, Fig. 6C). The prediction of medium stages was the best on the first and third days of the forecast. Furtermore, the underestimation (17.8‒26.6%) was slightly less common than overestimation (24.1‒28.6%). The median errors of the estimated stages (4.4‒40.6 cm) and their maxima (51‒335 cm) were the highest of the entire dataset.

Table 4 Precision of the prediction of medium stages (240‒649 cm) obtained using the LSTM model for the validation period (01 January 2005–24 December 2019)

Only 4% of all validation data exceeds the warning level (≥ 650 cm), and 60.1‒73.7% of the predicted flood stages fall within the required precision interval (Table 5, Fig. 6D). The forecast was most precise on the third day and became less precise on the subsequent horizons. The proportion of overestimated and underestimated data was similar (12.3‒16.2%) in the first half of the 7-day ahead prediction; however, underestimation became dominant in its second half. The median absolute differences between the predicted and actual high stages (3.2‒26.2 cm) were significantly higher than for the medium stages, particularly in the later days of the forecast. Furthermore, the maximum error has decreased, indicating that the model performs well.

Table 5 Precision of the prediction of high stages (> 650 cm) obtained using the LSTM model for the validation period (01 January 2005–24 December 2019)

Forecast of flood levels of selected flood events

The highest water level on record at Szeged was measured in 2006, and since this period was not included in the training data, it gave us a challenging test case to evaluate the model. In addition, the flood in 2013 was thoroughly investigated, though it only reached the II level of the flood warning at Szeged. In the case of the 2006 flood, the model systematically underestimated the water levels, particularly during the rising limb of the flood (Fig. 7). Thus, on every day of the 7-day ahead forecast for the rising limb of the flood wave (< 980 cm, until 18 April 2006), the modelled water levels were below the required precision (Table 6). However, for the peak of the flood (≥ 980 cm, between 18 and 29 April 2006), the prediction was within the required precision range on the first and third days of the 7-day ahead forecast, and slightly below in the second half (fifth‒seventh days) of the forecast. Despite the underestimation, much better precision was obtained during flood’s peak than during its rising limb, as evidenced by the overlap of the confidence interval and the required precision (Fig. 7). In contrast to the rising limb or peak phase of the flood, the descending water levels (falling limb, after 29 April, 2006) were predicted with greater accuracy on each day of the 7-day ahead forecast (Table 6), and the prediction overlaps with the required precision interval of the measured data on most days.

Fig. 7
figure 7

Hydrograph and the time-series forecast obtained using the LSTM model for the 2006 flood

Table 6 Mean absolute error of the prediction for the different hydrological phases of the record-breaking 2006 flood and the flood in 2013

While the 2006 flood reached the highest stage in history (1009 cm), the 2013 flood was much smaller (762 cm), and it was also much shorter, as 55 days were above 600 cm in 2006, and only 34 days in 2013. During the training period, several similar floods as the 2013 one occurred (in 1958, 1962‒1967, 1970, 1974, 1977, 1979–1982, and 1999‒2000), and accordingly, a fairly precise forecast was given for the 2013 flood (Table 6). During the rising limb of the 2013 flood (< 700 cm, until 4 April 2013), the LSTM model slightly underestimated the stages (Fig. 8), comparable to the prediction of the 2006 flood. However, during the peak phase (≥ 700 cm, from 4 April until 3 May 2013) and in the falling limb, the prediction remained within the required precision interval on each day of the 7-day ahead forecast, with minor overshoot. Furthermore, the precision of the peak and falling stage prediction in 2013 was much better than in 2006 across all forecast horizons.

Fig. 8
figure 8

Hydrograph of the 2013 flood and its forecast obtained using the LSTM model. Forecasting horizons: A first day; B third day; C fifth day; D seventh day

Comparison of the LSTM model to the DLCM used by the Hungarian authorities

For 2014‒2019, the predictions generated by our LSTM model and the official DLCM were compared. It must be noted that within these 6 years, no high stages (≥ 650 cm) appeared (max. stage was 616 cm); thus, the assessment of the models is valid just for low (≤ 239 cm) and medium stages (240‒649 cm). The statistical metrics (MAE, RMSE, R2 and WI) reflect that the LSTM model outperformed the DLCM on the first‒fourth days of the prediction (Fig. 9). On the fifth day, our LSTM model still had a lower MAE than the DLCM; however, the other three metrics showed better performance of the DLC model. The DLCM was more accurate on all four metrics on the sixth day of the forecast.

Fig. 9
figure 9

Comparison of the performance of the LSTM model (7-day ahead forecast) and the DLCM (6-day ahead forecast) for 2014‒2019. The evaluation metrics were calculated for different forecast horizons (first-sixth days). A MAE, B RMSE, C R2 correlation and D WI

Based on general quantile–quantile (QQ) plots, the LSTM model’s global performance was compared to that of the DLCM (Fig. 10). On each day of the 6-day head forecast, our LSTM model outperformed the DLCM, as 74.9‒79.9% of all data fell within the required precision interval for the LSTM model, whereas it was only 64.1‒73.7% for the DLCM.

Fig. 10
figure 10

General quantile–quantile (QQ) plots for water stages measured at Szeged (2014‒2019) and predicted using the LSTM model and DLCM. The comparisons were made for the first, third and fifth days of the 7- and 6-day ahead forecasts

The LSTM model generated excellent results for lower stages (< 240 cm), with 78.9‒84.2% of the data falling within the required precision interval, compared to just 70.4‒80.2% for the DLCM (Table 7). Similar to the general data, underestimation was more common than overshooting in both models. However, it was more common in the case of the DLCM, particularly on the first days of the forecast. Meanwhile, both models struggled with the prediction of medium stages (240‒649 cm), as only 53.8‒62.6% of the predicted data by the LSTM model fell within the required precision ranges, though it was even worse for the DLCM (39.1‒48.2%). It is also worth noting that both models overestimated the actual situation in the case of medium stages, and the median absolute differences of the models were also similar (LSTM model: 4.2‒28.1 cm; DLCM: 7.0‒29.0 cm), gradually increasing on the latter days of the forecast.

Table 7 Proportion (%) of the modelled data relative to the required precision intervals for the prediction of low and medium stages obtained using the LSTM model and DLCM for the validation period (2014–2019)

Though no overbank floods occurred during the comparison period (2014‒2019), some subsequent, at/below-bankfull level flood waves appeared in 2019 (Fig. 11). The hydrograph shows a typical flood-wave sequence: small flood peaks with gradually increasing heights appeared (on 7 May: 348 cm and on 14 May: 477 cm), with relatively rapid rising and falling limbs, and the last, largest flood-wave had a peak phase (from 7 to 11 June 2019) when the flood level remained almost the same (603‒615 cm).

Fig. 11
figure 11

Comparison of time-series forecasts of the LSTM model and the DLCM for the 2019 floods up to the sixth day of the 7- and 6-day ahead forecast horizons

Both models made similar prediction errors, but to varying degrees. Their forecasts were hampered by delays, so the rising, peak and falling stages were all predicted for later. For the days of the observed peak of the first two flood waves, the performance of both models were moderately good: some predictions were better for the DLCM, some were more precise for the LSTM model (similar trend can be seen in Fig. 9). Both models performed well in the case of the largest, bankfull flood, though our model had much smaller errors.


Performance of the tested models

According to the models evaluation metrics, the very simple Baseline model was slightly outperformed by the Linear model on the first and last days of the 7-day ahead forecast, while the Linear model provided more precise forecast on the other days. The results of the Linear model indicate that data from previous observations at various points along the river contain relevant information for the forecast. Unlike the Linear model, the MLP model can detect nonlinear connections in data owing to its nonlinear ReLU activation functions, which is very useful in case of lowland rivers, where previously unpredictable impoundments can occur. The MLP model produced good results for the third‒seventh days of the forecast, but the results for the first and second days of the 7-day ahead forecast were quite inaccurate. This could be because the training loss function included all seven forecasts with equal weights, and while the MLP model generally performed better than the Linear model, the focus was slightly shifted to longer time horizons.

The best results were obtained using the LSTM model regarding all time horizons. Setting the last observation of the water level at Szeged as the first input of the decoder, was a significant contribution to this achievement, forcing the model to use this essential information. Therefore, the MAE was not only the lowest on the first day but also throughout the forecasted horizons. Similar good performance of the LSTM model was described by Adikari et al. [1], Cui et al. [12] and Kim et al. [20], especially for extreme hydrological conditions.

The LSTM model tends to underestimate the water stages on each predicted horizon. The median differences on each day were within the required precision. However, the largest absolute differences increase in the different horizons: on the first day of the forecast, the maximum difference was half meter, but on the seventh day it could be over three meters. Forecasts were more consistent on the first and third day of the 7 days ahead forecast, while predictions began to fail more on the fifth and seventh days. Since the LSTM model is especially well-suited to predict low stages, even in later (third and fifth days) forecast horizons, it is an useful tool for predicting water stages during droughts. As droughts become more often and more severe in Europe, it will become increasingly vital to predict low water stages precisely to facilitate water withdrawal, and our LSTM model could serve this purpose.

Forecast of floods by the LSTM model

The hydrologists are particularly interested in medium-term flood prediction, as they need 3‒5 days to prepare adequate flood protection. Thus, they need an exact forecast on the fifth‒seventh days of the 7-day ahead forecast, as they would like to know the peak level of a flood or its duration. Therefore, the performance of the LSTM model was studied for selected flood events (2006 and 2013). The record high 2006 flood was higher (1009 cm) than any previous flood (e.g., the previous largest was in 1970 with 961 cm); hence, such an extreme event was not included in the training dataset. However, during the peak of the flood, our predictions were within the required precision interval on the first‒fourth days of the 7-day ahead forecasts, but not on the fifth‒seventh days. Meanwhile, the prediction of the 2013 flood was much more precise, as the model was trained for such a hydrological event by several previous similar flood events. Therefore, if the model will be used for flood-level prediction in the future, a better performance for flood waves could be expected if the training period includes similar flood events.

In the case of floods, the LSTM model consistently underestimated the stages during the rising; however, the peak was precisely predicted, and the prediction of the falling limb was the most accurate. It implies that the model cannot handle periods of such rapid water level increase (18‒27 cm/day) as it occurred during the rising limbs (the drop of the falling limb was only 10‒13 cm/day). However, the model performs much better under even conditions, as in the case of the Tisza the flood-peak lasted 12 days in 2006, and only minor stage fluctuations (< 7 cm) happened since the Danube impounded the Tisza.

Performance of the LSTM model and the DLCM used in practice

The DLCM has been used to predict the stages of the Hungarian rivers since the 1980s, but due its unreliability, we compared its performance with our LSTM model. The LSTM model outperformed the DLCM just on the first four days of the prediction, on the fifth day, they provided similar results; however, later on, the DLCM turned out to be more accurate. Both models tended to slightly underestimate the water stages, though it was more dominant for the DLCM (LSTM: 11.0‒14.3%; DLCM: 15.4‒18.6%). The forecast of low stages by the LSTM model was more precise than those by the DLCM; 78.9‒84.2% of the data fell into the required precision interval in the case of the LSTM model, though it was only 70.4‒80.2% for the DLCM. However, both models had problems predicting medium stages, as they usually overshoot. The LSTM and the DLC models provided acceptable general performance as they captured the main trends during the investigated period, especially during higher water levels. The results indicate that the LSTM model (7-day ahead forecast) is more precise than the DLC model (6-day ahead forecast) up to the fifth day of the forecast, and the DLC model is more accurate only on the sixth day of the forecast.


The advantage of the developed LSTM-based encoder–decoder model to predict the water stages of a lowland river (Tisza) is that it outperforms other models (i.e., Baseline, Linear, MLP and DLCM). However, its disadvantage is that the model tends to underestimate water levels; though, most of the predictions are within the required precision interval. Another benefit of the usage of the model that satisfactory results could be achieved on the first three days of a 7-day ahead forecasts. Therefore, hydrologists are advised to use this ML algorithm in hydrological predictions. The model likely works well in hydrological conditions similar to the Tisza River, thus it is suitable to predict the stages of lowland rivers with low slopes, large water level fluctuations, long duration of floods, and dry periods.

The proposed LSTM model can achieve satisfactory performance on low-stage and flood data but has difficulties forecasting medium-stage data. Thus, during hydrological predictions, simplification steps must be performed to avoid very complicated and not tractable models, as there is a maximum complexity that a given model can handle. Therefore, we suggest building separated models for long-lasting hydrological situations (e.g., drought, flood) and for periods with medium stages when rapid water level changes occur. An opportunity to reduce the loss value in medium-stage data is to introduce a more elaborated loss function in the training procedure, which gives more weight to the medium-stage water levels.

Availability of data and materials

Hydrological data are available at the Lower Tisza Hydrological Institution (ATIVIZIG) on request ( The code needed for generating plots of the Results and Discussion chapters (along with the data used for the validation) can be found at:


  1. Adikari KE, Shrestha S, Ratnayake DT, Budhathoki A, Mohanasundaram S, Dailey MN (2021) Evaluation of artificial intelligence models for flood and drought forecasting in arid and tropical regions. Environ Model Softw 144:105136

    Article  Google Scholar 

  2. Ahmed AAM, Deo RC, Ghahramani A, Feng Q, Raj N, Yin Z, Yang L (2022) New double decomposition deep learning methods for river water level forecasting. Sci Total Environ 831:154722

    Article  CAS  Google Scholar 

  3. Bartha P, Bálint G, Gauzer B (1998) Expected evolution of the Tisza flood wave. VITUKI Hungary Ltd, Budapest, p 22

    Google Scholar 

  4. Bartha P, Szöllősi-Nagy A, Harkányi K (1983) Hydrological data collection and forecasting system. Danube Vízügyi Közlemények 45(3):373–388

    Google Scholar 

  5. Bálint G, Bartha P (1982) Large-scale assessment of snow resources for forecasting spring flow. Hydrological Aspects of Alpine and High-Mountain Areas. Int Assoc Hydrol Sci 138:203–208

    Google Scholar 

  6. Bezak N, Petan S, Kobold M, Brilly M, Bálint Z, Balabanova S, Cazac V, Csík A, Godina R, Janál P, Klemar Z, Kopáciková L, Liedl P, Matreata M, Korniienko V, Vladikovic D, Šraj M (2021) A catalogue of the flood forecasting practices in the Danube River Basin. River Res Appl 37(7):909–918

    Article  Google Scholar 

  7. Camacho LA, Lees MJ (1999) Multilinear discrete lag-cascade model for channel routing. J Hydrol 226(1–2):30–47

    Article  Google Scholar 

  8. Cerqueira V, Torgo L, Mozetič I (2020) Evaluating time series forecasting models: an empirical study on performance estimation methods. Mach Learn 109(11):1997–2028

    Article  Google Scholar 

  9. Chau KW (2006) Particle swarm optimization training algorithm for ANNs in stage prediction of Shing Mun River. J Hydrol 329(3–4):363–367

    Article  Google Scholar 

  10. Chen WB, Liu WC, Hsu MH (2012) Comparison of ANN approach with 2D and 3D hydrodynamic models for simulating estuary water stage. Adv Eng Softw 45(1):69–79

    Article  CAS  Google Scholar 

  11. Clark JJ, Wilcock PR (2000) Effects of land-use change on channel morphology in northeastern Puerto Rico. Geol Soc Am Bull 112(12):1763–1777

    Article  Google Scholar 

  12. Cui Z, Zhou Y, Guo S, Wang J, Xu CY (2022) Effective improvement of multi-step-ahead flood forecasting accuracy through encoder-decoder with an exogenous input structure. J Hydrol 609:127764

    Article  Google Scholar 

  13. Dutta R, Maity R (2021) Time-varying network-based approach for capturing hydrological extremes under climate change with application on drought. J Hydrol 603(B):126958

    Article  Google Scholar 

  14. Fehérváry I, Kiss T (2020) Identification of riparian vegetation types with machine learning based on LiDAR point-cloud made along the lower Tisza’s floodplain. J Environ Geogr 13:53–61

    Article  Google Scholar 

  15. Fu JC, Huang HY, Jang JH (2019) River stage forecasting using multiple additive regression trees. Water Resour Manage 33:4491–4507

    Article  Google Scholar 

  16. Herman JD, Reed PM, Wagener T (2013) Time-varying sensitivity analysis clarifies the effects of watershed model formulation on model behaviour. Water Resour Res 49:1400–1414

    Article  Google Scholar 

  17. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  Google Scholar 

  18. Illés L, Konecsny K (2000) Hydrological effect of forests on flood development in the upper Tisza catchment. Vízügyi Közlemények 82(2):167–199

    Google Scholar 

  19. Imambi S, Prakash KB, Kanagachidambaresan GR (2021) PyTorch. Progr TensorFlow Solut Edge Comput Appl.

    Article  Google Scholar 

  20. Kim D, Lee J, Kim J, Lee M, Wang W, Kim HS (2022) Comparative analysis of long short-term memory and storage function model for flood water level forecasting of Bokha stream in NamHan River. Korea J Hydrol 606:127415

    Article  Google Scholar 

  21. Kiss T, Fiala K, Gy S (2008) Altered meander parameters due to river regulation works, Lower Tisza. Hungary Geomorphol 98(1–2):96–110

    Article  Google Scholar 

  22. Kiss T, Vágás I (2015) Flood hysteresis curves. Hidrológiai Közlöny 95(4):75–80

    Google Scholar 

  23. Kiss T, Fiala K, Gy S, Szatmári G (2019) Long-term hydrological changes after various river regulation measures: are we responsible for flow extremes? Hydrol Res 50(2):417–430

    Article  Google Scholar 

  24. Kiss T, Nagy J, Fehérváry I, Amissah G, Fiala K, Gy S (2021) Increased flood height driven by local factors on a regulated river with a confined floodplain, Lower Tisza. Hungary Geomorphology 389:107858

    Article  Google Scholar 

  25. Li Y, Shi H, Liu H (2020) A hybrid model for river water level forecasting: cases of Xiangjiang River and Yuanjiang River. China J of Hydrology 587:124934

    Article  Google Scholar 

  26. Moradkhani H, Hsu KL, Gupta HV, Sorooshian S (2004) Improved streamflow forecasting using self-organizing radial basis function artificial neural networks. J of Hydrology 295(1–4):246–262

    Article  Google Scholar 

  27. Mosavi A, Ozturk P, Chau KW (2018) Flood prediction using machine learning models: literature review. Water 10(11):1536

    Article  Google Scholar 

  28. Ñeco RP, Forcada ML (1997) Asynchronous translations with recurrent neural nets. Proc Int Conf Neural Netw 4:2535–2540

    Article  Google Scholar 

  29. O’Connor MK (1976) A discrete linear cascade model for hydrology. J of Hydrology 29(3–4):203–241

    Article  Google Scholar 

  30. Perumal M (1994) Multilinear discrete cascade model for channel routing. J Hydrol 158(1–2):135–150

    Article  Google Scholar 

  31. Perumal M, Moramarco T, Melone A (2007) A caution about the multilinear discrete lag-cascade model for flood routing. J of Hydrology 338(3–4):308–314

    Article  Google Scholar 

  32. Phan TTH, Nguyen XH (2020) Combining statistical machine learning models with ARIMA for water level forecasting: the case of the Red River. Adv in Water Resources 142:103656

    Article  Google Scholar 

  33. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  34. Sahoo B, Perumal M, Moramarco T, Barbetta S, Sahoo S (2020) A multilinear discrete Nash-cascade model for stage-hydrograph routing in compound river channels. Hydrol Sci J 65(3):335–347

    Article  Google Scholar 

  35. Saleh F, Ducharne A, Flipo N, Oudin L, Ledoux E (2013) Impact of river bed morphology on discharge and water levels simulated by a 1D Saint-Venant hydraulic model at regional scale. J Hydrol 476(D24):169–177

    Article  Google Scholar 

  36. Serinaldi F (2010) Multifractality, imperfect scaling and hydrological properties of rainfall time series simulated by continuous universal multifractal and discrete random cascade models. Nonlinear Process Geophys 17(6):697–714

    Article  Google Scholar 

  37. Szöllősi-Nagy A (1982) The discretization of the continuous linear cascade by means of state space analysis. J of Hydrology 58(3–4):223–236

    Article  Google Scholar 

  38. Szöllősi-Nagy A (1987) Input detection by the discrete linear cascade model. J of Hydrology 89(3–4):353–370

    Article  Google Scholar 

  39. Troin M, Arsenault R, Wood AW, Brissette F, Martel J (2020) Generating Ensemble Streamflow Forecasts: A Review of Methods and Approaches Over the Past 40 Years. Water Resources Research.

    Article  Google Scholar 

  40. Vapnik NV (1995) Setting of the learning problem. In: Vapnik NV (ed) The nature of statistical learning theory. Springer, New York, pp 17–33

    Chapter  Google Scholar 

  41. Vágás I (1982) Floods of the Tisza River. Vízügyi Dokumentációs és Továbbképző Intézet, Budapest, p 283

    Google Scholar 

  42. Wei CC (2015) Comparing lazy and eager learning models for water level forecasting in river-reservoir basins of inundation regions. Environ Model Softw 63:137–155

    Article  Google Scholar 

Download references


Open access funding provided by University of Szeged. Not applicable.

Author information

Authors and Affiliations



VZ: conceptualization, formal analysis, methodology, supervision, validation, writing the original draft; BB: formal analysis, methodology, software, validation, writing, review and editing; RL: software, validation, visualization; SS: software, validation; FI: data curation, writing the original draft; KP: resources, supervision. KT: conceptualization, writing, review and editing.

Corresponding author

Correspondence to Zsolt Vizi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: S1.

Research data. S2. LSTM cell details. Figure S2. The Long Short-Term Memory (LSTM) cell and its structure. The red rectangles indicate a neural network with σ (sigmoid) or hyperbolic tangent activation function. The light circles indicate element-wise methods ( for element-wise multiplication and addition). S3. Model and training. S4. Baseline models.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vizi, Z., Batki, B., Rátki, L. et al. Water level prediction using long short-term memory neural network model for a lowland river: a case study on the Tisza River, Central Europe. Environ Sci Eur 35, 92 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: