An advanced hybrid deep learning model for predicting total dissolved solids and electrical conductivity (EC) in coastal aquifers

For more than one billion people living in coastal regions, coastal aquifers provide a water resource. In coastal regions, monitoring water quality is an important issue for policymakers. Many studies mentioned that most of the conven‑ tional models were not accurate for predicting total dissolved solids (TDS) and electrical conductivity (EC) in coastal aquifers. Therefore, it is crucial to develop an accurate model for forecasting TDS and EC as two main parameters for water quality. Hence, in this study, a new hybrid deep learning model is presented based on Convolutional Neural Networks (CNNE), Long Short‑Term Memory Neural Networks (LOST), and Gaussian Process Regression (GPRE) models. The objective of this study will contribute to the sustainable development goal (SDG) 6 of the united nation program which aims to guarantee universal access to clean water and proper sanitation. The new model can obtain point and interval predictions simultaneously. Additionally, features of data points can be extracted automatically. In the first step, the CNNE model automatically extracted features. Afterward, the outputs of CNNE were flattened. The LOST used flattened arrays for the point prediction. Finally, the outputs of the GPRE model receives the outputs of the LOST model to obtain the interval prediction. The model parameters were adjusted using the rat swarm opti‑ mization algorithm (ROSA). This study used PH, Ca + + , Mg2 + , Na + , K + , HCO 3 , SO4, and Cl − to predict EC and TDS in a coastal aquifer. For predicting EC, the CNNE‑LOST‑GPRE, LOST‑GPRE, CNNE‑GPRE, CNNE‑LOST, LOST, and CNNE models achieved NSE values of 0.96, 0.95, 0.92, 0.91, 0.90, and 0.87, respectively. Sodium adsorption ratio, EC, mag‑ nesium hazard ratio, sodium percentage, and total hardness indices were used to evaluate the quality of GWL. These indices indicated poor groundwater quality in the aquifer. This study shows that the CNNE‑LOST‑GPRE is a reliable model for predicting complex phenomena. Therefore, the current developed hybrid model could be used by private and public water sectors for predicting TDS and EC for enhancing water quality in coastal aquifers.


Introduction
Coastal freshwater aquifers offer water for a variety of vital uses, including municipal and domestic water supplies, crop and pasture irrigation, and industrial activities.The coastal aquifer (CA) is an important natural resource for socioeconomic development [15].The water quality of coastal aquifers depends on several factors, including climate change, population growth, geological formations, and recharge rates.The water quality directly affects public health and the environment [3].Monitoring and evaluating the water quality of coastal aquifers is essential because they are used for irrigation and drinking [35].Predicting the water quality of coastal aquifers helps decision-makers to reduce pollution.Conventional methods of assessing water quality are usually expensive and time-consuming for decision-makers, especially in developing countries [10].Water quality can be predicted and managed using various physical or mathematical models.However, these models are complex, time-consuming, and data-intensive [29].It is difficult to use these models in developing countries due to the insufficiency of data or a scarcity of background information.
Various soft computing models have been used to predict water quality over the past few years [28,22,21,43].In order to predict water quality parameters, machine learning models are a better choice than sensors because of the following reasons: 1. Accuracy: Machine learning models can provide more accurate predictions than sensors [5].Machine learning models can analyze complex data patterns and make predictions based on them.2. Scalability: Machine learning models can be trained on large volumes of data, so they can predict water quality parameters across different regions and time periods.Sensors have a limited range of applications and may not be able to collect data from multiple locations [8]. 3. Flexibility: Machine learning models can adapt to different water quality parameters, making them more versatile than sensors designed for particular parameters.In other words, machine learning models can be customized to meet a variety of needs related to water quality monitoring.4. Cost-effective: Machine learning models are more cost-effective than sensors.Sensors are expensive to deploy and maintain.5. Reliability: Machine learning models are more reliable than sensors, which may malfunction or be affected by environmental factors [5].When sensors fail or are unavailable, machine learning models can still provide accurate predictions.
Various research has been conducted to determine and forecast groundwater level [26,27].For instance, for predicting the electrical conductivity (EC) of groundwater, Khashei-Siuki et al. [18] used the kriging method, artificial neural networks (ANNs), and adaptive neurofuzzy inference systems (ANFISs).A high correlation was found between the Cl − and EC parameters.ANN showed the best accuracy.Ravansalar and Rajaee [31] developed an ANN and wavelet ANN model to predict the monthly EC.Their results indicated that wavelet ANN was superior to ANN.Mohammadpour et al. [25] used radial basis function neural networks (RBFNNs) and support vector machine (SVM) models to predict the water quality index.Based on their study, SVMs and RBFNNs could successfully predict water quality indexes.Using wavelet-ANFIS and wavelet-ANN, Barzegar et al. [7] predicted electrical conductivitybased salinity levels.Ca 2+ , Mg 2+ , Na + , SO 4 2− , and Cl − were the inputs.Wavelet-ANFIS outperformed the Wavelet-ANN model.Salami et al. [33] used ANN models to predict dissolved oxygen (DO) and total dissolved solids (TDS).The ANN models were reliable for predicting water quality indicators.Amanollahi et al. [2] evaluated the ability of remote sensing data to predict TDS and PH using.The ANN model and remote sensing data successfully predicted water quality indicators.Charulatha et al. [9] used principal component regression (PCR)-ANN to estimate nitrite concentration.For predicting nitrite concentrations, the PCR-ANN showed high potential.For predicting DO, Zhang et al. [40] used an SVM model.The authors proposed a particle swarm optimization algorithm (PSOA) for finding SVM parameters.They concluded that SVM-PSO was a robust tool for short-term prediction.Khadr and Elshemy [17] used the ANFIS model to predict total phosphorus and nitrogen.ANFIS model required inputs such as TDS, EC, and PH.As a predictive tool, they found the ANFIS model to be reliable.Ahmed and Shah [1] used the ANFIS model to estimate DO.The ANFIS model was reliable for predicting water quality indicators.For EC prediction, Barzegar et al. [8] used extreme Learning Machine (ELM) models and wavelet-ELMs.The least squares boosting (LSBoost) algorithm was used to create an ensemble model based on the outputs of ELM and wavelet-ELM models.The ensemble model outperformed the wavelet-ELM and ELM models.Zhu and Heddam [42] used ANN and ELM models to predict DO.Overall, the ELM and ANN models successfully predicted DO.For predicting the water quality index, Kouadri et al. [19] suggested ANN, multilinear regression (MLR), and support vector machines (SVM).These models had high abilities for predicting the water quality index in the study area.Azrour et al. [4] used ANN and multiple regression algorithms to predict the water quality index.They stated that the ANN and MLR successfully predicted the water quality index.SVM, ELM, MLP, RBFNN, and ANFIS have successfully been used for predicting water quality.However, these models have some shortcomings.These models may miss information in the modeling process.These models can not automatically extract the features of input data.
Deep learning (DL) models are widely used to address the shortcomings of soft computing models.Deep learning models can extract deep features from data points.A convolutional neural network (CNN) is one of the robust deep learning models.CNN has been widely used in different fields, such as medical image [34], prediction of plant leaf diseases [12], stock trend prediction [11], streamflow prediction [14], and weather radar echo prediction [14].A CNN model can extract data features, but it may not be able to learn sequence associations.Due to their excellent information memory and sequential modeling capabilities, long short-term memory (LOST) networks are used for simulating complex problems [30,38].Hence, CNNE-LOST models are suggested for extracting complex features and predicting outputs.A CNNE-LOST combines the advantages of CNNEs and LOSTs.For time series data, the LOST has excellent processing ability, while the CNNE extracts features of grid data.Kumari and Toshniwal [20] used LOST-CNNE models to predict global horizontal irradiance.They reported that the LOST-CNNE model was a robust tool for short-term predictions.Yan et al. [39] used CNNE-LOST models to predict air quality.They reported that the LOST-CNEE outperformed the LOST and CNN models.
However, CNNE-LOST only provides a single prediction value.During the modeling process, it is essential to obtain the interval prediction and uncertainty values.Systematic reviews have shown that Gaussian process regression (GPRE) is a useful method for interval prediction [36,37].GPR is a type of nonlinear Bayesian regression for quantifying uncertainty.
Using LOST and CNN, features can be extracted from the input data.Then, the GPR is used to provide reliable interval predictions.A CNNE-LOST-GPR can predict points as well as intervals simultaneously.There are various advantages of the current developed hybrid model.For instance, the CNNE-LOST-GPR model predicts both interval and point predictions simultaneously.Secondly, unlike MLP, RBFFN, and SVM models, the CNNE-LOST-GPR extracts features automatically.Finally, it is possible to quantify the uncertainty of the modeling process using CNNE-LOST-GPR.
Hence, this study introduces the new hybrid model, namely, CNNE-LOST-GPR for predicting TDS and EC in a coastal aquifer.EC and TDS are predicted because they are the most important water quality indicators.Predicting the electrical conductivity of water provides valuable information about its purity or contamination.The electrical conductivity of water is directly related to the dissolved ions or salts in the water.Higher electrical conductivity in water indicates more dissolved solids, which can negatively impact aquatic life, human health, and industrial processes.A lower electrical conductivity indicates lower levels of contamination and higher purity of water, making it safe for consumption.Therefore, predicting the electrical conductivity of water is important to monitor and regulate water quality and ensure ecosystem health.

Structure of convolutional neural network models (CNN)
Because CNNE models share feature parameters and reduce dimensionality, they are widely used for predicting outputs [36].By sharing parameters, CNNE reduces the number of parameters and computations.CNNE consists of convolutional, pooling, and fully connected layers [6].The convolutional layer consists of many convolution kernels.From input matrices, convolution kernels generate feature maps.Spatial and temporal dependencies are captured using the convolution kernels.A pooling layer decreases the spatial dimensions of the matrices by down-sampling them.In the pooling layer, the number of parameters is reduced while the essential characteristics are maintained.Through fully connected layers, latent patterns are learned from time series input, feature maps, and targets.CNNEs commonly use Rectifying Linear Activation Units (ReLUs) as activation functions.In this study, the weight connections of the CNNE are updated using a robust optimization algorithm.

Structure of LOST
LOST is a robust method for sequence learning.A LOST has a memory cell that can retain information for a long period.There are three multiplicative units in each layer: input gate, forget gate, and output gate.LOST uses state cells.Using the forget gate, it is possible to determine what information should be removed or wished for [41].
where f t : the activation values of the forget gate ω f : the weight matrix of the forget gate, β f : the bias matrix of the forget gate, and µ : the activation function.Input gates determine what information is added to a cell state.The process consists of two levels.The first step is calculating candidate values for the cell states [23].The next step is to calculate the activation values of the input gates. (1) where ω ρ and ω i : the weight mercies of cell state and input gate, β i and β p : bias matrix, ρt : candidate values for the cell states, x t : input, h t-1 : hidden state, and i t : activation values of the input gates.Based on the previous levels, new cell states are computed.
where ρ t : cell state at time t, and ρ t−1 : cell state at time t-1.Finally, the output gat provides the outputs: where o t : activation values of the input gates, ω o and β o : weight and bias matrices of output gate h t : output.

Structure of Gaussian process regression (GPRE)
GPR is a nonparametric probabilistic model for quantifying uncertainty [16].GPRE is a good choice for approximating nonlinear functions.For the noisy data, a regression model is considered as follows: where Z : output, f: basic function in : input, and v : noise.Then, the prior distribution of observed data can be computed.
where σ 2 n : variance, I n : unit matrix, in i : ith input, in j : jth input, and K in i , in j : the N-dimensional covariance matrix.The covariance matrix is computed as follows [37]: where σ f and l: hyperparameters.Lastly, the posterior distribution of the predicted value is calculated. ( where K * * : the self-covariance of test points, K * : the n*1 covariance matrix of test points,z : the point prediction results of GPR, and σ 2 z : variance of the predicted value.Since the CNN-LOST model gives the point predictions, we only require σ 2 z to obtain the corresponding interval prediction (CIP) ( z− 1.96 σ z , z + 1.96 σ z ).The following equation computes the probability density function of the predicted value: The structure of RSOA There are many optimization algorithms, but RSOA is a simple and robust algorithm for solving complex problems.Based on the life of rats, Dhiman et al. [13] introduced RSOA.Rats are aggressive animals that can kill their enemies through their aggressive behavior.For solving complex problems, the RSO mathematically simulates the chasing and fighting behaviors of rats.Generally, chasing behavior assumes that the best search agent knows the location of prey before beginning its search.Based on the location of the best search agent, the other rats update their locations.Using the following equation, we can simulate chasing behavior [13] where R i (x) : the current location of rats, R r (x) : The best location of rats, A and C: random parameters, rand: random number, IT: number of iterations, IT max : maximum number of iterations, α:constant value, RA: the updated location of rats and C: random numbers.At the net level, the following equation is used to simulate the fighting behavior of RSOA: where R � A i (x + 1) : the new position of the rat.

Structure of hybrid LOST-RSO, CNNE-RSO, and CNNE-LOST-GPRE
Weight and bias are the key parameters of LOST and CNNE models.In this study, the RSO was used to adjust the LOST and CNNE parameters: For LOSTEs and CNNEs, weights and biases are initialized.2) A CNNE and a LOST are run using training data.
3) Check the stop criterion (CC).Models are run at the testing level if CC is met; otherwise, they go to step 4. 4) The LOST and CNNE parameters are regarded as the initial population of the algorithms.5) Each rat's location represents the weight and bias parameter values.6) The models are run using the initial population of the algorithms.7) The objective function (root mean square error) assesses the quality of the solution.8) Equations 16 and 17 are used to update rat locations using the operators of rat algorithms.9) The models go to step 3 if the convergence criterion is met; otherwise, they go to step 6.For predicting TDS, the daily inputs were PH, Ca ++ , Mg ++ , Na + , K + , HCO 3 , SO 4 , and Cl − and for predicting EC, the inputs were PH, Ca ++ , Mg ++ , Na + , K + , HCO 3 , SO 4 , and Cl − .

Case study
This paper studies Ghaemshahr coastal aquifer which is located in the north of Iran.A dense forest surrounds the southern region of the basin, while the Caspian Sea surrounds the northern part.There are sub-humid and humid climates in the region.In the study area, 85% of groundwater is used for agricultural purposes.Additionally, groundwater meets about 75% of drinking water demands.Therefore, the plain plays a key role in the water supply.River deposits have formed several types of alluvial plains within the study area.The shallow unconfined aquifer was formed by a calcareous unit containing sand and gravel.Silty and clayey sediments separate the semi-confined aquifer from the unconfined aquifer.The percolated rainfall dissolves minerals in the recharge zone due to the presence of calcareous and dolomite rocks.The data were collated from three zones and observed well.
In zone A (the recharge zone near the foothills of the alborz mountains), the groundwater table level changes from 55 (at sampling point 15) to 94 m (at sampling point 2) above the Caspian Sea level.Water well depth within zone A ranges from 21 to 187 m below the ground surface.In this zone, both the underlying semiconfined and the top unconfined aquifers are connected hydraulically and operate as a unified aquifer system.Water table level in zone B (the central zone) composed of stratified sediments (the top unconfined aquifer), the aquitard layer, the semi-confined aquifer, and the marine sediments) range between 6.6 (sampling point 29) and 61.7 m (sampling point 33) above the Caspian Sea level.Zone C is located near the coastline, and the water table level ranges from 0.4 (sampling point 53) to 12.4 m (sampling point 68) above the mean level of the Caspian Sea.Water wells in this zone are at shallow depths ranging from 12 to 24 m from the ground level.
The study period is from 2015 to 2021.For predicting TDS, the daily inputs were PH, Ca ++ , Mg ++ , Na + , K + , HCO 3 , SO 4 , and Cl − and for predicting EC, the inputs were PH, Ca ++ , Mg ++ , Na + , K + , HCO 3 , SO 4 , and Cl − .Table 1 shows the statistical details of input and output data.Figure 2 shows the study area on Google Map while Fig. 3 shows data points of EC and TDS while.
In some points of Fig. 3, the EC is very high due to various factors.For instance, when the temperature decreases, the EC will increase due to decreasing electrons scattering.Moreover, type and concentrations of ions are also another factor that affects the changes in EC.
In this study, point prediction evaluation metrics are applied to evaluate the performance of models: where MAE mean absolute error, RMSE: root mean square error, N: number of data, V i : Observed data,V i : average observed data, v i : estimated data,

Determination of random parameters
The performance of RSOA depended on the values of random parameters.Therefore, it is necessary to determine the values of random parameters.The maximum number of iterations (MANU) and population size (POPS) are the two most important parameters of RSOA.MANU and POPS are calculated using sensitivity analysis in this study.Minimizing the objective function is obtained by adjusting parameter values.Therefore, the lowest values of random parameters gave the lowest values of the objective function.

Selected features by the hybrid model
This study uses hybrid GPR-CNN-LOST to identify features automatically.The best input combinations are shown in Table 3.For predicting TDS, the best input combination was HCO3, Na + , Ca ++ , and Mg ++ .For Predicting EC, the best input combination was Na + , HCO  when selecting features.Previous research showed the effect of HCO 3 on EC [32].Figure 6 indicates the correlation heat maps between outputs and inputs.It was found that HCO 3 , Na + , Ca ++ , and Mg ++ had the highest correlation with TDS.It was found that Na + , HCO 3 , SO 4 , and Ca ++ had the highest correlation with EC.Thus, the hybrid model correctly chooses the best features.Also, LOST, GPRE, CNNE-, LOST-CNNE, LOST-GPRE, and CNNE-GPRE used the best input combinations for predicting TDS and EC.The correlation heat maps between outputs and inputs have been clearly shown in Fig. 6.For instance, the correlation values for pH are 0.3 and 0.59 for input and output of TDS respectively.Moreover, the correlation values for pH are 0.54 and 0.73 for input and output of EC respectively.

Evaluation of the accuracy of models for point predictions
This section evaluates the accuracy of models for predicting points.

Evaluation of the accuracy of models for interval prediction
Figure 10 shows the 95% prediction interval for TDS.Prediction interval is the estimation of the interval to fall future observations within certain probabilities.In regression analysis, prediction interval is commonly used.Based on Fig. 10, it can be clearly seen that the extreme events cannot be easily estimated.This is due to the lack of correlation between the previous and next values.The Best performance is achieved when all observed data are within bounds.Models with the highest PICP values are ideal.The CNNE-LOST-GPRE, LOST-GPRE, CNNE-GPRE, and GPRE were used for interval prediction.

Evaluation of the accuracy of models
In this study, the CNN-LOST-GPR was used to predict EC and TD.The models were useful for interval and point predictions.The main differences between the current research and other papers were as follows: The CNN-LOST-GPR is a robust tool for monitoring water quality in complex and dynamic systems.However, the standalone LOST and CNN were inaccurate in predicting water quality indicators.Also, the high accuracy of CNN-LOST-GPR indicated that the RSOA performed well.The CNN-LOST-GPR also can be used for providing spatial and temporal maps of water quality indicators in a large basin.

Evaluation of the hadrochemical and water quality characteristics of the aquifer
For irrigation purposes, it is necessary to evaluate the hydrochemical quality of groundwater.This section uses different indices to assess the water quality characteristics of the aquifer.Na + is one of the most important  Fig. 11 The 95% prediction interval of TDS predictions parameters for evaluating water quality.When sodium levels exceed the safe level, water permeability is reduced, and crops are damaged.The classification of water samples is shown in Table 5.
• SRA Based on SRA, 45, 33, and 22% of the water samples are good, doubtful, and unsuitable, respectively.If the SRA of water is high, it may cause the dispersion of soil colloids.
• MHR Too much magnesium inhibits calcium absorption, and plant growth is reduced.78 and 22% of samples are suitable and unsuitable based on the MHR parameter.Thus, water can adversely affect crop growth.

• EC
Higher EC inhibits nutrient uptake by increasing the osmotic pressure of the nutrient solutions.The health and yield of plants may be severely affected by lower EC.Based on EC values, 10, 67, and 23% of water samples are good, doubtful, and unsuitable, respectively.

• Sodium%
Crop yield is reduced when the sodium concentration exceeds the permissible limit.50, 20, 10, and 10% of water samples were good, permissible, doubtful, and unsuitable.

• TH
Based on THE values, 70 and 30% of water samples were hard and unsuitable.Thus, THE values indicate the low quality of water samples.
Based on the comparison of the utilized and developed hybrid machine learning models, it shows that CNN-LOST-GPR outperformed other proposed models (LOST-GPRE, CNNE-GPRE, GPRE) in predicting TDS and EC.This study demonstrates that the CNNE-LOST-GPRE model is a reliable predictor of complex occurrences.As a result, the already developed hybrid model could be utilized by the private and public water sectors to estimate TDS and EC in coastal aquifers in order to improve water quality.While population and irrigation demand may increase in the future, water quantity and quality are poor.Hence, decision-makers must develop new policies and strategies for managing the basin's water quality.In most cases, water table levels and subsidence are reduced, and water quality is improved through recharge basins.Brackish groundwater desalination is another widely used method in different world regions.Moreover, based on the PICP of the 95% prediction interval results for TDS, CNN-LOST-GPR outperformed LOST-GPR, CNN-GPR, and GPR with PICP of 0.95, 0.94, 0.91, and 0.91 respectively.Furthermore, based on the PICP of the 95% prediction interval results for EC, CNNE-LOST-GPRE outperformed LOST-GPRE, CNNE-GPRE, and GPRE with PICP of 0.97, 0.95, 0.93, and 0.90 respectively.
There are various advantages of the CNNE-LOST-GPRE hybrid model.For instance, CNN is able to capture both short-term and long-term dependency.LOST is able to intricate temporal dependency patterns.GPR could yield reasonable intervals for projected states, which is valuable for estimating uncertainty.Therefore, those three algorithms could attain a well performed accurate model.Besides, there are some limitations of the CNNE-LOST-GPRE hybrid machine learning model.For instance, CNN tends to be slow and training the data takes a long time.Furthermore, when the training data is limited or noisy, LSTM tends to overfit and lose generalization ability.Finally, GPR assumes a normal distribution, which is inappropriate for variables with only positive values.
-GPR is a hybrid model for predicting complex phenomena.Each model has a task in the modeling process.Training data are inserted into the CNNE model in the first step.The convolutional layer (COL) extracts features using convolution kernels.COLs provide feature maps.A pooling layer decreases the width and length of feature maps.Finally, CNNE provides outputs.In the next level, these outputs are flattened.The flattened arrays are inserted into the LOST model.Figure 1 demonstrates the structure of the LOST-CNNE model.The LOST model provided point predictions at the training and testing levels.Then, the outputs of LOST models are inserted into the GPR model for interval predictions.The GPRE predicts all data points and obtains interval predictions.This study compares CNNE-LOST-GPRE with LOST-CNNE, LOST, CNNE, LOST-GPRE, and CNNE-GPRE models.The structure of hybrid models is explained based on the following levels.• Hybrid CNN-LOST CNNE extracts the feature at the training and testing levels.The flattened outputs of CNNE are inserted into the LOST model for predicting data points.• Hybrid CNNE-GPRE The training and test data were inserted into the CNNE model at the training and testing levels.The outputs of the CNNE model are flattened.The flattened outputs are inserted into the GPR model.The GPRE model provides interval predictions.• Hybrid LOST-GPR The training and testing data were used to run the LOST model at the training and testing level.The outputs of the LOST model are inserted into the GPRE model for interval predictions.

Fig. 1
Fig. 1 Structure of the LOST-CNNE model v ies : average estimated data, PBIAS: Percent bias, and NSE: Nash-Sutcliffe efficiency.The low values of RMSE, MAE, and PBIAS show the best efficiency.The following indices are used to evaluate the predicted intervals: where PICP : Prediction Interval Coverage Probability, N: number of data, R: range of data, PINW : Prediction Interval Normalized Average Width, up i : upper values of variables, and low i : lower values of variables, NC : index uncertainty.The low and high values of PINAW and PICP show more accurate predictions.Table 2a, b show the optimal values of model parameters.
at the training and testing levels.The flattened outputs of CNNE are inserted into the LOST model for predicting data points.Therefore, each model uses different sizes for training and testing sets.Based on different data sizes, Fig. 4 shows the RMSE values of CNNE-LOST-GPRE.

Figure 5 Fig. 2
Fig. 2 Study area on Google Map

Fig. 3
Fig. 3 Data points of EC and TDS

Fig. 5
Fig. 5 Sensitivity analysis of random parameters of RSOA

Table 1
The details of input and output data (number of input data:391, number of output data:391)

Table 2
Optimal values of model parameters, a: for predicting EC, and b: for predicting TDS f :2 and l:1

Table 3
The best input combinations for predicting TDS and EC

Table 4
Summary of PICP, PINW, and NC results of 95% prediction interval for TDS and EC

Table 5
The classification of water samples