Insight into temporal–spatial variations of DOM fractions and tracing potential factors in a brackish-water lake using second derivative synchronous fluorescence spectroscopy and canonical correlation analysis

Insight into temporal–spatial variations of dissolved organic matter (DOM) fractions were undertaken to trace potential factors toward a further understanding aquatic environment in Lake Shahu, a brackish-water lake in northwest China, using synchronous fluorescence spectroscopy (SFS) combined with principal component analysis (PCA), second derivative and canonical correlation analysis (CCA). Five fluorescence peaks were extracted from SFS by PCA, including tyrosine-like fluorescence (TYLF), tryptophan-like fluorescence (TRLF), microbial humic-like fluorescence (MHLF), fulvic-like fluorescence (FLF), and humic-like fluorescence (HLF), whose relative contents were obtained by second derivative synchronous fluorescence spectroscopy. The increasing order of total fluorescence components contents was July (11,789.38 ± 12,752.61) < April (12,667.58 ± 15,246.91) < November (19,748.87 ± 17,192.13), which was attributed to tremendous enhancement in TYLF content from April (1615.56 ± 258.56) to November (5631.96 ± 634.82). The PLF (the sum of TYLF and TRLF) dominated the fluorescence components, whose proportion was 40.55, 37.09, or 46.91% in April, July, or November. DOM fractions in November were distinguished from April and July, which could be attributed to that water of the Yellow River was continuously loaded into the lake as water replenishment from April to September. From the replenishment period to non-replenishment, the contents of the five components gradually changed from low in the middle and high around the lake to high throughout entire lake. Based on the CCA results, the potential factors included TYLF, TRLF, MHLF, SD, and BOD5 in April, which were relative to organic matter pollution. The potential factors contained TYLF, TRLF, FLF, Chl-a, TP, CODCr, and DO in July, indicating the enrichment of TP lead algae and plants growth. The potential factors in November consisted of TYLF, TRLF, CODCr, SD, TN, and FLF, representing residue of the algae and plants have been deeply degraded. The replenishment of water led to enrichment of TP, resulting in growth of algae and plants, and was the key factor of water quality fluctuations. This work provided a workflow from perspective of DOM to reveal causes of water quality fluctuations in a brackish-water lake and may be applied to other types of waterbodies.


Background
Dissolved organic matter (DOM) is a complex mixture consisting of proteins, polysaccharides, and humic substances [1,2]. DOM has been defined as organic matter in solution that can pass through a 0.45-μm membrane filter [3]. It exists ubiquitously in natural and engineered aquatic systems, is an important carrier for pollutants, which is associated with the retention and release of nutrients, biological availability, and the migration and transformation of contaminants [4]. Therefore, the study of the variation of DOM composition and distribution is significant for understanding aquatic environments and evaluating water quality.
Various characteristic techniques, such as HPLC, FTIR, UV-vis, and fluorescence spectroscopy, have been applied to determine the structure, composition, and functionalities of DOM [5][6][7][8]. Fluorescence spectroscopy techniques, including excitation-emission matrix spectroscopy (EEMs) and synchronous fluorescence spectroscopy (SFS), are non-destructive techniques characterized by rapid analysis, high sensitivity, and simple operation [8,9]. Thus, fluorescence spectroscopy techniques are emerging as available tools that have been widely employed for investigating DOM. EEMs provide a whole range of incremented excitation wavelengths and the corresponding emission data with large and complicated datasets, including various peaks and their specific locations. Recently, EEMs combined with parallel factor analysis (PARAFAC) was widely applied to decompose EEMs into fluorescence components [10,11]. However, PARA-FAC requires large amounts of samples to ensure that the extracted fluorescence fractions are correct. Specifically, SFS, a method that scans both excitation and emission monochromators with a selected constant wavelength, provides simpler spectra, which could be easier to interpret without losing important information [12,13]. SFS also provides better structure and resolved peaks, which can be easily analyzed and differentiate the fluorescence spectra of samples of various origins and is suitable for a small number of samples. Statistical methods such as principal component analysis (PCA) could be used in SFS in order to acquire more information to assist in further analysis. SFS combined with PCA can decompose complex synchronous fluorescence spectra and reveal the similarity and dissimilarity between the samples [14,15]. Moreover, derivatives are applied in SFS to reduce extensive spectroscopic overlap and eliminate matrix interference [16]. Lake Shahu (106°18′E, 38°45′N), a terminal lake, is a typical brackish-water lake located in an arid region that frequently experiences dropping water quality levels [17]. Its level of water quality was highly affected by water replenishment, which input into the lake with large amounts of containments introducing eutrophication [18,19]. Lake Shahu was eutrophic during the period of replenishment, especially in the July, which contributed to high TP and TN concentrations in the replenishment water inflowing from the Yellow River [18]. Organic pollution also plagued Lake Shahu reflecting in the high concentration of COD Cr . DOM can alter nutrient availability, and that can strongly affect algae abundance and community structure across lakes [20]. Thus, it is urgent to investigate the variation of DOM composition and distribution in Lake Shahu in order to understand the dramatic fluctuations of water quality.
The objectives of this study were (a) to extract fluorescence components of DOM from Lake Shahu and characterize their temporal-spatial variations by SFS combined with PCA and second derivative; (b) to seek potential factors among water quality parameters and fluorescence components and identify pollution sources using canonical correlation analysis (CCA).

Study area and sample collection
Lake Shahu covered an area of 8.2 km 2 and an average depth of 2.2 m [21]. It had an arid and semi-arid continental climate with annual average temperature of 9.74 °C, annual average precipitation of 172.5 mm, and annual average evaporation of 1755.1 mm [22]. The south side of Lake Shahu is a sandy beach, the east side is adjacent to a wetland, and the large areas of farmland are locating on the west and north sides of Lake Shahu. As a terminal lake in an arid region with no natural surface runoff or outflow, Lake Shahu is particularly sensitive to an extremely high evaporation proportion [23]. For maintaining the ecological water storage, replenishment water from the Yellow River is loaded into the lake from April to September since 2013 [24] (Fig. 1). The volume of replenishment water from Donggan and Bayi channels is 23.40 and 16.13 million m 3 , respectively, in 2020. Because of the intense evaporation and the water recharge, the lake has undergone dramatic changes in water quality and accumulated contamination from various sources.
Eleven sampling sites were selected based on recharge water access, potential sources of pollution, and Keywords: Dissolved organic matter, Synchronous fluorescence spectroscopy, Principal component analysis, Second derivative synchronous fluorescence spectra, Canonical correlation analysis geographical proximity (Fig. 1). Sampling sites #1-3 were located on the southwest of Lake Shahu. Sampling sites #4-6 were in the central region of the lake, especially #6 close to the water inlet. Sampling sites #7 and #8 adjoined a resort, and the sampling sites #9-11 were in Niaodao island. Water samples from Lake Shahu were collected in April, July, and November 2020, which were the early water replenishment period, mid-replenishment period, and non-replenishment period, respectively. At each selected sampling site, water sample collection was carried out with a 5 L Van Dom water sampler. Three water samples were collected from different depths (10, 20, and 30 cm) and completely mixed with the same volume of water. The samples were shipped to the lab in a cooled container for analysis.

Measurements of physico-chemical parameters
Temperature (TEMP), electrical conductivity (EC), and dissolved oxygen (DO) were measured using a YSI portable multiparameter water quality tester in situ. The Secchi depth (SD) was measured using a standard Secchi disk with black and white quarters. The samples were transported to the laboratory in pre-cleaned polyethylene bottles, which were used to measure chemical oxygen demand (COD Cr ), total phosphorus (TP), total nitrogen (TN), ammonia nitrogen (NH 3 -N), chlorophyll a (Chl-a), and biochemical oxygen demand (BOD 5 ). The standard analytical methods for those parameters are presented in Table 1.

Measurements of synchronous fluorescence spectroscopy
Water samples were filtered using glass fiber filters (Millipore, 0.45 μm fiber Ø) before the fluorescence determination. The SFS was measured using a Hitachi Fluorescence Spectrophotometer (F-7000) with a 1 cm quartz cuvette, which equipped with the fluorescence solution 1.00.000 (FL-solution software) for data processing. PMT voltage was set at 700 V, and scan speed was fixed at 240 nm min −1 . The SFS was obtained by a constant wavelength difference (Δλ = λ em -λ ex = 55 nm) with the excitation wavelength range from 260 to   [9]. Before further analysis, the spectrum of blank was subtracted from all spectra.

Statistical analysis
PCA was performed for the SFS of DOM at 11 sampling sites by SPSS 25.0 software to identify the variations of DOM fractions and to trace dominated fluorescence components in different replenishment period. The sampling sites were set as variables and fluorescence intensity of spectroscopic wavelengths was set as cases, when performed PCA. Based on the score plots for spectral wavelengths, spectroscopic waveform and dominated fluorescence of each principal component (PC) was characterized. And the variations of DOM fractions were investigated by loading plots for the 11 sites. The potential factors between water quality parameters and DOM fractions were traced using the CCA, which was carried out by Canoco 4.5, with multivariate direct gradient analysis [25].

Second derivative method
The second derivative method was used to reduce extensive spectroscopic overlap and identify accurate wavelength range of each fluorescence peak of SFS by Origin 2021 software. After the second derivative, the fluorescent peaks were transformed into valleys, while the valleys were transformed into peaks. Therefore, the second derivative synchronous fluorescence spectroscopy (SDSFS) should be normalized with the intensities of fluorescence spectra multiplied by negative one [26]. For removing excess noise, the Savitzky-Golay method with 10 points of windows was applied to smooth the SFS after second derivative. The interval in the process was 2 nm.

Parameters of water quality
The spatio-temporal variation of water quality was displayed through the matrices of water quality parameters (Fig. 2). The highest average TEMP occurred in July (24.2± 1.03 °C). Sampling site #7 in July showed the highest TEMP (26.1 °C), and the lowest TEMP (8.0 °C) was observed in November at sampling sites #1 and #6 (Fig. 2a). The EC mean values decreased in the order of April (206.09 ± 18.11 μS cm −1 ) > July (192.10 ± 16.61 μS cm −1 ) > November (161.7 ± 32.74 μS cm −1 ) (Fig. 2b). In April, the EC values at sampling sites #3 (172 μS cm −1 ) and #4 (173 μS cm −1 ) were the lowest. In July and November, site #6 showed the lowest EC values at 146 μS cm −1 and 82 μS cm −1 , respectively. The DO mean value in July was higher than those in April and November (Fig. 2c), especially at sampling sites #4 (13.5 mg L −1 ) and #6 (13.4 mg L −1 ), which showed the highest DO values in July. The SD mean value increased in the order of July (52.4 ± 8.09 cm) < November (55.9 ± 19.99 cm) < April (64.09 ± 9.93 cm) (Fig. 2d). All sampling sites, except site #6, presented considerably higher concentration of the COD Cr in November than those in July and April (Fig. 2e). The highest COD Cr value was observed at site #1 (30 mg L −1 ) in November, and site #4 (12 mg L −1 ) showed the lowest concentration of COD Cr in April. The highest mean TP value was observed in November (0.038 mg L −1 ), followed by July (0.033 mg L −1 ) and April (0.026 mg L −1 ) (Fig. 2f ). An evidence promotion of concentration of TP occurred from April to July, especially sampling sites adjacent to water inlets (Figs. 1, 2f ). This could be associated with the water recharge with higher amounts of TP entered the lake in July, the period of intensive agricultural activities, which will cause the growth of algae and aquatic plants. Moreover, sampling sites #9-11 formed a zone with a higher concentration, and the highest TP value (0.06 mg L −1 ) was observed at sampling site #9 in July TP (Fig. 2f ). The TN mean values increased in the order of July (0.69 ± 0.037 mg L −1 ) < November (0.93 ± 0.28 mg L −1 ) < April (1.01 ± 0.20 mg L −1 ) (Fig. 2g). The TN value at sampling site #6 (1.51 mg L −1 ) in November was the highest, and the lowest value (0.64 mg L −1 ) was obtained at site 5# in July. Sampling site #6 presented the highest TN value in each month. In contrast, the NH 3 -N at site #6 showed the lowest value in each month. The NH 3 -N mean values increased in the order of November (0.09 ± 0.03 mg L −1 ) < July (0.12 ± 0.03 mg L −1 ) < April (0.13 ± 0.05 mg L −1 ) (Fig. 2h). The highest NH 3 -N value was obtained in April at site #5. The highest Chl-a mean value was presented in November (11.45 ± 5.01 μg L −1 ), followed by July (6.36 ± 3.70 μg L −1 ) and April (3.73 ± 3.00 μg L −1 ) (Fig. 2i). The sampling sites #9-11 showed higher concentrations of Chl-a (Fig. 2i), in which the trend was similar to the TP, but the highest Chl-a value (19 μg L −1 ) was observed at sampling site #5 in November. The BOD 5 mean values increased in the order of April (1.66 ± 0.38 mg L −1 ) < November (2.28 ± 0.24 mg L −1 ) < July (2.37 ± 0.18 mg L −1 ) (Fig. 2j). Sampling site #9 in July showed the lowest BOD 5 (1.2 mg L −1 ), and the highest BOD 5 (2.7 mg L −1 ) was observed at sampling sites #8 and #3 in November and July, respectively.
Noticeably, the lower concentration of COD Cr , NH 3 -N, Chl-a, and BOD 5 were exhibited at site #6 were lower than the concentration of other sites, representing a higher level of water quality in central of the lake (Fig. 1). In addition, the values of COD Cr , TP, Chl-a, and BOD 5 in November and July were higher than those in April. This indirectly indicated that the phenomena of algae and aquatic plants growth, intense anthropogenic activities, by-product of algae and decomposition of plants occurred in July and November. According to the spatio-temporal variation of water quality, Lake Shahu is mainly polluted by organic matter in July and November.

Synchronous fluorescence spectroscopy
SFS of DOM from Lake Shahu exhibited a prominent peak, a relatively weak peak, and two broad shoulders (Fig. 3). The prominent peak at the wavelengths of 260-310 or 260-310 nm was denoted as the protein-like fluorescence (PLF) component, containing the tyrosine-like (TYLF) and tryptophan-like fluorescence (TRLF) [27]. The PLF in surface waters was at lower contents than humic substances [28]. Thus, a higher level of PLF could be associated with exogenous pollution. And the TRLF in natural water was mainly impacted by anthropogenic activities [29], while the TYLF was plant-derived [30] and biodegradation-derived DOM [31]. In addition, the TRLF is one of the significance nutrients for aquatic plants and microorganisms [31,32]. The first shoulder presented at the wavelength range from 300 to 345 or 310 to 355 nm and was associated with the microbial humic-like fluorescence (MHLF) component, which concerned microbial activities [9]. The weak peak at 345-420 or 355-420 nm was assigned to the fulvic-like fluorescence (FLF) component from lignin and other terrestrial plant-derived precursor material [33]. The second shoulder was observed at the wavelength range of 420-500 nm, which was related to the humic-like fluorescence (HLF) component [34]. Obviously, the peaks with minor fluorescence intensity were covered up by the preponderant one.

Principal component analysis
PCA was employed on the SFS of Lake Shahu in order to decompose the overlaps of spectrums and qualitatively investigate the spectral composition of datasets from diverse sampling periods. The Kaiser-Meyer-Olkin (KMO) and Bartlett sphericity test were first performed to test the adequacy and applicability of factor analysis/ principal component analysis [35]. The KMO values of the four datasets were 0.94, 0.905, 0.932, and 0.843, respectively, and the significance levels of the Bartlett sphericity test were less than 0.001, indicating that the SFS was suitable for PCA. According to the loadings and scores of principal components (PCs), the characteristics of sampling sites and dominated fluorescence components of DOM could be identified (Fig. 4).
PCA on the each SFS datasets yielded two PCs, which accounted for 99.37, 99.856, 99.883, and 99.820% of total variables, respectively. Five fluorescence peaks were extracted by PCA (Fig. 4b). As shown in the map of PC loading plots (Fig. 4a), the sampling sites of the entire study period were divided into two groups: group A, with the sites during November (water non-replenishment period), and group B, with the sites during April and July (water replenishment period), which could attribute that water of the Yellow River was continuously loaded into the lake as water replenishment from April to September. Group A had higher second (> 0.8) and lower first (< 0.6) PC loadings, indicating that the TYLF was dominant in the water non-replenishment period (Fig. 4b). Therefore, the metabolism of native organisms and decomposition of aquatic plants were the sources of pollution in November. Furthermore, in the water replenishment period (April and July), the TRLF dominated group B, because it presented higher first (> 0.7) and lower second (< 0.7) PC loadings (Fig. 4b). Thus, the most prominent pollution sources in April and July were anthropogenic activities, including agricultural cultivation, irrigation, domestic wastewater, and tourism. The two PCs explained 99.856% of the total variances in the early water replenishment period (April), including 57.021% for PC1 and 42.834% for PC2. The score plots for the SFS wavelengths distinguished the fluorophores from DOM (Fig. 4d). PC1 showed three peaks with similar factor scores (Fig. 4d), which could be related to the components of TYLF, MHLF, and FLF. A prominent peak and two shoulders were obtained in PC2 (Fig. 4d). The prominent peak was associated with the TRLF. The first shoulder was referred to as the MHLF, presenting a blue-shift of 30 nm compared with PC1, and the second shoulder was involved in the HLF. With the exception of sites #1 and #9, all sides had a higher loading (> 0.7) of PC1 (Fig. 4c), illustrating that the TYLF, MHLF, and FLF were the preponderant components. This demonstrated that the pollution in April was mainly derived from the native organisms and contaminants accumulation during the freeze-up period. Sites #1, #9, and #10, especially site #1, with higher second PCA loadings, represented the characteristics of the replenishment period, which were dominated by the TRLF. This indirectly indicated that the sampling sites that were close to the shore, where are more easily to be affected by anthropogenic activities. The map of loading plots in the early water replenishment period showed a high degree of dispersion between the sampling sites (Fig. 4c), which could be associated with the differences in DOM components due to insufficient water replenishment. The preponderant component is the TYLF rather than the TRLF due to insufficient supply of water recharge.
Two PCs in July were extracted by PCA that explained 57.217 and 42.666% of the total variables, respectively. Interestingly, the trend of score plots in July (Fig. 4f ) were similar to the full period (Fig. 4b), i.e., PC1 contained the TRLF and FLF and PC2 involved the TYLF and MHLF. Most of the sites with more than 0.7 of PC1 loading and less than 0.70 of PC2 loading (Fig. 4e) presented characteristics of the water replenishment period, i.e., the TRLF was the representative component, which was similar to the water replenishment period (Fig. 4b). This indirectly indicated that July was the primary period of water recharge. In addition, the results demonstrated that anthropogenic activities were the dominant source of pollution, during the tourist season and the intense agricultural irrigation season. Site #6, with more than 0.75 of PC2 loading and less than 0.65 of PC1 loading, was dominated by the TYLF (Fig. 4e), which was consistent with its results in April. This indirectly indicated that there was consistently a better level of water quality at site #6 (Fig. 2). In addition, compared to April, the degree of dispersion decreased with a rise in the similarities of DOM components between the sampling sites, which could have been due to the sufficient water fluidity caused by water replenishment.
In November, PC1 (51.407% of the total variance) exhibited higher positive loadings (> 0.7) at sites #1, #10, #2, #5, #4, #7, #3, and #9. This indicated that the component of these sites mainly was the TYLF (Fig. 6h), which was associated with biodegradation and decomposition of lake aquatic plants. PC2 (48.413% of the total variance) showed better positive loadings (> 0.7) at sites #3, #9, #8, #11, and #6, indirectly verifying that the FLF was dominant, followed by the TRLF and MHLF (Fig. 6h). This could be associated with degradation of aquatic plants and algae. Based on the loading map, the degrees of dispersion in November were lower than those in April and higher than those in July, which could be attributed to the water replenishment.
Specially, the fluorophore characteristics of site #6 in July were similar to those during April, and the characteristics of site #6 in November were consistent with all the sampling sites, except for site #6, in July. This suggested that site #6 exhibited hysteresis in the variation of fluorophore constitution. In other words, there was consistently a better level of water quality at site #6 as indicated by the water quality parameters in "Parameters of water quality" Section.

Derivative fluorescence spectroscopy
The peak at around 280 nm was so strong that extensive overlaps at the bands of PLF and MHLF were found (Fig. 3), and the overlaps presented obstacles that hindered the discernment of fluorescence peaks. Although five fluorophore datasets were extracted by PCA, it could not identify the characteristics of each sample. Thus, in order to reduce extensive spectroscopic overlaps, the accurate wave band of each peak is sought, and the composition of DOM fractions at each site is investigated, and the derivative method was employed. The process of the derivative method is shown in Fig. 5. There were five peaks corresponding to five regions, which were defined as TYLF (I), TRLF (II), MHLF (III), FLF (IV), and HLF (V) (Fig. 6). The integrated area in each region was calculated to indicate relative content of corresponding fluorescence component [9,36].
The mean values of total content of fluorescence components (TFC) increased in the order of July (11,789.38 ± 12,752.61) < April ( 1 2 , 6 6 7 . 5 8 ± 1 5 , 2 4 6 . 9 1 ) < N o v e m b e r (19,748.87 ± 17,192.13). A sharp rise of DOM content occurred in November, which indirectly indicated that the degree of organic pollution in the lake is the highest in November (Fig. 2). Among the five fluorescence components, the content of TYLF performed the most significant elevation (Fig. 7c, Additional file 1: Fig. S1), which was 17,771.18 in April, 16,640.09 in July, and 61,951.50 in November, indicating large amounts of algae-derived and plant biodegradation-derived DOM release into the lake during the non-replenishment period. The content of the PLF (the sum of TYLF and TRLF) among all sampling sites almost dominated the fluorescence components, whose average percentages among the DOM fractions were 40.55% in April, 37.09% in July, and 46.91% in November (Fig. 7b, d, f ). And the TRLF had a larger share of PLF in April (68.84%) or July (65.78%), while the TYLF became the component with a larger share of PLF in November (61%). This indicated intense anthropogenic activities and release by-products of the lake organism in water replenishment period. The proportion of the MHLF decreased in the order of April (36.82%) > July (35.92%) > November (27%). The FLF and HLF showed a relatively consistent status and lower proportion in all periods, which varied from 15.64 (April) to 19.91% (November) and 6.09 (November) to 7.45% (July), respectively. From replenishment period to non-replenishment, the contents of the five components gradually changed from low in the middle and high around the lake to high throughout lake (Additional file 1: Fig. S1).
In a word, DOM fractions were mainly derived from by-products of the lake organisms and anthropogenic activities during the replenishment period. In the nonreplenishment period, the TYLF and FLF were derived from aquatic plants residue and biodegradation of the algae dominated DOM fractions, which resulted in an increase in DOM contents and an increase in degree of organic pollution (Fig. 2, 4h, Additional file 1: Fig. S1).

Canonical correlation analysis
The CCA could be applied to visualize the comprehensive correlations between water quality, environmental factors, and sampling sites, and to identify the potential factors of the sites [25,37]. The results of CCA visualized by CANOCO 5, the longer arrow indicates the greater influence of the factor, the smaller angle between the arrow and the coordinate axis implicates the higher correlation, and the smaller distance between the sampling sites and the arrow presents the stronger effect of the factor on the sampling sites. In the CCA ordination biplot of April, the TYLF, TRLF, MHLF, SD, and BOD 5 were relative to organic matter pollution (Fig. 8a). The arrows of the TYLF, TRLF, and MHLF pointed toward the positive direction of AX1 with small angles (< 45°) and much longer arrows, which illustrated that the positive half of AX1 could be related to the organic matter from water recharge. Sites #1, #3, #9, and #10, with positive loadings in AX1 (Fig. 8a), were affected by organic matter pollution. This indirectly indicated that these sites had much greater TYLF, TRLF, and MHLF content (Fig. 7a, b). The BOD 5 was the potential factor of the sites located in the negative region of AX1. Moreover, in the positive region of AX1, the loadings decreased in the order of #1 > #9 > #3 > #10 > #8, which was practically consistent with the PCA result (Fig. 4c).
As shown in Fig. 8b, in July, the potential factors included TRLF, TYLF, FLF, Chl-a, TP, COD Cr , and DO. This indicated that an amount of recharge water with a high TP, COD Cr , and TRLF load entered the lake resulted in algae and aquatic plants blooms. Site #6 obtained highest positive loading of AX2, followed by #5, #11, #7, and #10, which was associated with the potential factors of TYLF, TRLF, and FLF. This indirectly proved that the highest contents of TYLF, TRLF, and FLF were present at site #6 (Fig. 7c, d). The potential factors of site #9 were Chl-a, TP, and COD Cr , which illustrated that the higher values of those parameters at site #9 in July (Fig. 2e, f, i). The arrows of MHLF, HLF, Chl-a, and TP with smaller angles were the potential factors of the positive half of AX1. Moreover, the arrows of DO, SD, and TEMP pointed toward the negative direction of AX1, which represented the potential factors of sites #3, #4, #5, #7, and #8. The negative half of AX1 may be concerned with natural pollution.
The potential factors in November were COD Cr , TYLF, TRLF, SD, TN, and FLF (Fig. 8c), representing that residue of the algae and aquatic plants have been deeply degraded into TRLF, TYLF, and FLF during the non-replenishment period. As we expected, the sampling sites #1, #2, and #10, with the highest loadings in PC1 during November (Fig. 4g), were extracted by CCA (Fig. 8c), proving that the leading potential factor of the positive half of AX1 was TYLF. And FLF was the main potential factor of negative half of AX1, the sites #5, #7, #8, and #9 were deeply affected by it (Fig. 8c). On the positive half of AX2, COD Cr was the unique potential factor, associated with sites #4, #1, #11, and #7 (Fig. 8c). The smaller angle was obtained between the negative half of AX2 and the TN arrow, indicating that the TN was the potential factor. Therefore, sites #10, #3, and #6, with higher negative loadings, presented higher values of TN in November (Fig. 2g).

Conclusion
The water quality level in the lake was the highest in April, followed by July and November. The water quality of the central region in the lake was better than those of other regions in the above-mentioned months. DOM from the lake contained five fluorescence components: TYLF, TRLF, MHLF, FLF, and HLF, which in November were distinguished from April and July. It is attributed to the water of the Yellow River being continuously loaded into the lake as water replenishment from April to September. The TRLF dominated in DOM fractions during the period of water replenishment, while TYLF was the predominant component during the period of water non-replenishment. The potential factors included TYLF, TRLF, MHLF, SD and BOD 5 in April, which were associated with organic matter pollution. The potential factors contained TYLF, TRLF, FLF, Chla, TP, COD Cr and DO in July, suggesting the enrichment of TP lead algae and plants growth. The potential factors in November consisted of TYLF, TRLF, COD Cr , SD, TN and FLF, representing residue of the algae and plants have been deeply degraded. The replenishment water was the main factor controlling the water level as well as potential factor effecting the variations of DOM fractions, resulting in fluctuations of the water level. According to the results of this study, monitoring and controlling of nutrients input from replenishment water, as well as removing of overgrown aquatic plants from the lake during non-replenishment period, can be one of the methods to control pollution of Lake Shahu and brackish-water lakes or other types of waterbodies with similar pollution status.