Non-target screening for detecting the occurrence of plant metabolites in river waters

In surface waters, using liquid chromatography coupled to high resolution mass spectrometry (LC-HRMS), typically large numbers of chemical signals often with high peak intensity remain unidentified. These chemical signals may represent natural compounds released from plants, animals and microorganisms, which may contribute to the cumulative toxic risk. Thus, attempts were made to identify natural compounds in significant concentrations in surface waters by identifying overlapping LC-HRMS peaks between extracts of plants abundant in the catchment and river waters using a non-target screening (NTS) work flow. The result revealed the presence of several thousands of overlapping peaks between water—and plants from local vegetation. Taking this overlap as a basis, 12 SPMs from different compound classes were identified to occur in river waters with flavonoids as a dominant group. The concentrations of the identified compounds ranged from 0.02 to 5 µg/L with apiin, hyperoside and guanosine with highest concentrations. Most of the identified compounds exceeded the threshold for toxicological concern (TTC) (0.1 µg/L) for non-genotoxic and non-endocrine disrupting chemicals in drinking water often by more than one order of magnitude. Our results revealed the contribution of chemicals eluted from the vegetation in the catchment to the chemical load in surface waters and help to reduce the number of unknowns among NTS high-intensity peaks detected in rivers. Since secondary plant metabolites (SPMs) are often produced for defence against other organisms and since concentrations ranges are clearly above TTC a contribution to toxic risks on aquatic organisms and impacts on drinking water safety cannot be excluded. This demands for including these compounds into monitoring and assessment of water quality.


Background
Surface waters may contain a large number of chemicals detectable as signals in LC-HRMS including a large fraction of unknown chemicals often with high peak intensity [57]. In addition to synthetic chemicals and transformation products thereof, these signals may represent also natural compounds released from plants, animals and microorganisms, which may be not only considered as a confounding factor in chemical and effect-based screening of water contaminants but may also contribute to the cumulative toxic risk of water contamination [18]. Thus, reducing the number of unknowns in water samples by identifying also natural compounds in significant concentrations in surface waters will help to improve monitoring and assessment of water quality potentially impacted by complex mixtures of natural and synthetic compounds as shown recently for carbolines and aromatic amines [46]. Nanusha et al. Environ Sci Eur (2020) 32:130 Plants are known to produce a large number of SPMs depending on the species, season and environmental conditions [5,28,45]. SPMs play a significant role in controlling essential functions of growth and reproduction [41] and enable the synthesizing plants to overcome temporary or continuous threats and to establish biological and ecological relationships with other organism [5,41]. SPMs are typically advantageous to the producing plants but may cause adverse effects in other organisms exposed to these SPMs [41]. A wide variety of metabolites are released from plants through microbial decomposition and enzymatic degradation of plant parts with metabolites leaching to the receiving surface water through rain sewers or surface runoff [3,45]. SPMs are also released by root exudates and volatilization from living plants [45,49]. SPMs are diverse in their structure and effect on human health and wildlife [1]. Effects of SPMs on human health may be ambiguous. For instance, flavonoids of plant origin are often considered as safe and widely accepted as health promoting phytochemicals. However, experimental in vivo and in vitro studies have produced conflicting results. Some flavonoids (e.g., quercetin, rutin) interact with DNA and/or exhibit carcinogenic activity in rodents shown in male rat for a dose of about 60 mg/kg in vivo [24]. Others have mutagenic (e.g., quercetin) and/or pro-oxidant effects and may interfere with essential biochemical pathways [16,36]. Isoflavones such as genistein exhibit estrogenic activities [22]. The flavonoids, kaempferol and apigenin act through estrogen-receptor mediated mechanisms and exhibit antiestrogenic effects at a concentration of 34 and 32 µg/L in in vitro [56]. Several flavonoids including kaempferol and quercetin inhibit cholinesterases (AChE, BChE), with quercetin being most active at IC 50 of 62 mg/L [7,23].
Recent studies revealed a large diversity of phytochemicals from different classes of compounds (e.g., formononetin, gramine and senecionine) in environmental samples such as water and soil [18,20,47] and concentrations exceeding thresholds of toxicological concern for drinking water (TTC) [42] in water bodies from natural and agricultural areas. Assuming that only a minor fraction of SPMs in surface waters is known, considering these compounds in water quality monitoring and assessment may not rely on target screening only [9,11,18,20,47]. Here, non-target screening is helpful to access also unknown and unnoticed contaminants such as SPMs using liquid chromatography coupled to high-resolution mass spectrometry LC-HRMS) detecting as many contaminants as possible in parallel. Significant chemical information (e.g. elemental composition, chemical formula, isotopic pattern) can be extracted in a single experiment [33,34] to be used as input for structure elucidation. In the present study we tested non-target screening as a tool to identify SPMs from surrounding vegetation in water samples using LC-HRMS. The impact of vegetation on the chemical mixture in river waters was investigated by comparing NTS data of water samples with NTS data of eluates of vegetation abundantly present along the examined rivers. Since toxicity data of SPMs are extensively lacking, preliminary toxic risk estimates of individual compounds and mixtures were based on TTCs [42] for environmental contaminants in drinking water being aware that this approach is not directly applicable for surface waters without human consumption. Mixture risks were estimated using the concentration addition model for mixture effects.

Description of water and plant sampling locations
Study areas are located at the north-western part of the federal state of Saxony (Leipzig floodplain forest along the rivers Elster, Pleiße and Luppe called EPL catchment) and Saxony-Anhalt (Bode catchment), Germany. The floodplain is characterized by the trees Quercus robur, Fraxinus excelsior and Acer pseudoplatanus, while in spring, the forest scrub is dominated by plants from the amaryllis family (Amaryllidaceae) such as Allium ursinum and Galanthus nivalis [32]. The Bode catchment is characterized by a large diversity of natural and agricultural vegetation along a number of small streams. From both locations, streams with river banks covered by few highly abundant plants of interest were designated for this study. The study in the ELP catchment focused on three small streams and two seasonal plants, Allium ursinum and Galanthus nivalis, while in the Bode catchment three rivers, namely, Getel, Drängetalwasser and Barrenbach with their corresponding plant species Fraxinus excelsior, Digitalis purpurea and Conium maculatum L., respectively, were selected.
Both plant and river water samples were collected during plant growth season following rain events in the 2019 summer season, based on the hypothesis that under these conditions plants are particularly prone to leave their SPM fingerprints in the aquatic environment [6,8].
To this end, we collected a total of 8 water samples (see Additional file 1: Table S1) including 5 samples from 3 streams from the ELP catchment and 3 samples from 3 streams in Bode catchment (Fig. 1). Water samples were taken with glass beaker (500 mL) and solids were allowed to settle for about 2 min before transferring to a precleaned glass bottle. Aliquots of 1 mL were transferred to 2-mL autosampler vials for the chemical analysis. Backup samples were frozen in 125 mL Nalgene bottles. The sampling bottles were pre-cleaned and oven dried before use. Bottles were rinsed with the river water prior to sample collection. As a control, five river water samples were collected during dry weather conditions-3 from EPL and 2 from Bode catchments. Plant samples were collected in the immediate vicinity of the sampled rivers. The composite of random plant samples were collected using pre-cleaned scissors and kept in plastic bag. Both plant and water samples were chilled with ice packs during transportation to the UFZ laboratory, and then stored at − 24 °C until analysis.

Chemicals and materials
For sample preparation and analysis, LC-MS grade methanol, formic acid and ammonium formate from Honeywell and LC-MS grade water from Thermo-Fisher were used. For the extraction of plant materials, glassbottled drinking water (Lauretana, characterized by low contents of minerals) was used. For structural confirmation and quantification, analytical standards at least of 90% purity were obtained from various suppliers (see Additional file 1: Table S2).

Sample extraction and preparation
The collected plant samples were cut into pieces and 0.5 g portion were soaked in 50 mL water in an extraction vessel for 2 h and 30 min. This time period was selected to represent the duration of a typical rain event. The aqueous extract was separated from the solid residue using glass microfiber filters (Whatman GF/A, diameter 47 mm) in vacuum filtration. The filtrates were stored in the freezer for subsequent analysis using LC-HRMS.
Water samples and plant eluates were prepared for direct injection by adding 25 µL of internal standard mix (see Additional file 1: Table S3) containing isotopelabelled compounds (40 ng/L), 25 µL of methanol (LC-MS grade) and 10 µL of ammonium formate buffer (2 M, pH = 3.5) to each 1 mL of sample aliquot. Field, trip and method blanks were treated and analysed exactly in the same way as water samples and plant eluates.

Chemical analysis using LC-HRMS
LC separation was done on a Kinetex C18 EVO column (50 × 2.1 mm, 2.6 µm particle size) using a gradient elution with 0.1% of formic acid (eluent A) and methanol containing 0.1% of formic acid (eluent B) at a flow rate of 300 µL/min. After 1 min of 5% B, the fraction of B was linearly increased to 100% within 12 min and 100% B were kept for 11 min. Subsequently, the column was rinsed for 2 min with a mixture of isopropanol + acetone 50:50/eluent B/eluent A (85%/10%/5%) to remove hydrophobic matrix constituents from the column. Finally, the column was re-equilibrated to initial conditions for 5.7 min. To protect the main column from matrix, a 0.2 μm stainless steel inline filter (Phenomenex) and a Kinetex XB-C18 3 × 5 mm pre-column were used. Aliquots of 100 µL were injected to Thermo Ultimate 3000 LC system (consisting of a ternary pump, auto sampler and column oven operated at 40 °C) coupled to a quadrupole-orbitrap instrument (Thermo QExactive Plus) using electrospray ionisation (ESI). The spray voltage was 3.8 kV (positive mode), the sheath gas flow rate was 45 a.u., the auxiliary gas flow rate 1 a.u. and the heater temperature 300 °C. Full scan experiments (100-1500 m/z) at a nominal resolving power of 140,000 (referenced to m/z 200) were conducted in positive ion mode. For structural determination and confirmation, data dependent MS/MS experiments were carried out at nominal resolving power of 35,000. Since many SPMs contain nitrogen functionalities, esters or keto groups ionizing preferably in positive ion mode, we used only positive mode data for the detection and identification of SPMs.

Data handling for qualitative analysis
For data processing, the Thermo raw files acquired were converted to mzML format and centroids with Prote-oWizard (version 2.1.0) [25] and imported into MZmine 2.38 [53]. MZmine parameters such as mass detection, chromatogram building smoothing, peak alignment and gap filling were adjusted to get optimal peak detection (for more information see Additional file 2: Table S4) [29,30,43]. The transformed peak list was exported as csv file for further processing in MS Excel 2013. To remove noise and background and to reduce false positives, we applied a lower cut-off intensity (10 4 ) and blank correction. The positive detects were discarded if the peak intensities in the extracted chromatogram were below the threshold intensity (Eq. 1) or if a peak of similar retention time and similar or higher intensities was found in the blank samples. The remaining positive detects were extracted from the peak list and used for further metabolite identification. For performance evaluation of the workflow, 40 isotope-labelled internal standards were spiked to the samples and blanks, which could all be detected by the peak picking procedure in MZmine.
(I T threshold intensity, I Bav average intensity of the blanks, SD Iblank standard deviation of intensities of blanks).

Detection and structural elucidation of unknowns
For the identification of unknowns, we engaged a nontarget workflow (Additional file 2: Figure S1) consisting of three main steps; first, an empirical approach focused on selecting peaks from vegetation in river waters; therefore, overlapping peaks between plant and river water from adjacent location were extracted from the dataset. In few cases selection of overlapping peaks resulted in inclusion of isobaric compounds rather than identical compounds. If such a peak in water could be identified (1) as an SPM it was accepted in the list despite it was not detected in plant extract. Second, among the overlapping peaks, those with high intensity in plant extracts were subjected to further analysis. By inspecting extracted ion chromatograms (XICs), peaks with broader shape and well unresolved apex were excluded from the candidate list. Then, molecular formulas were evaluated using the Qual Browser of Thermo Xcalibur and searched against freely available compound databases (PubChem, Chem-Spider, Phytotoxin (TPPT) and KEGG) for formula query. The number of compounds for a given molecular formula was taken as an indicator for the probability of detection of the compound in river water and for the commercial availability of a reference standard. The isotopic pattern similarity between the computed formula and recorded mass spectra was used for confirmation of the elemental composition. The plausibility of the generated chemical formulas was checked using Seven Golden Rules for heuristic filtering of molecular formulas [31]. Finally, for structural elucidation, the MS/MS spectra of most plausible chemical structure were compared in the spectral libraries mzCloud (https ://www.mzclo ud.org) and Mass-Bank (https ://www.massb ank.eu), and supported by high rank structure in in silico fragmentation tools MetFrag (https ://msbi.ipb-halle .de/MetFr ag/), CSI:Finger ID integrated into SIRIUS 4 [14] and CFM-ID(https ://cfmid .wisha rtlab .com/). For more information on the settings used in in silico fragmenters, see Additional file 2: Tables S5, S6, S7. Peaks without plausible hits from spectral database and in silico fragments were discarded. If commercially available, reference standards were purchased for the most likely structures. MS/MS fragmentation in sample and reference standard with a mass accuracy of 5 ppm and the retention time within a window of 0.1 min were used for structural confirmation. The level of identification for each metabolite structure was reported according to confidence level proposed by [57].

Quantification of the identified PMs
TraceFinder (ThermoFisher Scientific Version 3.2) was used for the quantification of identified SPMs. A series of calibration standards ranging from 1 to 5000 ng/L were prepared. All the calibration standards were treated exactly the same way as river waters and plant extracts. Samples exceeding the highest calibration level were diluted and re-run. The metabolites were quantified using the internal standards with the nearest retention time.

Risk estimates
Since for the plant SPMs detected in this study, no toxicological data are available to conduct risk assessment, tentative risk estimates were based on TTC for nongenotoxic and non-endocrine disrupting compounds of 0.1 µg/L in drinking water. We defined the ratio between measured concentration of the compounds i (ci) and TTC as risk quotient (RQ), and calculated mixture RQs as the sum of individual RQs (Eq. 2) assuming a mixture RQ below one as safe for exposed humans and aquatic organisms.

Peaks detected in waters and plant extracts
For both water and aqueous plant eluates, the transformation of LC-HRMS output data resulted in a massive dataset (peaks list). After noise, background contaminant and blank correction, 13,000 to 29,000 and 50,000 to 70,000 peaks (defined by m/z, retention time and intensity) were considered to be positive detects in (2) RQ = c i TTC river waters and plant extracts, respectively. The positive detects represented organic molecules from all possible sources in the environment-both anthropogenic and natural. In a first step, we identified peaks common to vegetation and adjacent river water, which ranged from 4900 to 18,500 peaks for the individual pairs (Fig. 2).
For illustration, an aqueous extract of Galanthus nivalis and river water from an adjacent location are discussed here. As displayed in Fig. 3, a larger number of common peaks (red spots) were obtained between Galanthus nivalis extracts and rain event water samples ( Fig. 3-right) than for water samples under dry weather conditions (Fig. 3-left). A similar trend could be observed for all analysed plant-river water pairs. The majority of peaks in plant extracts (green spots) exhibit a higher retention time and thus hydrophobicity than those in water (blue spots). The agreement of m/z and retention times still allows for different isobaric compounds detected at the same retention time and thus requires further steps to narrow down to common structures.

Peak prioritization and structural identification of metabolites Prioritization of overlapping peaks
Peaks were prioritized for identification using a stepwise filtering approach demonstrated on the basis of Galanthus nivalis and a rain event river water sample from an adjacent location (Fig. 4). After limiting positive detects in both samples to common peaks only (8574), the overlapping peaks were ranked based on intensity in plant extracts and corresponding water samples considering two general assumptions. (1) Peaks with low intensity in plant extracts (selected threshold 10 6 ) have low probability to enter to river water in a sufficient quantity to be detected. (2) Peaks appearing at higher intensity in river water than in plant extracts are unlikely to originate from the plants. Both criteria were used to exclude peaks of low priority. In our example, this prioritization step reduced the number of peaks to be considered to 1406 which is 8% of the initial peak list (16,594 peaks). Broad peaks with low intensity and not well-defined apex were manually eliminated by inspecting the peak shape. In a next step, the elemental composition of each peak was evaluated based on accurate mass (with an error range given in 5 ppm for exact mass) considering the elements C, H, N, O, P and S-commonly occurring in natural products [9,54,64]. Finally, the isotopic fit analysis resulted in 261 (1.5% of initial peaks) tentatively identified candidate peaks.

Identification of unknown SPMs
All 216 peaks selected as candidates were subjected to further identification efforts combining a set of software tools for retrieving possible chemical structure with selection criteria based on database (and software) search and MS/MS fragment consideration as exemplified for two structures below. For a river water sample with high abundance of Galanthus nivalis in the catchment, we perceived plausible chemical structure for 54 out of 216 candidate peaks using spectral database search (Mass-Bank and MZcloud) and in silico fragmenters (Metfrag, CSI Finger ID, CFM-ID). By analysing MS/MS fragment, we were able to identify nine of the metabolites (Fig. 4) to confidence level 1-agreement with reference standard based on two orthogonal variables MS2 and retention time [57]. Three more metabolites were also identified to level 1 in the remaining water samples resulting in a total of twelve identified SPMs and other metabolites. The stepwise identification of unknown SPMs will be demonstrated for two examples. For one of the candidates, the accurate m/z of the unknown protonated molecule at a retention time of 0.8 min was determined to be 136.0619 mu. The PubChem search for the elemental composition resulted in five molecular formulas within 5 ppm mass accuracy. The isotopic pattern analysis confirmed the presence of N in the unknown molecule, thus formula not containing N were excluded, which left C 5 H 5 N 5 to be the only potential candidate with 284 registered chemical structures. Furthermore, the data dependent MS/MS fragment ion masses of the unknown molecule were matched with fragmentation pattern of the suggested molecules in the library. Adenine as the compound with the highest spectral match was selected as potential candidate and confirmed with a reference standard based on retention time and MS/MS fragment (see Additional file 2: Figures S2  and S3).
The second accurate mass, chosen for illustration, is 287.0549 mu eluting with a retention time of 10.4 min. Within the set limit, evaluation of the elemental composition using QualBrower of XCalibur resulted in 22 formulas applying a mass error window of 5 ppm. Formulas containing N and S were discarded, since the isotopic pattern analysis of full scan (MS1) spectra did not provide any evidence on the presence of N and S in the candidate molecule. Consequently, the only remaining molecular formula C 15 H 10 O 6 (Δ = − 0.085 ppm) was taken as potential candidate, for which 302 candidate structures were proposed by the database (PubChem). For the determination of the chemical structure, the data dependent MS/MS fragment ion spectrum was submitted to MetFrag, CFM-ID and CSI:finger ID to compare those with in silico predicted spectra for candidate structures retrieved from databases such as PubChem, KNAp-SAcK, Chemspider and KEGG. Among the structures suggested, the one with highest score and also with highest spectral similarity, namely kaempferol, was selected as plausible candidate structure. This compound could be confirmed in turn with a commercial reference standard based on retention time and MS/MS fragment match (see Additional file 2: Figures S4 and S5). Thus, from the above analysis the suspected unknown molecule was confirmed to be kaempferol.
Following a similar approach, the presence of nicotiflorin, hyperoside, cynaroside (luteolin 7-O-beta-d-glucoside), trifolin (kaempferol-3-O-galactoside), alpinetin, isofraxidin, apiin, guanosine, quercetin and kaempferitrin was confirmed in river waters. The chromatogram and MS/MS spectra for the identified compounds are given in Additional file 2: Figures S6-S25). All the detected metabolites were also obtained in plant extracts, except alpinetin and kaempferitrin, with common peaks detected in water and plant samples but confirmed only in water with isobaric but not identical compounds in the plant extracts. Among the detected plant metabolites, 10 are SPMs, while the nucleic bases adenine and guanosine are components of DNA and RNA and thus not SPMs in a strict sense but subsumed under the same abbreviation. The chemical structures for the identified metabolites are displayed in Fig. 5. See Additional file 2: Table S8 for full information on the identified metabolites in both river water and plant extracts.

Distribution of the identified metabolites in river waters
SPMs of different classes, flavonoids (and their glucosides), coumarins and purine nucleobases were identified and quantified (Fig. 6). In total, the presence of twelve SPMs in river waters from both catchments was confirmed with flavonoids being the predominant class detected. In general, most of the identified metabolites contain one or more phenolic groups representing a class of compounds found most abundantly in vegetation [55]. The identified SPMs have been detected in individual water samples at concentrations up to about 5 µg/L (Fig. 6, and Additional file 2: Table S8). The highest number and concentrations of identified SPMs have been found in two samples (ELP2 and ELP21) from the ELP catchment collected during heavy rain, while in none of the control (dry weather) samples, the identified metabolites were detected (data not shown). This finding supports the hypothesis that rain events drive the leaching of SPMs to surface water.
Most SPMs were detected in water samples from both catchments, with the exception of alpinetin, hyperoside, kaempferitrin and quercetin which were detected in the ELP catchment only. Among the detected SPMs, adenine and isofraxidin were obtained at high frequency in both water samples and plant extracts. This has been followed by cynaroside in water samples and trifolin in plant extracts (Table 1 and Additional file 2: Figure S26). In river waters, SPMs were detected in an overall concentration range of 0.02 to 5.1 µg/L (Fig. 6, Table 1).
The purine bases adenine and guanosine were detected at concentration range of 0.4-4.0 µg/L and 35-189.5 µg/g in water samples and plant extracts, respectively (Table 1). Adenine is an aromatic base found in both DNA and RNA of living organisms. The compounds were previously isolated from a variety of plants (e.g., maize, tea and coffee plants) [4,59]. Guanosine was reported to have neurotrophic and neuroprotective effects, evidenced from rodent and cell models study in vivo at 7.5 mg/kg [10,37,52].
Flavonoids, a class of natural compounds widely distributed in plants, including kaempferol and quercetin were detected in several water samples and plant extracts from ELP and one from Bode catchment. Quercetin was Nanusha et al. Environ Sci Eur (2020) Fig. 5 The chemical structure of identified SPMs in river waters obtained at an average concentration of 2 µg/L. Besides their potential positive effects such as antiproliferative, chemopreventive, and anti-inflammatory activities [35], kaempferol and quercetin inhibit the acetylcholinesterase (AChE) activity in vitro at IC 50 of approximately 32 and 4.7 mg/L, respectively [44,48,51,65]. In vivo study, quercetin demonstrated toxic and carcinogenic effects in the kidney of male rats at doses above 40 mg/kg [13,15]. The flavanone alpinetin and the glycosyloxyflavone kaempferitrin (a 3,7-dirhamnoside of kaempferol) were obtained in river waters from ELP, but not in the investigated plant extracts (despite overlapping peaks by isobaric compounds). However, the metabolites were previously reported from a variety of other plants in the environment-alpinetin from genus Alpinia (flowering plants) and kaempferitrin from Lathyrus (a genus in the legume family Fabaceae) [2,12,27,38,63]. In the present study, no evidence was obtained for the presence of such plants along the investigated rivers. The measured concentration of kaempferitrin was 0.9 µg/L, while alpinetin was present in concentrations of 23 and 50 ng/L. Besides its antibacterial and anti-inflammatory activities, alpinetin exhibited vasorelaxant effects on rat at a mean concentration (IC 50 ) of about 7.4 mg/L in in vitro study [63]. It also showed potential effects in downregulating the immune system in mice [17]. A study by Zhang   [66]. The glycosyloxyflavone apiin was measured at a high concentration (5 µg/L) in a water sample from the Bode catchment but was also obtained in two water samples from ELP at an average concentration of 2.9 µg/L. Another flavonoid glucoside, namely nicotiflorin (kaempferol 3-O-rutinoside) was obtained in rivers from both catchments-two from ELP and one from Bode catchment-at an average concentration of 2 µg/L. However, both metabolites were detected only in one plant extract each-apiin in Digitalis purpurea and nicotiflorin in Fraxinus excelsior from Bode catchment, though, Fraxinus excelsior is a characteristic plant in the ELP floodplain forest, too. The detection of apiin in ELP water samples indicates leaching also from other frequently occurring plant species (not considered in this work) including Apiaceae [2] and stinging nettle (Urtica dioica) [50]. In vitro, apiin displayed anti-inflammatory activity at IC 50 of 49 mg/L [40]. Nicotiflorin has many interesting pharmacological activities, such as decreasing arterial blood pressure and heart rate and hepatoprotective effects in mice in vivo [21]. It was found to protect against memory dysfunction and oxidative stress in multi-infarct dementia model rats at 30 mg/kg in vivo [21,26].
In only two water samples from ELP, an average concentration of 3.9 µg/L was registered for hyperoside (a quercetin-3-O-D-galactoside). It was also detected in substantial concentrations in plant extracts (Fraxinus excelsior and Galanthus) from close vicinity, from which it could be emitted (Additional file 2: Table S8). It may have potential as a therapeutic agent for the treatment of liver fibrosis [61]. It improves cardiac function and prevents the development of cardiac hypertrophy via AKT signalling at concentration of about 4.6 mg/L in vitro [62]. Hyperoside, at concentrations 10 mg/kg in vivo, was found to present a depressor effect on the central nervous system as well as an antidepressant-like effect in rodents which is, at least in part, mediated by the dopaminergic system [19]. The water-extractable hypersoside from Hypericum species demonstrated an acetylcholinesterase inhibition effect at IC 50 of 66 mg/L [23].
Cynaroside and trifolin glycosyloxyflavones in water samples occurred at concentrations ranging from 0.2 to 2.1 and 0.3-2.9 µg/L, respectively (Table 1 and Fig. 6). The former was identified in five water samples-four from ELP and one from the Bode catchment, while the later was in three samples-two from Leipzig and one from Bode catchment. Both metabolites were also detected in plant extracts from both catchments. Cynaroside shown to cause a prominent anti-oxidant effect, inhibiting lipid and protein oxidation. In vitro, it also displayed inhibitory effects on human liver cytochrome P450 (CYP) isoforms with an IC 50 value of 7 mg/L [60]. Trifolin (kaempferol-3-O-galactoside), which is a galactose-conjugated flavonol, exhibits antifungal and anticancer effects at IC 50 value of about 50 mg/L in vitro [39].
The coumarin, isofraxidin was obtained at an average concentration of 0.03 µg/L in two water samples from each location. In the rest of the water samples, except one from Bode catchment, it was found at an average concentration of 0.2 µg/L. The SPM was quantified in all the plant extracts-the highest being in Fraxinus excelsior, a characteristic tree along the rivers in both catchments. Apart from its numerous pharmacological activity such as antioxidant and anti-inflammatory, isofraxidin inhibited human liver cytochrome P450 (CYP) isoforms in vitro with an IC 50 of about 3 mg/L [58].

Toxic risk estimation
The SPMs have been detected in water samples not as individual compounds but in mixtures of at least three SPMs co-occurring at all sites, while at two samples, even nine metabolites were detected (Fig. 7a). Thus, a preliminary mixture RQ based on a TTC of 0.1 µg/L exceeded 5 Fig. 7 a Co-occurrence of detected metabolites in between sites and b the number of samples exceeding mixture risk quotient (RQ) levels of metabolites (and thus also 1 at all the sites), while at 7 sites, a value of 10 and at 3 sites even a value of 50 was exceeded (Fig. 7b). Individual concentrations of the detected SPMs, except isofraxidin (in three water samples) and alpinetin, were also above the TTC. Thus, toxic risks by individual SPMs and mixtures thereof and a contribution to overall toxicity of surface waters cannot be excluded and demand for additional efforts in hazard characterization.

Conclusion
In this study, for the first time a novel approach has been applied to associate unknown peaks of high intensity in LC-HRMS NTS to SPMs from surrounding vegetation by focusing on peaks overlapping between river water and aqueous plant extracts. A high number of peaks has been found in this overlap suggesting a significant impact of vegetation on chemical mixtures detectable in surface waters. In total, 12 SPMs and other metabolites could be identified including flavonoids, flavonoid glucoside, coumarins and purine bases with flavonoids as the predominant compounds. SPMs are produced by many plants and in surface water their individual concentration may reach up to 5 µg/L exceeding the TTC level (0.1 µg/L) for non-genotoxic and non-endocrine disrupting chemicals in drinking water. Although this finding does not necessarily indicate toxic risk to aquatic organisms it may illustrate the relatively high concentrations at which a contribution to mixture toxicity cannot be excluded. There might be possible contribution of these compounds to the effects sometimes detected with the effectbased monitoring tools even in natural and apparently pristine areas. Thus, this should be considered to explain discrepancies between expected effects by anthropogenic chemicals found in a water sample and detections with effect-based methods. Impacts of SPMs on quality of drinking water abstracted from natural water resources cannot be excluded. However, due to the lack of aquatic toxicity data for SPMs and extremely scarce exposure data, no reliable risk assessment and prioritization of SPMs for monitoring and assessment can be performed. Thus, SPMs should be included increasingly into chemical monitoring of surface waters to collect exposure data on a larger scale complemented with toxicity testing of compounds occurring frequently or in high concentrations. Substantial toxicity of individual compounds to mammals as reported above may also trigger hazard assessment of SPMs found in surface waters. The present study clearly indicates that identified compounds represent only the tip of the iceberg of possibly toxic SPMs in water resources. Thus, NTS-based approaches should be increasingly applied to understand complex mixtures of synthetic contaminants and SPMs.
Additional file 1: Table S1: Information on sampling site for river water and plant species. Table S2. Analytical standards used. Table S3. Internal standards used for the chemical analysis (ESIpos).
Additional file 2: Table S4. Setting for MZmine data processing. Table S5. Setting used in CSI:Finger ID for in silico fragment pattern prediction. Table S6. Setting used in MetFrag for in silico fragment pattern prediction. Table S7. Setting used in CFM ID for in silico fragment pattern prediction. Table S8. Concentration of detected metabolites in both water samples and aqueous plant elutriates. Figure S1. Work flow for the non-target detection of SPMs in river waters. Figure S2. Extracted ion chromatograms of adenine in reference standard, water sample and plant (Galanthus nivalis) elutriates. NL: signal intensity at 100%. Figure S3. MS/MS spectra (HCD fragmentation at 55 a.u.) of adenine in a reference standard, water sample and plant (Galanthus nivalis) elutriates. Figure  S4. Extracted ion chromatograms of kaempferol in reference standard, water sample and plant (Galanthus nivalis) elutriates. (NL: signal intensity at 100%). Figure S5. MS/MS spectra (HCD fragmentation at 55 a.u.) of kaempferol in a reference standard, water sample and plant (Galanthus nivalis) elutriates. Figure S6. Extracted ion chromatograms of apiin in reference standard, water sample and plant (Digitalis purpurea) elutriates. NL: signal intensity at 100%. Figure S7. MS/MS spectra (HCD fragmentation at 45 a.u.) of apiin in a reference standard, water sample and plant (Digitalis purpurea) elutriates. Figure S8. Extracted ion chromatograms of hyperoside in reference standard, water sample and plant (Galanthus nivalis) elutriates. NL: signal intensity at 100%. Figure S9. MS/MS spectra (HCD fragmentation at 55 a.u.) of hyperoside in a reference standard, water sample and plant (Galanthus nivalis) elutriates. Figure S10. Extracted ion chromatograms of nicotiflorin in reference standard, water sample and plant (Fraxinus excelsior) elutriates. NL: signal intensity at 100%. Figure S11. MS/MS spectra (HCD fragmentation at 45 a.u.) of nicotiflorin in a reference standard, water sample and plant (Fraxinus excelsior) elutriates. Figure  S12. Extracted ion chromatograms of cynaroside in reference standard, water sample and plant (Galanthus nivalis) elutriates. NL: signal intensity at 100%. Figure S13. MS/MS spectra (HCD fragmentation at 45 a.u.) of cynaroside in a reference standard, water sample and plant (Galanthus nivalis) elutriates. Figure S14. Extracted ion chromatograms of isofraxidin in reference standard, water sample and plant (Fraxinus excelsior) elutriates. NL: signal intensity at 100%. Figure S15. MS/MS spectra (HCD fragmentation at 55 a.u.) of isofraxidin in a reference standard, water sample and plant (Fraxinus excelsior) elutriates. Figure S16. Extracted ion chromatograms of kaempferitrin in reference standard and water sample. NL: signal intensity at 100%. Figure S17. MS/MS spectra (HCD fragmentation at 45 a.u.) of kaempferitrin in a reference standard and water sample. Figure S18. Extracted ion chromatograms of alpinetin in reference standard and water sample. NL: signal intensity at 100%. Figure S19. MS/MS spectra (HCD fragmentation at 45 a.u.) of alpinetin in a reference standard and water sample. Figure S20. Extracted ion chromatograms of quercetin in reference standard, water sample and plant (Fraxinus excelsior) elutriates. NL: signal intensity at 100%. Figure S21. MS/MS spectra (HCD fragmentation at 55 a.u.) of quercetin in a reference standard, water sample and plant (Fraxinus excelsior) elutriates. Figure S22. Extracted ion chromatograms of guanosine in reference standard, water sample and plant (Digitalis purpurea) elutriates. NL: signal intensity at 100%. Figure S23. MS/MS spectra (HCD fragmentation at 45 a.u.) of guanosine in a reference standard, water sample and plant (Digitalis purpurea) elutriates. Figure S24. Extracted ion chromatograms of trifolin in reference standard, water sample and plant (Galanthus nivalis) elutriates. NL: signal intensity at 100%. Figure S25. MS/ MS spectra (HCD fragmentation at 45 a.u.) of trifolin in a reference standard, water sample and plant (Galanthus nivalis) elutriates. Figure S26. Distribution of detected metabolites in the aqueous plant elutriates.

Abbreviations
LCHRMS: Liquid chromatography coupled to high resolution mass spectrometry; SPMs: Secondary plant metabolites; ND: Not detected; NQ: Not quantified;