Retrospective non-target analysis to support regulatory water monitoring: from masses of interest to recommendations via in silico workflows

Applying non-target analysis (NTA) in regulatory environmental monitoring remains challenging—instead of having exploratory questions, regulators usually already have specific questions related to environmental protection aims. Additionally, data analysis can seem overwhelming because of the large data volumes and many steps required. This work aimed to establish an open in silico workflow to identify environmental chemical unknowns via retrospective NTA within the scope of a pre-existing Swiss environmental monitoring campaign focusing on industrial chemicals. The research question addressed immediate regulatory priorities: identify pollutants with industrial point sources occurring at the highest intensities over two time points. Samples from 22 wastewater treatment plants obtained in 2018 and measured using liquid chromatography–high resolution mass spectrometry were retrospectively analysed by (i) performing peak-picking to identify masses of interest; (ii) prescreening and quality-controlling spectra, and (iii) tentatively identifying priority “known unknown” pollutants by leveraging environmentally relevant chemical information provided by Swiss, Swedish, EU-wide, and American regulators. This regulator-supplied information was incorporated into MetFrag, an in silico identification tool replete with “post-relaunch” features used here. This study’s unique regulatory context posed challenges in data quality and volume that were directly addressed with the prescreening, quality control, and identification workflow developed. One confirmed and 21 tentative identifications were achieved, suggesting the presence of compounds as diverse as manufacturing reagents, adhesives, pesticides, and pharmaceuticals in the samples. More importantly, an in-depth interpretation of the results in the context of environmental regulation and actionable next steps are discussed. The prescreening and quality control workflow is openly accessible within the R package Shinyscreen, and adaptable to any (retrospective) analysis requiring automated quality control of mass spectra and non-target identification, with potential applications in environmental and metabolomics analyses. NTA in regulatory monitoring is critical for environmental protection, but bottlenecks in data analysis and results interpretation remain. The prescreening and quality control workflow, and interpretation work performed here are crucial steps towards scaling up NTA for environmental monitoring.


Background
Organic pollutants are well-documented in aquatic environments [59]. Traditionally, target strategies that look for chemicals known in advance have been used to identify these compounds [27]. In contrast, nontarget analysis (NTA) helps discover previously undetected, unexpected and/or unknown substances. NTA has been under intense development in recent years, aided by advances in instrumentation and computational approaches [17,27]. Considering the vast chemical space of possible environmental pollutants [65], the need for NTA is becoming more pressing in order to tackle the growing challenge of identifying chemical unknowns in samples. Yet, data analysis in NTA remains a formidable challenge. To ease the "identification burden" in NTA, simplifying approaches like Suspect Screening, where chemicals on discrete lists suspected to be present in the sample are screened, are being taken in the interim [17].
Various successful examples of NTA [1,4,5,19,28,50,53,60] have inevitably encouraged interest in its potential role to monitor and manage chemical pollutants in the environment [17]. As the field matures, there is some consensus that NTA is "Ready to Go", with calls for it to be applied more widely within the regulatory frameworks of local, regional, and national authorities [17,18]. Data-mining routines like enviMass have contributed to such initiatives [34]; enviMass facilitates NTA by peak-picking and prioritising unknown features of interest worthy of further identification efforts. It does so by connecting mass spectral features based on criteria such as having signals of sufficient intensity, grouping together isotopologues and adducts of the same component, and detecting temporal trends, ultimately giving as output a list of m/z-retention time pairs, plus accompanying information for further identification efforts.
However, challenges for regulators to perform NTA persist, particularly with respect to high-throughput data analysis and identification following the mass prioritisation and peak-picking steps described above. For example, regulators may lack specific NTA expertise and/or resources to apply the potentially many and complicated computational workflows [15,33] available for analysing the copious amounts of data. In addition to the time-consuming and complex nature of data interpretation, issues related to standardisation and reproducibility exist, as there is currently no 'one size fits all' approach to identifying compounds using NTA [16]. As a result, NTA is currently often considered by regulators as "too much effort for too little sound evidence".
Another more systemic obstacle to applying NTA in a regulatory context relates to the divergent interests of scientists in academia, who are (currently) responsible for driving most NTA developments, and scientists in regulatory practice, who would implement these developments towards regulatory compliance and environmental protection. While the former aim often to develop and publish novel work, the primary mandate of the latter is regulatory compliance towards environmental protection. One possible consequence of this reality is that academic research outcomes resulting from NTA may not be directly relevant or in a form that is readily usable for regulators. In other words, researchers' questions may not be regulators' questions-what is possibly scientifically interesting may not be of priority or directly useful to regulators.
Despite these aforementioned challenges, it is possible (and important) to navigate both research and regulatory needs in NTA. The present work is an example of academic research driven primarily by regulatory priorities. In this "top-down" approach, pre-existing data were used to generate results of direct environmental relevance and with immediate implications for environmental management.
Three practical challenges characteristic of applying NTA in a regulatory environmental monitoring context arose in this study: (i) the study was framed by superlative questions that required a large volume of data to be analysed, i.e. identify unknown compounds occurring at the highest intensities and highest temporal frequency with point sources across all the samples of the sampling campaign; (ii) there was a strict and limited timeframe allowed for the study following project management procedures of the regulatory body, and (iii) the data originally collected had been repurposed for this NTA study as there was no capacity nor further resources available within the scope of the project to do additional measurements. The latter point was all the more critical as preliminary manual inspection of the available data revealed that not all measurements were fully suitable for the intended non-target identification. These challenges called for a high-throughput approach capable of processing large volumes of data of variable quality in a fast and reproducible way that would be compatible with Keywords: Non-target analysis, Suspect screening, Retrospective, Wastewater, Micropollutants, Cheminformatics, Identification, Monitoring, Regulation identification approaches downstream. Additionally, unlike the seemingly increasing complexity of existing workflows [33], an uncomplicated and 'minimal, barebones' but fully functional approach that is transparent and easily explainable is critical given the regulatory context.
MetFrag, used in this work to support identification efforts, is an example of an open in silico identification approach which satisfies the aforementioned criteria. Released in 2010 [68], it first retrieves potential candidates with matching mass from compound databases such as PubChem [23] (111 million chemical structures, August 2020), ChemSpider [7,48] (103 million chemical structures, February 2021), or smaller biological databases like the Human Metabolome Database [67], 20) (114,304 metabolites, February 2021). These candidates are then scored according to how well the experimental spectrum matches the in silico fragments generated per candidate using a bond dissociation approach [68], and subsequently ranked according to this Fragment-erScore (sometimes referred to as the Fragmentation Score or FragScore, or simply the MetFrag Score when it is the only component thereof ). For the identification of environmental "known unknowns", using fragmentation information alone in this way can give mediocre results (e.g., ~ 22 and 6% of 473 environmentally relevant standards ranked first with ChemSpider and PubChem, respectively [51]). This outcome may have various causes: (i) the search databases used are too large and/or do not contain only environmentally relevant compounds, therefore resulting in too many candidates that are not meaningful, and/or (ii) there is simply not enough information to distinguish candidates when considering their fragmentation alone.
To address these limitations, MetFrag was 'relaunched' in 2016 to incorporate further identification strategies beyond fragmentation, such as retention time information, substructure in/exclusion, availability of literature and patent information, presence/absence in suspect lists, and user-defined scoring terms [51]. Over time, spectral similarity comparison with spectra from the MassBank of North America (MoNA) (Fiehn [12] with and without a MetFusion approach [14] was also integrated into MetFrag. Since then, two further open-science/environmental chemistry developments have contributed significantly to MetFrag's extended capabilities for identifying environmental unknowns. Firstly, the release and integration of the United States Environmental Protection Agency's CompTox Chemicals Dashboard [66] (hereafter, "CompTox") into MetFrag provides a search database of > 850,000 compounds of environmental and toxicological relevance [54], while allowing users to leverage the "MS-Ready" concept [37] and various forms of chemical metadata availability in CompTox as user-defined scoring terms. Secondly, critical information from international regulatory bodies can now be exploited through MetFrag towards identifying environmental chemicals. Beyond (i) the US EPA's Chemicals and Products database (CPDat) ([62, [10] and other CompTox-related metadata terms that are already integrated via CompTox, MetFrag's user-defined scoring terms can also be configured to incorporate information such as (ii) hazard and exposure from the Swedish Chemicals Agency KEMI [13], (iii) European chemicals registration, i.e. REACH [2], and (iv) the NORMAN Network's merged suspect list of chemicals of emerging concern known as SusDat (NORMAN [43] representing knowledge gathered from NORMAN members, which include > 70 regulatory and academic reference laboratories throughout the world, as well as external contributions. Used in this way, MetFrag connects disparate resources from various regulatory agencies and academic researchers towards identifying environmental unknowns, practically 'helping researchers and regulators help each other' by providing an interconnected information platform with identification functionality.
The present work aimed to exploit "post-relaunch" MetFrag and Open Science developments towards retrospectively identifying non-target environmental pollutants in a regulatory context, as summarised in Fig. 1.
Here, pollutants determined to be of regulatory concern by regulators originating from industrial activities found in Swiss wastewater treatment plant (WWTP) effluents were the main subjects of this study, which focused on developing the open in silico workflow to identify them. A prescreening and quality control workflow for highthroughput automated data processing was developed to analyse a provided list of unknown m/z prioritised by enviMass. The use of MetFrag in this work leverages the state-of-the-art open resources mentioned above, chief among them, regulatory information from multiple international sources, in addition to exploiting many of MetFrag's post-relaunch capabilities. The identifications provided by MetFrag were analysed with respect to the specific environmental regulatory context of this study and communicated using an established system of confidence levels, discussed in detail in the next section.

Methods
Daily water samples were collected from 25 sites based at 22 WWTPs distributed across Switzerland within sampling campaigns focusing on point sources of industrial chemicals. Of these 25 sampling sites, 19 correspond to WWTP effluents (i.e., 1 site per WWTP), while 6 constitute paired influent and effluent sampling sites of 3 WWTPs (i.e., 2 sites per WWTP) which employ ozonation. The effluent from these 3 WWTPs employing ozonation came from secondary clarifiers. Five sites were sampled twice each (in June and October 2018, respectively), while 20 were sampled only once (June 2018), giving a total of 30 samples.
During each sampling campaign, 2 L of the 24-h flowproportional composite samples were collected daily at each sampling site over seven consecutive days. The sample was filled into two 1-L glass bottles and kept closed at 4 °C until the last day of the respective sampling campaign. That day, all samples were transported cooled to an analytical laboratory and were filtered, flow-proportionally mixed, and sent cooled for MS-analysis. The final samples used for measurement were flow-proportional 7-day composites.

Sample measurement
Prior to analysis, samples were filtered through a glass fibre filter and isotopically labelled internal standards were added (26 for positive and 7 for negative ionisation mode, respectively). Samples were analysed without enrichment by direct injection of 100 μl into the chromatographic system. Chromatographic separation of the analytes was performed using a Waters Atlantis T3 column (150 × 3 mm, 3 μm particle size) connected to a Thermo Scientific Accela liquid chromatography system equipped with a 1250 pump, open autosampler, and Thermo Scientific Column Oven 300. The mobile phase eluent A consisted of ultrapure water (ELGA LabWater Purelab Ultra from Labtec Services AG, 5 mM ammonium formate), while eluent B consisted of LC-MS grade methanol (Scharlau Chemie S.A, 5 mM ammonium formate). The gradient programme started with 10% B, which was kept for 1 min before a linear ramp to 95% B for 12 min. This condition was kept for 5 min before returning to starting mobile phase conditions at 18.5 min. The column was re-equilibrated for 4.5 min giving a total run time of 23 min with a flow rate of 300 μl/min.
A full-scan single MS measurement was performed using a Thermo Scientific QExactive Orbitrap LC/MS system with resolving power of 70,000 (at m/z = 200) within 7 days of sample collection and preparation. A scan range of 100 to 1000 was used in both positive and negative electrospray ionisation modes. A heated Following the prioritisation of non-target masses (described in Part 1 of the prescreening workflow of the next section), the resulting list of non-target masses formed the inclusion list for MS2 measurements of the same samples in data-dependent acquisition mode in February 2019. Normalised collision energy of 35 was used. The same measurement protocol as described above was applied with resolving power of 17,500 (at m/z = 200).

Part 1-enviMass prioritisation of masses of interest
enviMass (v.3.5, [34]) was used to prioritise non-target masses of interest based on the following criteria: highintensity MS1 peaks (used as a proxy for high concentration), presumed point source (occurring at one or only a few sampling sites), multiple temporal occurrences across the sampling campaign, i.e. high-frequency occurrences, and existing isotopologue and adduct linkages. Initially, a list of 300 non-target masses of interest was identified and used as an inclusion list for MS2 acquisition in the second round of measurements in February 2019 using the same samples that had been stored at 4 °C as described above. Of these 300 masses, 125 masses with associated [M + H] + and [M-H] − information from enviMass (117 and 8, respectively) were considered for further processing in the next step and constituted "List A". A further 60 masses with associated [M + H] + and [M-H] − information (28 and 32, respectively) were also considered for the next step ("List B"), but had not been measured as part of the inclusion list. The enviMass parameters used to derive Lists A and B are detailed in the SI. These lists were the starting point for the workflows described here.

Part 2-prescreening and quality control workflow
Data files in .RAW format were first converted to .mzML format using MSConvert from Proteowizard (v.3.0.19182-51f676fbe, [6]), with full settings available in the SI (Additional file 1: Figure S1). The data were preliminarily inspected manually using XCalibur Qual Browser (v.4.2.28.14, Thermo Fisher Scientific, Waltham MA, USA). Then, a workflow to extract, prescreen, and quality control the spectra of the precursor masses in Lists A and B was developed and performed prior to further identification efforts.
The prescreening workflow first extracts all MS1 and MS2 ion chromatograms of each m/z from each mzML file supplied to it as input. No post-processing of mass spectral features such as peak removal, filtering, or scaling is performed whatsoever during the extraction of spectra. Extracted MS1 precursors whose retention times are within 2 min of the mean retention time given by enviMass were deemed as matching the original list entries, considering possible drifts caused by wastewater matrix effects and normal variations in the LC analytical set-up, unless specified otherwise.
A 'case' was defined as a measurement whose chromatograms and corresponding spectra have the same m/z, retention time, and file source (essentially, a single unique measurement). As part of the prescreening, each case was subject to quality control: the MS1 and MS2 ion chromatograms were checked automatically by an algorithm within the workflow in a stepwise fashion as per checks and thresholds 1-5 listed in Table 1. Failure to meet any of the criteria in the checks caused the case to be rejected from further identification efforts.
Cases that passed quality control checks 1-6 were manually inspected for peak shape and width (check 7, Table 1). Only cases that passed all quality control checks Minimum peak width and overall shape (manual QC) 0.1 min 1-7 were used as input for MetFrag identification in the next part of the workflow. This prescreening workflow developed and used as part of this work has been embedded into the openly available R package Shinyscreen (v.0.1.1-paper, [24]).
First, the neutral monoisotopic mass corresponding to the [M + H] + or [M − H] − adducts indicated by envi-Mass in positive and negative mode, respectively, was calculated. Then, candidates of matching mass with a relative deviation of 5 ppm (selected to reflect the analytical mass error, also known as "Search ppm") were retrieved from CompTox. Subsequently, candidates were fragmented in silico using the following fragmentation settings: Absolute Fragment Peak Match Deviation 0.001 Da ("Mzabs"), Relative Fragment Peak Match Deviation 5 ppm ("Mzppm"), and Maximum Tree Depth 2. Then, candidates were ranked according to the MetFrag Score, calculated as the sum of ten weighted scoring terms summarised in Table 2 and explained in detail below. These terms are either already built-in, or can easily be configured within MetFrag since its relaunch [51]. Candidates with identical first block InChIKeys (i.e., stereoisomers, with the same structural skeleton) were grouped together.
Three scoring terms within the MetFrag Score reflect the contribution of the fragmentation spectra to the proposed identification: the FragmenterScore (in silico fragments explaining measured peaks, a function of peak count and bond dissociation energy), OfflineM-etFusion (spectral similarity to entries in MassBank of North America (MoNA) using a MetFusion approach [14], and OfflineIndivMoNA (maximum spectral similarity with MoNA entries having exact InChIKey match). Four scoring terms relate to the availability of the chemical's metadata: CPDAT_COUNT [66] (number of entries within US EPA's Chemicals and Products database), DATA_SOURCES [66] (number of data sources underlying CompTox, which performs similarly to the reference count), KEMIMARKET_HAZ (v.S17.0.1.3, [13]) (scaled and normalised hazard score calculated by the Swedish Chemicals Agency), and KEMIMARKET_EXPO (v.S17.0.1.3, [13]) (scaled and normalised exposure score calculated by the Swedish Chemicals Agency KEMI). The remaining three terms account for the candidate's presence or absence in suspect lists, another form of metadata availability: INDACT (Industrial Activity chemicals known to be used near the sampling sites, supplied by the regulator), REACH2017 (v.S32.0.1.3, [2]) (chemicals registered under the European legislation framework REACH), and NORMANSUSDAT (vS0.0.2.0, NORMAN [43] (chemicals in the merged NORMAN Suspect List Exchange). All metadata scoring terms were weighted 1 except for REACH2017 and NORMANSUSDAT, which were both weighted 0.5 due to the high redundancy across the two databases.
To calculate the maximum possible MetFrag Score, all the scoring terms except NORMANSUSDAT, REACH2017, INDACT, and OfflineIndivMoNA are first normalised to their respective largest values among the candidate set and scaled between 0-1. These normalised and scaled values are then summed together with the presence/absence scores of NORMANSUSDAT, REACH2017, and INDACT (0.5, 0.5, 1.0 if present, 0, 0, 0, if absent, respectively), and the similarity score from OfflineIndivMoNA (which is not scaled as it is already defined between 0 and 1).
Tentative identifications by MetFrag were communicated using an established system of levels [57], reiterated here with study-specific context for clarity: as MetFrag is an in silico method, it generally gives identifications of Level 3 confidence based on evidence for possible chemical structure using MS1, MS2 and experimental data/ context. These identifications are tentative and require further validation before achieving higher confidence levels, as do Level 2a identifications of probable structure based on a library spectrum match, corresponding to a high MoNA individual similarity score (> 0.9) in the present work. Level 1 identifications require confirmation of the structure using a reference standard and includes target compounds.

Prescreening and quality control
Preliminary manual inspection of the data using XCalibur Qual Browser (v.4.2.28.14, Thermo Fisher Scientific, Waltham MA, USA) indicated that not all measurements of each individual m/z were suitable for non-target identification because, e.g., MS1 precursors were often at low intensity, some MS2 spectra were absent, and spikes and/ or noise were observed in the MS1 extracted ion chromatogram instead of actual peaks. Therefore, the prescreening workflow consisting of 7 quality control checks (Table 1) was implemented to isolate measurements that were suitable for non-target identification. Figure 2 provides examples of measurements visualised using Shinyscreen which passed all quality control checks (Panel A) and failed either one or more checks (Panels B-E), respectively. The latter were automatically eliminated from further consideration by the workflow because they were deemed unsuitable for use in non-target identification. For identification, a total of 185 non-target m/z from both List A and List B were prescreened in each of the 30 mzML files, resulting in 5,550 cases possible for identification. For List A containing 117 m/z measured in positive mode, the prescreening workflow runtime was approximately 8 h on a laptop machine with 8 GB RAM and 2 physical cores over all 30 mzML files. Runtime was estimated based on timestamps from results file generation.
Of the 5,550 cases, 899 cases satisfied checks 1-5 listed in Table 1. Duplicate cases by m/z (e.g., if it was detected at more than one site) were eliminated by prioritising those with the highest MS1 intensity (check 6), leaving 157 cases (approximately 0.03% of total cases) to be manually inspected for peak width and shape (check 7, Fig. 2e). Of these 157 cases, only 22 passed manual inspection and qualified for further identification efforts using MetFrag (listed in full in Additional file 1: Table S2). Figure 3 summarises this data reduction outcome as a result of quality control within the prescreening workflow.

Tentative identification using MetFrag
Tentative identifications for the 22 m/z that passed quality control checks were obtained using MetFrag. Candidates for each m/z were proposed as ranked lists according to their respective MetFrag Scores comprising the ten scoring terms described in Table 2 (full MetFrag results with lists of ranked candidates available in Mas-sIVE). Figure 4 shows the distribution of MetFrag Scores classified into tertiles for the top-ranked candidate for each of the 22 m/z.

Interpretation of MetFrag results
Given the background and context of this work (i.e. NTA in environmental monitoring to identify high-priority unknowns), the MetFrag results described above do not represent a satisfactory end-point/end-product of this study. In other words, it does not suffice to present MetFrag's outputs (lists of ranked candidates, one list per m/z) alone, as these results alone do not provide sufficient direction for the next regulatory steps. Rather, it is crucial that these scientific outcomes are translated into transparent and actionable information for regulatory scientists to aid their future decision-making with respect to the following questions:

What does the distribution of MetFrag Scores mean
and what are the implications? 2. How can this information guide evidence-based decision-making regarding further identification efforts? (e.g., by adding candidates to suspect lists for future Suspect Screenings, purchasing reference standards for confirmation, etc.) The following section addresses these two questions through in-depth interpretation of MetFrag's results at two levels: at a global level across all 22 m/z studied, and at a candidate level per m/z, respectively. The aim of these interpretations is to deliver information based on scientific premises that is actionable from a regulatory point of view and in doing so, present 'complex' MetFrag results in an interpretable way using Scenario Analysis. Examples of cases which pass and fail quality control within the prescreening workflow. Quality control helped isolate measurements which were suitable for non-target identification and discarded those which are not. Panel A shows Shinyscreen's graphical user interface and an example of a case whose MS1-MS2 measurement is suitable for non-target identification-its extracted ion chromatogram shows a MS1 peak of sufficiently high intensity, a corresponding MS2 event that is temporally well-aligned, and its MS2 spectrum. The remaining panels show examples of cases that were eliminated from further identification efforts by the workflow as they were deemed unsuitable due to an excessively noisy MS1 spectrum (B; check 3 in Table 1  Regarding the MetFrag Scores of the top candidates for each m/z (Fig. 4), this distribution arises as a result of four possible combinations of Spectral and Metadata Score components contributing toward the final Met-Frag Score ( Table 3). The distribution is split into tertiles based on the range of MetFrag Scores possible (0-9), and each tertile is assigned an associated scenario, as explained below.
Scenario 1 features both strong spectral and metadata evidence supporting a given candidate, resulting in a High MetFrag Score. Moderate MetFrag Scores result when one of these two scoring components, Spectral or Metadata, is low and the other is high, leading to Scenarios 2 and 3. Finally, Scenario 4 describes situations where both Spectral and Metadata scores are low, resulting in Low MetFrag Scores. Table 4 shows the breakdown of the MetFrag Score into its component Spectral and Metadata terms for four illustrative examples, one for each scenario. These representative examples were selected from the distribution (Fig. 4) and are the respective top-ranked candidates for 4 m/z.
The implications of this distribution (Fig. 4) can guide future actions depending on whether depth or breadth of the NTA study is more important. For example, if the ultimate goal is to fully identify one or two high-priority non-target unknowns to Level 1 confidence, pursuing candidates with High MetFrag Scores (3 rd tertile, dark red region in Fig. 4, Scenario 1 in Table 3) is recommended. Alternatively, if gaining a wide survey of the possibly relevant but as yet unknown environmental pollutants throughout the sampling campaign is preferred (akin to a 'first-approximation' of the situation), then even candidates with moderate and/or low scores can also be considered further depending on the relevance of the scoring terms to the context. Additionally, further decisions on future actions can be made based on possible limitations of the study which may be known from the outset (see Discussion).  Close inspection of the MetFrag Score, namely its component spectral and metadata scoring terms, enables results interpretation on the individual candidate level for each m/z. Irrespective of whether a breadth or depth strategy is chosen, the lists of ranked candidates should always be scrutinised for plausibility because although each identification has a top candidate ranked first by MetFrag, the top candidate may not be the only candidate worth considering (if at all) given the context of the study. Below, an in-depth analysis and results interpretation of the top 4 candidates for selected m/z is presented in the following tables as examples of each of the scenarios (Table 3). Distributed Structure-Searchable Toxicity Substance Identifiers from CompTox, known as DTX-SIDs are given as identifiers. The choice to use DTXSID as candidate identifiers and not their compound names is addressed in the Discussion. Thirty-three compounds with matching mass were retrieved from CompTox and scored by MetFrag using the ten scoring terms ( Table 2). The top-ranked candidate, DTXSID4058156, has the highest total MetFrag Score out of all the candidates proposed (Table 5). In terms of spectral information, it has the highest Frag-menterScore and OfflineMetFusion score of all the candidates, as well as a MoNA library match of 0.998, while all other candidates had a MoNA library match of 0.
In terms of metadata and presence in suspect lists, DTXSID4058156 has abundant metadata, is present on many suspect lists compiled by the NORMAN Network (REACH2017, SusDat and KEMIMARKET), and has 47 underlying data sources in CompTox. Based on this aforementioned evidence, this identification has confidence level 2a.
Overall, both the spectral and metadata evidence strongly support Candidate 1 over the others, as seen in the large difference between the candidates' MetFrag Scores.
Candidate recommendation: Candidate 1 should be strongly considered for further identification efforts.
A reference standard of DTXSID4058156 (metazachlor) provided a retention time match within 0.03 min, thereby confirming the identification of this unknown as metazachlor with Level 1 confidence.

Scenario 2: low Spectral but high Metadata scores (moderate MetFrag Score; 3-6)
For m/z 187.0938, identified as a [M + H] + adduct by enviMass, the top candidate scored poorly in the Spectral terms compared to subsequent candidates. However, its strong scoring in the metadata terms ultimately drove its high MetFrag Score ( Table 6).
The distribution of MetFrag Scores in Table 6 indicates that the top 3 (or even 4) candidates have relatively similar scores. Although the spectral data rather support Candidates 2 or 3 as better matching the experimental data, the high KEMIMARKET_EXPO score for Candidate 1 indicates that it may be of greater concern in a regulatory context due to the potentially large exposure volumes, and could be considered for further confirmation efforts to eliminate this from consideration in future campaigns. Candidate recommendation: All top four candidates should be considered for further identification efforts due to high exposure and hazard scores.

m/z 249.0728
Additional example for Scenario 2: low Spectral but high Metadata scores (moderate MetFrag Score; 3-6) The information provided by high Metadata scores can serve as the discriminating factor between candidates when their Spectral scores yield little/poor information which in turn gives little indication of how to rank the candidates if only spectral evidence had been considered. In this sense, Metadata scoring terms contribute an extra layer of information beyond spectral evidence towards identifying potentially relevant unknowns.
For example, the top four candidates of m/z 249.0728 (Table 7) have comparably poor Spectral scores meaning there is overall little spectral evidence supporting these identifications. However, Candidate 1 distinguishes itself significantly from the other candidates because of its relatively high Metadata scores, in particular its KEMI-MARKET_EXPO, KEMIMARKET_HAZ, and presence in REACH2017. Therefore, it has higher environmental relevance than subsequent candidates, which explains its top ranking.

Scenario 3: high Spectral scores but low Metadata scores (moderate MetFrag Score; 3-6)
For the top candidates of m/z 152.0198, practically no metadata exists except for DATA_SOURCES-each candidate has 1, indicating that these are not particularly well-known chemicals (or, potentially newly discovered and not well documented in public databases yet). However, the FragmenterScores of the candidates differed sufficiently to discriminate between them and indicate that Candidate 1 may be the best match in this case (Table 9). Candidate recommendation: Candidate 1 may be considered for further identification efforts, but candidates for other masses are more promising in the regulatory context (Table 10).

Scenario 4: low Spectral scores, low Metadata scores (low MetFrag Score; < 3)
Candidates proposed for m/z 199.1050 had neither particularly strong spectral nor metadata information, resulting in low overall MetFrag Scores. In this case, there is no strong evidence that any of the candidates available in CompTox are of particular interest in the context of the investigation.   Candidate recommendation: Candidate 1 may be considered for further identification efforts, but candidates for other masses are more promising. Table 11 summarises the candidate recommendations presented above, where 7-9 candidates are recommended for further identification efforts for the 6 m/z presented here.

Information for regulatory decision-making on further identification efforts/next steps
The top four candidates for each of the remaining 16 m/z were analysed in the same way as discussed above, and candidates were evaluated based on the same criteria as described: prioritisation according to tertile, scenario, and Spectral and Metadata scores, including potential exposure and hazards (Additional file 1: Tables S3-S18). For these 16 m/z, a total of 25-49 candidates (out of    Table S19). Thus, for all the 22 m/z which underwent MetFrag identification in this study, an overall total of 32-58 candidates (out of possible 22 times 4 = 88) are recommended for further identification efforts. These candidate numbers are provided as ranges to allow for flexibility in project management and future steps, which may depend on available resources (see Discussion).

Discussion
In this study, non-target analysis was performed retrospectively on samples from Swiss WWTP effluents that had been collected as part of an existing regulatory environmental monitoring campaign. Instead of an exploratory approach that is still common amongst NTA studies, the research questions that directed this study were derived from regulatory priorities, thereby ensuring outcomes of direct and immediate relevance for environmental monitoring and protection. Unknowns of regulatory interest were defined as those with the highest intensities and highest temporal frequency with point sources across all the samples of the sampling campaign. These criteria had been predefined by the regulatory coauthors of this study, and resulted in a list of m/z of interest that were manually selected after filtering and sorting the masses using enviMass. In the current work, the mass spectra of the m/z of interest from the given list were subjected to pre-screening and quality control (Fig. 2) to ensure their suitability for use in non-target identification. Quality control isolated measurements worthy of further identification efforts and eliminated those of poor standard, effectively resulting in data reduction (Fig. 3). The prescreening workflow was written in R and is now openly available within the package Shinyscreen [24].
Then, MetFrag [51,68] was employed to provide tentative identifications for these unknowns, leveraging its extensive metadata capabilities "post-relaunch", as well as several open resources/information sources, including chemical information from regulators around the world. MetFrag analysis was performed via the command line using scripts based on ReSOLUTION [55] and RChem-Mass [56].
Tentative identifications for 22 m/z were obtained using MetFrag (21 at Level 3, 1 at Level 2a, whose identity was eventually confirmed to Level 1). These identifications were evaluated in terms of (i) a score distribution for the top candidates (Fig. 4) and (ii) Scenario Analysis (Table 3) according to the regulatory context and research questions underlying this work. Final candidate recommendations were given based on MetFrag Score breakdowns, thereby providing in-depth and transparent analyses of the spectral and metadata evidence for proposed candidates. For the 22 m/z analysed, 32-58 candidates were recommended for further identification efforts.
Regarding the analytical method, direct injection without enrichment was used here, as non-target compounds of high intensity were of primary interest and enrichment was not considered necessary. Additionally, Mechelke et al. recently found that direct injection is comparatively better suited to capturing a broader range of compounds, including highly polar compounds that would otherwise experience poor recovery during enrichment [38]. The spectral data were recorded using data-dependent acquisition mode with an inclusion list in this study. While future NTA work could explore the use of data-independent acquisition (DIA), omitting the necessity for an inclusion list, this adds other complexities, as lower intensity precursors may not yield fragments of sufficient intensity and data interpretation inevitably becomes more complicated, especially if complex matrices like wastewater with many co-eluting compounds are being studied. Quality control was a critical element in the prescreening workflow, as preliminary manual inspection of the data using XCalibur revealed variable data quality. In fact, most data (> 80% cases) were not fully suitable for the intended non-target identification. R scripts (now embedded within Shinyscreen package) were written to automate most of the quality control checks ( Table 1, checks 1-5). Automated quality control allowed for quick and reproducible processing of the large quantity of data needed to answer the superlative research questions guiding this work. The variable quality of the data had several likely causes: (i) List B masses were not in the inclusion list; (ii) MS2 were not measured immediately after MS1, therefore sample degradation over long storage time between MS1 and MS2 measurements could have occurred, and (iii) possibly over-restrictive enviMass prioritisation criteria. Thus, the small number of cases (~ 0.03% of total) passing all quality control checks and qualifying for MetFrag identification was not unexpected.
MetFrag was configured to comprise both Spectral and Metadata scoring terms, including chemical suspect lists and scoring terms from international regulators within the latter such as KEMIMARKET_EXPO, KEMIMA-RKET_HAZ, REACH2017, NORMANSUSDAT, and CPDAT_COUNT. Paired with CompTox as its candidate database, MetFrag was thus specifically customised to perform non-target identification of environmental unknowns in WWTP samples within a regulatory context in this work. Beyond using fragmentation information alone, using metadata to inform MetFrag's identifications proved to be especially important in certain situations, e.g., when Spectral scores based on fragmentation were not informative enough to distinguish candidates from each other (Tables 7 and 8). Crucially, the information provided by metadata can serve as guidance for future regulatory actions in the context of the environmental protection aims of this study. For example, although certain candidate(s) may not be top-ranked or have strong spectral evidence (Table 6), potentially concerning hazard and exposure scores may qualify a certain candidate for serious consideration in future work in the spirit of applying the Precautionary Principle.
Regarding the components of the MetFrag Score, a total of ten scoring terms, three Spectral and seven Metadata, were used to score candidates. Compared to most previous studies which used MetFrag as mentioned in the Introduction, this number may seem large. However, adding extra scoring terms does not appear to compromise MetFrag's identification capabilities. In fact, the additional scoring terms were beneficial because further bases for differentiating between candidates became available. In other words, using more scoring terms can provide more granularity when distinguishing candidates, which is important for candidate evaluation and recommendation. Further scoring terms based on physical-chemical properties could be integrated in the future such as correlation of the partitioning coefficient logK ow (or log P) with retention time as already available in Met-Frag [51]. While such scoring criteria would help filter out any unrealistic candidates based on objective criteria like ionisability and polarity, insufficient information was available to perform retention time correlation via Met-Frag in this study.
With respect to the individual terms, CPDAT_COUNT, INDACT, and OfflineIndividualMoNA proved to be relatively uninformative in this particular study, evidenced by their frequent zero-value scores. As a database containing consumer chemical products ranging from those used in home maintenance (paints, sealants, lubricants, cleaners, etc.) to personal care products (hair gel, nail polish, face cream, makeup, etc.), CPDAT's limited applicability in wastewater studies such as the present one is unsurprising, and it instead may be more suitable for exposomics studies involving, e.g., household dust. INDACT, the list of industrial activity chemicals known to be used in the vicinity of the WWTPs as disclosed to the regulator, had the strongest potential to improve the identification results. However, not a single candidate across all the MetFrag results was present on this suspect list, which could suggest that the chemical disclosures made by the industries were either incomplete, unsuitable for identification purposes (e.g., parent compounds were disclosed but possibly only transformation products are present in the environment/are detectable, UVCBs with unspecific chemical identities, etc.), and/or inherently do not end up in wastewater if the compounds themselves are used in closed circuits, are recycled, or partition into sludge if they are very non-polar. Lastly, while mass spectral libraries are inherently incomplete [44], a low OfflineIn-dividualMoNA score does not necessarily indicate poor spectral library matches. Rather, low OfflineIndividual-MoNA scores could also signify that the candidate is not present within MoNA to begin with, or result from noisy experimental spectra even if the match would otherwise be good. Therefore, evaluating candidates on this scoring term alone must be done with these factors in mind, and improvements to its design to avoid possible faulty interpretations could constitute future work. Other future work on MetFrag itself could involve the addition of new Spectral scoring terms which do not require scaling via normalisation of the maximum value, as this maximum value is highly dependent on the candidate database chosen. For instance, a simple spectral similarity metric such as cosine similarity would evaluate how well the in silico and experimental fragmentation spectra align, independent of those of other candidates.
CompTox, the candidate database chosen here, remains one of the most environmentally-focused open databases of chemical compounds as it exclusively contains chemicals of environmental and toxicological relevance.
Compared to other open databases like PubChem (111 million chemical structures, August 2020), CompTox is also smaller in size (883,000 chemicals, February 2021). Therefore, MetFrag paired with CompTox is likely to suggest smaller lists of candidates which are de facto environmentally-meaningful, making workflow runtimes shorter and candidate evaluation relatively easier. However, using CompTox has drawbacks, principally stemming from its lack of comprehensiveness when compared to PubChem. In some cases, there may be a lack of candidates matching the identification criteria when using CompTox with MetFrag simply because they may not exist within CompTox itself to begin with due to its limited size and scope. PubChemLite [55,56,58] represents one complementary alternative to these issues, as it is by design essentially a subset of environmentally relevant compounds based on compound classifications. Overall, the ability to subset databases based on usage and classification information of chemicals can be beneficial, as different regulatory bodies may have different mandates, and studies can be designed to align with those mandates accordingly, e.g., focus only on chemicals with (i) known usage in industrial manufacturing, or (ii) agricultural chemicals, or (iii) pharmaceuticals, etc.
Using scenarios as a framework to interpret MetFrag's results was critical considering the specific regulatory aims of this work: tentatively identify pollutants of high priority (with minimum Level 3 confidence) to guide further monitoring and identification efforts.
Scenario Analysis revealed in detail whether Spectral, Metadata, or both contributed to a given MetFrag Score and in turn provided the rationale behind proposed candidates. As our evaluation has shown, multiple candidates are worth considering especially if they have very similar scores (e.g., Table 6), or have more compelling evidence represented by individual scoring terms as described above. In this way, Scenario Analysis as used here is highly suitable for transparently communicating scientific results in a regulatory context. On a larger scale, such analyses address a key weakness common to NTA studies: the current lack of ability to perform detailed data interpretation -especially in a high-throughput, automatable and reproducible manner.
Furthermore, Scenario Analysis as used here can inform decision-making regarding the next steps. Besides addressing study priorities based on "depth vs. breadth" as discussed in the Results, the scenarios can be used to devise a prioritisation scheme for future work. For example, if authentic standards can only be purchased/ analysed for 10 compounds due to resource limitations, those compounds should be the recommended candidates with MetFrag Scores from Scenario 1 > Scenarios 2/3 > > > Scenario 4. Alternatively, if it is known from the outset that spectral data may be poor quality, Scenario 2 candidates may take precedence over Scenario 3 candidates, as the former rely on high Metadata scores and not high Spectral scores for their high MetFrag Scores. Additionally, applying the precautionary principle may motivate prioritising identity confirmations of candidates with concerning metadata like high toxicity and/or exposure (corresponding to KEMIMARKET_HAZ and KEM-IMARKET_EXPO scores), even if those candidates are not necessarily ranked highly by MetFrag.
Practically speaking, next steps in environmental monitoring based on the results here (besides identity confirmation using authentic standards) could include expanding suspect lists using the recommended candidates to improve future suspect screening activities. These new suspects could in turn be added to the inclusion lists of future measurements, thereby already gaining an analytical 'upper-hand' for future NTA studies. Expanding suspect and inclusion lists in this way, possibly in combination with using a rarity score [26] that prioritises high intensity, infrequently occurring peaks, represents an evidence-based approach towards more meaningful environmental monitoring in the long-run, as these candidate compounds were tentatively 'observed' and are therefore site-specific. Otherwise, suspect lists are typically expanded based on information from national or international chemical registration lists, whose applicability may be limited depending on the actual usage/ exposure in the region of concern. Therefore, an additional outcome of this study is a means to bridge target and non-target analysis by supplying meaningful candidates for suspect screening.
This work is one contribution to a much larger discussion surrounding (i) how NTA can support regulatory environmental monitoring, and (ii) the practical feasibility of applying NTA in routine environmental monitoring. (For an example of current discourse, see Germany's guidelines for non-target screening in water analysis [52].) Regarding the former, this work demonstrates that NTA can be used to address the concerns of regulators by translating research questions arising from regulatory priorities into peak-picking/mass prioritisation criteria: in this case, high concentration unknown pollutants with point sources that occurred persistently were taken to be high-intensity precursors found at one or few sampling sites at both sampling time points. Without the ability to perform quantification, the assumption that high ion intensity represents high concentration could be validated by using different chromatographic solvent systems as a test of ionisation efficiency in future work, or implementing ionisation efficiency models [32,46].
On the feasibility of performing NTA as part of routine regulatory environmental monitoring, the overall method described here offers a highly automated approach via (i) feature prioritisation via enviMass, (ii) prescreening and quality control (plus a manual step), and (iii) in silico identification, of which (ii) and (iii) were developed in this work. The results interpretation and candidate recommendation processes performed manually in this work form the basis of future efforts towards automated reporting based on Scenario Analysis, MetFrag Score distributions, and evaluation of critical parameters like thresholds for potential toxicities and exposure levels. Such automated reporting would not only allow scalability of future regulatory NTA studies, but could also eliminate potential biases in unknown identification-analysts would not be able to 'cherrypick' candidates based on their familiarity with certain compounds because undescriptive identifiers, e.g., DTX-SIDs would be used up until the final results are delivered at the end of the entire method. Furthermore, while the prescreening, quality control, and identification workflow was applied retrospectively, the improvements to workflow automation detailed here could allow for quicker data analysis turnaround in the future, which would help guide future sampling and measurements planned in the short-medium term and prevent the long delays between remeasurements still commonly observed in NTA investigations-effectively, moving towards 'realtime' instead of retrospective NTA approaches. Two concrete follow-up initiatives are foreseen: (i) build an interface connecting Shinyscreen and MetFrag, including automated reporting features as previously described, and (ii) develop a set of 'default' scoring terms and settings tailored for NTA of wastewater samples. Further collaborations involving non-target wastewater studies and database hosts will help augment expert knowledge on more use cases, which would be leveraged to develop this approach further.
On a community level, standardisation would play a role in increasing the feasibility of NTA as part of routine regulatory environmental monitoring. As previously mentioned, there exist considerable, albeit nascent, efforts towards standardising analytical protocols for non-target screening on a national level in, e.g., Germany in the form of guidelines [52]. Such activities suggest that standardisation is certainly of priority to the community and may be achievable over time. However, NTA may not be widely adopted by regulators in the short-to mediumterm until analytical protocols are successfully standardised. In turn, it continues to be challenging from a data analysis perspective to implement standardised workflows if the analytical parameters used for measuring data are not themselves standardised. Thus, the status quo demands that current data processing methods remain flexible to accommodate the variety of analytical parameters used, as is the case with the method presented here.

Conclusions
A prescreening and identification workflow for analysing non-target compounds was developed in this study to retrospectively identify unknowns detected in WWTP sites in the context of directly supporting regulatory decision-making for environmental monitoring. Using Open data and Open tools including the US EPA CompTox Chemicals Dashboard, NORMAN Network resources such as SusDat and the Suspect List Exchange, and MetFrag, tentative identifications for 21 unknown compounds were provided at Level 3 confidence, and 1 compound's identity was confirmed using a reference standard giving a Level 1 identification. These results were achieved despite limited data quality.
This study heavily emphasised results interpretation on two levels: on a global level across the chemical unknowns investigated, and on an individual candidate level. Through these analyses, specific candidates were recommended for further identification efforts, and transparent justifications were provided based on the MetFrag score breakdown (i.e., spectral vs. metadata evidence). These recommendations, and not just MetFrag's outputs, represent the final results in the regulatory and environmental monitoring context of this study, and may serve as a template to drive future developments in NTA.
The prescreening and quality control workflow developed here is embedded in the open R package Shinyscreen [24], which is freely available online, as is code from ReSOLUTION [55] and RChemMass [56] used for performing command-line MetFrag identification. The CompTox database version with the metadata terms used here is likewise also publicly available [54].  Figure S1. Screenshot of the MSConvert (v.3.0.19182-51f676fbe) Graphical User Interface showing settings used to convert the .RAW mass spectrometry data to .mzML format. Table S2. List of 22 m/z which had been prioritised by enviMass and passed Quality Control to qualify for MetFrag identification.