A European proposal for quality control and quality assurance of tandem mass spectral libraries

High resolution mass spectrometry (HRMS) is being used increasingly in the context of suspect and non-targeted screening for the identification of bioorganic molecules. There is correspondingly increasing awareness that higher confidence identification will require a systematic, group effort to increase the fraction of compounds with tandem mass spectra available in central, publicly available resources. While typical suspect screening efforts will only result in tentative annotations with a moderate level of confidence, library spectral matches will yield higher confidence or even full confirmation of the identity if the reference standards are available. This article first explores representative percent coverage of measured tandem mass spectra in selected major environmental suspect databases of interest in the context of human biomonitoring, demonstrating the current extensive gap between the number of potential substances of interest (up to hundreds of thousands) and measured spectra (0.57–3.6% of the total chemicals have spectral information available). Furthermore, certain datasets are benchmarked, based on previous efforts, to show the extent to which acquired experimental data were comparable between laboratories, even with HRMS instruments based on different technologies (i.e., quadrupole–quadrupole-time of flight versus ion trap/quadrupole-Orbitrap). Instruments and settings that are less comparable are also revealed, primarily linear ion trap instruments, which show distinctly lower comparability. Based on these efforts, harmonization guidelines for the acquisition and processing of tandem mass spectrometry data are proposed to enable European (and ideally worldwide) laboratories to contribute to common resources, without requiring extensive changes to their current in house methods.

Background

Detection, annotation and identification
The goal of suspect and non-targeted analysis is to provide extensive qualitative information on the chemical composition of a sample. These analytical approaches are possible because of recent technologies and instrumentation, which are capable of generating large amounts of chemical information from low sample amounts. In particular, high-resolution mass spectrometry (HRMS) is one of these innovative technologies for large-scale and high-throughput profiling of complex samples [1]. However, assigning chemical identities to a set of mass spectrometric signals is not trivial and requires strongly consolidated data processing as well as appropriate quality assurance (QA) and quality control (QC) procedures. Particularly the confidence of the assigned chemical identities (annotations) is a crucial issue [2]. As the aim is typically to confirm the presence of as many compounds as possible, adequate strategies have to be used in reporting the results, both for research and in the context of regulatory use. While QA/QC aspects are well established in the field of conventional targeted methods, these are currently less developed for non-targeted analyses, although these are discussed actively in both the metabolomics and environmental communities (e.g., [2,3]).
This article explores possible strategies for QA/QC of tandem mass spectral libraries and databases, especially suitable for the annotation of chemicals of emerging concern in the context of human biomonitoring (specifically within the Human Biomonitoring for Europe project, HBM4EU, http://www.hbm4e u.eu) and environmental monitoring (initiatives originating from the NORMAN Network, http://www.norma n-netwo rk.net). It reflects on existing initiatives and approaches that can be used to assign well-defined confidence levels to annotated biomarkers (either of exposure or effect), to check the quality of existing sets of tandem mass spectra data, and for acquiring new experimental data, as well as any gaps that may need to be addressed.
For this article, a few definitions are clarified here, with further definitions given in the Glossary below. Detection refers to the collection of compound-specific data by instrumental analysis. For chromatography coupled to mass spectrometry, this collected data may include retention times, mass-to-charge ratios (m/z) of molecular ions, adducts and possibly fragment ions as well as the presence and relative abundances of fragment ions and isotopologues. Annotation is the act of linking a detected mass spectrometric feature with a chemical identity, taking into account the detected chromatographic and spectrometric characteristics. Identification is the process of proving or verifying that the annotated compound is indeed the proposed chemical (i.e., the annotation can be confirmed).
Annotation is generally performed using analytical evidence of the measured dataset alone [1][2][3], along with additional supporting evidence (e.g., experimental context [2] and metadata [4][5][6]). In contrast, identification is generally accomplished by comparing measured data sets (e.g., using reference standards), where one set of features is obtained from the analysis of an unknown compound; the other from a reference standard of known identity. In this context, defining objective metrics for confirmation of identity is a challenging task. Ideal metrics should minimize the number of false positive and false negative identifications (see below).
Sufficient analytical data must be available to enable definitive identification. Like a fingerprint, this evidence should be a unique set of information capable of excluding all other chemical entities from consideration (unequivocal identification). In the case of a mass spectrometry (MS)-based characterization, such a chemical fingerprint can partly be created in silico (e.g., m/z-values of molecular ions, relative abundances of isotopologues given the molecular formula and, to some extent, fragment ions) and/or by analyzing reference standards (e.g., m/z-values of fragment ions, retention times in chromatographic systems). These "fingerprints" (often specific for distinct instrumental settings) are generally stored in databases. While predictive (in silico) methods exist for both fragment (e.g., [7,8]) and retention time information (e.g., [9][10][11][12]), these are not yet sufficiently accurate for unequivocal identification, although these are constantly improving in accuracy. Thus, complete "standards-free" identification is not yet possible for HRMS, although it is becoming increasingly possible to get reasonable annotations using "standards-free" approaches.
From an analytical point of view and depending on the available information for annotation, chemicals can be divided into three categories, which we define as follows for this article (see Fig. 1).

Fig. 1
The chemical space difference between targets, suspects and non-targets/unknowns Oberacher et al. Environ Sci Eur (2020) 32:43 Targets are compounds ("knowns") that are preselected for analysis in a sample and for which full mass spectrometric reference data, including MS/MS fragmentation and retention time, is available for annotation. The reference data are usually acquired with certified reference standards in house; the reference mass spectra and, in some cases retention times (depending on the database), are stored in mass spectral databases.
Suspects are known compounds ("known unknowns" [13]) that are expected ("suspected") to be present in a sample, but for which either no reference standard (in house) or incomplete mass spectrometric reference data are available, such that unequivocal annotation is not always directly possible. In suspect screening, this could be either missing or measured with alternative methods, or predicted with computational tools, and as such not of sufficient accuracy in many cases to allow reliable annotation-although it may provide supporting evidence. For instance, while MS information can be computed reliably from the structure, the in silico prediction of MS/MS fragmentation and retention time information still needs to be improved.
Both targets and suspects represent subsets of the entire chemical space in a sample (Fig. 1). Suspects can be "converted" into targets by collecting comprehensive mass spectrometric reference data that enables unequivocal identification of the suspect compound (usually reliant on the availability of reference standard compounds). The remaining signals in the sample are generally termed non-targets or unknowns-for which no target or suspect identity can be assigned readily. These require full elucidation and are beyond the scope of this article.
Inspired by European regulatory documents and a classification system originally proposed by Sumner et al. [14] for metabolomics, a classification system tailored for HRMS primarily for the environmental context was proposed by Schymanski et al. [2] in 2014. Level 1 (confirmed structure) described identification that has been verified via the appropriate measurement of a reference standard with MS, MS/MS and retention time matching, matching the definition of targets above. A "probable structure" (Level 2) is obtained by unambiguous matching literature or library data (Level 2a) or via diagnostic evidence (Level 2b), where the diagnostic evidence must clearly rule out all other candidate structures. "Tentative candidates" (Level 3) describe the case where the available data provide evidence for possible or likely structure(s), but insufficient information exists for one exact structure only (e.g., positional isomers). Level 4 or 5 identifications are typically "unknowns", where only the molecular formula (Level 4) or exact mass (Level 5) are known. Since initiation of the level system in 2014, several practical cases have evolved within each level and these are, for instance, now encoded into the mass spectral processing software RMassBank [15].

Tandem mass spectral databases: current status
Tandem mass spectral databases are indispensable tools for compound annotation in non-targeted HRMS workflows based on soft-ionization mass spectrometry (typically liquid chromatography (LC)-HRMS) and good matches can yield Level 2a annotations in many cases. Several reviews are available describing the development and application of tandem mass spectral databases [3,[16][17][18][19][20][21][22]. Typically, a tandem mass spectral database represents an organized collection of tandem mass spectral data within a management system. The database management system enables the user, or other applications, to interact with data within the database itself. Tandem mass spectral databases are acquired by the analysis of reference standards. Since a fragmentation spectrum can look different depending on the excitation process (e.g., resonant vs. non-resonant) as well as the collision energy applied to the parent ion, state-of-the-art databases include sets of compound-specific spectra that were acquired by applying different collisions energy settings, as well as different instruments [23,24]. Fragmentation is typically accomplished by collision-induced dissociation (CID) or higher-energy collisional dissociation (HCD). Usually, the spectral information is processed prior to storage in a library. Curation efforts may include manual inspection of mass spectra by experienced mass spectrometrists, noise and artifact removal, recalibration of spectra and peak annotations, as well as inter-library comparisons [15,23,[25][26][27]. In some databases, such as the Human Metabolome Database (HMDB [28]) and MassBank of North America (MoNA, [29]), experimental data are now complemented with in silico-generated spectra.
In 2016, the overlap of compounds with tandem mass spectra from authentic reference standards in most public and commercial databases was evaluated by Vinaixa et al. [16]. A total of 27,622 unique compounds were present across all databases. Among the 7127 compounds in the four open databases HMDB 3.0 [30], MassBank [24], the Global Natural Product Social Molecular Networking library (GNPS) [31], and the RIKEN MS n spectral database for phytochemicals (ReSpect) [32], only 18 compounds (< 1%) had at that time at least one form of spectral data in all databases. When comparing all combined open databases versus four commercial ones, only 225 compounds out of 27,622 (< 1%) had at least one form of spectral data in all databases. The ratio of compounds in each database with any type of spectral data in two or more databases was generally > 50%, with the exception of METLIN and GNPS, which only overlapped approximately 35% with other databases in terms of compounds. As there is a relatively low overlap of compounds among existing spectral databases, most scientists currently use multiple databases. Since the 2016 review, many of the databases have expanded their compound coverage immensely and many of the open libraries have cross-imported their spectral records. However, the issue of overlap and coverage of relevant substances in chemical space remains [17].
Conceptually, the premise of spectral library searching is very simple: the fragmentation pattern of a molecule is a reproducible fingerprint of that molecule under a given set of fixed conditions, such that unknown spectra acquired under similar conditions can be identified via spectral matching [33]. Automated spectral library searching involves software with tailor-made search algorithms for tandem mass spectral databases [34][35][36][37]. The search score obtained following the database search represents the likelihood that the searched spectrum corresponds to a given reference spectrum in the mass spectral database. A low score indicates that the experimental fragmentation pattern has low similarity to any stored reference spectrum. A high score indicates significant spectral overlap and, consequently, that the analyte is likely either structurally similar or even identical to the reference compound. Library search should be both sensitive and specific, producing as few false negative and false positive results as possible. Ideally, the scores obtained should be able to distinguish true and false positive matches [18]. To compare with historical targeted methods applied in a regulatory context (e.g., forensic toxicology and food safety), the primary objective of a screening method is to limit the risk of false negatives (e.g., 1%) and to keep to an acceptable risk of false positives (e.g., 5%). The latter should be further reduced by confirmatory analyses using reference standard compounds. Non-targeted methods with consolidated QA/ QC thus have to consider and document these issues during the development and evaluation of method performances, as well as in reporting of results. False discovery rates, applied successfully in proteomics for many years [38], are now being developed for small molecule MS/MS [39], but are not yet widely integrated into tandem library software.
There is also extensive discussion about the robustness and transferability of tandem mass spectral libraries. For a long time, the predominant opinion was that libraries would only be useful on the instrument used to acquire reference spectra, due to the limited reproducibility of tandem mass spectra. This situation has changed thanks to both progress in instrument technologies and informatics tools. Databases combining advanced library designs with tailor-made search algorithms have been shown to enable reliable compound identification with spectra acquired in different laboratories with various instruments and different instrument settings [18,27]. While pre-acquisition harmonization of analytical procedures was researched, the participating laboratories encountered a number of difficulties [40]. Thus, the current trend is rather to look for a post-acquisition flexibility of the MS/MS reference library and associated matching algorithms to deal (as much as possible) with the diversity of imported experimental data without sacrificing the ambitioned confidence level in terms of correct annotation.
Over the last 10 years, there has been substantial progress in the quality of tandem mass spectral databases. Today, spectral acquisition of reference spectra is accomplished regularly on high-resolution instrumentation (i.e., quadrupole-quadrupole-time of flight (QqTOF), Orbitrap) employing multiple collision energies for fragmentation to comprehensively cover the breakdown curves of reference compounds. Besides the protonated and deprotonated molecular ions, adduct ions, in-source fragments as well as isotopologues are commonly selected as precursor ions [18,25,26]. Furthermore, to improve spectral quality, generally only curated spectra are stored in databases, which come bundled with improved search algorithms. As knowledge and understanding of mass spectra increase, automated curation procedures are being implemented and constantly improved to reduce the manual curation load associated with mass spectral database creation [15,25]. Overall, the ambition for a "universal tandem mass spectral database" is closer to a reality. However, this ambition requires definition and implementation of some common procedures to ensure the reliability and robustness of the generated data, both stored in the desired tandem mass spectral reference library and generated from each experimental sample.
A number of reference tandem mass spectral databases exist that have to be considered in the frame of identification of chemicals of emerging concern, in terms of structuration and/or content. These existing resources should serve as a basis to avoid unnecessary time spent in re-implementing existing and reliable elements, and to ensure a coherence of the ambitioned outputs with potential established standards. However, an obvious lack of high-level QA/QC consolidation appears within many of these existing databases (e.g., percentage of erroneous information, insufficiently or non-adequately curated spectra), together with some necessary adjustments for specific applications (e.g., human metabolites of contaminants are not well represented as compared to parent compounds). Oberacher et al. Environ Sci Eur (2020) 32:43

Generic strategy for converting suspects into targets
The long-term goal in terms of developing screening capabilities would be to progressively include a large part, ideally all, of the compounds listed in "suspect lists of interest" (e.g., [41]) into corresponding entries in tandem mass spectral libraries. This would allow the conversion of suspects (generally Level 3) into targets (Level 1, if the retention time information has been measured in house or on an identical chromatographic regime elsewhere) or higher confidence tentative matches (Level 2a, spectral library match). The strategy for accomplishing this aim involves (1) QC of already acquired tandem mass spectral data to determine how many suitably comparable mass spectral records exist and (2) QA-guided acquisition of new reference spectra, shown in Fig. 2.
To determine a baseline for the status of environmentally and toxicologically relevant compounds and their presence in various resources, a mapping exercise was performed. Compound numbers (number of entries) were obtained for the CompTox Chemicals Dashboard [42], NORMAN SusDat [43], HMDB [28], DrugBank [44], the Toxic Exposome Database (TEDB) [45] and Exposome Explorer [46] from their respective websites Fig. 2 The proposed workflow to convert suspects into targets or download files on March 15, 2019. CompTox numbers mapping to mzCloud [47], MassBank [48] and WRTMD [49] were obtained via downloading the respective list files (list codes on https ://compt ox.epa.gov/dashb oard/ chemi cal_lists are: MZCLO UD, HDXNO EX, MASSB ANKEU SP, MASSB ANKRE F, MYCOT OXINS , WRTMD ), also on March 15, 2019 and counting/merging by InChIKey first block (thus ignoring stereochemistry). HMDB MS/MS numbers were obtained from download files (March 15, 2019) and cross-checked with InChIKey mappings still on record from a previous study [16]; also counting by InChIKey first block. SusDat mappings to MassBank were obtained by extracting list S1 results from the download file. To provide a global, up-to-date overview, all compounds with MS/MS annotation that were listed in the PubChem [50] [51], converted to InChIKeys using Open Babel [52], and counted by unique InChIKey first block. While both DrugBank and T3DB contain MS/MS records, this information is not available in their export files, and these contain high overlap with HMDB where the information is mapped extensively in the download files.
The authors note that while many more resources are available, these are open resources with pre-mapped information to the highest quality and relevant MS/MS records to form a sufficient information basis for the outcome of this article.

Quality control and benchmarking of tandem mass spectral libraries
The library of the Helmholtz Centre for Environmental Research (UFZ) being part of MassBank was used as test set to demonstrate the usefulness of the proposed strategy. The UFZ library (at that stage) contained 636 MS/ MS spectra corresponding to 167 compounds. Reference spectra were recorded on a LTQ-Orbitrap XL (Thermo Fisher Scientific, Waltham, MA, USA). HCD product-ion spectra were acquired at three different collision energy levels (HCD 35, 55, 80) at a nominal resolving power of 30,000. The R package RMassBank was used to perform recalibration and clean-up of acquired spectra [15]. The curated spectra are available at https ://githu b.com/ MassB ank/MassB ank-data/tree/maste r/UFZ.
The spectra of the UFZ library were searched against "The Wiley Registry of Tandem Mass Spectral Data" (WRTMD) [53]. Library search was accomplished using 'MSforID Search' [34,35] as described in the Additional file 1. The spectra of compounds covered in both UFZ and WRTMD served as positive controls. The number of positive identifications obtained with the positive controls were counted and used to calculate the statistical parameter sensitivity (= true positive rate).

Interlaboratory study to validate acquired reference spectra
An interlaboratory study was organized to verify that participating laboratories were generating new reference data with experimental settings and workflows that were compatible with existing reference spectra collections. Each participating lab was asked to use fifteen reference compounds for producing tandem mass spectral libraries. Table 1 includes the set of compounds used in this study. They have already been applied to test the transferability of the WRTMD in different laboratories with various available instruments and procedures (e.g., [18,34,36]). Seven laboratories involved in the NORMAN Network and/or HBM4EU participated in the interlaboratory study. An overview of the applied instrumentation as well as the applied fragmentation technique is provided in Table 2 as well as in the Additional file 1.
The seven collections of centroided, averaged and curated tandem mass spectra were benchmarked against the WRTMD. Benchmarking included two sets of experiments: (1) matching the test spectra to the WRTMD, and (2) matching the spectra of the 15 test compounds included in the WRTMD to modified libraries derived from the WRTMD by substituting the original reference spectra with the newly generated libraries. For statistical evaluation of library search performance, all test sets were grouped according to the collision energy settings used to acquire the individual spectra.

Results and discussion
Starting point-overview on existing tandem mass spectral data for chemicals of potential concern Various initiatives and/or sources of information have documented proposed lists of chemicals of emerging concern in various contexts, mainly in the field of environment and toxicology. Examples are the many lists on the NORMAN Suspect List Exchange [41] and the corresponding merged database SusDat [43], the CompTox Chemicals Dashboard [42] and a series of topical databases from the Wishart laboratory and collaborators (e.g., HMDB [28,30], DrugBank [44], the Toxin and Toxin Target Database (T3DB) or Toxic Exposome Database (TEDB) [45] and the Exposome Explorer database [46]). While these sources overlap to some extent, they also provide a lot of complementary information and functions. As several reviews have covered the number of overlapping substances in tandem mass spectral libraries in various contexts (especially metabolomics) recently [16,17,54], the focus here will be on substances of interest for the environmental and biomonitoring contexts, using the resources mentioned. Looking into these collections, tandem mass spectral fragmentation information is already available for a considerable number of suspects in MassBank, HMDB, WRTMD or mzCloud, but this represents only a fraction of the substances actually present in the respective resources.
An overview of the number of compounds in the respective resources, as well as the number of entries that map to mass spectral data within that resource is given in Fig. 3. The CompTox Dashboard (875,000 compounds) includes 3997 compounds in mzCloud, 2377 in MassBank and 1429 in WRTMD, corresponding with 5019 unique compounds (ignoring stereochemistry differences), thus 0.57% of the resource. HMDB (144,098 compounds) contains MS/MS data corresponding to 750 unique compounds (ignoring stereochemistry), or 0.66% of the resource. NORMAN SusDat contains 40,180 entries, of which 1387 are in MassBank (3.6% of SusDat). This overview shows that tandem mass spectral data is available only for a rather low number of compounds. A further complicating factor is that these tandem mass spectral data are spread among several spectral collections. For the vast majority of interesting suspects, no public mass spectral data exists and measured mass spectral data will have to be newly generated, if possible. While METLIN now claims MS/MS spectra of over 500,000 chemicals (https ://metli n.scrip ps.edu/, accessed 8 Dec. 2019), information on the coverage is not available, nor are the spectra openly available. However, as the PubChem database [50] aggregates information from a number of sources, the 74,678 compounds with MS/ MS annotations (ignoring stereochemistry; 89,726 with stereochemistry), of 102,404,298 compounds (~ 0.073% of PubChem) give a reasonable indication of the total number of compounds with MS/MS information available to some extent, although some of these are in silico and many of these are not directly relevant for human biomonitoring or environmental studies. The generation of new reference spectra is considered to represent an important element for the successful and long-term establishment of non-targeted LC-MS. As a single laboratory will not have the necessary resources available to handle the huge number of suspects ahead, this challenge must be addressed as a group. For successful realization of multi-partner generation of reference spectra, harmonization of acquisition and processing strategies is essential. Related QA actions imply application of generally agreed best practice procedures and participation in interlaboratory studies (see below). However, even by joining forces with respect to manpower and instrumentation, there will be further challenges ahead, and these are related to prioritization of suspect lists and availability of the corresponding reference standards.
Existing techniques for prioritizing chemicals are generally based on risk assessment [55,56], which involves assessment of exposure and hazard. Other useful criteria might represent detectability by analytical techniques (e.g., LC-MS with ESI in positive or negative ion mode), legal status, importance for a defined research project [1], or simply the availability of reference information [4,5,57].
As things are now, over the next years a steady increase of the number of chemicals of emerging concern included in tandem mass spectral libraries is expected. Already available spectral collections are considered to represent nuclei for even larger collections. Therefore, much effort should also be put into the QC of already acquired tandem mass spectral data to determine how many suitably comparable mass spectral records already exist (see below).

Quality control of MassBank collections
MassBank is an important collection of reference tandem mass spectra [24]. Currently, 45 collections are available on MassBank (https ://massb ank.eu/MassB ank/Recor dInde x) with more than 55,075 tandem mass spectra (of 76,037 spectra total) representing 14,297 compounds (15,988 stereoisomers) total (over all spectral types). In terms of compound coverage, there is significant overlap between MassBank and the WRTMD that can be used to create sets of positive controls for testing the libraries.
Positive controls are particularly suitable for testing the quality and comparability of databases. Matching positive controls is used to determine the sensitivity (= true positive rates) of a database. Ideally, the obtained sensitivity values should be close to 100%. Negative controls are used to test the specificity (= true negative rate) of a database. Initial benchmarking efforts between the Swiss Federal Institute of Aquatic Science and Technology (Eawag) MassBank collection and the WRTMD were published recently [27]. Spectra from the 233 overlapping substances between the two collections were used as positive controls. Of particular interest was the fact that the Eawag spectra were acquired with an Orbitrap instrument (HCD and CID), whereas the WRTMD spectra were acquired on a QqTOF. Spectra in the range of collision energy 20-50 eV on the QqTOF and 30-60% NCE on the Orbitrap provided optimal library matching results with sensitivity-values 95.1-98.4% [27]. Therefore, it was concluded that both collections enable reliable compound identifications, and that they are ready for use in suspect screening applications.
Another important spectral collection within Mass-Bank is the UFZ library. The library contains tandem mass spectra of 167 compounds. The spectra were acquired on an Orbitrap with HCD. For each compound, reference spectra were acquired with three different collision energy settings. All spectra were curated and recalibrated before storing in MassBank. 87 reference compounds included in the UFZ library were also covered by the WRTMD. For each of these compounds two to eight spectra acquired at different collision energy settings (35, 55, 80%) were available. The corresponding 352 spectra represented positive controls suitable for QC of the UFZ library. The spectra were matched to the WRTMD and the number of positive matches was statistically evaluated (Fig. 4). The overall sensitivity was 89.7%. For 70 compounds, all test spectra performed well (amp > 5.0) and led to a positive match. There were, however, 16 compounds, of which at least one test spectra retrieved an amp-value below the specified threshold of 5.0 indicating insufficient similarity between test and reference spectra. Communicating the benchmarking results to the authors of the library initiated a fruitful discussion that also included reviewing of the raw data. This process identified reasons for the observed differences between test and reference spectra. In this way, the issues could be resolved and the corresponding entries in Mass-Bank were updated. Thorough quality control of already existing spectral collections is able to identify libraries (or subsets thereof ) for immediate application to compound identification in suspect screening. Likewise, benchmarking of tandem mass spectral libraries is a suitable approach to identify specific errors like low signal-to-noise ratios, improper mass calibration or wrong compound labeling. In this way, low-quality spectra can be identified, corrected or even deprecated.

Recommendations for QC of existing tandem mass spectral libraries
On the basis of the above results, as well as the previously published and discussed benchmarking studies, the following two-step QC procedure could be drafted and adopted: Firstly, tandem mass spectral data should meet the following quality criteria: QC1 Acquisition: High-resolution instrumentation (e.g., QqTOF, Orbitrap, Fourier Transform ion cyclotron resonance (FTICR)), typically with a minimum resolution of 10,000 in MS/MS mode, and m/z error lower than 10 ppm, to ensure contributions of broadly applicable spectra can be made from many laboratories. QC2 Ionization: Positive and/or negative mode with a specified ionization technique (ESI, atmospheric pressure chemical ionization or atmospheric pressure photo ionization), as these are the most common methods. QC3 Precursor ion isolation: The isolation window should be as narrow as possible to avoid fragmentation of multiple precursors, including isotopic peaks. QC4 Fragmentation: Either Orbitrap-HCD or QqTOF-CID should be used for generating MS/ MS spectra, as such spectra can be searched with higher sensitivity (see discussion below and Fig. 5). QC5 Mass range: Ideally the mass range should start at m/z ≤ 50 (instrumental limitations may preclude this, e.g., for instruments relying on ion trapping) wherever possible to include also small mass fragments. The acquired mass range should be given to avoid poor spectral matches due to the presence/absence of low m/z fragments in fragmentation spectra acquired with different scan ranges (see discussion below and Fig. 6). QC6 Collision energies: Multiple collision energies (minimum 3) should have been recorded wherever possible over a meaningful range (e.g., 5 to 60 eV CID; NCE 10-60% HCD [27]) to form compound-specific breakdown curves. QC7 Curation: Centroiding, filtering, noise removal, and recalibration should have been performed where possible to provide the best quality reference spectra. QC8 Expert review: This is necessary to identify issues, such as artifacts, improper noise removal, or truncated spectra, which cannot always be captured automatically.
Secondly, spectral collections satisfying these conditions will proceed to the benchmarking step, using procedures described in the section "Quality Control of Mass Bank Collections".

Multi-partner acquisition of new tandem mass spectral data
Interlaboratory harmonization studies are useful to verify that a laboratory is generating new reference data with experimental settings and workflows that are compatible with existing reference spectra collections. One way to characterize this interlaboratory comparability is to introduce a number of predefined known compounds and accompanying QA/QC criteria, such that these compounds must be detected and successfully identified with a given instrumentation and related procedure to validate the method appropriateness and reliability.
Seven laboratories involved in the NORMAN Network and/or HBM4EU project participated in the first harmonization study. The study was aimed to demonstrate compatibility and transferability of newly acquired tandem mass spectral data among participating laboratories as well as with already available reference spectra collections. The study involved the measurement of 15 reference standards (Table 1) on three different Orbitrap configurations (at four locations) and two QqTOFs ( Table 2). The WRTMD served as the reference library.
The eight collections of centroided, averaged and curated tandem mass spectra were benchmarked against the WRTMD (Table 3). In a first set of experiments, acquired tandem mass spectra were matched to the WRTMD. The number of positive identifications obtained with individual test sets ranged from 73.2 to 100%. To prove that even the sets that showed a significant number of negative matches (e.g., laboratories 3, 4, and 5) contained suitable collision energy windows, the eight test sets were further grouped according to the collision energy settings used to acquire the individual spectra. As expected, a considerable number of subgroups were identified that led to 100% correct positive identifications (Fig. 5). The collision energy windows that appeared to be suitable for acquiring test spectra spanned at least 15 units (eV or % NCE). In the second set of benchmarking experiments, the spectra of the 15 test compounds included in the WRTMD were matched to eight libraries derived from the WRTMD by substituting the original reference spectra with the newly generated sets of reference spectra. The number of positive identifications obtained with individual test sets ranged from 78.5 to 99.5%. Also in this case, the test sets were further grouped according to the collision energy settings used to acquire the individual spectra. Like in the other experiment, a considerable number of subgroups were identified that led to 100% correct positive identifications (Fig. 5). The collision energy windows that appears to be suitable for acquiring library spectra spanned at least 10 units (eV or % NCE).
The interlaboratory study clearly demonstrated that the participating laboratories are able to acquire highquality reference spectra for building libraries, while also providing further evidence that Orbitrap-HCD and QqTOF-CID introduce quite similar fragmentation reactions (Fig. 5). Thus, libraries produced on these types of instruments will offer complementary identification possibilities. Of utmost importance was the observation Fig. 5 The reliability of library matching with the WRTMD and libraries derived from the WRTMD by substituting the original reference spectra with instrument-specific library spectra is shown by sensitivity versus collision energy for a a QqTOF instrument, b a Q-Orbitrap instrument with HCD, c a LIT-Orbitrap instrument with HCD, and d a LIT-Orbitrap instrument with CID, respectively. The collision energies are given in eV for the QqTOFs and NCE for the Orbitraps. a-c Reliable matches in a wide CE range, while d shows that the optimal CE window is smaller for LIT-Orbitrap instrument with CID Oberacher et al. Environ Sci Eur (2020) 32:43 Even for instruments with identical configurations a considerable inter-instrument variability in the optimal collsion energy range necessary for obtaining library matches with high amp-values was observed that there is a significant overlap of the compoundspecific collision energy ranges between instruments (Fig. 5). Thus, databases that contain series of multiple spectra acquired on one instrument will enable reliable compound identifications when querying spectra from other instruments. Clearly, databases produced in different laboratories will offer complementary identification possibilities.
Another important result of the interlab study was the observation that even for instruments with identical configurations, a considerable inter-instrument variability in the optimal collsion energy range necessary for obtaining library matches with high match probability was observed (Fig. 6). Taking into consideration that all Orbitrap technology-based instruments were provided by the same manufacturer, a higher degree of similarity between those instruments regarding compound-specific breakdown curves was expected. The observation suggests that even after years of instrument development and optimization, harmonization of collision energy values has hardly been accomplished yet. The good news is, however, that state-of-the-art tandem mass spectral databases can cope with spectral variability leading to high reliability of a library match. The analyst applying these libraries just needs to find the optimal collision energy corridor for acquiring test spectra. The herein presented interlaboratory study could represent an appropriate strategy for this purpose.
The interlaboratory study also highlighted some limitations. These are mainly connected with the use of tandem-in-time fragmentation in the ion trap of the linear ion trap (LIT)-Orbitrap instrument (Fig. 5d). In contrast to quadrupole collision or HCD cells that use non-resonant CID, ion traps use resonant CID (i.e., several low-energy collisions during a longer time than nonresonant), which enables to produce fragmentation trees beyond MS 2 . Generally, these fragmentation trees cover the full range of possible fragmentation pathways, and are therefore specific identifiers for the corresponding molecules, which can be stored in databases (e.g., mzCloud [47]). With ion trap MS 2 , only parts of the entire range of possible fragmentation reactions are covered, even when applying higher collision energies [58,59]. Such spectra match the lower energy part of spectral series acquired on tandem-in-space instruments (Orbitrap-HCD and QqTOF-CID) well. There is, however, limited overlap with spectra acquired at higher collision energies. Another problem of ion trap fragmentation is related to the "low mass cut-off ", or the so-called "1/3 rule" [59]. This means that fragment ions with an m/z-value below 1/3 of the m/z-value of the precursor ion are not trapped under normal operation conditions and are lost. Thus, a considerable part of fragment ions that are observed with higher collision energy of fragmentation on tandem-inspace instruments is not detectable with IT analysers. Thus, in comparison to Orbitrap-HCD and QqTOF-CID, Orbitrap-CID spectra are truncated. This truncation can hamper compound identification if abundant fragment ions are missing. One such example is desipramine (Fig. 7). At low collision energies, this compound has one abundant fragment ion at m/z 72.0808. This ion was not observed in the LIT-Orbitrap spectra since it displayed a m/z-ratio lower than 1/3 of the precursor ion (m/z 267 for [M + H] + ). Accordingly, spectral match gave a low score.
Another limitation of tandem mass spectral databases is highlighted in Fig. 8. It is well recognized that stereoisomers can hardly be distinguished from each other by tandem mass spectral fragmentation [13,60]. But even the differentiation of constitutional isomers can be challenging. Fragmentation of such compounds may lead to identical products. In the worst-case scenario, tandem mass spectra will be identical. One such example of a pair of constitutional isomers comprises phentermine and methamphetamine. At higher collision energy levels, the fragmentation mass spectra of these two compounds show two identically intense fragment ions (Fig. 8), leading to ambiguous library search results.

QA recommendations for acquisition and processing of tandem mass spectral reference data
The results of the interlaboratory study, as well as the available experience and knowhow in building tandem mass spectral libraries, formed the basis for drafting and adopting the following recommendations: Fig. 7 The influence of the "1/3 rule" for ion trap spectra, exemplified with desipramine using a spectrum acquired on a LIT-Orbitrap instrument with CID and a QqTOF spectrum taken from the WRTMD. The black dot indicates the precursor mass isolated for MS/MS fragmentation (hollow dot). Due to the "low mass cut-off" observed on LIT-Orbitrap instruments, an abundant fragment ion is missing in the corresponding fragment ion mass spectrum Fig. 8 An example of near identical tandem mass spectra: phentermine-and methamphetamine spectra acquired on a QqTOF instrument. The spectra were taken from the WRTMD. Black dots indicate precursor masses that triggered the MS/MS spectra (hollow dot). At higher collision energy levels, the fragmentation mass spectra of these constitutional isomers show two identically-intense fragment ions, leading to ambiguous library search results. Cases such as these demonstrate that library matching has to be complemented by orthogonal information such as retention time for higher identification confidence QA1 Acquisition: High-resolution instrumentation (e.g., QqTOF, Orbitrap, FTICR) should be used for the acquisition of reference tandem mass spectra.

QA2
Instrument performance: Instruments should be properly tuned and calibrated (ideally daily or before commencing a batch analysis). High mass accuracy should be maintained using a lock mass or similar. The instrument should be capable of a minimum resolution of 10,000 in MS/MS mode, and the m/z error should be lower than 10 ppm. QA3 Standards: Certified reference standards should be used to ensure that spectra will represent the linked structure. QA4 Sample introduction: Samples may be introduced by direct infusion, flow injection or chromatography. A special caution should be paid to the minimal number of acquisition points (related to dwell time values and scan speed capabilities) to ensure a sufficient number of spectra for averaging. The possible occurence of background interferences should be checked by introducing blank samples. QA5 Separation of mixtures: If reference compounds are introduced in mixtures, proper separation of the individual precursor ions (either during sample introduction or the mass spectrometric analysis) must be ensured to avoid the acquisition of chimeric spectra. QA6 Dealing with isobars: If reference spectra are acquired in batches, isobaric compounds must not be processed consecutively, to avoid interferences due to carryover effects resulting in chimeric spectra. QA7 Alternative precursors: While the primary precursors of interest may be protonated or deprotonated molecules, for some molecules other abundant signals corresponding to in-source fragments, isotopic peaks or other related species might be considered as additional precursor ions. QA8 Isolation width: The precursor isolation window should be as narrow as possible to avoid fragmentation of multiple precursors, including isotopic peaks. QA9 Fragmentation: Fragmentation should be accomplished by tandem-in-space techniques (e.g., HCD for Orbitrap, CID for QqTOF). As shown in Fig. 7, CID fragmentation with tandem-in-time instruments may produce spectra with limited overlap to spectra acquired with tandem-in-space techniques at higher collision energies. Another problem of ion trap fragmentation is related to the "low mass cut-off ". QA10 Scan range: The lower limit of the applied scan range should ideally be ≤ m/z 50. The lower limit of the applied scan range must not exceed m/z 100 to avoid the production of truncated spetra (instrumental limitations may preclude this, e.g., for instruments relying on ion trapping) wherever possible to include also small mass fragments. The acquired mass range should be given to avoid poor spectral matches due to the presence/absence of low m/z fragments in fragmentation spectra acquired with different scan ranges (see Fig. 7). QA11 Collision energies: Compound-specific breakdown curves should be covered by spectra acquired at multiple collision energies. A spectral series should contain at least 3-5 fragment ion spectra acquired at sufficiently different collision energies within the defined range. With this strategy, libraries are produced that are robust against inter-instrumental collision energy variability (see Figs. 5 and 6). If ramped collision energies are used, these should be clearly labelled as such. QA12 Signal-to-noise: Sample concentration should be sufficiently high to produce fragment ion mass spectra with signal-to-noise ratios >100, to enable reliable acquisition of low-abundant fragment ions. QA13 Saturation: Detector saturation must be avoided for fragment ions, and is only acceptable for precursor ions if the resulting artifacts are removed during curation. QA14 Centroid spectra: Fragment ion mass spectra should be acquired in centroid mode or centroided during export and curation. QA15 Curation: Spectra should be curated, which includes multiple steps of filtering, noise removal, and recalibration, to provide the best quality reference spectra. QA16 Expert review: Spectral series should be reviewed by an expert to identify issues like artifacts, improper noise removal, or truncated spectra, which cannot always be captured automatically.
The curation of acquired tandem mass spectral data is of utmost importance to obtain a high-quality library.
Curation efforts may include noise and artifact removal, recalibration of spectra and peak annotations, manual inspection of mass spectra by experienced mass spectrometrists, as well as inter-library comparisons (e.g., [15,25,27,61]). Removal of noise during data processing may lead to losses of spectral information of compounds. Accordingly, processed spectra should be reviewed by experienced mass spectrometrists to check the integrity of data with a special focus on the occurrence of artifacts and processing errors.

Conclusions
Acquisition of tandem mass spectral data requires the physical availability of reference substances and sufficient experimental capacities for acquiring fragment ion mass spectra. Even by joining forces, for instance within HBM4EU and NORMAN initiatives, the acquisition of reference data for ten thousands of compounds is a multi-annual project requiring significant resources. With this document, an outline for harmonized acquisition of suitable spectra for the expansion of public resources is proposed, which balances the consideration of individual instruments and methods at individual laboratories and the comparability of the resulting data. As a result, it is hoped that the next few years will see an increase in the number of environmentally relevant spectra in (open) mass spectral libraries. The question of how to prioritize the compounds for acquisition is being addressed in other activities currently in progress in the HBM4EU project and NORMAN Network and will be communicated separately.

Annotation/identification
Capability to assign a signal detected by non-targeted or suspect screening (i.e., a spectrometric descriptor) to a chemical with a given confidence level, by means of a reference library and/or structural elucidation work. Annotation is the act of linking a detected mass spectrometric feature with a chemical. Identification is the act of proving to be the same.

Non-targeted LC-MS
Analytical process for gathering comprehensive information on the composition of a sample. Workflows involve different steps of sample collection, sample preparation, data acquisition and data mining. The fraction of compounds accessible by a certain workflow depends on the characteristics of the individual steps applied. Datadependent or data-independent acquisition techniques are employed for data acquisition. Detected features are characterized by retention time, MS, and, where possible, MS/MS information to enable annotation.

Targeted LC-MS
Analytical process for gathering specific information on the composition of a sample. Workflows involve different steps of sample collection, sample preparation, data acquisition and data mining. The steps were optimized for a preselected number of molecules. Often selected reaction monitoring techniques are employed for data acquisition. Furthermore, target screening usually involves a reference standard measured in-house under the same analytical conditions such that retention time, MS, and, where possible, MS/MS information is available for identification and confirmation.

Tandem mass spectrometry
Tandem mass spectrometry, also known as MS/MS or MS 2 , involves multiple steps of mass spectrometry selection, with some form of fragmentation occurring in between the stages. Multiple stages of mass analysis separation can be accomplished with individual mass spectrometer elements separated in space or using a single mass spectrometer with the MS steps separated in time.

Target
A compound that is expected to be included in a sample and of which full mass spectrometric reference data, including MS/MS fragmentation, is available to enable annotation. The reference data is usually acquired with certified reference standards, and is stored in tandem mass spectral databases. The mass spectrometric data is often accompanied by metadata.

Suspect
A compound that is expected to be included in a sample. Typically, the available mass spectrometric data is incomplete and does not allow unequivocal annotation. Often, information on MS/MS fragmentation and retention time is missing or has only been predicted with computational tools.

Known
A detected signal that was annotated to a suspect or target at a certain confidence level.

Identification level
An approach for communicating identification confidence. A commonly used classification system in environmental research [2] includes five levels: exact mass-unequivocal molecular formula-tentative candidate-probable structure-confirmed structure.

Tandem mass spectral database
An organized collection of tandem mass spectral data which comes bundled with a management system. The database management system is a software application that interacts with the user, other applications, and the database itself to capture and analyze data. Tandem mass spectra are typically acquired from certified reference compounds. Spectral information is processed prior to storage in a library.

Tandem mass spectral library
A curated and annotated collection of mass spectra acquired from certified reference compounds. Curation efforts may include manual inspection of mass spectra by experienced mass spectrometrists, noise and artifact removal, recalibration of spectra and peak annotations, as well as inter-library comparisons. The mass spectrometric data is often accompanied by metadata.