NORMAN Database System (NDS): data gathering and data management
Perspective and recommendations
Continue to develop the NORMAN Database System (NDS) as a reference database that brings together, in a single platform, widely differing chemical monitoring data acquired using various techniques and in different matrices, thereby ensuring a harmonised approach for data collection, storage, quality control, curation and exchange among NORMAN members and more widely. Future platform development will be guided by the FAIR principles (Findability, Accessibility, Interoperability, and Reuse of data).
The NDS is complementary to the EC Information Platform for Chemical Monitoring (IPCHEM) [8, 9] in harvesting chemical target monitoring data, while at the same time paving the way for the development of a new European infrastructure for handling data coming from innovative methods, such as non-target screening (NTS) and effect-based methods (EBM). It should continue in that role.
Rationale
The crucial task of gathering and managing environmental CEC exposure data to support chemical risk assessment has been the core activity of the NORMAN Association from its start in 2005.
The current NDS [10] is an open-access platform of interconnected databases able to assist effective and rapid screening and risk assessment of contaminants in the environment.
The unique feature of the NDS is that it provides a comprehensive set of data on CECs together with a range of innovative applications for their hazard and risk assessment. These tools range from physico-chemical properties, use characteristics, mass spectral information, and exposure data from target and non-target screening in all environmental compartments, to ecotoxicity data and in situ bioassay signals reflecting mixture toxicity. The NDS currently consists of 12 modules (Fig. 1), of which eleven (Substance Database (SusDat); Suspect List Exchange (SLE); Chemical Occurrence Data (EMPODAT); Ecotoxicology; Bioassays Monitoring Data; MassBank Europe; Digital Sample Freezing Platform (DSFP); Indoor Environment; Passive Sampling; Substance Factsheets; Prioritisation) are accessible, interlinked and populated with data. The 12th is an antibiotic-resistant bacteria and genes module (ARB&ARG) that is still under development, while a new module hosting data on microplastics is currently being designed.
A selection of the NDS modules most relevant to PARC is presented below.
NORMAN Substance Database: a common list of substances for harmonised chemical risk assessment
Perspectives and recommendations
Further develop the Substance Database (SusDat) as the cornerstone of a common European platform where information on highly relevant and newly discovered environmental pollutants can be shared in a harmonised format [11].
Rationale
A common, harmonised list of chemical compounds shared among all parties in research and regulation is one critical requirement for enhanced cooperation among existing regulatory frameworks and shifting towards a “one chemical, one assessment” paradigm. However, current chemicals lists are fragmented collections, with researchers and regulators all using their own lists.
We believe that the combination of NORMAN Suspect List Exchange (SLE) [12] and the merged NORMAN Substances Database [13] of the NDS could be a globally leading model for collaboratively working towards such a list. Numerous organisations, national and international regulatory agencies and research groups from Europe and North America already contribute to this initiative. NORMAN SLE is a platform to share lists of substances potentially responsible for emerging risks to ecosystems and human health. The submitted lists are shared with US EPA CompTox Chemicals Dashboard [14, 15] and PubChem [16, 17] and are published on Zenodo [18]. By acting as a data collector, the NORMAN SLE has become an important source of specialised research information for major chemical databases such as PubChem and CompTox, beyond the realms and means of individual researchers. In return, the integration of the NORMAN SLE into major chemical databases adds enormous value to the original contributions, offering up new functionality for all parties.
The merged list (without duplicates) is known as NORMAN SusDat [13]—a curated compound database (65,697 compounds as of April 2020), where substances are merged by the Standard InChIKey, which acts as the unique identifier. This is accompanied by other structural information such as CAS numbers and SMILES, as well as physico-chemical properties. SusDat also contains mappings to the equivalent “MS Ready” forms [19], as well as other mass spectrometric information for the identification of compounds with NTS techniques, estimated (in silico) Predicted No-Effect Concentrations (PNECs), and other information required for the prioritisation and risk assessment of substances. Since 2016, SusDat has been used to interlink all NORMAN databases among themselves, as well as the NDS with major external databases.
NORMAN Ecotoxicology Database: a common platform for ecotoxicity assessment
Perspectives and recommendations
Establish a core team of ecotoxicology experts, from EU Member States and globally, using the Ecotoxicology Database as a basis to evaluate the reliability and relevance of ecotoxicity studies and reach consensus on Quality Standards (i.e. PNEC values) for a more harmonised risk assessment of chemicals.
Rationale
We propose to share the NORMAN Ecotoxicology Database [20] for harmonised ecotoxicity assessment within the PARC partnership. The database provides a transparent tool to guide experts in: (i) the identification of the reliable ecotoxicity studies, based on the CRED (Criteria for Reporting and Evaluating ecotoxicity Data) classification system [21]; (ii) the online derivation of a set of quality standards for each matrix and regulatory framework based on selected ‘reliable’ ecotoxicity studies, using a built-in software tool implementing the requirements of the EC guidelines [22], and (iii) the final selection of a single, common PNEC value, agreed upon as a result of Europe-wide expert consultations.
At present the database comprises, for almost all SusDat substances (i.e. > 65,000), at least one in silico PNEC [23] based on predicted acute effects for each of the three basic trophic levels of the fresh water compartment (fish, daphnia, and algae), which are used when experimental toxicity data are insufficient or not available. In 2019, a semi-automated tool for retrieving experimental (eco)toxicity data from the US EPA ECOTOX Knowledgebase allowed the import of > 125,000 experimental data on standard (eco)toxicity endpoints for about 5000 SusDat substances in a format compatible with the metadata requirements of the NORMAN Ecotoxicology Database. Additional experimental (eco)toxicity data and threshold values will be retrieved from other databases such as the REACH portal, the ETOX database of the German Federal Environment Agency, as well as existing PNECs and Quality Standards (EQS) from various regulatory sources. The (eco)toxicity threshold values used for chemicals prioritisation are agreed by experts and referred to as ‘Lowest PNECs’. These values are generally calculated for the fresh water matrix and then converted to an equivalent PNEC value for marine water, sediment and biota matrices (for example, bioconcentration factors (BCF) are used for conversion to equivalent PNECs for biota).
EMPODAT: a database of target monitoring data
Perspectives and recommendations
Provide a Europe-wide standard for essential quality information (metadata) accompanying chemical analysis results and commonly agreed minimum requirements to allow interoperability of archived monitoring data.
Rationale
A game changer for next generation chemical risk assessment is a system able to provide comprehensive information on the exposure of humans and the environment to large numbers of chemicals during the entire life cycle of products, including waste and recycled products.
With the EMPODAT database module [24] of the NDS, the NORMAN Association has already established a collaboration with IPCHEM, the official European repository of monitoring data produced by national monitoring programmes and EU-funded research projects in all matrices and compartments. EMPODAT today hosts approximately 10.3 million geo-referenced target monitoring data of more than 3100 substances in water (surface, ground, and waste water), sediment, biota, soil, sewage sludge and air matrices. The data are publicly accessible and provide an overview of benchmark values on the occurrence of contaminants of emerging concern across Europe. From the start, NORMAN has made a great effort to ensure that the data are gathered in a standard format in order to facilitate data comparability and exploitation across Europe and beyond. These spreadsheet-based Data Collection Templates (DCTs) were developed for each of the matrices, and contain information allowing for automated assessment of data quality.
Non-target screening (NTS) tools and Digital Sample Freezing Platform (DSFP) for retrospective suspect screening of environmental contaminants
Perspectives and recommendations
Establish a federated European infrastructure storing raw non-target screening data converted into a common (open) format, designed for retrospective screening.
Establish a central platform/database storing regularly updated information on available data sets Europe-wide and, eventually, at a global scale.
Apply commonly agreed workflow(s) for retrospective analysis to identify and prioritise pollutants frequently detected in environmental samples.
Rationale
Thanks to NTS techniques it is possible to obtain an overview of human and environmental exposure to thousands of chemicals simultaneously, with a high level of sensitivity and selectivity, including chemicals that have not been identified previously [25]. The NTS workflows (comprising wide-scope target, suspect and non-target screening) based on full scan, high-resolution mass spectrometry (HRMS), developed by NORMAN members, represent the state-of-the-art methods to deal with real-world contaminant mixtures in a more holistic way.
Active since 2013, the NORMAN NTS Working Group has built a strong collaborative infrastructure and developed innovative tools to facilitate exploitation and interpretation of complex data produced by full scan, HRMS methods. NORMAN members have also developed protocols to implement NTS in routine, regulatory applications. Suspect screening of pre-defined lists of tens to tens-of-thousands of known substances in each sample (supported by NORMAN SLE and NORMAN SusDat) is presently the recommended way forward.
In this context the Digital Sample Freezing Platform [26] is a key tool developed by NORMAN to support suspect and non-target screening. This novel technology allows the storage of thousands of high-resolution mass spectra (fingerprints) of all chemicals, metabolites and transformation products detected in each of the analysed samples. Thanks to this platform, it is possible for users to search retrospectively for a large number of compounds (e.g. those in SusDat; see above) in all the “digitally frozen” samples stored in the database and obtain reliable qualitative and semi-quantitative data on their occurrence in the investigated samples.
Further key tools, supported by NORMAN and embedded in the NDS, to assist non-target screening, are:
-
MassBank Europe, an open-source, open-access database of mass spectra to support higher confidence identification of suspects and non-targets [27, 28]. Based on MassBank Japan, MassBank Europe was founded in 2011, arising from a NORMAN initiative. Today MassBank contains over 80,000 unique mass spectra for > 14,300 compounds (database release 2020.05 [29]), including mass spectra of tentatively identified compounds. MassBank Europe is a core service for NORMAN as well as for other initiatives such as HBM4EU (Human Biomonitoring for Europe) initiative [30], ELIXIR [31], the German Network for Bioinformatics Infrastructure (de.NBI) [32] and the German National Research Data Infrastructure Initiative for Chemistry (NFDI4Chem) [33];
-
A Retention Time Index (RTI) prediction model [34, 35] allowing for tentative identification of each compound in SusDat as a combination of its exact mass, MS/MS fragments and the predicted RTI value, reduces the number of false positives in suspect screening.
Thanks to all the above-mentioned interconnected tools, DSFP can provide reliable qualitative and semi-quantitative data on the occurrence of already identified as well as novel CECs, thereby providing exhaustive insight into the spatial and temporal distribution of contaminant mixtures in the environment, making NORMAN DSFP a virtual environmental observatory on chemical contamination. Extensions of DSFP for additional chemicals captured in SusDat (e.g. highly polar molecules and gas chromatography-only amenable substances) are under way.
Collaborative European framework to improve data quality and comparability: development and harmonisation of methods
Perspectives and recommendations
Build the capacity of laboratories in Europe and globally by systematic organisation of international Collaborative Trials addressing analysis of CECs in various matrices by novel analytical technologies.
Pursue progressive testing and implementation of novel sampling and analytical methodologies to help design smart(er) monitoring strategies that can be applied in regulatory monitoring activities.
Rationale
NORMAN brings together the leading European institutions in the development and harmonisation of measurement methods for the detection of emerging chemicals in the environment. The studies organised by the network represent a crucial step for the scientific community and for environmental agencies for validation and harmonisation of innovative sampling and monitoring tools before their possible future implementation in regulations.
NORMAN is the author of the first common framework for validation of chemical and biological monitoring methods—a protocol which is now adopted as a Technical Specification (TS) of the European Committee for Standardization (CEN) (CEN TS 16800:2015) [36, 37].
More than 15 collaborative trials have been organised by NORMAN since 2006 on a wide range of methods, including non-target screening in water [38], sediment [39], indoor dust [40] and biota [41], in vitro and in vivo bioassays [42] and passive sampling [43, 44]. They have tackled aspects relevant to monitoring and early warning of CECs in the environment and approaches to hazard assessment, including integration of effect-based methods with chemical analysis to improve interpretation of cause–effect links. These trials included not only the assessment of sample preparation and instrumental performance, but also the evaluation of the impact that computational and data processing tools have on interpretation of results.