GC×GC-HRMS nontarget fingerprinting of organic micropollutants in urban freshwater sediments

Background : Sediments are sinks for organic micropollutants, which are traditionally analysed by gas chromatography-mass spectrometry (GC-MS). Although GC-MS and GC-MS/MS (tandem MS) are preferred for target screening, they provide only limited chromatographic resolution for nontarget screening. In this study, a comprehensive two-dimensional GC-high-resolution MS method (GC×GC-HRMS) was developed for nontarget screening and source identification of organic micropollutants in sediments from an urban channel and adjacent lake in Copenhagen, Denmark. The GC×GC-HRMS data were processed by pixel-based chemometric analysis using baseline subtraction, alignment, normalisation, and scaling before principal component analysis (PCA) of the pre-processed GC×GC-HRMS base peak ion chromatograms (BPCs). The analysis was performed to identify organic micropollutants of high abundance and relevance in the urban sediments and to identify pollution sources. Tentative identifications were based on match factors and retention indices and tagged according to the level of identification confidence. Results : The channel contained both significantly higher concentrations of micropollutants and a higher diversity of compounds compared to the lake. The PCA models were able to isolate distinct sources of chemicals such as a natural input (viz. a high relative abundance of mono-, di- and sesquiterpenes) and a weathered oil fingerprint (viz. alkanes, naphthenes and alkylated polycyclic aromatic hydrocarbons). A dilution effect of the weathered oil fingerprint was observed in lake samples that were close to the channel. Several benzothiazole-like structures were identified in lake samples close to a high-traffic road which could indicate a significant input from asphalt or tire wear particles. In total, 104 compounds and compound groups were identified. Conclusions : Several chemical fingerprints of different sources were described in urban freshwater sediments in Copenhagen using a pixel-based chemometric approach of

Tailored pre-processing and careful interpretation of the identification results is inevitable and still requires further research for an automated workflow. BACKGROUND Anthropogenic pollution of freshwater ecosystems via agricultural, industrial and urban activities is ubiquitous. The World Water Assessment Programme (WWAP) estimated in 2017 that approximately 80% of all wastewater, globally, is discharged into the environment without treatment 1 . However, environmental awareness is increasing and thus, the desire for a better understanding of what types of pollutants are ending up in our environment. Anthropogenic micropollutants comprise numerous chemicals and their degradation products such as pharmaceuticals, detergents, pesticides and chemicals from consumer care products 2, 3 . These can be introduced to freshwater ecosystems via, e.g., household or industrial waste effluents; littering, road runoff and car exhaust. Compounds with low water solubility and high octanol/water partition coefficient (log K OW ) mostly deposit in sediments and are only released slowly to the water where they could cause adverse effects to the local fauna. The European Parliament and Council identified 45 priority substances in 2013 that every member state ought to monitor in surface waters at least once a year 4 . The list includes several heavy metals, pesticides, halogenated compounds, polycyclic aromatic hydrocarbons (PAHs) and phenols.
Gas chromatography (GC) coupled with mass spectrometry (MS) is the conventional solution for target analysis of volatile and semi-volatile persistent organic pollutants (POPs) in the environment 5 . The use of extracted ion chromatograms (EICs) facilitates the identification of known compounds; however, monitoring unknown chemicals or chemicals of emerging concern (CECs) that have previously not been included on any regulatory list, is a more challenging task. For a more comprehensive chemical impact assessment of our environment, suspect screening and nontarget screening (NTS) are used in combination with target analyses 2,6 . In suspect screening and NTS, the identification of unknown compounds is usually made by comparing experimental mass spectra with MS-libraries such as NIST for GC-MS with electron ionisation 7 . An adequate chromatographic resolution that provides mass spectra of the unidentified peaks free of chemical interferences is key to increase the reliability of the spectral matching. Therefore, GC-MS may not provide sufficient resolution for NTS of complex environmental matrices such as sediments.
Comprehensive two-dimensional GC with either low or high-resolution mass spectrometry detection (GC×GC-(HR)MS) provides higher peak capacities than those obtained by onedimensional GC both for target analysis of, e.g., PAHs, PAH derivatives and organochlorine pesticides 8 ; and for NTS where more unambiguous identification of POPs has been obtained based on e.g., high-resolution neutral loss of halogens, isotopic patterns and mass defect calculations 9,10 . Additional applications of GC×GC-(HR)MS for environmental purposes have also been reported 3,[11][12][13][14] . Data mining in NTS is still labour-intense, and prioritisation and identification of thousands of peaks can be challenging, especially with large datasets 6,15 . Methods for peak prioritisation in NTS have been described in the literature, e.g., by intensity, specific isotopic patterns or as part of a homologous series 6,16 , but few methods have been suggested for sample prioritisation. Even though chemometric tools, such as principal component analysis (PCA), have already been incorporated in NTS workflows to prioritise specific chemical profiles 6 , applications in environmental monitoring are still limited, particularly for GC×GC-(HR)MS. However, some benefits of chemometrics in NTS are apparent, e.g., the extraction of the most relevant chemical patterns or different sources of pollution can be visualised from the entire chemical fingerprint of the samples. In the so-called pixel-based approach, the data analysis is performed directly on the chromatographic pixels 17 . Therefore the main variance in the data is displayed directly on the chromatographic space without prior peak-picking. The pixel-based approach facilitates the interpretation of structured chromatograms commonly found in GC×GC-HRMS data, such as homologous series. For example, a study by Alexandrino et al. successfully implemented the pixel-based analysis for forensic investigations of diesel spills in the environment based on GC×GC-HRMS data 18 .
This study aimed to characterise freshwater lake and channel sediments in an urban area (Copenhagen, Denmark) based on an NTS analytical workflow using GC×GC-HRMS. First of all, the overall chemical variation between and within two sampling sites was investigated with pixel-based PCA. Second, a PCA for each sampling site intended to give a refined insight into the sources of pollution and chemical fingerprints. Subsequently, distinct samples were prioritised based on the pixel-based PCA for tentative compound identification. Figure 1 shows the two sampling sites from the sampling campaign in September to November 2017: the lake Utterslev Mose (UTM) and the adjacent fortress channel (FSK as in Faestningskanalen) in Copenhagen, Denmark. The lake is part of a protected nature park in the western part of Copenhagen and is, among others, fed by the fortress channel which surrounds the Danish capital. Sampling was performed with a Kajak sediment core sampler (KC Denmark A/S) with acrylic sample tubes (length: 47.5 cm, i.d.: 0.46 cm, wall thickness: 0.4 cm). Top-layer sediment samples (0-30 cm) were obtained from the lake.

Sampling
Five increments were pooled to form one composite sample in a sampling grid of ca. 50×50 m. Three types of sample were collected at the fortress channel: (i) composite samples with five increments (indicated by C0) for six out of eight sampling grids (ca. 80×10 m) from which (ii) two were not pooled in the vertical direction (indicated by C1: 0-10; C2: 10-20; C3: 20-30 cm); and (iii) five single samples (pooled vertically, indicated by S01 to S05) were collected for two out of eight sampling grids. After draining most of the surface water and proper mixing of the increments or single samples, representative mass reduction (between 5:1 and 2:1) was performed with a home-made mass reduction tool (stainless steel, Figure S1) to approximately 200 g. Mass-reduced samples were transferred to Rilsan® bags and stored at 4 °C. In total, 19 and 28 samples were retrieved from FSK and UTM, respectively. The lake was further divided into three regions: Region I -Close to FSK, Region II -Close to a road, and Region III -Centre of the lake (Figure 1).

Sample preparation
Samples were extracted within one month after sampling. Pressurised liquid extraction

Chemical analysis
Only the nonpolar fraction was analysed with GC×GC-HRMS. Extracts from FSK were diluted five times with the extraction solvent to adjust concentration differences between UTM and FSK samples as the total concentrations were higher in FSK. Facilitator samples from UTM (AnQC UTM ) and FSK (AnQC FSK ) were prepared by pooling equal volumes of eight and seven extracts from UTM or FSK, respectively (Table S3). The facilitator samples were used during signal processing. Five batches containing ten randomly selected samples (including extraction replicates) and at least one of each AnQC UTM and AnQC FSK were analysed in each batch. A reference crude oil sample (1.25 mg mL -1 ), an n-alkane series mix (Florida mix), the deuterated standard mixtures (50 ppb) and standard mixtures (Table S1)  were injected in splitless mode with an inlet temperature of 300 °C. The primary oven temperature was ramped according to the following gradient: 60 °C held for 3.5 min, 7.5°C min -1 to 310 °C, and held for 15 min. The secondary oven temperature was operated at a constant temperature offset of +10 °C from the primary oven. Helium (≥99.9999%, AGA, Pullach, Germany) was used as carrier gas at a constant flow rate of 1.5 mL min − 1 . The modulation was performed using an independent cooling system that provided cold jets of N 2 (g) at approximately −70 °C, while the hot jets (pressure = 20 psi) were produced with N 2 (g) heated at a constant temperature offset of +60 °C from the primary oven.

Pre-processing
The data files were converted to netCDF files using the AIA File Translator programme from Agilent which bins the m/z values according to an accuracy of ±0.025 Da. An inhouse script was written to import netCDF files into Matlab R2017b (MathWorks, Natick, MA, USA). The script applied a filtering step that excluded the most dominant (highresolution) m/z values derived from column bleeding (Table S4). The two-dimensional base peak ion chromatograms (2D-BPCs) ( 1 t R × 2 t R ) were extracted, where 1 t R and 2 t R are the retention times in the first ( 1 D) and second ( 2 D) dimension, respectively. Each 2D-BPC was phase-corrected to account for rigid retention time shifts due to the modulation.
Further, due to differences in total concentration and sample complexity, the dataset was divided according to the two sampling sites. For alignment, a 2D correlation optimised warping algorithm (2D-COW) was used 19 , where each 2D-BPC was aligned to the target BPC from the corresponding AnQC (FSK or UTM) obtained from the third batch (middle of the sequence). The optimal warping parameters, segment length and slack (viz., how much each segment is allowed to change in the alignment) were obtained with an in-house script in Matlab that performs a grid search in the parameters space, herein considering both chromatographic dimensions. The two aligned datasets were individually unfolded into row-vectors that resulted in a data matrix Di(K, L), i = UTM or FSK, where K is the number of objects (i.e., samples, replicates and AnQC samples) in sampling site i and L the number of variables or pixels that represent the aligned BPCs. Blank-subtraction was performed by removing the peaks that were present in the aligned BPC of the blanks suggesting column bleeding, instrumental contamination and contamination from the sample preparation. Finally, Di was normalised to the unitary Euclidian norm to decrease the concentration effects and focus the pixel-based analysis of relative differences in the chemical fingerprints. The normalisation also aims to remove the minor variation of injection volume between samples and to reduce the effects of variations in the instrument sensitivity along with the analysis of the batches 20 . The BPCs of QC ex , reference crude oil, Florida mix, blanks and standards were not included in Di; they were used for quality assurances and identification and not for the modelling part (see below).

Pixel-based analysis and weighted-principal component analysis (WPCA)
The analyses were performed with WPCA 18 . The variation that is unrelated to the chemical composition (e.g., instrumental noise, residual retention time shifts, column bleeding and peak saturation) negatively affects the quality of PCA models. Therefore, each BPC in Di was weighted by dividing each row of Di element-wise by a vector w (defined as the pixel-by-pixel relative analytical standard deviations (RSD) of the BPC exclusively from the AnQCs). The weighting down-scales pixels noise regions (e.g., electronic and chemical noise) and regions of the BPC where the alignment is poor (e.g., fronting and tailing sections), and up-scales regions that contain chemical information (peak regions) 21  combining the two datasets of UTM and FSK, and included the facilitator samples AnQC UTM and AnQC FSK (6×35,000 and 5×35,000, respectively). Subsequently, the more detailed chemical variability occurring within each of the two sampling sites (UTM and FSK) was assessed through local models fitted for D FSK (23×35,000) and D UTM (39×35,000), respectively. All models were fitted on the mean-centred data. Only principal components (PCs) that contain chemical information rather than noise were further evaluated.

Compound identification
The interpretation of the models in an environmental context requires the (tentative) identification of peaks expressed in the PC loadings. Additionally, the score plot for each PC can be utilised to select samples in which the corresponding chromatographic pattern obtained from the PC loadings is more evident, i.e., the samples that are projected in the corners of the score plot for a particular PC. Selected raw data files were converted to *.mzXML files using ProteoWizard (v 3.0.19140) and subsequently imported to Matlab. The data was re-folded into the original 2D-structure. Next, mass spectra at the maximum height of each identified peak in the corresponding 2D-BPC were extracted and organised in individual text files which were submitted to NIST14® MS library (Gaithersburg, MD, USA). The cut-off for a mass spectra matching was a match factor (MF) of ≥ 800 -all hits that were below that value were not considered for identification. The hit with the highest MF was selected for identification, and the results were organised in a list with all tentatively identified compounds. All the steps in Matlab were performed using in-house scripts.
MassHunter's software Unknowns Analysis (B.07.00) was used as an additional tool for compound identification to verify the results from the Matlab workflow. Parameters are described in Table S5. Additionally, compounds that were part of the spiking mixtures were targeted. Identified compounds that were found with Unknowns Analysis and the inhouse Matlab workflow were collected in a table (Table 1, Excel sheet in SI2), including structural identifiers, experimental and literature retention indices (RI, based on n-alkanes C10-C26 in the reference oil and Florida mix, Table S6), and identification confidence levels (with Level 1 -confirmed structure with reference standard; Level 4 -unequivocal molecular formula) 22 . If the experimental and literature RI were different by ±50 units, the confidence level was set down to Level 4. results and discussion

Screening of sample extracts
Substantial concentration differences were observed between UTM and FSK samples. In Figure 2, the 2D-BPCs of facilitator samples for FSK (AnQC FSK ) and UTM (AnQC UTM ) are compared. The 2D-BPC of AnQC FSK contains > 1000 peaks while 2D-BPC of AnQC UTM is much less populated, which demonstrates that the overall concentration of compounds in the two combined extracts is significantly higher in the fortress channel (FSK) compared to the lake (UTM). Some peaks in AnQC FSK expressed wraparound after 30 min, viz. peaks that should have retention in 2 D higher than the modulation period of 6 s. Fortunately, these wraparounds did not co-elute in 2 D with other peaks. Moreover, a detector overload can be seen in the 2D-BPC of AnQC FSK (#58, large yellow blob after 35 min in Figure 2). This peak or cluster of peaks (which is difficult to assess due to the detector overload) was identified as phthalic acid esters (Table 1)  Target analysis was performed for the spiked compounds listed in Table S1 in the GC×GC-HRMS chromatograms of standard mixtures and the two facilitator samples from the third batch. In the standard mixture, 67 out of 109 compounds were detected; 25 out of the 67 were also found in the facilitator samples (Table S1). Some of these compounds represent a group of isomers with the same monoisotopic mass and molecular formula, and thereby enabled the identification of groups of these isomers in other samples. Aromatic and fatty acids and steroids were not detected because of, e.g., degradation at high temperatures, high polarity or low volatility of the compounds; while (alkylated) four-ring PAHs (and higher) were not detected because of the lower maximum oven temperature in the GC×GC-HRMS method compared to GC-MS methods.

Pixel-based analysis
The enhanced peak capacity and sensitivity of GC×GC-HRMS allows the separation of thousands of compounds with different physicochemical properties in such complex environmental samples. Compound identification, however, can be cumbersome still, because of the often overwhelming number of peaks and large datasets in GC×GC-HRMS.
Prioritisation is often unavoidable and is based, for example, on signal intensity, peaks with a specific isotopic pattern or mass defect, to name a few 6,16 . Often, some samples do not add meaningful information, especially when many samples are collected for spatial and temporal investigations. The pixel-based PCA provides information on the highest variation in a dataset, and thus, helps to focus the identification only on the samples with the highest variation and unique chemical fingerprints. Furthermore, the positive and negative signals in a loading plot can be used to identify peaks of high relevance.

Global model
The global model includes all measured samples from both sampling sites and describes the overall variation within and between the two sampling sites. The score plot in Figure 3 is a projection of the samples in the WPCA model ( Figure S2 for loading plots). Thus, the chemical similarities between pair or group of samples can be assessed while comparing the distances of their coordinates in this variable-reduced space spanned by the PCs. The first PC explains 73.10 % of the total variance, whereas PC2 describes 10.81 % (Figure 3).
Samples from FSK showed a more considerable variation along the PCs subspace compared to the UTM samples, which also demonstrates that the FSK samples contain a more substantial chemical heterogeneity across the sampling site. In general, the samples were separated in the WPCA model according to sampling location. However, there is an overlap along PC1 between UTM samples collected close to the outlet of the channel (Region I) and close to the road (Region II) (Figure 3). A reasonable hypothesis is that the samples collected in these specific locations of the lake are affected by chemical inputs from FSK and the urban areas delimited by Region II. In contrast, the UTM samples from the centre of the lake (Region III) are less affected by the chemical inputs from both FSK and Region II, due to dilution in the lake.
In summary, the global WPCA model was able to show that i) FSK and UTM sampling sites contain distinct chemical fingerprints and ii) chemical inputs to UTM may come from FSK (Region I) and the urban areas delimited by Region II. To assess the chemical composition of the samples from each sampling site individually, and to elucidate the contamination sources, local WPCA models for FSK and UTM sample sets were calculated. The local WPCA models were used to prioritise a subset of samples within each site in order to reduce the identification workload.

Local models -Channel (FSK) site
The local model of the FSK sampling site explained 91.27 % of the total variation and was built using six PCs. Chemical interpretation of the loading coefficients of PC1 to PC6 was performed to assess chemically relevant patterns and the presence of modelling artefacts ( Figure 4). For example, a negative score in the PC1 loading, such as the dark blue peak #7 in Figure 4, and a negative score in the scores plot ( Figure 5) indicates that that particular sample (2C1) has a high relative concentration (due to the nature of the normalisation) of that compound compared to the samples with high positive PC1 scores.
PC3 and PC4 loadings majorly describe residual retention time shifts that could not be fully removed during pre-processing ( Figure S3). Therefore they do not explain relevant chemical information from this site and were not discussed further.  Table 1, Table S1). They indicate a continuous input of mineral oil products in this particular region of the fortress channel (negatively scoring samples 1C to 2C) in the past. The top-layer (2C1, 0-10 cm) positively scores in PC2, thus, there is relatively less of these oil components within the entire chromatographic fingerprint of these samples. Therefore, it could be presumed that the spill did not occur recently. The retained oil compounds were weathered and potentially biodegraded in the lower sediment levels (2C2 to 2C3, 10-30 cm) seen by the many alkylated compounds such as the C4 -C6benzenes (#10.4 -10.6), naphthalenes (#48.1 -48.4), PAHs and dibenzothiophenes (#30. 1 -30.2). Alkylphenols (#7) and butylated hydroxytoluene (#19) also occurred with negative PC2 coefficients ( Figure 5). They have a wide variety of applications in consumer products such as detergents or cleaning products; butylated hydroxytoluene is known as an antioxidant and is widely used in fuels to prevent oxidation. The most common alkylated phenol is nonylphenol which is primarily used for the production of nonylphenol ethoxylate surfactants or a degradation product of them. They are defined as endocrine disrupters, and are persistent and bioaccumulative, and have been found ubiquitously in the aquatic environment 28 . The positive PC2 loading coefficients are mostly described by a few peaks (#39 diterpenes and #64.d tetralin derivatives) which, as in the positive PC1 loading, is indicative for a dominating natural chemical fingerprint (Figure 4).  an acceptable pre-processing as part of the WPCA modelling 20 . Single sampling spots highlight that there were variations within the same sampling grid, e.g., samples 6S01, 02, 04 and 05 range from -0.055 to 0.038 in PC1 ( Figure 5). The loading of PC3 (positive direction) describes mainly sample 6S05 ( Figure S3). Composite samples, on the other hand, describe an average of five sampling spots per sampling grid, such as for sampling grids 3C0 or 7C0. The considerable variation within one sampling grid is significant with respect to the sampling strategy. Therefore, the chromatograms highlight the importance of the adequate sampling strategy, particularly for sediments and soil samples containing high chemical heterogeneity and immobility of non-polar contaminants 29 .
Six out of 18 samples were prioritised for compound identification according to their scoring in the particular loadings, namely 2C1, 2C2, 2C3, 6S0-5, 8S0-2 and 8S0-3. These samples build the corners or of the dataset in the particular loadings. Alkanes (#5), dichloro-diphenyl-dichloroethane (DDD, #24), polychlorinated biphenyls (PBCs, #51) and PAHs were observed in higher relative concentrations in-depth (Figure 4). Dichlorodiphenyl-trichloroethane (DDT) and PCBs are POPs and have been phased out in Europe in the 1970s. Therefore, it comes to no surprise to find these pollutants at relatively higher concentrations in-depth. However, this was difficult to recognise from the loading plots alone (Figure 4). Other compounds in Table 1 are not less important but were found in several samples, e.g., decalin and its derivatives (#26 in Table 1) which are industrial solvents used in fuel additives.
Sample 2C1 (0-10 cm) had a high relative concentration of alkylphenols (#7), alkylated benzenes (#10.4), non-alkylated styrene (#62) and tetralin (#64) ( Table 1). Sample 2C2 was relatively different from the rest of the samples which is implied by its relatively large negative score (between -0.052 and -0.055) in PC2 compared to the rest of the samples ( Figure 5). The only diterpenes that could be identified at Level 3 were 10,18bisnorabieta-8,11,13-triene (C 18 H 26 , #39.1) and Methyl-10,18-bisnorabieta-8,11,13-triene (C 19 H 28 , #39.2). The NIST library search suggested polycyclic musk (included in the term 'tetralin derivatives', #64.d) such as tonalide or versalide (used in personal care products), of which the former is associated with long-term adverse effects to the aquatic life. It was not possible, however, to confirm the tentative identification without the target analysis with standards.

Lake (UTM) site
The model for the lake site was built with four PCs and described 65.0% of the explained variance. The PC1 and PC3 scores and loading plots are shown in Figure 6 and  Figure 6. Samples close to a road (Region II) demonstrate a high variation in the score plot ( Figure 6) such as 1S and 4T (closest to the road on the east of the lake) and 12A and 9B-RI (from the south of the lake).
Despite the 1:5 dilution of the channel samples, compound concentrations in the lake were still considerably lower than compared to the fortress channel samples. The prominent peaks (e.g., #1, #11 and #31) in the PC3 loading (Figure 7) indicate potential insufficient remobilisation in 2 D of these compounds or that these compounds were tailing. Nevertheless, compound identification was possible due to the presence of only a few peaks in the 2D-BPC, which in combination with the insufficient remobilisation in 2 D makes it difficult to identify compounds.
As it was the case for the channel samples, alkylphenols (#7) were present at a high relative concentration in samples close to the outlet of the fortress channel (Region I) as it can be seen by the negative PC1 and PC3 loadings (Figure 7). Interestingly, this was not the case for samples from the east, which is indicative of dilution from the channel towards the lake ( Benzothiazoles and 2-mercaptobenzothiazole are used in various industrial processes, such as for rubber vulcanisation or as a corrosion inhibitor. These compounds are biologically active and potential aquatic toxins 31 . Tire wear particles were identified as a potentially significant source in the environment 31 . Dibenzylamine (#31) is also an additive and by-product from the production of rubber. Benzothiazoles, its derivatives and dibenzylamine were detected in sediment samples and associated with a rubber production factory in China 32 . The higher relative abundance of these compounds in samples 1S and 4T is most likely linked to the proximity to the highway, which is one of the four major routes to Copenhagen and among the ten busiest highways in Denmark 33 .
The contamination source and, to a large extent, the chemical fingerprint in PC 3 loading, could therefore be defined as traffic-related, perhaps even more specifically to tire wear particles. The impact on tire wear particles in the aquatic environment was recently reviewed by Wagner et al. 34 . *) If several compounds were identified for a compound group, the average of the mass error was calculated. †) Confidence ID levels from Schymanski et al. 22 . Retention time ( 1 D t R ) and index (RI) were extracted from the facilitator samples AnQC FSK and AnQC UTM and calculated based on n-alkane elution in the reference oil and Florida mix (Table S6). If the calculated RI deviated ≥ 100 from the NIST library, the confidence ID level was set down to 4. conclusions Nontarget screening of urban freshwater sediment was performed by GC×GC-HRMS and pixel-based chemometric analysis. The study shows that pixel-based PCA on the 2D-BPCs without prior selection of specific ions can be a powerful tool for the NTS of sediments.
The tiered NTS workflow included (i) a pre-screening of the sample extract raw chromatograms and taking a decision on the modelling strategy; (ii) a global pixel-based PCA model to obtain a map of the overall variation of the sampling area; (iii) local models of each sampling site for a more thorough investigation and identification; (iv) source identification and identification of prioritised peaks. A proper pre-processing of the data is crucial before building the models. It was possible to describe spatial and in-depth variation, and specific chemical fingerprints such as natural, weathered oil or high-traffic/ tire wear. The prioritisation for tentative identification was not based only on peak intensity, but also on the highest variation between the samples. The pixel-based PCA prioritisation is primarily favouring samples (and thus, compounds) with a varying concentration and omits potential contaminants that are present in all the samples at a very similar concentration level. For the identification workflow, however, the prioritised sample was analysed as a whole which allowed the reporting of more compounds than visible in the plots from the PCA. The authors also would like to emphasise the importance of an appropriate sampling strategy regarding solid environmental samples due to the considerable variation between different sampling spots as it was shown herein.