Key Points
- Researchers from the University of Amsterdam and the University of Queensland developed a prioritization method linking LC–HRMS fragmentation and chromatographic data directly to aquatic toxicity categories—bypassing the need to identify individual compounds. This significantly enhances early-stage screening of emerging contaminants in complex environmental samples.
- Two models—Random Forest Classification (RFC) using MS1, retention time, and fragmentation (CNLs), and Kernel Density Estimation (KDE) using only MS1 and retention—were used to predict toxicity. RFC excels when fragmentation data is present; KDE is valuable when it's missing, offering a complementary dual approach.
- The approach limits bias from standardized analytes by using data-independent acquisition (DIA) and open-source tools. A key innovation is the ability to predict toxicity directly from spectral data, reducing uncertainty associated with structure-based models and improving risk assessment even when most compounds remain unidentified.
Complex environmental samples contain a diverse array of known and unknown constituents. While liquid chromatography coupled with high-resolution mass spectrometry (LC–HRMS) nontargeted analysis (NTA) has emerged as an essential tool for the comprehensive study of such samples, the identification of individual constituents remains a significant challenge, primarily due to the vast number of detected features in each sample. To address this, prioritization strategies are frequently employed to narrow the focus to the most relevant features for further analysis. A recent study conducted by the University of Amsterdam (Amsterdam, Netherlands) and the University of Queensland (Queensland, Australia) developed a novel prioritization strategy that directly links fragmentation and chromatographic data to aquatic toxicity categories, bypassing the need for identification of individual compounds. LCGC International spoke to Viktoriia Turkina of the University of Amsterdam, lead author of the paper that resulted from this study (1), about their work.
Can you explain the benefits of using LC–HRMS nontargeted analysis (NTA) for investigating contaminants of emerging concern (CECs) compared to targeted approaches?
Both targeted and nontargeted approaches are indispensable for a comprehensive analysis of CECs. Targeted analysis is essential for detecting and quantifying specific, well-characterized compounds with high accuracy and sensitivity. However, it is inherently biased, thus limited to compounds we already know and expect to find.
On the other hand, LC–HRMS NTA aims to provide an unbiased and comprehensive chemical fingerprint of a sample. A major advantage of LC–HRMS NTA over targeted approaches when investigating CECs, is that it does not require prior knowledge of the chemical composition of a sample. This is crucial given the vast and ever-growing number of chemicals in the environment, many of which are not routinely monitored. This enables the detection of a much broader spectrum of chemical features, including novel or transformation products that would be missed by targeted methods.
What are some of the key challenges in using LC–HRMS NTA for environmental samples, and how would you approach preprocessing the data?
One of the main challenges of using LC–HRMS NTA in environmental analysis is the complexity and convolution of the data. Often thousands of features are detected, which are signals defined by mass-to-charge ratio (m/z), retention time, and intensity. Coelution, matrix effects, and instrumental noise often introduce false-positive features or obscure relevant signals. Additionally, fragmentation spectra are often incomplete, making it difficult to reconstruct structures or even assign meaningful annotations. Typically, fewer than 5% of the detected compounds can be confidently annotated and identified.
Data preprocessing is essential to manage this complexity, and a specific approach depends on the goal of the study. In a typical NTA workflow, it begins with feature detection to filter out irrelevant instrument signals and retain true analytical features. Then, features detected in procedural blanks are removed, and alignment is applied across replicates to ensure consistency. After that, features undergo what’s called componentization. This means grouping related signals such as adducts, in-source fragments, isotopologues, and MS2 data together to represent a single compound. Once components are formed, they can proceed to identification with either reference data, spectral libraries, acquisition of standards or application of orthogonal analytical methods.
Because most features remain unidentified, prioritization becomes a key step. Depending on the study’s scope, prioritization strategies can be applied at different stages: either based on intensity, statistical relevance, or, as we demonstrated in our study, predicted aquatic toxicity.
Why is it important to limit bias toward standardized analytes in NTA workflows, and what strategies can be used to achieve that?
Limiting bias toward standardized analytes in NTA workflows is crucial because one of the core strengths of nontargeted analysis is its ability to detect both known and unknown compounds. Focus on a standardized analytes inherently introduces bias by excluding chemicals that haven't been previously studied. In the context of environmental analysis, where we're dealing with a vast and evolving number of chemicals, including transformation products and newly introduced substances, this can lead to a significant underestimation of potential exposure and risk.
To minimize these biases, several strategies can be used. One is to apply data acquisition methods that do not rely on predefined inclusion lists. For instance, using data-independent acquisition (DIA) rather than data-dependent acquisition (DDA), which can miss low-abundance but potentially important features. Another strategy is to use open-source or customizable preprocessing tools that allow the user to retain low-intensity or atypical features, rather than automatically discarding them based on general filters.
The text mentions that only about 5% of detected features in NTA studies are typically identified. What factors contribute to this low identification rate?
The size and diversity of the chemical space in environmental samples are enormous. We are dealing with hundreds of thousands of compounds, including unknown transformation products, contaminants that were never registered, and naturally occurring substances that may not exist in any current database.
Another current major limitation for the identification is the lack of comprehensive spectral libraries. While databases like MassBank or MoNA are valuable, they cover only a small fraction of the total chemical space. If a compound isn’t in the library, even a good-quality MS2 spectrum won’t lead to tentative identification. This is further complicated by the variability in fragmentation behavior across instruments and settings, which makes spectral matching less reliable.
Incomplete or poor-quality MS2 data also contributes to the problem. In environmental samples, many features are present at low concentrations, which can result in weak or missing fragmentation spectra. If a compound is detected in MS1 but not fragmented or fragmented incompletely, it becomes very difficult to identify.
Finally, the identification process itself often depends on accurate prediction of molecular formulas and fingerprints. Tools like SIRIUS or CSI:FingerID are powerful, but their accuracy can be affected by incorrect fragment annotation, background noise, or coeluting compounds. Any errors in this step can lead to false identifications.
Describe the differences between online and offline prioritization strategies in LC–HRMS studies. What are the trade-offs between them?
Online and offline prioritization strategies in LC–HRMS studies differ mainly in when and how prioritization is applied during the data acquisition and analysis process.
Online prioritization happens during data acquisition. For example, in data-dependent acquisition (DDA), the instrument is set up to focus on certain features to trigger MS2 scans. Typically, those above a specific intensity threshold or from a predefined inclusion list. This approach can improve data quality for selected compounds, as it allows for cleaner, high-resolution fragmentation spectra with better sensitivity. However, the trade-off is that it introduces bias: potentially important, but low in abundance, features might be missed entirely if they don’t meet the triggering criteria. That’s especially problematic in environmental samples, where even trace-level contaminants can have ecological relevance.
Offline prioritization, on the other hand, is applied after the data have already been collected. All features are recorded in MS1, and potentially some in MS2 through DDA or DIA, and then post-processing is used to decide which features to focus on. This approach is more comprehensive because it preserves the full chemical information of the sample and allows for retrospective analysis. The trade-off with offline prioritization is that it requires more complex data processing and some features may lack sufficient MS2 information for confident follow-up. But overall, it offers greater flexibility and minimizes the risk of missing unknown but relevant contaminants.
In short: online prioritization is more targeted and sensitive but risks missing unexpected compounds, while offline prioritization is broader and more exploratory but depends on sophisticated data handling.
In the context of environmental studies, how can prioritization strategies influence the focus of downstream analysis?
Given the complexity of environmental samples and the limited resources available for full identification or toxicity testing, prioritization acts as a filter to highlight the most relevant or potentially harmful compounds. This allows researchers to focus downstream efforts, such as confirmation, identification, or regulatory assessment, on features that are more likely to pose, for example, ecological threats, even if we don’t yet know exactly what they are. Ideally, prioritization connects broad, unbiased detection in NTA to actionable insights in environmental monitoring and risk assessment. It transforms a vast and complex dataset into a manageable, targeted list of features that guide the rest of the study.
How does data-dependent acquisition (DDA) differ from data-independent acquisition (DIA) in mass spectrometry, and how might each impact prioritization workflows?
Data-dependent acquisition (DDA) and data-independent acquisition (DIA) are two different strategies for collecting MS2 data in LC–HRMS.
In DDA, the instrument selects precursor ions for fragmentation in real-time, typically based on intensity. This means only the most abundant ions at any given moment are selected for MS2 scans. While this results in high-quality fragmentation spectra for those ions, it also means that lower-abundance or coeluting compounds may be missed entirely. This selective nature can bias the data toward known or high-intensity features, which in turn limits the comprehensiveness of downstream prioritization, especially for studies that aim to capture unknown or low-level contaminants.
In contrast, DIA fragments all ions across a predefined mass range in a systematic and unbiased way, regardless of intensity. This results in highly multiplexed MS2 data that cover all detectable features, increasing the likelihood of capturing information on low-abundance or unexpected compounds. The trade-off is that DIA spectra are more complex and require more advanced deconvolution during data processing to assign fragments correctly.
From a prioritization standpoint, DDA can limit your ability to assess many features because MS2 data may be missing for a large portion of them. This restricts the use of fragmentation-based prioritization models, like the CNL-based model we used in our study. DIA, on the other hand, provides broader MS2 coverage, which supports more comprehensive prioritization, but places greater demands on preprocessing and computational tools to resolve overlapping spectra.
Ultimately, the choice between DDA and DIA affects both the quality and completeness of fragmentation data and therefore determines how many features can be prioritized based on their fragmentation behavior. Depending on the study’s scope (target-focused or exploratory) one approach may be more suitable than the other.
What are some key features or descriptors extracted from chromatographic and mass spectral data that can be used to predict chemical toxicity?
In LC–HRMS-based nontargeted analysis, even without knowing a compound’s exact structure, we can extract meaningful features from chromatographic and mass spectral data to help predict chemical toxicity, for example, monoisotopic mass, retention time (or retention index), and information from fragmentation spectra.
Monoisotopic mass gives a general idea of the size and elemental composition of a molecule. Retention time reflects its polarity and hydrophobicity, which are closely related to properties such as bioaccumulation and membrane permeability, that considered to be important factors in toxicity. From the fragmentation spectra reflect substructures and functional groups that may be linked to toxic mechanisms.
The document mentions using Random Forest Classification (RFC) and Kernel Density Estimation (KDE) models. How do these models differ in their approach to classifying features into toxicity categories?
The RFC model learns patterns from monoisotopic mass, retention index, and cumulative neutral losses (CNLs) derived from fragmentation spectra. Instead, KDE estimates the probability density of features within the chromatographic and MS1 space (mass and retention time), based on known distributions of toxicity categories. When a new feature is introduced, the model assesses which region of this space it falls into and assigns it to the toxicity category with the highest probability density.
The key difference is that RFC learns from examples and relies on supervised training, whereas KDE maps the density landscape of the feature space and makes assignments based on proximity to known toxicological patterns. RFC tends to perform better when MS2 data are available and consistent, while KDE is more flexible and useful when fragmentation data are missing or unreliable.
In practice, these models complement each other: KDE broadens the applicability of toxicity predictions when MS2 data are lacking, and RFC adds higher precision when rich fragmentation information is available. This dual approach allows for a more robust prioritization strategy for NTA.
Discuss the potential limitations of predicting toxicity based solely on mass spectrometry data without relying on molecular structure or fingerprints.
Mass spectrometry provides indirect information about a compound, but it doesn’t always reveal the full structural context. Two very different molecules might have similar MS1 and retention characteristics or share common fragment ions, which could lead to incorrect or overly broad toxicity predictions. Without structure we can miss subtle but important differences in functional groups or stereochemistry that significantly affect toxicity.
Another limitation is uncertainty in the predictions. While our models, like the Random Forest and KDE approaches, demonstrated good overall performance, they rely on patterns learned from known compounds. When applying them to truly novel or underrepresented chemistries, there’s a risk that predictions become less reliable, especially in the absence of structural confirmation.
Also, toxicity is often mechanism-specific, meaning it depends not just on general physicochemical properties, but on how a molecule interacts with biological systems. These interactions are difficult to fully capture with mass data alone.
That said, our approach helps bridge the gap when the structure is unknown or MS2 data are incomplete. It allows for early-stage screening and prioritization, which is extremely valuable in complex environmental samples. But it should be viewed as a complementary tool and not as a replacement for structure-based toxicity assessment.
Why might directly predicting chemical activity from chromatographic and MS data reduce uncertainty compared to models based on molecular fingerprints?
Molecular fingerprints are derived from predicted molecular structures, which themselves are inferred from MS2 data. But this step is prone to error, especially when the fragmentation spectra are incomplete, noisy, or missing key diagnostic fragments. If the structure prediction is inaccurate, the resulting fingerprint will be wrong, and that will compromise any toxicity prediction built on it.
By contrast, our approach bypasses the need for structural reconstruction entirely. Instead of predicting a molecular fingerprint, we use information that is directly observed: the monoisotopic mass, retention time, and cumulative neutral losses (CNLs) from the fragmentation spectrum. As a result, there’s less room for error, and predictions are more robust, particularly for unknown or novel compounds that fall outside existing chemical databases.
This is especially important in environmental samples, where a large portion of detected features remain unidentified. By using directly measured chromatographic and spectral features, we can still assess their toxicity.
Given the need to monitor a diverse set of chemicals in environmental samples, what do you think are the most promising areas of research to improve exposure assessment and risk evaluation?
I think one of the most important areas is expanding and harmonizing toxicological databases, especially for sub-lethal, chronic, or mixture effects. Most current risk assessments rely on acute toxicity data for a limited number of model organisms. But real-world exposure involves mixtures, low doses, and diverse species. Developing better in vitro and in silico models, including new endpoints and multi-species frameworks, will be key for filling those gaps.
I also see huge potential in retrospective data mining and temporal trend analysis. With NTA data, we’re often sitting on a large archive of information. If we can improve tools for reanalyzing existing datasets as new chemicals or risks emerge, we can gain insights into long-term trends and previously overlooked exposures, without needing to repeat sampling or analysis.
Finally, enhancing interoperability and automation in data processing, through open-source tools, standardized formats, and data driven annotation, will be essential to make NTA-based exposure assessment more scalable and routine in regulatory or monitoring contexts.
Altogether, combining high-resolution data with predictive models, broader toxicity endpoints, and better data infrastructure is what will allow us to move from just detecting chemicals to understanding and managing their risks in a more proactive and comprehensive way.
If tasked with improving the accuracy of toxicity predictions from LC–HRMS data, what steps would you prioritize in model development and validation?
First, I would focus on expanding and diversifying the training data. One major limitation in current models is the relatively small set of compounds with both high-quality LC–HRMS spectra and experimentally measured toxicity values. Increasing the chemical and structural diversity of the training set, especially for underrepresented toxicity classes, would help reduce bias and improve generalization. It would also be important to include compounds with a wider range of modes of action and environmental relevance.
Second, I would work on improving feature engineering. Right now, models like ours use descriptors such as retention index, monoisotopic mass, and cumulative neutral losses. These are already informative, but combining them with new types of data, like ion mobility, peak shape descriptors, or metadata from chemical use or environmental occurrence, could add more predictive power. Improving the quality and consistency of fragmentation data through better acquisition settings or preprocessing would also make a big difference.
Next, I would invest in applicability domain (AD) analysis and uncertainty quantification. Knowing where the model performs reliably and where it doesn’t is just as important as the prediction itself. Tools like leverage analysis or probabilistic thresholds (which we used in our KDE model) can help guide users on when to trust a prediction.
Finally, external validation on truly independent datasets is critical. It's easy to overestimate performance when testing on data like the training set. Validating models on real-world, complex samples is a more realistic and rigorous way to assess performance.
In short, to improve toxicity prediction, I would focus on growing and refining the training data, enriching the input features, strengthening interpretability through AD analysis, and validating on diverse, real-world samples. I believe that these steps together would support the development of more robust, scalable models for environmental risk assessment.
Describe how the integration of machine learning tools could transform environmental risk assessment workflows. What are the biggest challenges to their adoption?
Machine learning can help us extract meaningful patterns from high-dimensional data, like LC–HRMS results, and make predictions about toxicity, persistence, or bioaccumulation even when full structural information is unavailable.
This shift enables faster predictive risk assessment. For example, machine learning models like the ones we developed can prioritize unknown features based on predicted toxicity directly from mass spectrometry data. That means regulators and researchers can focus attention on the most concerning compounds earlier in the process, even if they haven’t been identified yet. It also opens the door to retrospective analysis of existing datasets and better monitoring of emerging contaminants.However, there are still significant challenges to adopting these tools in routine practice. One major hurdle is trust and interpretability. Regulatory agencies need transparency in how models make decisions, especially when those decisions influence public health policy. Black-box models are often difficult to validate or explain, so there’s a need for interpretable and well-documented machine learning approaches.
Another challenge is data quality and availability. Machine learning is only as good as the data it’s trained on, and in environmental science, high-quality, labeled datasets. especially for toxicity endpoints, are limited. Standardization of data formats, better data sharing, and expanding toxicity databases will be essential.
Finally, integration into existing regulatory workflows takes time. It requires collaboration between computational scientists, chemists, toxicologists, and policymakers. Training, validation, and acceptance of new tools must be supported by clear guidelines and frameworks.
Overall, machine learning currently cannot replace traditional risk assessment, but it can greatly enhance it by helping us move from reactive to proactive strategies, identifying risks earlier, and making better use of the data we already collected.
References
- Turkina, V.; Gringhuis, J. T.; Boot, S. et al. Prioritization of Unknown LC-HRMS Features Based on Predicted Toxicity Categories. Environ. Sci. Technol. 2025, 59 (16), 8004-8015. DOI: 10.1021/acs.est.4c13026