Non-Targeted Food Analysis: How HRMS and Advanced Data Processing Tools Address the Current Challenges

LCGC Europe, October 2021, Volume 34, Issue 10
Pages: 442–445

LCGC Europe spoke to Christine Fisher about the challenges and solutions associated with developing non-targeted food analysis methods, why data quality is so important, and how data processing software and algorithms are helping to tackle the current challenges in food analysis.

Q. Why are non-targeted methods necessary for food analysis?

A: Liquid chromatography–high‑resolution mass spectrometry (LC–HRMS) non-targeted methods can detect and identify new, unknown, and/or unexpected compounds and can be instrumental in pinpointing unsafe contaminants in food. They are often used in screening applications, such as olive oil authentication (1), or in response to adverse events, such as when milk and milk products were contaminated with melamine in China in 2008 (2), with Chinese authorities later reporting that approximately 53,000 infants suffered illnesses, with 13,000 hospitalizations and four deaths. They can also be used for retrospective analyses where previous data can be mined for newly discovered compounds of concern to determine if these chemicals have been detected previously, for example, retrospective analysis for transformation products of detected pesticides and veterinary drugs (3). Non-targeted methods are also useful in nutrient analyses and foodomics because they offer a more complete view of sample composition.

Q. What novel challenges have emerged that non-targeted methods are suited to solving?

A: The challenges we face today are not novel or different from the ones we faced a decade ago, but they have become more common, partly because of the globalization of the food supply which enables the year-round availability of a highly diverse array of foods regardless of the local climate/weather conditions. Currently, the FDA is using targeted methods for food sample analyses to verify that foods, domestic and imported, meet all applicable food safety standards. However, when targeted methods are not available, non-targeted analyses allow us to detect and identify new, unknown, and/or unexpected compounds.

Q. The techniques used in non-targeted analysis (NTA) generate a lot of data and this is especially true when analyzing a complex food matrix. What issues arise when this much data is generated?

A: Non-targeted datasets are extremely information-rich, which is incredibly powerful, but requires the use of software and processing tools to efficiently mine the data. Therefore, to ensure that the data and results generated are trustworthy, it is important to develop and implement quality control procedures that can be used to provide confidence in the sample preparation procedures, instrument operation, software, processing tools, and settings. In addition, NTA detects hundreds to thousands of compounds in a single food sample, most of which are inherent to the food, so we need tools and methods to quickly flag compounds of interest for further investigation or identification.

Q. Your recent paper focused on specific data processing tools that are currently available for NTA/suspect screening analysis (SSA) (1). What are some of the challenges that data processing tools can overcome?

A: Data processing tools are generally used to prioritize data and molecular features for further investigation, reduce false positives, and/or putatively identify detected compounds. Given that we typically detect thousands of molecular features in a single food sample, prioritizing features of interest helps us focus our identification efforts more efficiently. For example, for food safety applications we have used chemometric approaches, such as principal component analysis (PCA) or differential analysis, among many others, to highlight compounds and/or samples that need further scrutiny. These would be things such as compounds that are present in a suspect sample compared to a control sample.

Suspect screening analysis is one of the quickest ways to putatively identify sample components, where detected masses, molecular formula, and/or fragmentation spectra are screened against compound databases. But there can be tens to thousands of molecular structures with the same molecular formula, resulting in a high propensity for false positives. Retention time prediction, tandem mass spectrometry (MS/MS), and ion mobility spectrometry (IMS) can provide additional structural information to help eliminate some of these false positives and/or determine which functional groups are likely to be present in the molecular structure. In addition, suspect screening approaches are limited by incomplete databases. However, tandem mass spectrometry tools, such as in silico MS/MS, similarity searches, and molecular networking can be used to help bridge this gap.

Q. What is data quality and why is it so important?

A: Data quality refers to the accuracy, precision, reproducibility, and reliability of the collected data. In non-targeted analysis, data quality includes the observed mass accuracy and measured isotopic abundance of detected compounds, the chromatographic separation of compounds, for example, resolution and peak shape, and, of course, the number of mass spectra collected across chromatographic peaks, the reproducibility of detected compounds, retention times, and intensities across replicates, and so on. Like targeted methods, collecting quality data is necessary to obtain reliable and meaningful results. The amount of data generated and our reliance on processing software makes data quality particularly important for non-targeted analyses. Using pooled samples and/or large standard mixtures are two good examples of quality assurance/quality control (QA/QC) measures that can be employed to determine and demonstrate data quality (4,5). Ensuring good data quality is critical for attaining high-throughput analysis.

Q. What other measures would you recommend for obtaining high-quality data for non-targeted analysis?

A: Regardless of the methods used, it is important to implement quality control procedures to properly assess data quality. For example, we have developed a non‑targeted standard quality control (NTS/QC) mixture containing 89 compounds covering a wide range of physicochemical properties, such as molecular weights 126–1100 Da, estimated log Kow range -8 to 8.5, amenable to electrospray ionization in positive and/or negative modes, and diverse chemical classes (4). We successfully implemented the NTS/QC to measure critical data quality parameters, including observed mass accuracy within 3 ppm, isotopic ratio accuracy where most compounds matched with a score greater than 0.6 out of 1, and peak height reproducibility where greater than 94% of compounds had less than 20% relative standard deviation. This procedure also highlighted areas for improvement in our method. For example, we could improve the separation of early eluting polar analytes, and we found that background and/or matrix interferences were responsible for poor isotopic ratio matches for some compounds. It is also good practice to use a quality control mixture to monitor instrument performance over time to indicate if a new column is necessary or if the mass spectrometer requires cleaning. It is also important to randomize samples and to analyze blanks and pooled samples multiple times throughout the sample queue to check for carryover and drifts in instrument performance.

Q. There is a wide array of data processing software and algorithms available, the selection of which can impact on the results of a study. Are there any efforts being made to standardize these into a workflow for specific fields of study? And if so, what are the challenges towards achieving this?

A: There are a number of ongoing efforts to standardize and harmonize aspects of non-targeted analysis workflows, including those by several working groups such as the Metabolomics Quality Assurance and Quality Control Consortium (mQACC) (6), the Norman Network (7), and Benchmarking and Publications for Non‑Targeted Analysis (BP4NTA) (8). One of the biggest challenges for standardization is broad applicability of non-targeted analysis even within a given field. For example, within food analysis, workflows used to investigate food safety, quality, and authenticity are often distinct from those used to investigate foodomics and nutrition. In addition, no single method is capable of detecting and identifying everything, as demonstrated by the EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT) (9,10), so standardizing a single method will inherently exclude some of the chemical space that may be of interest. It is also important to consider that standardizing non-targeted workflows may result in stifled creativity and advancement of the field.

I think there are certainly applications and/or aspects of workflows that could benefit from standardization, such as thorough reporting of NTA methods and results. For example, using the NTA Study Reporting Tool (SRT) developed by BP4NTA (8). Other applications that could benefit from standardization would be QA/QC best practices, with efforts by mQACC (6) being notable, and quality control mixtures for performance assessment (4). However, standardization of all non‑targeted workflows is likely not realistic and could be a detriment to the field.

Specifically considering processing workflows, a recent study found that there was only ~10% overlap of reported compounds using different software to process the same dataset (11). While this result is likely a combination of the different algorithms and chosen settings for each software, it is certainly striking and emphasizes the need for standardization efforts. Hindering these efforts, however, is software accessibility. Vendor-specific software is often very powerful and user‑friendly, but can often only work with vendor-specific data and may be cost-prohibitive for some users. Open-source software gets around these drawbacks and can be more flexible but often requires more programming knowledge and/or linking of multiple algorithms to generate a full workflow. This increased flexibility makes the tools more powerful but also more difficult to standardize, although there have been efforts to do this in metabolomics (12).

Q. What exactly are molecular networks and what uses do they have?

A: Molecular networks rely on MS/MS data, where fragmentation spectra are generated by the dissociation of precursor ions in the mass spectrometer via collisions with gas molecules. These spectra are often searched against spectral libraries of known compounds to aid in compound annotation and identification. However, spectral libraries are not comprehensive. Molecular networks address this gap by grouping detected compounds into molecular families based on the degree of similarity between their fragmentation spectra. Often similar chemical structures generate similar fragment ions. This is useful for annotating and identifying unknowns and can also be used to highlight compounds of interest. For example, when investigating specific compound classes, molecular networks can help indicate which detected molecular features belong to that class. Similarly, in food safety applications, molecular networks could be used to indicate which compounds do not require further identification efforts because they are grouped with sugars or flavonoids and so forth. This means they are likely inherent to the food. Molecular networks can also help reduce false positive candidates for identification purposes.

Q. Machine learning approaches are increasingly common. What considerations must be taken before utilizing such tools? And what advice would you give to those who are looking to start using them?

A: A considerable risk of machine learning approaches is an over‑interpretation of the results they provide. It is very important to understand the domain of applicability of the model in use and to be cautious when interpreting results where the model has been applied outside of this domain. For example, training a model to classify Gala apples based on the presence of pesticides may not provide reliable results when applied to Granny Smith apples. Retention time prediction or in silico tandem mass spectrometry models trained on pesticides may not provide accurate predictions for pharmaceutical compounds. In some cases, precisely defining the domain of applicability may be difficult and limiting aspects may be overlooked.

Machine learning models can also be sensitive to small changes in the data. While this is often a strength in medical imaging/diagnostic applications, this can potentially limit applications with mass spectrometry data. For example, background ion signals change with different mobile phases, columns, tubing, instrument setups, and so on. Here, it can be difficult to reliably apply the same machine learning model to classify samples over time and between laboratories because the model may differentiate samples based on changes in background ion signals as opposed to the chemical composition of the samples.

A good approach is to learn what types of applications machine learning works well for and why, and consider the limitations when deciding how best to implement it into desired workflows. Quality control procedures, such as known spiked compounds, should be used to test these models and ensure they are fit‑for‑purpose. Machine learning models are extremely powerful, and as more training data becomes available the performance and reliability should improve. I am excited to see how machine learning impacts non‑targeted analysis in the future!

Q. What are you currently working on?

A: Most of my current projects aim to improve different aspects of non‑targeted workflows. In one example, I’m developing a method for introducing calibrant ions into the source of our instrument during heated electrospray ionization (HESI) to enable automatic lock-mass calibrations to improve the measured mass accuracy of observed compounds, especially during queues lasting several days to weeks. This should also lead to better putative compound identifications. I am also working to further develop a standard mixture that can be used to evaluate non‑targeted method performance, which we hope will be commercialized to increase accessibility for other non-targeted analysis researchers. I am investigating methods to improve identification of fluorine-containing compounds and to reduce false positive candidate matches using a combination of annotated data, such as retention time prediction, tandem mass spectrometry data, and so forth. Additionally, I am a member and co-chair for the BP4NTA working group, where I am collaborating with researchers in other fields to develop recommendations for the non-targeted community, including performance assessments of non-targeted data and results.

Q. Regarding the Benchmarking and Publications for Non-Targeted Analysis (BP4NTA) group, have you published any information so far and where can scientists go to find out more?

A: The BP4NTA group currently has two manuscripts in the review process. One of these serves as an introduction to the group and the types of challenges we are working on. The other introduces a tool called the “Study Reporting Tool” (SRT), which lists information that is useful to report in NTA studies so that these complicated methods can be fully understood and thus made reproducible. The SRT also includes a scoring system to help evaluate studies during manuscript and proposal review. I am also currently working with other BP4NTA members on a manuscript discussing performance assessments for non-targeted methods. If you would like more information or are interested in joining BP4NTA, please visit the website: https://nontargetedanalysis.org/. Here, you can find a breadth of reference material regarding non‑targeted analysis workflows, including defined terms, publications in the field, commonly used software tools, libraries/databases, and the SRT. We also include a list of sub‑committees and their contact information in case you would like to learn more about our specific activities. We have a blog where we post BP4NTA‑relevant news and NTA job postings, and a “contact us” section where you can express interest and give feedback. We are a very welcoming group and love hearing new ideas on how we can develop tools, improve workflows, and tackle challenges that can help the broader NTA community.

References

  1. N.P. Kalogiouri, N.A. Alygizakis, R. Aalizadeh, et al., Anal. Bioanal. Chem. 408(28), 7955–7970 (2016).
  2. https://wayback.archive-it.org/7993/20170111170320/http://www.fda.gov/Food/FoodborneIllnessContaminants/ChemicalContaminants/ucm164514.htm
  3. M.L. Gomez-Perez, R. Romero-Gonzalez, J.L. Vidal, et al., J. Chromatogr. A 1389, 133–138 (2015).
  4. A.M. Knolhoff, J.H. Premo, and C.M. Fisher, Anal. Chem. 93(3), 1596–1603 (2021).
  5. D. Broadhurst, R. Goodacre, S.N. Reinke, et al., Metabolomics 14(6), 72 (2018).
  6. https://epi.grants.cancer.gov/Consortia/mQACC/
  7. https://www.norman-network.net/
  8. https://nontargetedanalysis.org/
  9. J.R. Sobus, J.N. Grossman, A. Chao, et al., Anal. Bioanal. Chem. 411(4), 835–851 (2019).
  10. E.M. Ulrich, J.R. Sobus, C.M. Grulke, et al., Anal. Bioanal. Chem. 411(4), 853–866 (2019).
  11. L.L. Hohrenk, F. Itzel, N. Baetz, et al., Anal. Chem. 92(2), 1898–1907 (2020).
  12. R.J.M. Weber, T.N. Lawson, R.M. Salek, et.al, Metabolomics 13(2), 12 (2016).

Christine Fisher is a chemist in the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration (FDA) in College Park, Maryland, USA. She obtained her Ph.D. from Purdue University under the direction of Scott A. McLuckey and is a mass spectrometrist by training. She has been working to develop and improve non-targeted analysis methods using LC–HRMS in foods and cosmetics since she joined the FDA in 2017. She is currently a co-chair for the BP4NTA international working group.