
- June 2026
- Volume 22
- Issue 2
- Pages: 6–7
Artificial Intelligence or Artificial Stupidity?
Key Takeaways
- Large-scale training data, more than compute, has enabled modern AI, but scientific datasets frequently lack ground-truth labels needed for reliable model generalization.
- Untargeted HRMS remains vulnerable to misannotation from in-source fragmentation, isotopic-distribution mishandling, and incorrect adduct identification, undermining AI-driven compound identification.
AI promises to solve cost, labor, and regulatory challenges, but a shortage of deployment expertise means it can create more problems than it solves.
The concept of artificial intelligence (AI) can be traced back to pioneering researchers such as Charles Babbage and Ada Lovelace, who developed early mechanical computers, laying the groundwork for what was to follow. Later, Alan Turing, followed by Isaac Asimov and Gene Roddenberry brought the concept of AI into the public consciousness. This combination of academic research alongside the cultural influence of Star Trek has given us two generations of researchers who realized that AI need not be confined to a sci-fi show or an academic paper but could be developed in reality.
Early attempts to develop AI have persisted since the 1940s and the advent of the first electromechanical computers, with developments gathering pace as the first microprocessor-based computers were introduced. It is only in very recent times that AI has truly become a genuine possibility—the result of many different strands finally converging—that truly meaningful AI has arrived. What has proved truly transformational in recent years is not processing power alone, but access to training material on an unprecedented scale. Rapid advances in internet connectivity have unlocked vast repositories of human knowledge—books, magazines, academic journals, and archives spanning population, geographical, and financial data—giving AI systems the raw material needed to develop genuine capability.
Ground Truth
For most AI engines, such as those which can be used to generate images, this training material has proved enormously effective. With access to the vast databases of images, artwork, and films, they are highly capable, and have inherent truth built into what they’re being trained with. A Picasso is a Picasso; a photograph of Donald Trump is what it purports to be. There is little inherent uncertainty in the training data, and the outputs reflect that reliability, whether the task is generating artwork in a particular style or producing entirely novel imagery.
That is not the case when it comes to science; we’re training AI engines, including those used to analyze our chromatography and mass spectrometry data, using datasets that are not “ground truth” or in many cases, remotely accurate. This is precisely where the concept of artificial stupidity takes hold.
Mass spectrometry-related AI presents an area of grave concern. There remains, to this day, a healthy discourse about analyte identification in high-resolution, untargeted datasets, with a substantial proportion of the community arguing that many discreet compounds identified in datasets are the result of in-source fragmentation.1,2 There also remains concerns around the appropriate handling of isotopic distributions, particularly in data from instruments that lack suitable isotopic fidelity, alongside similar concerns around correct adduct identification.3 This leads to a situation where the compound identifications, fragmentation patterns, isotopic distributions, and adduct species used to train AI-based mass spectrometry data processing tools are so fundamentally inaccurate that the resulting AI package generates more inaccurate compound identifications. This will in turn generate more data that is used to train, retrain, and refine models, amplifying underlying inaccuracy rather than producing increasingly more accurate data and embedding errors ever more deeply into the system.
This can be partially resolved by training AI models with data from targeted modelling, though even in this area, it may not be possible to provide the level of reliable, ground truth data that is really needed to enable accurate AI-based compound identification. In recent years, we have seen egregious errors in retention time and compound-peak annotation even in targeted data, once again introducing inaccury and errors in the datasets used to iterate successive generations of AI.4.5
The Complication of Export Controls
As I write this, the US has placed export controls on some of the very latest AI technologies — so not only are we contending with inaccurate training materials giving rise to AI models that will produce inaccurate answers when challenged with mass spectrometry data, but we now have export control legislation limiting who can access which models. The obvious concern is that researchers in different locations will end up working with models that interpret the same datasets in different ways, further undermining whatever accuracy AI might otherwise bring to the analysis of complex mass spectrometry data.6
Looking Forward
The good news is there’s nothing which can’t be fixed but there will need to be community consensus on how best to proceed. This is something which might best be progressed by the American Society for Mass Spectrometry (ASMS) and/or the Metabolomics Society. These two organisations run annual conferences which would be the perfect place for the community to come together, agree on some principles and perhaps establish the appropriate working groups that can best agree on the necessary convergence of different AI models and a suitable framework that can be used to ensure regardless of the AI models used, the same data going in will result in the same data coming out the other end.
We’ve seen the community come together with some nice perspective publications in the past and I feel this is a topic well suited to some similar treatment. 6
References
- Giera, M.; Aisporna, A.; Uritboonthai, W.; Siuzdak, G. The Hidden Impact of In-Source Fragmentation in Metabolic and Chemical Mass Spectrometry Data Interpretation. Nat Metab 2024, 6, 1647–1648. DOI: 10.1038/s42255-024-01076-x
- El Abiead, Y.; Rutz, A.; Zuffa, S.; et al. Discovery of Metabolites Prevails amid In-Source Fragmentation. Nat Metab 2025, 7 (3), 435–437. DOI: 10.1038/s42255-025-01239-4
- Stricker, T.; Bonner, R.; Lisacek, F.; Hopfgartner, G. Adduct Annotation in Liquid Chromatography/High-Resolution Mass Spectrometry To Enhance Compound Identification. Anal Bioanal Chem 2021, 413 (2), 503–517. DOI: 10.1007/s00216-020-03019-3
- Domingo-Almenara, X.; Guijas, C.; Billings, E.; et al. The METLIN Small Molecule Dataset for Machine Learning-Based Retention Time Prediction. Nat Commun 2019, 10, 5811. DOI: 10.1038/s41467-019-13680-7
- Theodoridis, G.; Gika, H.; Raftery, D.; et al. Ensuring Fact-Based Metabolite Identification in Liquid Chromatography–Mass Spectrometry-Based Metabolomics. Anal Chem 2023, 95 (8), 3909–3916. DOI: 10.1021/acs.analchem.2c05192
- Alseekh, S.; Aharoni, A.; Brotman, Y.; et al. Mass Spectrometry-Based Metabolomics: A Guide for Annotation, Quantification and Best Reporting Practices. Nat Methods 2021, 18 (7), 747–756. DOI: 10.1038/s41592-021-01197-1
Articles in this issue
43 minutes ago
Can You Use a Polyethylene Glycol (PEG) Phase in an MS?about 4 hours ago
Screening of Complex Matrices by GC–MS Using Deconvolutionabout 7 hours ago
Monitoring Volatile PFAS in Air and Emissions by TD–GC–MS27 days ago
Jim Grinias Wins 2026 HTC Innovation Award



