University of Maryland and Czech Academy of Sciences researchers tested spectral prediction algorithms and their capability for predicting spectra in mass spectrometry (MS)-based databases. Their findings were published in Analytical Chemistry (1).
Shot of Dark Data Center With Multiple Rows of Fully Operational Server Racks. Modern Telecommunications, Cloud Computing, Artificial Intelligence, Database, Supercomputer. | Image Credit: © Gorodenkoff - stock.adobe.com
Mass spectrometry (MS) is widely known as one of the most effective analytical methods for identifying unknown compounds. Widespread use of MS has been found in various research industries, such as toxicology and education. This ubiquity can be attributed to the high sensitivity and specificity of EI–MS, especially when used alongside gas chromatography (GC). Unknown compounds can be identified with MS by measuring the mass-to-charge (m/z) of ions in a sample.
While MS is useful, it comes with limitations, especially when considering unknown compound identification. Notably, in MS spectral databases, there is reliance on previously established reference data. If a compound’s spectrum does not exist within known databases, compound identification can become difficult. Multiple MS spectra prediction algorithms have been created to address this limitation, though there is still refinement work to be done.
In this study, the scientists evaluated the accuracy of the neural electron-ionization mass spectrometry (NEIMS) spectral prediction algorithm. This algorithm was trained on EI–MS spectra from roughly 300,000 molecules. After training and validation, the algorithm was reported to quickly predict spectra with varying degrees of accuracy. The analyses were focused on monosubstituted α-amino acids given their significance as important targets for astrobiology, synthetic biology, and diverse biomedical applications. The scientists hoped to inform those using generated spectra for detecting unknown biomolecules.
While NEIMS performed well for the molecules and measures it was trained for, accuracy decreased for molecules (amino acids) outside of the training set and measured in other ways. The pattern was consistent across all four accuracy metrics and all three libraries tested, though to varying degrees. Given the small proportion of the National Institute of Standards and Technology (NIST)–MS amino acid spectra database (~0.01%), this finding may be intuitive; that said, the scientists aimed to quantify what degree the problem manifested to.
The data also showed that neither derivatization nor physicochemistry (molecular weight and hydrophobicity or physical chemistry) correlate with accuracy. This may stem from the small fraction of amino acids within the NEIMS training set, though no clear insights could be made into why NEIMS struggles to reliably predict MS spectra for these amino acids. In terms of derivatization, the NEIMS algorithm proved just as accurate for “free” amino acids than their derivatized counterparts.
The scientists found that predicted spectra were inaccurate for amino acids beyond the algorithm’s training data. However, these inaccuracies were not explainable through physicochemical differences, or the derivatization state of the amino acids measured. As such, the scientists highlighted the need to improve both current machine learning-based approaches and further optimization of ab initio spectral prediction algorithms to expand databases for structures beyond what is currently experimentally possible, including theoretical molecules.
Once MS spectral prediction algorithms are validated, there is a critical need for comprehensive libraries of predicted spectra for unknown and theoretical molecules (amino acids). Whenever reliable theoretical databases of predicted mass spectra are formed, they allow for greater expansion of the potential search space for an unknown molecule in a sample. These tools would hold broad uses, with the scientists providing examples of informing NASA mission data and expanding public health surveillance (2,3). Beyond amino acids, predictions that enable extension to other classes of biomolecules would further advance many disciplines.
(1) Brown, S. M.; Allgair, E.; Kryštůfek, R. Mapping the Edges of Mass Spectral Prediction: Evaluation of Machine Learning EIMS Prediction for Xeno Amino Acids. Anal. Chem. 2025. DOI: 10.1021/acs.analchem.5c00286
(2) Sarli, B.; Bowman, E.; Cataldo, G.; et al. NASA’s Capture, Containment, and Return System: Bringing Mars Samples to Earth. Acta. Astronaut. 2024, 223, 270–303. DOI: 10.1016/j.actaastro.2024.05.048
(3) Lasch, P.; Stämmler, M.; Schneider, A. A MALDI-TOF Mass Spectrometry Database for Identification and Classification of Highly Pathogenic Microorganisms from the Robert Koch-Institute (RKI). DOI: 10.5281/zenodo.163517
New Research Explores Role of Nucleotide Hydrophobicity in Oligonucleotide Separation
June 18th 2025Researchers from Waters and Biospring studied the contribution of nucleotide type and modifications on the retention and resolution of 22–24 nt long oligonucleotides in different chromatographic methods.
A Life Measured in Peaks: Honoring Alan George Marshall (1944–2025)
June 18th 2025A pioneer of FT-ICR Mass Spectrometry, Alan G. Marshall (1944–2025), is best known for co-inventing Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), a transformative technique that enabled ultrahigh-resolution analysis of complex mixtures. Over a career spanning more than five decades at institutions like the University of British Columbia, The Ohio State University, and Florida State University, he published over 650 peer-reviewed papers and mentored more than 150 scientists. Marshall’s work profoundly impacted fields ranging from astrobiology to petroleomics and earned him numerous prestigious awards and fellowships. Revered for his intellect, mentorship, and dedication to science, he leaves behind a legacy that continues to shape modern mass spectrometry.