News|Articles|September 15, 2025

MCheM and the Future of Metabolite Annotation: Expanding Beyond Traditional LC-MS/MS Workflows

Author(s)John Chasse
Fact checked by: Caroline Hroncich
Listen
0:00 / 0:00

Key Takeaways

  • MCheM enhances metabolite annotation by integrating orthogonal post-column derivatization reactions, improving confidence in structural predictions.
  • The method leverages functional group-specific derivatization to generate orthogonal chemical data, addressing challenges in non-targeted LC-MS/MS analysis.
SHOW MORE

LCGC International sat down with Daniel Petras to discuss persistent challenges in non-targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS) metabolite annotation, where only a small fraction of acquired spectra match existing libraries, and how the multiplexed chemical metabolomics (MCheM) workflow introduces functional group-specific derivatization to generate orthogonal chemical data.

A recent article in Nature Communications (1) introduced multiplexed chemicalmetabolomics (MCheM), which employs orthogonal post-column derivatization reactions integrated into a unified mass spectrometry data framework. By integrating MCheM with annotation platforms like SIRIUS and GNPS2, metabolomics researchers can constrain chemical space, re-rank structural predictions, and boost annotation confidence. LCGC International sat down with Daniel Petras, corresponding author of the paper, to discuss persistent challenges in non-targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS) metabolite annotation, where only a small fraction of acquired spectra match existing libraries, and how the MCheMworkflow introduces functional group-specific derivatization to generate orthogonal chemical data.

What are the primary challenges in metabolite annotation during non-targeted LC-MS/MS analysis, and why does this remain an unresolved issue despite advances in instrumentation and computation?

In most LC-MS/MS-based metabolomics experiments we perform in our group, we can match between 2 to 10% of the MS/MS spectra against the spectral libraries we have available in the group (including GNPS, MassBank, and NIST). When we look at the sheer number of MS/MS spectra we can acquire with modern instruments (> 1,000,000 in a mid-sized metabolomics study), versus the number of library spectra we have available, this is not too surprising. The GNPS library currently contains ~ 573,579 spectra, but if we look at the individual compounds in the library, this number is substantially smaller (64,133 unique structures). Here, in silico spectral matching tools that compute MS/MS spectra or fragmentation trees from structural libraries, have a much larger structural coverage of chemical space, and potentially higher annotation rates. Yet, confidence in predicting spectra is currently much lower than using a real, measured one from a library. Nevertheless, the available tools are becoming much better and, with new approaches emerging (such as DreaMS), I assume we will see more improvements in this area.

However, in the end, we always depend on the richness of MS/MS fragments per spectrum, and the further we expand our search space, the more overlap we will have, especially for spectra with few fragment peaks. Here, constraining the library to the biological systems of interest) or to orthogonal analytical data, such as retention time, UV spectra, or, in our case, chemical reactivity, could be a good way to increase scalable annotation confidence.

Can you explain the role of ion adducts and in-source fragments in contributing to the high number of unannotated features in MS/MS data?

When using electrospray ionization, we typically observe protonated (ESI+) or deprotonated (ESI-) molecular ion species. As most MS users are probably aware, in addition to these species, we frequently observe other ion adducts such as Na+, K+, NH4+, and acetonitrile in positive mode, or Cl- in negative mode. In addition, we also frequently observe in-source fragments, such as H2O and other neutral losses. Both the different adducts and in-source fragments largely depend on the chemical structure of the metabolites as well as sample clean-up, for example, whether we performed some desalting in the sample prep step or whether we inject the sample directly without any sample prep into the system (which is particularly bad for seawater). I was a little surprised that this well-known characteristic of ESI made the news recently, as this has been known for decades, and numerous tools (such as RAMClust) have been developed to group different ion species together. In our current LC-MS/MS data analysis pipeline, we use for example ion-identity networking, which we developed together with Robin Schmid from the mzmine team (2). Ion identity networking allows us to group different ion species and in-source fragments within our molecular networks, reducing data redundancy for our stats and allowing us to leverage multiple adducts for more confident metabolite annotation.

With regards to the question of whether different ion species explain the large number of unknown compounds in our experiments, there has been an excellent meta-analysis by Yasin El Abiead and coworkers, that showed that the large number of unknown metabolites prevails amid in-source fragmentation (3). Besides evidence from our own metabolomics data, this observation is also very much in line with data from biosynthetic gene clusters from (meta)genomics studies.

Describe the core components and operational principle of the MCheM workflow. How does it differ from traditional MS/MS-based annotation approaches?

The principle and setup of MCheM is simple. It is essentially a post-column derivatization approach in which we infuse a set of reagents post-column into the mobile phase before the electrospray source. So, the minimum components needed are an additional PEEK capillary, a t-splitter, and a syringe pump, which most instruments already have available for calibration purposes. Depending on the reagent, one also needs a make-up pump to adjust the pH (for example, AQC for amide bond formation of amines). Now, depending on the operation mode, one can run the sample iteratively with different reagents, which is what we described in the paper. A more sophisticated approach, on which we are currently working, is to use a parallel flow reactor and set of syringe pumps to infuse different reagents in parallel. This will require some custom hardware, and I would recommend starting with the iterative operation mode. The multiplexing of the different reactions will happen on the data level anyway, where we basically assign the reactivities to a given MS/MS spectrum, for example, spectrum X is from a molecule that contains a primary or secondary amine (as it reacted with the AQC reagent). The key component to analyzing MCheM data and writing the new .mf spectrum files that contain functional group information is a new module that Robin implemented in mzmine, which started as a community project and is now professionally developed by the company MZIO. An important point for us is that the software is open source and free to use for academic researchers. The advanced MCheM spectrum files can then be annotated with our regular MS/MS annotation tools (SIRIUS and GNPS2) and, in addition to the spectrum matching, the results can be filtered and re-ranked depending on whether the functional group we determined through MCheM is present in the resulting structure.

What are the advantages of integrating MCheM with existing annotation tools like SIRIUS and GNPS2 in metabolomics research?

I think the key conceptual advance from MCheM is that we generate orthogonal information to the MS/MS spectra from our regular non-targeted metabolomics experiments. This information includes the presence of certain reactivity/functional groups, but also new MS/MS spectra of the derivatized metabolites. This information can now be used to constrain the library/structural search space, which enhances confidence in spectra annotation (from existing library spectra and structures), which is what we initially leveraged with SIRIUS and GNPS2. Nevertheless, looking forward, MCheM also offers completely new ways to generate richer MS/MS data that can be leveraged for the prediction of new structures, such as via open modification and site localization (such as Modifinder) or future de novo tools (such as MS Novelist).

How does the MCheM method leverage orthogonal chemical data to improve annotation accuracy in both in silico and open modification searches?

The concept we are using so far is simple: as MCheM provides functional group information, we simply filter SIRIUS and GNPS2 outputs to re-rank possible annotations that fulfill the presence of a functional group from MCheM data. On a computational side, this is simply done by writing possible SMARTS strings (codes of chemical structures) into the MGF header of each spectrum and filtering the resulting output structures to those that contain one (or more) of the strings.

What makes the hardware setup for MCheM practical and scalable for existing LC-MS/MS platforms? What equipment is required for implementation?

As mentioned above, technically all one needs is an additional PEEK capillary, a t-splitter, and the reagents (if a syringe pump is available for calibrating the instrument). The software is freely available for academic researchers and can be downloaded and run on most Windows systems.

We implemented the workflow on both our Q-Orbitrap and the Q-TOF platform in the lab of Chambers Hughes with whom we developed MCheM, but this will also work with any other MS/MS system that supports data-dependent acquisition mode, and conversion to .mzML or .mzXML formats.

How do derivatization reactions within MCheM contribute to improved structural annotation, and what are the prospects for adding more functional group-specific reactions?

In our study, we used the derivatization reactions as a test on whether a metabolite contains a given set of functional groups, such as an aldehyde if it reacts with hydroxylamine or a primary or secondary amine if it reacts with AQC. Once we gain such knowledge about a molecule, it drastically reduces the possible annotation results and thus helps to filter out false positive matches and improve annotation confidence.

Discuss a specific example, such as the discovery of a glycosylated oxazolomycin, that highlights the effectiveness of MCheM in natural product discovery.

Oxazolomycin is a great example of how MCheM can help annotate unknown compounds, as it was not present in the GNPS library and the MS/MS spectra remained unannotated using our regular workflow. However, as the oxazolomycins contain a β-lactone moiety, one of the reagents we use for MCheM (cysteine) reacts with it, which told us that the unknown must contain either a Michael system or a β-lactone. When we reran SIRIUS and CSI:FingerID (the in silico structure annotation tool within SIRIUS), MCheM re-ranked oxazolomycin as the number one hit. As we had the genome from the strain available and we knew that there was a matching biosynthesis cluster, it was straightforward to confidently identify it. Interestingly, in addition to oxazolomycin, we detected several similar MS/MS spectra that were connected in our spectral network, which also hit the cysteine reagent. Inspecting the mass differences between the spectra and the fragmentation patterns, as well as the fact that there was a gene that encodes a glycosyltransferase, we quickly came up with the hypothesis that one of the analogs must be a new glycosylated version of oxazolomycin. Nevertheless, Chambers Hughes, with whom I co-lead the study, and Shu-Ning Xia, a Ph.D. student in his group, went on to purify the compound and confirm the proposed structure via NMR, which left us with a very nice case study for the effectiveness of MCheM.

What potential does MCheM hold for enabling complete de novo structure elucidation in metabolomics? What challenges remain?

The potential ways I see for MCheM to improve de novo structure elucidation are twofold. First, as we already use it, we can constrain chemical space, meaning that we can filter possible output structures based on the presence/absence of given functional groups.

The second way, and I think that this has much larger potential, is that we start using mass specs as reactors rather than pure analytical readouts. Like the sequencing-by-synthesis that Illumina’s DNA sequencing technology builds on, we start to chemically modulate the metabolites of interest to gain orthogonal, or as we call it in our paper, multiplexed chemical information. Theoretically, this is already what we do with MS/MS with CID, and the community is slowly expanding it, such as through new electron-based fragmentation techniques. So, we start layering the analytical information, and step by step, we get more of a picture, until at the end only one possible structure is left. MCheM, and particularly the software we developed, solves a major bottleneck, and this is the fusion (or multiplexing) of the multi-layer data. Looking forward, I hope that in addition to multiple chemical reactions and potentially new reagents, we can use other reactions/modulation techniques, such as new gas-phase fragmentation or orthogonal analytical methods such as ultraviolet (UV) and infrared (IR) spectroscopy, to further expand the physicochemical characteristics of the unknown metabolome.

Looking ahead, how might MCheM and similar methods evolve with the integration of ML or AI-based annotation tools? What would a fully automated metabolite annotation pipeline look like?

In addition to the expansion of chemical reactions, for example to effectively label carboxylic acids, which we do not have a good reagent for, I anticipate that we will further expand the multiplexed nature of the data acquisition and integrate other analytical techniques that can be coupled with chromatography, or ideally be performed inside of the mass spec. Computational tool integration will be absolutely critical. I assume hardware and software, in particular machine learning, will have to go hand in hand. This is why I am so thankful to our amazing collaborators from the Wang and Boecker labs as well as the mzio team, as they can tailor their tools to new experiments and data set streams we generate in the lab. While I am not a computational scientist myself, I am convinced that the field will benefit enormously from recent developments in deep learning and generative AI, following the footsteps of protein structure prediction and de novo design. Given that we can provide sufficient analytical information about a molecule (through multi-layered MS/MS, MCheM, and other analytical approaches), I am sure it is just a matter of time until machine learning tools will be able to confidently predict structures de novo, and metabolite annotation will be solved.

References

  1. Vitale, G. A.; Xia, S. N.; Dührkop, K. et al. Enhancing Tandem Mass Spectrometry-Based Metabolite Annotation with Online Chemical Labeling. Nat. Commun. 2025, 16 (1), 6911. DOI: 10.1038/s41467-025-61240-z
  2. Schmid, R.; Petras, D.; Nothias, L. F. et al. Ion Identity Molecular Networking for Mass Spectrometry-Based Metabolomics in the GNPS Environment. Nat. Commun. 2021, 12 (1), 3832. DOI: 10.1038/s41467-021-23953-9
  3. El Abiead, Y.; Rutz, A.; Zuffa, S. et al. Discovery of Metabolites Prevails Amid In-Source Fragmentation. Nat. Metab. 2025, 7 (3), 435-437. DOI: 10.1038/s42255-025-01239-4

Newsletter

Join the global community of analytical scientists who trust LCGC for insights on the latest techniques, trends, and expert solutions in chromatography.