Metabolomics Meets Machine Learning: The Future of Nature-Inspired Drug Discovery

ColumnApril 2024
Volume 20
Issue 4
Pages: 11–15

Researchers at Enveda Biosciences are transforming nature into new medicines with their artificial intelligence (AI)-enabled industrial-scale drug discovery platform incorporating tandem mass spectrometry (MS/MS).

Natural product drug discovery harnesses the rich chemical diversity found in nature to uncover potential therapeutics. By combining the ancient wisdom of evolution with modern scientific techniques, it is possible to tap into the vast array of natural compounds possessing strong biological activities that can be tailored to target specific diseases. This diversity is a valuable source of drug candidates, particularly for unmet medical needs, potentially leading to more effective and safer drugs.

Metabolomics is a vital tool in nature-based drug discovery and has the power to expedite the discovery process. It allows researchers to comprehensively analyze and profile the entire set of small molecules (or metabolites) from a natural source, such as a medicinal plant, and provides valuable insights into chemical composition, metabolic pathways, and biochemical processes. This method helps identify bioactive compounds responsible for therapeutic effects, which can be used as a starting point to develop novel pharmaceuticals.

Nature-Inspired Therapeutics

Remarkably, 95% of the natural world remains a chemical mystery; the vast majority of molecules made by and for the natural world are totally unknown to science. Despite this paucity of information, the 5% of natural products decoded by science are the source of one-third of all approved small molecule medicines, a clear demonstration of their utility. Enveda Biosciences recognizes the untapped potential of drug-like molecules in nature and is at the forefront of pioneering a new approach to natural product drug discovery that can access these previously unknown molecules.

The majority of Enveda’s research is focused on phytochemicals—molecules from plants that have served as a major source of small molecule drugs—due to their incredible diversity of chemical properties and biological activity. However, identifying new, unknown, highly diverse molecules from plants is challenging. Traditionally, potentially therapeutic molecules were isolated and studied individually, making the discovery process painstakingly slow, costly, and prone to failure.

Enveda is redesigning this process, allowing scientists to capture and review critical information about phytochemicals that are still within complex mixtures, such as a plant extract. By reserving the tedious and slow isolation step only for the molecules with the most advantageous combination of structure and bioactivity profiles, the company is significantly accelerating the characterization process. Its approach is helping uncover previously unknown bioactive compounds with potential therapeutic applications that are inspiring the development of innovative small-molecule drugs.

Powered by metabolomics, mass spectrometry, and machine learning, Enveda scientists can rapidly assess the biological activity, chemical structure, and drug-like properties for thousands of molecules in a single sample in high throughput. Through this combination of advanced techniques, Enveda streamlines the identification of promising molecules to accelerate natural product drug discovery.

The Power of Tandem Mass Spectrometry

Tandem mass spectrometry (MS/MS) is one of the core technologies powering Enveda’s platform, providing two consecutive stages of data used to determine a molecule’s structure and drug-like properties.

In MS1 (the first stage of mass spectrometry), molecules are ionized and separated based on their mass-to-charge ratio (m/z). In MS2 (the second stage), molecules are fragmented into smaller pieces, and the mass and abundance of each fragment piece are analyzed. The resulting fragmentation patterns provide a unique fingerprint for each molecule, akin to a chemical signature, which can be used to deduce the compound’s structure and identity.

Structural analysis is essential for prioritizing drug candidates because it determines synthesizability, amenability to medicinal chemistry, and many pharmacokinetic and pharmacodynamic (PK/PD) properties. However, unraveling the intricate chemistry of plant extracts is challenging, as they are complex samples due to the sheer number of distinct molecules per sample (tens of thousands), combined with the fact that bioactive compounds are also often present in very low concentrations, making it difficult to perform traditional spectroscopy-based structural analysis.

Natural products also contain many compounds that have a similar m/z and structure. These include isobars and structural isomers, distinct molecules that can have exactly the same m/z values. Isobars and isomers that are co-eluted in liquid chromatography (LC) can’t be separated prior to MS/MS. This results in a mixture of fragmentation patterns that don’t produce a clean chemical fingerprint of each unique molecule and are difficult to interpret.

Enveda works to overcome this challenge by using a strategic combination of trapped ion-mobility spectrometry (TIMS) and MS/MS-based metabolomics. TIMS is a gas-phase technique that separates ions based on their size, shape, and charge. It does this by subjecting ions to a forward-moving gas flow and an opposing electric field, allowing differently sized and shaped ions to be sorted in a trapping region which can then be sequentially ejected by stepwise drops in electric potential. This adds an additional layer of separation before MS/MS analysis which enhances MS2 accuracy and provides additional information about the structure of molecules by measuring collisional cross-section (CCS) values for each ion. CCS is a unique property of an ion and can be used as an additional information for identifying a molecular structure.

This combination of TIMS and MS/MS offers a high level of specificity and sensitivity, minimizing the chances of misidentifying bioactive compounds and enabling the detection of extremely low quantities of metabolites. MS/MS can also simultaneously screen for thousands of metabolites in a single analysis and enables high-throughput profiling of the plant metabolome. All of these features speed up the discovery process and lower costs.

Machine Learning Advances in Interpreting MS Data

Given the utility of MS/MS for this application, the main problem faced by the industry now is how to quickly analyze and manage vast amounts of data. Interpreting MS2 spectra, even from TIMS-enhanced MS/MS metabolomic analyses, can still be challenging due to the multitude of fragment peaks generated. To make sense of MS2 spectra, Enveda uses advanced machine learning models, which are built on a neural network architecture called transformers. The key insight was that the peaks on a MS2—the fragmentation pattern—have a grammar. The order of the peaks matters just like the order of words in a sentence. Transformer models, like the ones that power ChatGPT, are ideal for identifying these kinds of context-dependent patterns. Thus, the models developed by Enveda “learn” the language of MS, enabling them to accurately predict the structure and properties of compounds within complex mixtures, effectively translating the jargon of MS/MS data into meaningful chemical information.

The Enveda platform combines these structure and property prediction models with data from bioactivity assays, and sophisticated deconvolution algorithms are used to pinpoint which molecule is likely responsible for a given biological effect. This allows Enveda’s scientists to have a detailed view of the chemical composition of a plant at unprecedented speed, including those molecules previously unknown to science, opening troves of new and potentially therapeutic molecules.

The Future of Machine Learning in Drug Discovery

The potential for machine learning to increase the speed and efficiency of drug discovery is enormous, and is likely to play a significant role in nearly every step of the drug development process for nature-based drugs, synthetic drugs, and biologics. Enveda hypothesizes that drugs that are more likely to work in the clinic exist in plants with a long history of medicinal therapeutic use—those of traditional medicines. As clinical trials are the most expensive part of drug development, the AI-powered tools developed by Enveda to identify new, active molecules from plants with a strong precedent for efficacy in humans could significantly reduce time and cost of developing new drugs. Technological advances such as these are particularly crucial in a field where the conventional approach to drug discovery is slow.

With the assistance of machine learning, researchers can rapidly analyze complex mixtures of molecules and generate high-quality data in seconds. They can also make better-informed decisions and can more effectively pinpoint compounds that are more likely to receive approval for drug development. These profoundly enhanced capabilities will help novel therapeutics reach the market more quickly, ultimately improving the health of more people across the globe.

By creating the largest library of natural compounds, annotated with structure and function and specifically designed for machine learning, Enveda plans to shift drug discovery from the established paradigm of screening to that of an informed database search. Enveda scientists and partners will be able to search this massive library based on desired features, such as bioactivity and PK/PD properties, identifying natural molecules in seconds that can serve as ideal starting points for drug development efforts. Through these efforts, Enveda is accelerating the pace of discovery, which could result in more efficient and effective nature-inspired treatments for a wide range of diseases.


(1) Huffman, B. J.; Shenvi, R. A. Natural Products in the “Marketplace”: Interfacing Synthesis and Biology. J. Am. Chem. Soc. 2019, 141 (8), 3332–3346. DOI: 10.1021/jacs.8b11297

(2) Allen, A. TRANSFORMing Natural Product Drug Discovery: Machine Learning for High-Fidelity Chemical Property Prediction from Metabolomics Data. Medium, 2022. (accessed 2024-01-11).

August Allen is Chief Technical Officer at Enveda Biosciences and has a decade of experience building high-throughput automation, scaling teams, and solving problems at the intersection of technology and life sciences. He was an early employee of Recursion Pharmaceuticals, where he scaled their high-content imaging platform by multiple orders of magnitude before joining Enveda in 2021. At Enveda, he is responsible for overseeing the company’s platform to annotate the structure and function of the natural world to discover new medicines.

Pelle Simpson is Senior Scientist at Enveda Biosciences and has a background in chromatography and mass spectrometry, initially focusing on natural products drug discovery and cell signaling proteomics at the University of Amsterdam. He expanded his research in proteomics at the University of Colorado, Boulder, where he also developed interests in 3D printing, machining, and programming for lab automation. Since joining Enveda Biosciences in 2021, he has leveraged these skills to develop custom hardware and software to enable the company’s mass spectrometry and high-throughput screening efforts.

Erica Forsberg is Vice President Metabolomics at Bruker Daltonics and manages the metabolomics business for Bruker Life Sciences Mass Spectrometry. She came to Bruker in 2021 from a faculty position in the Department of Chemistry at San Diego State University following a post-doctoral fellowship at The Scripps Research Institute in Gary Siuzdak’s group. Her major focus has been untargeted metabolomics method development and bioinformatics solutions for structure elucidation. Her current role involves managing the development and implementation of Bruker’s future metabolomics and lipidomics solutions.

Related Videos
John McLean | Image Credit: © Aaron Acevedo
Related Content