PLS-DA With Univariate Filtration Refines Discovery of Untargeted Metabolomics Data Using LC–MS

ColumnJune 2024
Volume 20
Issue 6
Pages: 7

As opposed to targeted metabolomics which looks only at a predetermined set of metabolites or their pathways, an untargeted approach was preferred by this research team because of its unbiased yet still comprehensive nature.

Researchers from numerous institutions across China—the Affiliated Wuxi People’s Hospital of Nanjing Medical University, Guizhou Medical University, Yunnan University, and multiple schools and labs within Chengdu University of Traditional Chinese Medicine—have described, in a recent journal article, a partial least squares–discriminant analysis (PLS-DA) method for analyzing untargeted metabolomics data both preceded and followed by univariate filtration (UVA) and after multivariate analysis (MVA) for differential features selection (1).

A 3D representation of indoxyl sulfate, a metabolite of tryptophan. Generative AI | Image Credit: © Charlotte -

A 3D representation of indoxyl sulfate, a metabolite of tryptophan. Generative AI | Image Credit: © Charlotte -

The study, which was published in the journal Analytica Chimica Acta, used as its samples both human serum and extracts of Caenorhabditis elegans, a microscopic nematode, data on which was acquired by liquid chromatography coupled to mass spectrometry (LC–MS). In previous similar studies, the authors said, UVA has been used to counteract the fact that numerous insignificant input features might distort the PLS-DA model, relying on UVA to refine the selected differential features but frequently yielding unstable results (1).

PLS-DA is a supervised multivariate statistical method employed to identify and quantify differences between sample groups based on mathematical comparisons of their mass spectra. The technique models the relationship between the LC–MS data (predictors) and the sample classes (responses) by finding latent variables that maximize the covariance between them, thus enhancing class separation. In LC–MS, PLS-DA emphasizes the most relevant features that contribute to class differentiation and it aids in the interpretation of the underlying biochemical variations.

As reported by Moritz and Lewis, metabolomics is a relatively novel form of “omics” that has emerged only in the last two decades as a way of characterizing metabolites and their pathways within biological systems (2). Companies such as Enveda Biosciences are now developing models, like the transformer models behind artificial intelligence (AI) engine ChatGPT, that can “learn” the language of mass spectrometry (MS) in the interest of quickly analyzing large amounts of metabolomics data (3).

Untargeted metabolomics was the strategy followed by this research team because they defined it as an unbiased approach that comprehensively analyzes the metabolites in samples without prior knowledge of their properties or structures, whereas targeted metabolomics (as its name would suggest) is intended to quantify and analyze only a predetermined set of metabolites or pathways (1). Further specifying the method used to obtain the metabolomics data, the authors said that ultrahigh-pressure liquid chromatography (UHPLC) coupled to high-resolution mass spectrometry (HRMS) was a particular enhancement to the usual LC–MS process.

In discussing the results of their experiment, the authors confirmed that univariate data prefiltration prior to multivariate PLS-DA analysis resulted in fewer differential features, but ones which were more stable with a lower false positive (FP) rate, making them presumably more reliable and meaningful (1). Attention was paid to setting an appropriate variable influence on projection (VIP) threshold, as the researchers conceded that a large number of insignificant variables or other orthogonal noise could artificially inflate VIP values; in short, a VIP of greater than 1.0 had a higher chance of returning false positives for differential features. Based on these findings, the study concluded that univariate prefiltration should be viewed as “indispensable” when it comes to data preprocessing for analysis of untargeted metabolomics information.


(1) Xu, S.; Bai, C.; Chen, Y.; Yu, L.; Wu, W.; Hu, K. Comparing Univariate Filtration Preceding and Succeeding PLS-DA Analysis on the Differential Variables/Metabolites Identified from Untargeted LC-MS Metabolomics Data. Anal. Chim. Acta 2024, 1287, 342103. DOI: 10.1016/j.aca.2023.342103

(2) Moritz, T.; Lewis, M. R. Comprehensive Small Molecule Analysis Using Trapped Ion Mobility Spectrometry for Obtaining Accurate Metabolic Profiles. LCGC International – Curr. Trends Mass Spectrom. 2024, 22 (1), 14–17.

(3) Allen, A.; Simpson, P. Metabolomics Meets Machine Learning: The Future of Nature-Inspired Drug Discovery. Column 2024, 20 (4), 11–15.

Related Videos
Toby Astill | Image Credit: © Thermo Fisher Scientific
John McLean | Image Credit: © Aaron Acevedo
Related Content