
Standardizing Statistical Tools for “Omics’: Best Practices Using R and Python
A global team led by Michal Holčapek, professor of analytical chemistry at the Faculty of Chemical Technology, UPCE, Pardubice (Czech Republic), and Jakub Idkowiak, a research associate from KU Leuven (Belgium), has unveiled a powerful R and Python-based toolkit for tackling the complexities of lipidomics and metabolomics data, offering researchers a new approach for “omics” analysis.
A comprehensive overview of statistical tools and visualization methods for processing lipidomics and metabolomics data using R and Python has been published in Nature Communications (1). The study, involving 20 co-authors from 19 institutions across five countries, was led by Michal Holčapek at the Faculty of Chemical Technology, UPCE, Prague and Jakub Idkowiak, a research associate from KU Leuven (Belgium). The article addresses challenges faced by researchers in the processing and interpretation of large-scale mass spectrometry datasets commonly generated in lipidomics and related “omics” fields.
The original aim was to consolidate data visualization scripts developed by PhD student Jakub Idkowiak for internal use, according to Holčapek (2). However, because of growing interest in the topic and input from researchers across multiple disciplines, the project developed into a broader review of available tools and best practices.
One key outcome is an openly accessible GitBook resource—Omics Data Visualization in R and Python—which compiles example scripts, workflows, and user guidance for statistical processing and visualization (3). This library is intended to support researchers across varying levels of computational expertise, facilitating reproducibility and transparency in data analysis.
For separation scientists, the review offers practical insights into how multivariate statistical methods and visualization strategies can be integrated into workflows following chromatographic and mass spectrometric separation. Lipidomics and metabolomics studies increasingly rely on advanced separation techniques to resolve complex mixtures before quantification and identification. However, the resulting datasets can be challenging to interpret without appropriate computational tools. This publication serves as a practical reference for selecting suitable statistical approaches for quality control, feature selection, and classification tasks, particularly in studies involving biomarker discovery, disease diagnostics, or systems biology.
The growing volume and complexity of omics data have created a need for standardized and user-friendly analysis approaches. Workshops and sessions on R-based data processing have become highly attended at international conferences, underscoring the demand for accessible, well-documented resources in this area.
Michal Holčapek told LCGC International: “For scientists working hands-on with LC–MS or chromatography workflows, our tools help to detect and correct for batch effects, systematic drift, or outlier runs early on via diagnostic visualization; for example, principal component analysis (PCA) and quality control (QC) trends. By integrating evaluation plots and QC statistics into data preprocessing, the analyst can quickly flag problematic injections or batches before deeper analysis downstream. In short, better ‘cleaning’ of your measurement matrix leads to more reliable downstream insights. We also show a range of visualization tools for LC–MS obtained -omics data.”
Jakub Idkowiak added: “Unlike many existing omics toolboxes, our approach prioritizes flexibility and transparency by offering modular, interoperable components rather than rigid ‘black box’ pipelines. We focus on the key analytical techniques required for lipidomics and metabolomics and integrate them with robust, state-of-the-art visualization methods. Moreover, we provide a GitBook (3) containing R/Python code and clear decision logic, explaining the rationale behind method selection, which allows users to both execute and truly understand the analysis.”
This publication is the fifth contribution from Holčapek’s team to Nature Communications. Previous studies from the group have addressed challenges in lipidomics data reporting and introduced a method for early detection of pancreatic cancer from blood lipid profiles (4), which has since been patented and is currently undergoing clinical validation through the university spin-off company Lipidica (5).
References
1. Idkowiak, J.; Dehairs, J.; Schwarzerová, J.; Olešová, D.; Truong, J. X. M.; Kvasnička, A.; Eftychiou, M.; Cools, R.; Spotbeen, X.; Jirásko, R.; Veseli, V.; Giampà, M.; de Laat, V.; Butler, L. M.; Weckwerth, W.; Friedecký, D.; Demeulemeester, J.; Hron, K.; Swinnen, J. V.; Holčapek, M. Best Practices and Tools in R and Python for Statistical Processing and Visualization of Lipidomics and Metabolomics data. Nat Commun., 2025, 16, 8714. DOI:
2.
3. Laboratory of Lipid Metabolism. Omics Data Visualization in R and Python. GitBook.
4. Wolrab, D., Jirásko, R., Cífková, E. et al. Lipidomic Profiling of Human Serum Enables Detection of Pancreatic Cancer. Nat Commun., 2022, 13, 124. DOI:
5.
Newsletter
Join the global community of analytical scientists who trust LCGC for insights on the latest techniques, trends, and expert solutions in chromatography.





