
Streamlining Lipidomics: Novel Statistical Workflows and Visualization in R and Python
Key Takeaways
- The research fills a gap in standardized guidance for lipidomics/metabolomics data analysis, focusing on transparency and reproducibility using R and Python.
- The approach offers modular, interoperable components, integrating robust visualization methods and providing a GitBook with code and decision logic.
Recent developments in lipidomics and metabolomics have highlighted persistent challenges in data processing, statistical analysis, and visualization—areas that often hinder reproducibility, transparency, and accessibility within the field. A newly published guideline addresses these issues by presenting a comprehensive, code-based framework for the statistical treatment and graphical representation of omics data using R and Python. LCGC International spoke to Michal Holcapek and Jakub Idkowiak about the rationale behind this research and the benefits it offers separation scientists.
You recently published a paper titled “Best Practices and Tools in R and Python for Statistical Processing and Visualization of Lipidomics and Metabolomics Data” (1). What was the rationale behind this research? What specific limitations or challenges in current lipidomics or metabolomics workflows were you aiming to overcome?
We identified a clear gap in accessible and reproducible guidance for statistical processing and visualization in lipidomics and metabolomics, particularly for students and early-career researchers beginning their journey in omics data analysis. Existing workflows often rely on ad hoc choices for normalization, imputation, scaling, and visualization, which are not well standardized and may obscure data interpretation or compromise reproducibility. Our objective was to develop a best-practices framework and curated selection of tools in R and Python to help users perform these critical steps more transparently, consistently, and robustly.
There are many existing resources for statistical analysis in “omics.” What is novel about your approach compared to existing methods?
Unlike many existing omics toolboxes, our approach prioritizes flexibility and transparency by offering modular, interoperable components rather than rigid “black box” pipelines. We focus on the key analytical techniques required for lipidomics and metabolomics and integrate them with robust, state-of-the-art visualization methods. Moreover, we provide a GitBook containing R/Python code and clear decision logic, explaining the rationale behind method selection, which allows users to both execute and truly understand the analysis (2).
To our knowledge, this is the first time such a practical, code-based solution has been made available for lipidomics/metabolomics researchers rather than merely summarized in a review.
For researchers working hands-on with liquid chromatography–mass spectrometry (LC–MS) or related separation workflows, how does your method benefit separation scientists in practice?
For scientists working hands-on with LC–MS or chromatography workflows, our tools help to detect and correct for batch effects, systematic drift, or outlier runs early on via diagnostic visualization; for example, principal component analysis (PCA) and quality control (QC) trends, and normalization modules. By integrating evaluation plots and QC statistics into data preprocessing, the analyst can quickly flag problematic injections or batches before deeper analysis downstream. In short, better “cleaning” of your measurement matrix leads to more reliable downstream insights. We also show a range of unique visualization tools for LC–MS obtained omics data.
You present both R and Python tools—how did you select which packages to recommend, and how do you handle interoperability between them? Is there a preferred environment depending on the user’s goals, for example, visualization vs. statistical modeling?
Our package recommendations were filtered by three criteria: maturity, community adoption, and ability to generate publication-quality output with manageable complexity, as the review is mainly intended for beginners. We attempted to highlight analogous tools in both languages. Depending on the user’s goal, one might prefer R for highly polished static graphics and simple statistical modeling, or Python for integration with more complex machine learning workflows, but our workflow supports both. We are also open to further suggestions and development, particularly if some of the preferred data analysis/visualization tools are missing or will be missing. All one needs to do is reach out to us at lipidomicsrpython@kuleuven.be.
Preprocessing steps like batch correction, normalization, and missing value imputation are often poorly standardized. How does your proposed workflow in R/Python help streamline these steps, and how adaptable is it to different sample types and analytical platforms?
We strongly recommend careful sequence planning in accordance with current guidelines from the Lipidomics Standards Initiative (3) and the Metabolomics Society (4), including the use of QC samples, blank injections, and system suitability standards to assess instrument performance
Missing data points remain a major challenge in lipidomics and metabolomics. Rather than applying imputation methods blindly, we emphasize investigating the underlying causes of missingness and addressing them appropriately depending on whether the data are missing completely at random, at random, or not at random. The manuscript presents a range of well-tested approaches to guide this process.
Normalization remains one of the most critical and debated steps in the field. We advocate for standards-based normalization, which accounts for analytical response factors and sample preparation variability, for example, extraction efficiency. In practice, researchers often prefer pre-acquisition normalization, but this should be carefully optimized for each sample type and thoughtfully planned before implementation. Importantly, several post-acquisition normalization techniques exist that can help correct for suboptimal or missing pre-acquisition normalization.
Data transformation and scaling should be applied in line with the chosen statistical analysis methods and never performed automatically, as excessive transformation may complicate interpretation.
Ultimately, successful analysis begins with proper planning: standardized and well-organized acquisition sequences are essential; collecting appropriate QC data enables the use of advanced correction algorithms such as LOESS (Locally Estimated Scatterplot Smoothing) and SERRF (Systematic Error Removal using Random Forest) (5), which are well-known, evaluated, and tested in both lipidomics and metabolomics. For standard sample types like plasma or serum, we also implemented NIST-based batch effect removal, following the approach proposed by Chocholouskova et al. (6).
These strategies and methods have been extensively tested across multiple analytical platforms, providing a reliable means to minimize inter-instrument and inter-laboratory variability. However, the key to success lies in planning, optimizing, validating, and analyzing your data. This leads to the improvement of methods.
Effective visualization is key in metabolomics and lipidomics. Can you highlight specific visualization techniques or tools in your workflow that are especially useful for revealing patterns, trends, or potential outliers in large-scale omics data sets?
We advocate using box plots with jitter or violin plots instead of traditional bar charts, as they more accurately depict data distributions. For skewed distributions, we highlight the use of adjusted box plots that employ medcouple-based whisker definitions, providing a more robust representation of the asymmetric distributions frequently encountered in lipidomics/metabolomics. The accompanying GitBook showcases a variety of volcano plots, along with more advanced visualization tools developed by Friedecký’s team, such as lipid maps and fatty acyl-chain plots, which reveal trends within lipid classes and/or fatty acyl-chain length or unsaturation. Dendrogram-heatmap combinations are indispensable for interpreting quantitative bulk data, offering powerful visual insights into sample clustering and trend patterns. Similarly, PCA and Uniform Manifold Approximation and Projection (UMAP) embeddings support unsupervised data exploration, often uncovering meaningful relationships when interpreted carefully. This concerns both biological insights one would like to focus on and any technical factors that may be revealed. In terms of implementation, we rely on ggpubr, tidyplots, ggplot2, ComplexHeatmap, ggtree, and mixOmics in R, while in Python, we employ seaborn and matplotlib for flexible and publication-ready visualizations.
How does this approach align with FAIR (Findable, Accessible, Interoperable, Reusable) data principles? Do you propose or include any tools for ensuring reproducibility and transparent documentation of analytical steps?
We strongly align with FAIR principles (7) by using open-source tools and standard file formats. This is essentially reflected in us publishing a complementary GitBook with all code, versioning, and decision logic visible to the reader. The modular, documented workflow encourages reuse and adaptation, and users can re-run or re-route parts of the pipeline. In addition, by making choices explicit (rather than hidden), we promote transparent reporting of each step in the analysis. Many journals encourage the sharing of R/Python scripts used for data analysis and visualization, and we strongly support it.
What do you see as the next steps for broader adoption of these tools in the metabolomics and lipidomics communities? Are there plans to turn this into a package, a teaching resource, or perhaps to build more automation and user-friendliness into the workflows?
We are planning to develop training materials, workshops, and case-study tutorials to promote broader adoption within the community. Several such events have already been held with great success, reflecting the growing interest in using R and Python for data analysis and visualization. We are considering integrating selected tools into existing software solutions or platforms, such as the LipidMaps statistical toolkit (8) to enhance accessibility and streamline their practical use. Given that the GitBook is a live document with potential for collaboration with interested researchers, we may consider a follow-up article if significant changes and improvements are made.
What future developments do you foresee in this area, particularly regarding automation, AI-driven annotation, or the miniaturization of separation platforms? Are you exploring any of these directions in the future?
We anticipate increased automation in annotation, such as AI-driven feature assignment and lipid identification, which is already emerging in mass spectrometry. Simultaneously, there can be a closer integration with separation methods, for example, including real-time QC feedback or analysis of retention patterns. As separation platforms continue to miniaturize and throughput rises, scalable, parallel, and adaptive preprocessing will become essential. There is also strong potential for leveraging machine learning models for anomaly detection, automatic drift correction, and semi-supervised identification of novel lipids or metabolites, helping to capture features that might otherwise be overlooked by analytical chemists. However, careful manual supervision of all reported information is crucial to ensure the highest possible confidence and to avoid overreporting.
References
- Idkowiak, J.; Dehairs, J.; Schwarzerová, J.; et al. Best Practices and Tools in R and Python for Statistical Processing and Visualization of Lipidomics and Metabolomics data. Nat. Commun. 2025, 16, 8714. DOI:
10.1038/s41467-025-63751-1 - Laboratory of Lipid Metabolism. Omics Data Visualization in R and Python. GitBook.
https://laboratory-of-lipid-metabolism-a.gitbook.io/omics-data-visualization-in-r-and-python (accessed 2025-10-14). - Lipidomics Standards Initiative. Lipidomics Standards Initiative Home Page.
https://lipidomicstandards.org/ (accessed 2025-10-23). - Metabolomics Society. Metabolomics Society Home Page.
https://metabolomicssociety.org/ (accessed 2025-10-23). - Fan, S. Kind, T.; Cajka, T.; et al. Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data. Anal. Chem. 2019, 91, 3590–3596. DOI:
10.1021/acs.analchem.8b05592 - Chocholoušková, M.; Wolrab, D.; Jirásko, R.; et al. intra-laboratory Cross-comparison of Four Lipidomic Quantitation Platforms using Hydrophilic Interaction Liquid Chromatography or Supercritical Fluid Chromatography Coupled to Two quadrupole- Time-of-flight Mass Spectrometers. Talanta 2021, 231, 122367. DOI:
pubmed.ncbi.nlm.nih.gov/33965032/ - Wilkinson, M. D.; Dumontier, M.; Aalbersberg, I. J. et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship, Scientific Data 2016, 3, 160018. DOI:
https://www.nature.com/articles/sdata201618 - Lipid MAPS. Statistical Analysis Tools for User-Uploaded Data.
https://www.lipidmaps.org/resources/tools/stats (accessed 2025-10-22).
Biography
Michal Holčapek obtained his Ph.D. in analytical chemistry from the University of Pardubice, Pardubice, Czech Republic, where he currently serves as a professor of analytical chemistry. His research focuses on mass spectrometry and its coupling with liquid chromatography or supercritical fluid chromatography, applied mainly to lipidomic analysis and cancer biomarker research. He has received numerous prestigious awards, including the Neuron Award for Connecting Science and Business (2023, Neuron Foundation), the Rudolf Lukeš Prize (2023, Experientia Foundation), and the Herbert J. Dutton Award (2022, American Oil Chemists' Society). He is one of the founding members of the Lipidomics Standards Initiative and the International Lipidomics Society.
Jakub Idkowiak earned his Ph.D. in analytical chemistry from the University of Pardubice, Czech Republic, under the supervision of Professor Michal Holčapek, where he specialized in mass spectrometry of lipids and their quantitation in biological matrices. He is currently a research associate in the Laboratory of Lipid Metabolism and Cancer, led by Professor Johannes V. Swinnen, at the Leuven Cancer Institute and the Leuven Institute for Single Cell Omics, Department of Oncology, KU Leuven, Belgium. His research centers on analytical methods for lipid detection and quantification, with a focus on direct infusion mass spectrometry, mass spectrometry imaging, and liquid chromatography–mass spectrometry combinations. He is also interested in the statistical analysis and visualization of -omics data.
Newsletter
Join the global community of analytical scientists who trust LCGC for insights on the latest techniques, trends, and expert solutions in chromatography.





