
Extech 2026 Preview: The Role of Artificial Intelligence in Chromatographic Data Analysis: An Update
Paweł Wiczling from the Medical University of Gdańsk, Poland describes some of the practical benefits that artificial intelligence/machine learning (AI/ML) can offer separation scientists. Pawel will present on this topic at Extech 2026, which will take place at the University of Liege, Gembloux, Belgium from July 6–9 2026.
How can artificial intelligence improve the analysis of chromatographic data?
Artificial intelligence (AI) is a powerful and increasingly effective approach in the analysis of many aspects of chromatographic data.1 At its core, it involves building systems that learn from large volumes of data to address specific analytical problems, delivering rapid and automated responses.
In this context, it is useful to consider statistical methods (SM) alongside machine learning (ML) approaches.1,2 Both fall within a broader model-based data analysis framework, yet they differ in important ways. As such, careful, deliberate decisions should be made about when and how to apply each approach.
SM focuses on explicitly representing the data-generating process using probabilistic frameworks. These models include interpretable parameters that often correspond to physically meaningful aspects of chromatographic behavior, such as peak shape, retention time variability, or noise structure. This interpretability makes SM especially valuable when scientific insight, transparency, and uncertainty quantification are important. Modern statistical models can also accommodate nonlinearity, higher-order interactions, and large numbers of predictors, making them flexible enough for complex chromatographic data.
In contrast, ML typically refers to more algorithmic approaches that do not impose a predefined structure on the relationship between inputs and outputs. These methods, such as neural networks or tree-based models, have become dominant in the recent literature due to their strong performance in tasks like automated peak detection and signal deconvolution, especially when large labeled datasets are available.
However, the choice between SM and ML should depend on the specific analytical objective. While ML methods may excel in prediction accuracy, statistical models offer advantages in interpretability and principled uncertainty quantification. This is particularly critical in chromatography, where decisions based on poorly calibrated confidence estimates can lead to overconfidence and potentially flawed conclusions. Ultimately, the key requirement for any model-based approach is that it is well-calibrated and sufficiently accurate for the task at hand. As a scientist, I’m more inclined toward SM methods, but I’m also a firm believer in ML methods.3
In what ways can a model assist in optimizing chromatographic conditions—such as mobile phase, temperature, and flow rate—more efficiently than traditional trial-and-error approaches?
Both SM and ML model can improve the optimization of chromatographic conditions by formalizing and accelerating what is traditionally a trial-and-error process. In the conventional approach, the analyst begins by defining the analytical problem, reviewing the literature, and considering physicochemical properties of the analytes and stationary phase. Based on this prior knowledge, an experiment is designed and performed. The results are then interpreted, and the analyst updates their understanding of the system before deciding whether further experimentation is needed. This iterative cycle continues until satisfactory separation is achieved. While effective, this process is time-consuming, resource-intensive, and relies heavily on expert intuition. It is also prone to various fallacies and cognitive biases of analyst.
Model-driven approaches can augment or replace this workflow by learning from both prior knowledge and experimental data to guide decision-making more efficiently. In particular, model-based methods can integrate diverse sources of information and predict the effects of experimental conditions on chromatographic outcomes in a coherent way. A key requirement for such models is the ability to generalize across the relevant design variables (e.g., mobile phase, temperature, flow rate, stationary phases), enabling efficient exploration of the experimental space.
In my research,4–6 I am particularly interested in hierarchical modeling approaches, similar to those used in pharmacokinetics and pharmacodynamics. These models enable partial pooling of information across experiments, analytes, and stationary phases, improving predictive performance, especially in data-sparse settings (even when the overall dataset is large, certain aspects may remain sparse). At the same time, they preserve interpretability, allowing us to understand how specific factors influence retention.
Furthermore, by combining prediction with uncertainty quantification, such models can actively guide experimental design (e.g., suggesting the most informative next experiment), thereby reducing the number of required runs and improving efficiency compared to traditional trial-and-error approaches.7,8
How can models be used to predict retention times in chromatography, and what factors influence the accuracy of these predictions?
Retention time prediction is a key component of any chromatographic method development procedure. While many modeling approaches exist, I believe the focus should shift toward understanding variability and explicitly accounting for uncertainty. Proper quantification of different sources of variation and a transparent representation of uncertainty can be far more valuable for decision-making than relying on a single “best” prediction.
As John Tukey famously said, “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” This is also relevant in many problems encountered in chromatography.
In my view, many models, including some AI-driven approaches, tend to produce answers that are neat and plausible, but not necessarily correct, especially when faced with limited or noisy data. Rather than acknowledging uncertainty, they may overstate confidence. A Bayesian SM framework offers a strong advantage here, as it naturally incorporates prior knowledge, quantifies uncertainty, and provides probabilistic predictions that are more informative for decision-making.
How might AI help resolve overlapping peaks or identify compounds in highly complex chromatographic mixtures?
Resolving overlapping peaks and identifying compounds in highly complex chromatographic mixtures remains a challenging problem.9,10 A particularly promising direction is the use of probabilistic, model-based approaches rather than relying on traditional deterministic or threshold-based methods. Basically instead of assigning a binary decision (e.g., peak present or absent), these approaches estimate the probability that a given signal corresponds to a specific analyte. This probabilistic framework allows multiple sources of information, such as MS spectrum and retention time, to be combined in a principled way.
Nevertheless, the contribution of retention time information (given the MS spectra) to improving compound identification is relatively limited. Current structure-based retention models remain too imprecise for this task in most real-world applications, particularly for the most challenging cases involving closely eluting analytes. However, they still provide valuable complementary information that is worth incorporating into the identification process
How could model-driven approaches leverage unsuccessful or suboptimal chromatographic runs to improve future method development?
This is a very relevant question that fits the topic of a model-based chromatographic data analysis. Even unsuccessful or suboptimal chromatographic runs contain valuable information. For example, if we know that under certain conditions an analyte has not eluted before a given time, this constitutes censored data. It tells us that the retention time exceeds a specific threshold. This type of information can and should be incorporated into the analysis and decision making.
More broadly, I view experimental data as one component of a larger information set, alongside prior knowledge and the assumed model structure. The objective is not simply to analyze individual runs in isolation, but to understand the underlying data-generating process. Once such a process is properly modeled, even suboptimal or failed runs contribute meaningful information that improve inference and guide decision-making in future method development steps.
What are the main barriers to integrating model-based approaches into routine chromatographic workflows, and how might laboratories overcome them?
The main barriers are not only technical but also cultural and methodological.
A key challenge is the perceived complexity and abstractness of advanced SM and ML approaches. Many practitioners view these methods as “black boxes,” which limits trust and slows adoption, especially in regulated environments where interpretability and validation are essential. In addition, there are practical issues such as limited high-quality training data, lack of standardized workflows, and difficulties in integrating these tools with existing laboratory software and instrumentation. These favors the established models or paradigms.
Another important barrier is insufficient validation and reproducibility. Many published models demonstrate promising results but are developed under narrow conditions, making it unclear whether they generalize to real-world laboratory settings. This contributes to skepticism and hesitancy among analysts.
Overcoming these challenges requires several parallel efforts. First, there is a need for well-validated, user-friendly tools that integrate seamlessly into existing workflows. Second, greater emphasis should be placed on interpretability and uncertainty quantification to build trust and support decision-making. Third, standardized datasets could help establish reproducibility and enable fair comparison of methods. I believe many SM and AI/ML approaches will emerge in the future and provide significant benefits to the chromatographic community.
References
1. Levy, D. G. Navigating Statistical Modeling and Machine Learning. Statistical Thinking.
2. Harrell, F. Road Map for Choosing Between Statistical Modeling and Machine Learning. Statistical Thinking.
3. Liu, C.; Mo, F. Automation and AI-Powered Prediction in Chromatographic Separation. Acc. Chem. Res. 2026, 59 (1), 138–150.
4. Wiczling, P.; Kamedulska, A. Comparison of Chromatographic Stationary Phases Using a Bayesian-Based Multilevel Model. Anal Chem 2024, 96 (3), 1310–1319.
5. Kamedulska, A.; Kubik, Ł.; Jacyna, J.; Struck-Lewicka, W.; Markuszewski, M. J.; Wiczling, P. Toward the General Mechanistic Model of Liquid Chromatographic Retention. Anal. Chem. 2022, 94 (31), 11070–11080.
6. Wiczling, P. Bayesian Multilevel Modeling of Retention Data Informed by Structural Similarity of Analytes. Journal of Chromatography A 2026, 1765, 466503.
7. Wiczling, P. Evaluation of Sequential Bayesian-Based Method Development Procedures for Chromatographic Problems Involving One, Two, and Three Analytes. SEPARATION SCIENCE PLUS, 2018, 1, 63–75.
8. Niezen, L. E.; Cabooter, D.; Desmet, G. Uncertainty-Driven Model-Based Search Methods for Method Development in Liquid Chromatography. Journal of Chromatography A 2026, 1781, 467071.
9. Woldegebriel, M.; Gonsalves, J.; van Asten, A.; Vivó-Truyols, G. Robust Bayesian Algorithm for Targeted Compound Screening in Forensic Toxicology. Anal. Chem. 2016, 88 (4), 2421–2430.
10. Woldegebriel, M.; Vivó-Truyols, G. Probabilistic Model for Untargeted Peak Detection in LC–MS Using Bayesian Statistics. Anal. Chem. 2015, 87 (14), 7345–7355.




