
AI/ML in Practice: Emery Bosten Explores Assisted Active Learning (AAL)
In the first instalment of LCGC International's interview series exploring how artificial intelligence(AI)/machine learning (ML) is being used in separation science, we interviewed Emery Bosten from KU Leuven, Belgium, on the role of assisted active learning (AAL) for method development in liquid chromatography (LC).
What motivated your group to explore assisted active learning (AAL) for model-based method development in liquid chromatography?1 What is AAL and what makes this approach innovative compared to traditional method development strategies?
Developing a liquid chromatography (LC) method is traditionally a time-consuming and resource-intensive process. Analysts must adjust multiple experimental parameters, such as gradient profile, pH, flow rate, temperature, and column characteristics, to achieve an optimal separation. Because these parameters strongly interact, identifying the right combination often requires many trial-and-error experiments and substantial expert input. We were motivated to explore whether this process could be made faster, more systematic, and less dependent on manual intervention.
Active learning (AL) offers a promising solution. AL is a statistical framework in which experiments are selected sequentially based on what has already been learned. Instead of testing conditions randomly or exhaustively, the algorithm identifies the most informative next experiment to perform, thereby maximizing learning efficiency. In chromatography, AL can automatically propose new experimental settings, and iteratively improve the method until satisfactory separation is achieved.
Assisted active learning (AAL) builds upon this concept by incorporating additional sources of knowledge to guide the learning process. Rather than starting from scratch, the algorithm is “assisted” by prior information, such as results from previous method development studies, physicochemical properties of analytes, or known physical constraints of the chromatographic system. This guidance helps the model make more informed decisions, accelerating convergence toward optimal conditions.
The innovation of our approach lies in this integration of prior knowledge into an optimization framework. While traditional active learning strategies treat each method development problem largely independently, our AAL framework leverages existing data and chemical insight to reduce the number of required experiments. By valorizing historical method development data and compound-specific information, we improved the speed and efficiency of automated LC method development.
In short, AAL combines machine learning, and chemical knowledge to transform method development from a largely empirical, trial-and-error process into a data-driven and knowledge-guided optimization strategy.
How does the study use Bayesian statistics to incorporate historical data and analyte information into initial retention models, and why is this advantageous for LC method development?
Our AAL approach relies on mathematical models, so called retention models, that describe how compounds behave under different chromatographic conditions, for example, how changes in solvent composition, temperature, or pH affect their retention. To build these retention models, we normally need experimental data. In traditional approaches, this means performing several exploratory runs before the model becomes reliable.
The challenge is that, at the beginning of the method development, very little data is available. As a result, early predictions can be uncertain, and the decision algorithm may suggest less optimal experiments. This can slow down the overall process. To overcome this, we used Bayesian statistics, a framework that allows to combine existing knowledge with new experimental data in a systematic way. Instead of starting from scratch, the initial retention models begin with an informed estimate based on historical method development data and known physicochemical properties of the compounds under study. In other words, the algorithm starts with an “educated guess” rather than a blank slate.
As new experimental results become available, the Bayesian framework updates the retention models step by step. It continuously balances prior knowledge with newly collected data, gradually refining the predictions. This updating process makes the model more accurate over time.
The key advantage of this approach is speed and efficiency. Because the model is already guided by relevant prior information, fewer exploratory experiments are needed to reach reliable predictions. This reduces uncertainty in the early stages and allows the system to identify optimal separation conditions more quickly than traditional approaches that rely only on newly generated data.
In short, Bayesian statistics enables our method to learn faster by building on what is already known.
What is the role of the active data selection method in AAL, and how does it determine the “most informative” experiment to run next?
The active data selection step is essentially the decision-making engine of the AAL framework. Its role is to automatically decide which experiment should be performed next in order to make the method development process as efficient as possible.
At any given stage, the system has retention models that describe how compounds are expected to behave under different chromatographic conditions. However, these models are not perfect, especially in the early stages, and they contain areas of uncertainty. The active data selection method analyzes two key aspects:
- where the model predictions are most uncertain, and
- which experimental conditions are predicted to yield the best separation.
The “most informative” experiment is therefore one that serves a dual purpose. First, it reduces uncertainty by collecting data in regions where the model is less confident. Second, it moves the separation closer to optimal conditions by focusing on promising areas of the experimental space.
In other words, the algorithm does not simply search for better separations, nor does it only try to improve the model. It strategically balances both goals: learning more about the system (exploration) while simultaneously improving performance (exploitation). This balance is what allows AAL to reach high-quality separation conditions in fewer experiments compared to traditional trial-and-error approaches.
How does the iterative process of AAL balance between exploiting current model knowledge and exploring new experimental conditions to achieve optimal separations?
The key to balancing exploitation and exploration lies in how the next experiment is selected. If the model is already confident about certain regions of the parameter space, the algorithm may exploit that knowledge by testing conditions expected to yield improved separations. At the same time, it actively identifies areas where predictions are uncertain or under-informed and explores those regions to refine the model. In other words, experiments are not run on a fixed grid but are chosen because they are maximally informative.
In this way AAL can prioritize experiments that either (i) move directly toward optimal conditions or (ii) reduce uncertainty that might otherwise lead to suboptimal method choices. Through repeated updating and targeted experimentation, the process converges efficiently: early iterations emphasize learning the system (exploration), while later iterations increasingly fine-tune conditions (exploitation).
How does AAL reduce the number of experimental runs needed for method optimization, and what metrics demonstrate its efficiency compared to fixed design methods?
AAL reduces the number of experimental runs by replacing the traditional design of experiments approach with a model-guided selection of only the most informative experiments. In fixed design approaches, experiments are typically distributed over a predefined grid of conditions, regardless of whether those experiments add meaningful new information. This often leads to redundant measurements in well-understood regions of the parameter space and unnecessary use of time and resources. In contrast, AAL continuously analyses what has already been learned and uses that knowledge to decide which experiment will provide the greatest improvement.
The reduction in experiments is achieved through two key mechanisms. First, AAL initializes the retention model with prior information derived from historical data and analyte properties, which provides a scientifically informed starting point instead of a blank model. Because the system already has an approximate description of the chromatographic behaviour, fewer experiments are required to reach a reliable prediction space. Second, the AAL algorithm evaluates both predicted separation performance and the uncertainty of those predictions, selecting new experiments specifically where they will either improve the model most or directly enhance the separation. This targeted experimentation avoids the redundancy inherent in fixed screening designs.
The efficiency of this strategy is demonstrated using metrics that combine chromatographic performance with experimental economy. Our study employs a chromatographic response function (CRF) to quantify separation quality while also considering analysis time, allowing each experiment to be evaluated in terms of its contribution to achieving a practical, optimized method. By tracking improvements in CRF values across iterations, we show that AAL converges toward optimal conditions in a small number of steps.
What were the key outcomes of the two practical examples demonstrated in the study, particularly regarding gradient optimization and separation performance?
In the presented case studies, optimized gradient conditions were reached with fewer experimental runs (up to only three experiments) than would be expected from a conventional fixed design, while still achieving separations comparable to those obtained through more extensive experimentation. The results demonstrate that the combination of informed model initialization, uncertainty-driven experiment selection, and performance-based evaluation enables AAL to focus only on experiments that meaningfully advance method optimization.
For a practicing chromatographer, what are the potential challenges, limitations, or required expertise when applying AAL and ML approaches to LC method development in a routine laboratory setting?
For a practicing chromatographer, the main challenges of applying AAL and related ML approaches will probably arise from the data quality. These methods rely on well-structured (historical) data to build meaningful models. Therefore, laboratories must have consistent, high-quality datasets, including accurate metadata on gradients, column conditions, and analyte identity. In many routine environments, such data may be incomplete or not stored in a format that is readily usable for modelling, which limits the immediate benefit of AAL or other ML methods.
Another consideration is the need for interdisciplinary expertise. While chromatographers by no means need to become data scientists, successful implementation requires some basic understanding of model assumptions, uncertainty, and how algorithmic recommendations relate to chemical reality. Especially when predictions appear to be incorrect, a basic understanding of the algorithmic methods is required to determine the potential root cause.
There are also practical integration challenges. Sequential decision-making workflows must be connected to instrument control and data processing tools, such as peak detection and tracking in a reliable, automated loop. Without robust automation, the efficiency gains can be offset by manual intervention. Close collaboration between analytical scientists and data scientists or software specialists might offer the best chance to achieve a successful autonomous method development system.
Finally and foremost, users must develop confidence in model-guided decisions. Because AAL and alternative methods propose experiments selectively rather than exhaustively, it represents a shift from traditional trial-and-error toward probabilistic decision-making, with sometimes counterintuitive proposals (to the human). With proper validation and transparency, however, it can become a powerful tool that complements chromatographic expertise.
Are you aiming to explore AI/ML for chromatography further?
I will certainly continue to explore diverse AI/ML methods to resolve some persistent bottlenecks in chromatography. Method development in LC as discussed in this research is only one of the many research areas in which AI/ML can offer potentially interesting solutions.
Peak detection and tracking in complex datasets is another area where AI can help. Modern chromatography, especially when coupled with MS, produces large, multidimensional datasets that are difficult to interpret manually. ML can assist with automated peak detection, deconvolution of overlapping signals, and consistent tracking of compounds across experiments, which is essential for closed-loop optimization and high-throughput workflows.
Future efforts will hence concentrate on three main research areas :
(i) the exploration of alternative optimization methods for automated method development, such as Reinforcement Learning, which showed very promising results in recent investigations.
(ii) the development of key signal processing tools that allow complete automation including baseline correction, noise removal, peak detection, tracking, and deconvolution.
(iii) implementation of the different algorithms in a closed-loop system to ultimately develop a fully autonomously operating LC system.
Reference
- Bosten, E.; Pardon, M.; Chen, K.; Koppen, V.; Van Herck, G.; Hellings, M.; Cabooter, D. Assisted Active Learning for Model-Based Method Development in Liquid Chromatography. Anal. Chem. 2024, 96 33), 13699–13709. DOI: 10.1021/acs.analchem.4c02700




