News|Articles|April 13, 2026

MC-Retention: Accelerating Liquid Chromatography Screening with Neural Network

Listen
0:00 / 0:00

Key Takeaways

  • Method screening rapidly surveys broad variables (phase chemistry, mobile phase, pH/ionic strength, elution mode, temperature) to find viable selectivity, whereas optimization refines a narrowed method for resolution, time, and robustness.
  • Separation performance emerges from coupled effects among stationary phase, organic modifier, buffer/pH, gradient, flow, temperature, and pressure, with pH-driven ionization shifts strongly modulating retention and selectivity constraints.
SHOW MORE

A recent paper introduced what the authors described as “an uncertainty-aware graph-based neural network that predicts retention times across multiple column chemistries and buffer pH conditions. LCGC International spoke to some of the authors of this paper, including corresponding author Pankaj Aggarwal, about their work.

The multiple condition (MC)-retention model is an uncertainty-aware graph-based neural network that predicts liquid chromatography (LC) retention times across multiple column chemistries and buffer pH conditions simultaneously. By accurately modeling multiple chromatographic methods, it significantly reduces the time and resources required for LC method development while providing calibrated uncertainty estimates.

A recent paper published in Analytical Chemistry1introduced what the authors described as “an uncertainty-aware graph-based neural network that predicts retention times across multiple column chemistries and buffer pH conditions. Additionally,” the authors continued, “we demonstrate that MC-retention effectively identifies optimal conditions for separating amide bond-forming reaction components, while also supplying calibrated uncertainty estimates for its retention time predictions.” LCGC International spoke to some of the authors of this paper, including corresponding author Pankaj Aggarwal, about their work.

What are the key differences between method screening and method optimization in liquid chromatography (LC), and why is screening typically performed first?
Within LC method development, method screening (often termed method scouting) constitutes an exploratory stage in which a limited number of experimental runs is used to interrogate a broad conditions space, typically including stationary-phase chemistry, organic modifier selection, aqueous buffer composition (including pH and ionic strength), elution mode (isocratic versus gradient), and, where relevant, column temperature. This is done to identify conditions that yield suitable retention, selectivity, and peak shape for the analytes of interest. In contrast, method optimization involves systematic refinement of a reduced set of parameters about a selected starting condition (for example, gradient profile and slope, mobile-phase pH within an appropriate buffering range, flow rate, and column temperature) to satisfy prespecified performance criteria, including resolution, analysis time, and robustness. Screening is therefore performed first because chromatographic selectivity is frequently governed primarily by the combined choice of stationary phase and mobile-phase system; early identification of an appropriate separation mechanism mitigates the risk of optimizing parameters within an intrinsically unsuitable method space.

Which factors must be investigated and optimized during LC method development, and how do these parameters interact with each other to affect separation performance?
Critical parameters in LC method development include stationary-phase chemistry, mobile-phase composition (nature and proportion of the organic modifier, buffer identity, pH, and ionic strength), flow rate, column temperature, gradient profile, and system pressure. These variables are strongly interdependent and collectively determine chromatographic retention, selectivity, and efficiency. For example, changes in mobile-phase composition or pH directly influence analyte ionization state and solute–stationary phase interactions, thereby altering retention and selectivity, which in turn constrains feasible flow rates, gradient slopes, and temperature settings. Consequently, effective method development requires consideration of both individual parameter effects and their coupled interactions to achieve the desired balance of resolution, analysis time, and robustness.

Compare traditional empirical modeling approaches (like van't Hoff equation and linear solvent strength model) with modern machine learning approaches for retention time prediction. What are the limitations of each?Traditional empirical modeling approaches, such as the linear solvent strength model, provide robust frameworks for theory driven predictions but necessitate initial measurements for model parameter fitting. These methods lack the flexibility required for predicting retention times across diverse conditions and analytes in a de novo manner but can excel at extrapolation once fitted if theoretical assumptions hold. In the case of the linear solvent strength model, a linear relationship between the logarithm of the retention factor and the organic modifier fraction in the mobile phase is assumed, with an analyte specific solvent strength parameter derived empirically from experimentation.

Unlike traditional empirical modeling methods, contemporary machine learning methodologies can facilitate predictions for previously unseen analytes. However, while these methods can provide great utility, they typically require large, well-curated datasets for effective training and may encounter challenges with generalizability across varying analyte types and chromatographic conditions. Additionally, these models are less interpretable, and often uninterpretable with deep learning, whereby contrast empirical models have physiochemical relationships between analytes and retention made explicit.

How do graph-based neural networks like graph isomorphism networks (GIN) improve retention time modeling compared to traditional machine learning approaches?
Graph-based approaches to deep learning allow the network to learn from the overall connectivity and structure of the analyte. In our case the GIN consider atom properties, adding an additional degree of information. By using the structure of the analyte, the network can learn how groupings of atoms interact with each other at various hierarchical levels of abstraction and relate both local and molecule wide chemistry to retention time.

Leveraging the molecular structure, these graph-based models can better generalize than traditional machine learning approaches by using “learned features” rather than pre-defined calculable features such as molecular weight or number of H-bond acceptor/donators. These networks being capable of learning complex non-linear relationships and are well suited to learning how structures relate to retention time across multiple chromatography conditions.

What makes it challenging to predict retention times across different column chemistries, and how does the MC-retention model address this limitation?Accurate prediction of retention times across different column chemistries is inherently challenging due to the diversity of physicochemical interactions governing analyte–stationary phase behavior. Variations in bonded phase functionality, surface chemistry, and secondary interaction mechanisms can lead to substantial changes in retention and selectivity that are difficult to capture with models trained on a single or limited set of column types. Conventional retention models and many data-driven approaches therefore exhibit limited transferability across stationary phases.

The MC‑retention model addresses this limitation by explicitly learning joint representations of analyte molecular structure and column chemistry, enabling retention time prediction across multiple stationary-phase conditions. In addition, the incorporation of uncertainty estimation provides a quantitative indication of prediction reliability, allowing practitioners to identify conditions where model extrapolation may be less reliable and experimental confirmation is warranted.

Why is uncertainty estimation important in retention time modeling, and how does it increase the robustness and usability of chromatography models?
In practice, incorporating uncertainty estimation becomes critical when deciding between multiple conditions that would otherwise appear to have similar separations if only considering singular predicted retention times.If we can imagine cases where MC-retention predicts two conditions with similar mean retention times, but the distributions show one condition has major overlap in retention time distributions between peaks; this becomes actionable, and the condition without overlapping distributions becomes the clear candidate for follow up in the lab.

Additionally, using an uncertainty estimation technique that decomposes the total predictive uncertainty into the quantified epistemic and aleatoric components, such as MC-retention, can help to clarify sources of uncertainty. It can also support model improvement strategies, such as targeted data collection guided by a subject matter expert or an active learning method. High predictive epistemic uncertainty suggests the model lacks sufficient training data or knowledge in the region of the chemical space corresponding to the prediction.This could motivate additional experimentation and expansion of the training set to address the potential insufficiency. In contrast, aleatoric uncertainty is due to the inherent randomness of the data and is thus irreducible.

How can transfer learning be applied in chromatography modeling to work with limited experimental data sets?
Transfer learning allows for neural networks to leverage larger datasets to pre-learn representations on many molecules, before finetuning the network on task specific data.This technique has been associated with the rising popularity of foundational models, where a network is trained on many measured/calculated properties, and then re-trained on a much smaller dataset for a given problem on hand. In our case with MC-Retention, we found that pre-training on a larger public dataset and then utilizing transfer learning and fine tuning to our Multiple Condition Adjusted Retention Time (MCaRT) dataset boosted its performance. As more network architectures become capable of modeling multiple chromatography conditions at once, it will likely become the case that transfer learning will be used to boost the performance on specific analyte chemotypes and methods with highly sparce data.

What strategies can be employed to reduce the experimental burden of method screening when dealing with limited sample quantities?
When experimental sample availability is limited, reducing the burden of LC method screening requires strategies that maximize information content per experiment while minimizing material consumption. In silico retention time modeling approaches can be employed to prioritize a subset of chromatographic conditions with a high likelihood of providing acceptable separation performance, thereby reducing the number of physical experiments required. In parallel, statistically designed experiments, such as design of experiments (DOE) frameworks, enable efficient exploration of multidimensional parameter spaces by systematically varying key factors and capturing interaction effects with a minimal number of runs. Together, these approaches support more resource‑efficient method screening while maintaining rigorous coverage of relevant chromatographic variables.

How can advanced retention time modeling accelerate pharmaceutical development processes and reduce time-to-market for life-saving products?
Advanced retention time modeling has the potential to substantially accelerate pharmaceutical development by enabling a more efficient and rational approach to chromatographic method development. By providing reliable, in silico predictions of retention behavior across a range of chromatographic conditions, such models can reduce the experimental burden associated with empirical method screening and iterative optimization. Importantly, model‑informed exploration of chromatographic design space also supports the identification of operating regions that are inherently more robust to small variations in method parameters, thereby promoting the development of stable and reproducible methods early in the lifecycle. This capability allows promising separation conditions to be identified earlier in development, shortening analytical timelines, improving resource utilization, and reducing development risk. In regulated pharmaceutical workflows, these combined benefits translate directly into faster method readiness and shorter timelines for advancing drug substances and drug products to clinical and commercial manufacturing.

What validation approaches are necessary to ensure a multi-condition retention time model performs reliably under real-world stationary and mobile phase scouting scenarios?
Validation of multi‑condition retention time models intended for practical method scouting must be designed to reflect the complexity and variability encountered in real‑world chromatographic workflows. Robust validation strategies should therefore include rigorous cross‑validation across diverse datasets encompassing multiple stationary phases, mobile‑phase compositions, and analyte chemotypes to assess model generalizability beyond narrowly defined training conditions. External validation using prospective or held‑out experimental data is particularly important for evaluating model performance under realistic method development scenarios. In addition, incorporation of uncertainty‑aware modeling frameworks enables quantitative assessment of prediction reliability, allowing identification of regions of chemical or chromatographic space where model extrapolation may be less reliable. Finally, continuous model updating through integration of newly generated experimental data supports sustained performance over time, ensuring that retention time models remain accurate, robust, and fit‑for‑purpose as chromatographic conditions and analytical demands evolve.

References

  1. Beck, A. G.; Jones, G.; Singh, A. et al. Uncertainty-Aware Learning of Multiple Conditions as a Framework for Streamlined Retention Time Prediction to Accelerate Method Development. Anal Chem. 2026, 98 (6), 4777-4790. DOI: 10.1021/acs.analchem.5c06729