Key Points
- Bioactive compounds extracted from plants can contain many beneficial properties, although distinguishing between them can be challenging.
- New quantitative structure-retention relationship (QSRR) models were developed to predict the retention times of plant food bioactive compounds.
- The QSRR model created in this research allowed valuable insights into the separation mechanisms of plant food bioactive compounds, being tested across three different chromatographic systems.
University of Kurdistan and University of Milano-Bicocca researchers developed original quantitative structure-retention relationship (QSRR) models to predict the retention times of plant food bioactive compounds. Their findings were published in the Journal of Chromatography A (1).
Bioactive compounds extracted from plants can enhance human health, reduce the risk of disease, and serve as suitable sources for identifying new drugs. Plant foods encompass a large variety of compounds, which are called secondary metabolites. These can exhibit diverse bioactive properties, such as antioxidant, anti-inflammatory, and anticancer activities. These substances generally have molecular weights less than 900 Da and belong to various chemical classes, such as terpenes and polyphenols.
Due to the complex matrices of foods, the qualitative and quantitative analysis of metabolites in plant foods requires a prior separation step. Liquid chromatography–mass spectrometry (LC–MS) is the preferred analytical technique for untargeted and targeted metabolomics, largely due to its ability to separate and determine a wide range of bioactive compounds within complex food matrices. Proper compound identification in LC–MS-based metabolomics presents a challenge, as the mass spectra of small molecules typically lack the necessary information for determining their chemical structures.
To develop automated tools for rapid compound identification, prior knowledge of retention times (RTs) has been deemed a valid support to MS data to reduce the number of potential candidate structures. However, experimental determination of retention times is not always feasible for a large number of compounds. As such, alternative methods, such as quantitative structure-retention relationships (QSRRs), are being used to fill data gaps by predicting metabolites’ chromatographic retention behavior. QSRR is a data-driven computational approach, which establishes a mathematical relationship between molecule retention time and molecular structure descriptors or physicochemical properties. These models are helpful for performing tasks with minimal tasks and cost expenditures, allowing compounds’ retention times without performing experiments (2). Developing predictive and reliable QSRR models can not only assist in identifying new metabolites by predicting retention times but can also provide important information about molecular separation mechanisms.
In this study, genetic algorithms (GAs) coupled to multiple linear regression (MLR) were applied to select the most relevant molecular descriptors to establish QSRRs aimed at predicting the retention times of plant food bioactive compounds across three different LC chromatographic systems. The statistical parameters showed model robustness and satisfactory predictive ability. Particular attention was paid to measure the uncertainty of predictions and assess their reliability based on the model applicability. Interpretations of the selected molecular descriptors provided valuable insights into separation mechanisms. The developed models were applied to predict the unknown retention times for the three studied LC chromatographic systems of a large library of plant food bioactive compounds, which were freely available for further assisting the research in the field of natural products.
The QSRR model developed in this research produced valuable insights into the separation mechanisms of plant food bioactive compounds in the different chromatographic systems. The models proved robust with satisfactory predictive ability, despite the very limited number of available molecules for model calibration and validation. Unlike conventional approaches, which only rely on experimentally measured retention times as input for predicting new compounds, the proposed QSRR models were found to be applicable to any molecular structure, enabling RT predictions for large compound libraries.
Combining two different diagnostic metrics to define the boundary of the QSRR applicability domain demonstrated to be effective in identifying unreliable predictions. This enhances the application of QSRR models in supporting the untargeted analysis of large datasets of plant food bioactive compounds.
References
(1) Sepehri, B.; Consonni, V.; Ballabio, D.; et al. Application of QSRR Models for Predicting the Retention Times of Plant Food Bioactive Compounds. J. Chromatogr. A 2025, 1758, 466194. DOI: 10.1016/j.chroma.2025.466194
(2) Kumari, P.; Van Laethem, T.; Hubert, P.; Fillet, M.; et al. Quantitative Structure Retention-Relationship Modeling: Towards an Innovative General-Purpose Strategy. Molecules 2023, 28 (4), 1696. DOI: 10.3390/molecules28041696