It has been more than five years since the last update in this column on the evolution of the Hydrophobic Subtraction Model (HSM) of reversed-phase selectivity and characteristics of new stationary phases recently characterized using the model. In this installment, Dwight Stoll discusses the continuing evolution of the model.
It has been more than five years since the last update in this column on the evolution of the Hydrophobic Subtraction Model (HSM) of reversed-phase selectivity and characteristics of new stationary phases recently characterized using the model. In this time, nearly 50 new columns have been added to the public HSM database, and new perspectives on the limitations and use of the model have been published. In this installment, I discuss the continuing evolution of the model, and characteristics of the recently added columns, with an eye toward use of this information for troubleshooting and method development.
In the April 2020 installment of "LC Troubleshooting," I summarized the findings reported by five different speakers as part of a symposium at the Pittsburgh Conference held in Chicago that year that was aimed at highlighting the success and evolution of the hydrophobic subtraction model of reversed-phase selectivity (HSM) (1). The HSM has been wildly successful by any measure, and the public database associated with the model is still the single largest compilation of characteristics of reversed-phase columns that is freely available, with data for more than 775 columns. The development of the model was initiated in the late 1990s by Lloyd Snyder, John Dolan (the long-time author of this column until 2017), Peter Carr, and many other collaborating research scientists from both academia and industry. This early work was supported by the Product Quality Research Institute, the National Institutes of Health, and the United States Pharmacopoeia, resulting in tens of research articles describing the work over the following two decades, and the public database of column characteristics as we know it today, which is maintained on two websites that will be referenced later in the article. Over the past decade, growth in the database has been quite steady, with additions at a rate of about two columns per month on average. Most of this growth has been supported by column manufacturers who provide the columns that are used to make the retention measurements needed to determine the column characteristics as described in the following section.
In my interactions with LC practitioners, I find that there is a wide range of ways that different scientists interact with the HSM and the columns database. On one hand, there are people very familiar with the model and database who describe numerous examples of ways that they have leveraged the model in their method development work, including using the database to address difficult situations. On the other hand, there is still a large population of scientists who have either never heard of the database, or do not have enough familiarity with it to leverage it effectively in their work. In this installment, I hope to provide something useful for both beginners and advanced users.
The basic principle of the HSM model was first described in a journal article by Snyder and coworkers in 2002 (3). Since then, many articles have been published on the topic, but two resources are particularly noteworthy for readers interested in learning more about the model. First, in 2012, Snyder and coworkers published a book chapter in Advances in Chromatography that is still the most comprehensive discussion of the model and its application that has been published to date (2). Second, a more recent article in LCGC provides more of an overview of the model and its application that may be an easier place to start for those that are completely new to the idea (4). The model, which was originally developed using retention data from alkyl phases (for example, C4, C8, and C18) bonded to high-purity type B silicas, assumes that RP selectivity (defined here as the ratio of retention factors for a compound of interest, such as ethylbenzene) can be described using the sum of five pairs of column and solute parameters that are related to different physicochemical interactions between solutes and the RP stationary phase. A view of the nature of each of these interactions is shown in Figure 1.
The mathematical expression of the model is shown in Equation 1, where the capital Roman letters H, S*, A, B, and C are column parameters, and the Greek small letters η, σ, β, α, and κ are solute parameters.
The column parameters are determined experimentally by measuring the retention times of 16 carefully chosen probe solutes in a mobile phase composed of acetonitrile and potassium phosphate buffer at pH 2.8, calculating the selectivity value for each compound (k/kEB), and regressing those selectivities against the known solute parameters for the probe compounds (1). To date, parameters for 775 commercially available columns have been determined and are publicly available for free through two websites: 1) a site maintained by the United States Pharmacopoeia (https://apps.usp.org/app/USPNF/columnsDB.html); and 2) a site maintained by my research group (www.hplccolumns.org). The two primary uses of this database are finding columns that have similar selectivities (for example, identifying a backup column during method development), and finding columns that have very different selectivities (for example, identifying a set of columns to screen during method development). In recent years, the identification of highly complementary selectivities that can be used in the second dimension of 2D-LC platform methods to evaluate peak purity has become very important in the pharmaceutical industry (5), and the HSM can be used to guide this process.
Over the last five years, we have made 46 new entries for RP columns in the publicly available HSM database. A summary of the numbers of columns in different phase categories is shown in Figure 2. Some of these columns are truly new, both to the database, and to the commercial market. Others have been on the market for some time, but have been characterized only recently using the HSM. Here, we see a few interesting trends. First, it is remarkable how dominant the C18 group is in the most recent set of additions. I suspect there are at least a couple of major drivers for this. Many of the new columns in the 2019 to 2024 group are coming from relatively new manufacturers, and it makes sense that many of their first entries in the market would be in the C18 space, given that it is the dominant chemistry used for RP separations. The other factor that is definitely playing a role here is that we rarely see “plain vanilla” C18s these days. Although we often don’t know what the details of the stationary phase chemistry are, it is often evident from the column name that, although the phase is referred to as a “C18,” it really is probably some kind of hybrid phase with functionalities other than C18 that are influencing the selectivity of the phase in important ways. Other interesting trends we see here are that there hasn’t been a single new cyano column (CN) introduced to the database in the last five years, and just one pentafluorophenyl propyl phase (PFP), but we may be seeing an increase in the popularity of biphenyl phases, with three new columns coming from three different manufacturers in the last five years.
An interesting question to consider when thinking about the recent additions to the database is, “Which of these new database entries reflects selectivity that was not represented in the database previously, and which of the new entries is redundant in the sense that the new entry is effectively equivalent to another column that was already in the database?” Both types of entry are valuable, of course. Columns that truly reflect a new selectivity that was not represented in the database previously deepen the selectivity pool, giving analysts more options to choose from when looking for a column with different selectivity compared to what they already have. On the other hand, new entries to the database that are effectively equivalent to an existing entry add resilience to the RP column ecosystem. In the event of a supply chain disruption, for example, it is valuable to be able to quickly identify an alternate or “backup” column that can be used in place of the column normally specified for a method, such that a similar separation is achieved, and the results can be used with little to no method development effort. Over the years, I have had several highly experienced LC users tell me about situations like this where the HSM columns database has “saved the day” by quickly identifying an alternative column when they have experienced problems with the normally used column.
To identify “equivalent” columns, Snyder, Dolan, and coworkers have advocated for the use of a “similarity factor,” FS, which is a weighted distance between two columns in five-dimensional selectivity space (4). When FS is calculated this way, columns for which FS < 3 are considered “equivalent,” meaning that they are effectively interchangeable and will produce very similar chromatograms for most applications. Of course, this must be verified experimentally for any particular application, but it is at least a good guideline. Columns for which FS > 100 are considered very different, and looking for such columns may be useful when intentionally assembling a set of columns with very different selectivities. Interpretation of FS values between 3 and 100 depends on the properties of solutes in a particular mixture of interest, and the degree of change in selectivity that is tolerable when trying to identify columns with similar selectivity (or “equivalent” columns). Figure 3 (Figures 3–5 are available online by accessing the QR code at the end of the article) shows a histogram of FS factors calculated for each new entry in the database since 2019. We see that 27 of the new entries are effectively equivalent to an already existing entry, and eight more are very similar to existing entries (3 < FS < 5). We see that the largest FS factor for any of the new entries is just 12, meaning that none of the new entries are very different from the entries that existed prior to 2019.
The original HSM, as developed by Snyder and colleagues in the early 2000s, is still the only model of its kind for which a public database exists. However, my group has been working with Dr. Sarah Rutan over the past few years to think about how we might refine or update the model in light of both the proliferation of non-C18 phases and the much larger data sets now made available since the development of the original HSM. We have made two major steps in this direction. First, in 2021, we published a paper describing what I think of as a refinement of the original HSM, which we now refer to as HSM2 (6). The original HSM was built using retention data from alkyl silica stationary phases, and the fits of that model to data from non-C18 type phases have not been as good as the fits for C18-type phases. In the development of HSM2, we considered retention data for 551 phases, including all phase types. The improvement in the fit of the model for the full spectrum of stationary phase chemistries can be visualized by looking at histograms of the model residuals, which are shown in Figure 4 for both the original HSM and HSM2. What is particularly striking to me in these plots is the dramatic reduction in the number of very poor fits, as measured by residuals (that is, the difference between the experimentally determined selectivity and the model value) greater than 10%. For the original HSM, there are 230 such values, but for the HSM2, this number is reduced by approximately 90% to just 25. Although we do not maintain a public database for the HSM2, the column and solute parameters were published with the paper, so anyone interested in using those values locally for any purpose can do so.
Having realized how much the model could be improved by training it on a larger data set, more recently we began exploring the capabilities of an HSM-like model trained using data from a much larger solute set. We refer to this next iteration of the model as HSM3 (7). Whereas the original HSM was based on data from 16 solutes and several hundred columns, the HSM3 is based on data from 78 solutes and 13 columns. In this work, we were particularly interested to evaluate the ability of such a model to capture the solute- and stationary phase-related factors that contribute to the selectivity needed for isomer separations. Figure 5 shows a dramatic example of this, where we find that differences between the hydrogen bond acidity (a parameter in Figure 5) for two isomers of dinitrophenol, complemented by the hydrogen bond basicities (B parameter in Figure 5) of different stationary phases, leads to dramatic differences in selectivities for the separation of these two isomers. For 12 of the 13 stationary phases studied in this work, the 2,4-dinitrophenol isomer is eluted before the 2,5-isomer, but for one of the phases, the elution order is reversed, and the model predicts this reversal accurately.
We are enthusiastic about what this work shows about the potential for future HSM-like models trained on large retention data sets. These models evidently not only have the potential to predict selectivities needed for challenging separations, but also provide insight into what drives these separations at the molecular level.
In this installment, I have reviewed the basic principles of the hydrophobic subtraction model (HSM) of reversed-phase (RP) selectivity. The HSM and the accompanying free database of column parameters is the single largest publicly available resource for comparing the selectivities of RP columns, which currently stands at over 775 entries. In the past five years, there have been 46 new entries, about half of which are effectively equivalent to already existing entries as measured by the HSM. In parallel with the steady growth of the HSM database, investigators are also considering refinement and evolution of the model itself. In the past five years, two new variants of the model have been discussed in the literature, which we refer to as HSM2 and HSM3. This work provides hints about what the future of these hydrophobic subtraction models might hold, including greater predictive accuracy and insights about the molecular-level drivers for different separations, such as the separation of isomers.
I want to thank Sarah Rutan for providing Figure 5, which was originally presented at the HPLC 2024 meeting in Denver in July of 2024.
Systematic Evaluation of HILIC Stationary Phases for MS Characterization of Oligonucleotides
Hydrophilic interaction chromatography–mass spectrometry (HILIC-MS) offers a flexible and efficient alternative to ion-pairing reversed-phase liquid chromatography (IP-RPLC) for oligonucleotide analysis, with column selectivity and mobile phase pH being key factors in optimizing retention and detection.
AI-Powered Precision for Functional Component Testing in Tea Analysis
October 11th 2024Analyzing functional foods reveals numerous health benefits. These foods are rich in bioactive compounds that go beyond basic nutrition, boosting the immune system and improving overall wellness. However, analyzing these compounds can be challenging. This article discusses AI algorithms to support automated method development for liquid chromatography, simplifying the process, enhancing labor efficiency, and ensuring precise results, making it accessible to non-experts for tea analysis.
Advanced LC–MS Analysis for PFAS Analysis in Eggs
October 11th 2024The European Commission's regulation on maximum levels for certain contaminants in food highlights the need for precise and reliable methods to quantify per- and polyfluoroalkyl substances (PFAS) in various food matrices. This article discusses development and validation of a robust method for analyzing 21 PFAS compounds in chicken eggs using solid-phase extraction (SPE) and liquid chromatography–mass spectrometry (LC–MS).