OR WAIT 15 SECS
Is that peak “pure”? How do I know if there might be something hiding under there?
Daniel W. Cook1, Sarah C. Rutan1, C.J. Venkatramani2, and Dwight R. Stoll3, 1Virginia Commonwealth University (VCU), Richmond, Virginia, USA, 2Genentech USA, San Francisco, California, USA, 3LC Troubleshooting Editor
Is that peak “pure”? How do I know if there might be something hiding under there?
In part 1 of this series we discussed how the peak purity tools commonly provided in chromatographic data system software could aid in the detection of impurities in liquid chromatographic analysis (1). Here, we go one step further, and explore how a class of chemometric techniques known as curve resolution methods can be used to differentiate between a target compound and impurities, and subsequently quantify them, even when their peaks are overlapped.
As in the previous instalment (1), we focus on diode-array detection in liquid chromatography (LC–DAD). While mass spectrometric detection undoubtedly gives more selective information in the vast majority of cases, it is clearly a more complex detection mode and is prone to effects that can hamper quantitation such as ionization suppression because of matrix effects. The potential for highly precise quantitation of low-level impurities using DAD data is actually quite good, provided the spectra of the impurities have significantly different spectroscopic signatures as compared to the main peak. The latter point is of course an important caveat.
Multivariate Curve ResolutionâAlternating Least Squares
In part 1 of this series we discussed the power of utilizing all of the absorbance information provided by a diode-array detector at multiple wavelengths to assess peak purity (1). Chemometric curve resolution techniques take this one step further. These techniques analyze the matrix of absorbance measurements at all wavelengths (that is, spectra) at all time points across a given time region of the chromatogram. Using a regressionâbased approach to determine how the spectra change over time, any impurities cannot only be discovered, but also be mathematically resolved from the target peak.
Here we illustrate one of the most popular curve resolution techniques, known as multivariate curve resolutionâalternating least squares (MCR-ALS) (2–6). The basis for this technique is a multicomponent formulation of Beer’s law given as:
where Aλ represents the measured absorbance of a mixture solution at wavelength λ, b is the detection pathlength, ελ,X and ελ,Y represent the molar absorptivities at this wavelength for two chemical species X and Y, and cX and cY represent the concentrations of these species in the solution. For a twoâcomponent mixture, if absorbance measurements are obtained at two different wavelengths, and the molar absorptivities are known, it is possible to solve for the concentrations of the two species, X and Y, in the mixture solution via simple algebra. If measurements at more than two wavelengths are available, least squares regression is needed to obtain the concentrations. It is important to note that the assumption that the two (or more) signals are linearly additive is only valid in cases where the total signal is within the linear range of the detector (for example, at signals less than about 1500 mAU with DAD).
At this point, we generalize the discussion to a measurement x, and consider this as a signal in an LC–DAD chromatogram, such that the variable xi,j refers to the absorbance at the ith time point and jth wavelength of the chromatogram. Additionally, we consider the possibility that more than two chemical species may be present in the sample within the chromatographic peak, which gives the following expression:
Here, ci,n refers to the concentration of species n at the ith time point in the chromatogram, and sn,j refers to the molar absorptivityâpathlength product for species n at the jth wavelength. The full spectrochromatogram can be easily understood in terms of a matrix product. In matrix notation, equation 2 is commonly written as
where the rows and columns of matrix X represent the absorbance at each wavelength and time point, respectively, and the superscript T refers to the matrix transpose. This concept is illustrated schematically in Figure 1. If the molar absorptivities are known at all measured wavelengths for all species present in the peak, then it is straightforward to solve for the resolved chromatograms, C, as follows:
where the superscript † indicates the pseudo inverse operation. Equation 4 is simply a linear regression equation in matrix format. The columns of C are the individual component chromatograms (that is, each compound plus any background contributions), and the rows of ST are the individual component spectra.
While in theory this approach could be a means of resolving overlapped chromatographic peaks, if there are unknown impurities present or uncharacterized mobile phase background components or species, then we do not have enough information to specify the S matrix. The MCR-ALS technique then becomes quite useful in this regard. Rather than exactly specifying S, an initial estimate for S is provided to the regression. This initial estimate can be obtained in a number of different ways. Pure variable methods are frequently used for this purpose. These methods seek to find the N most different spectra from the chromatographic data matrix, X, where N is the number of components needed to describe the measured data. The principle is that the most different spectra in the matrix are likely to be similar to the underlying pure component spectra. The caveat is that the number of components must be set by the user. Methods have been proposed for selecting the correct number of components such as scree plots; however, the only reliable method is evaluation of the results for multiple values of N. For a simple impurity screen, running MCR-ALS with two and three components to start should suffice, as one component would represent background, one would represent the target analyte, and if a third component is necessary, it is most likely because of an impurity peak.
Once this estimate for S is obtained, equation 4 is used to solve for the chromatographic profile matrix, C. Because the matrix S is only an approximation, C will only be an approximation as well. MCR-ALS can be considered an optimization method in which these C and S matrices are continuously improved with the goal of accurately representing the true underlying chromatographic and spectral profiles of each component. The power of MCR-ALS lies in the judicious implementation of constraints on the C matrix (and in subsequent steps, the S matrix as well) during this optimization. One frequently applied constraint is nonânegativity, which allows the user to force the chromatographic profiles contained in C to have only positive values (6,7). Another constraint is unimodality, which forces each individual species chromatogram to exhibit a single peak (7). Many other constraints have been developed for MCR-ALS, but they are too numerous to describe here. Once C is constrained appropriately, the spectral matrix is updated via linear regression using equation 5:
Now, constraints can be applied to this S matrix as well; non-negativity is frequently used in this case too. By updating the S and C matrices in an alternating fashion (that is, equations 4 and 5), interspersed with the application of constraints, the final solutions for C and S will contain the pure component profiles of the individual chemical species within the chromatographic peak.
Application of MCR-ALS
We illustrate this approach using the chromatographic peak that was analyzed in part 1 of this series (1). Figure 2(a) shows the chromatographic peak, and Figure 2(b) shows the contour plot of the matrix X. We first applied a pure variable method (in this case the pure method in the Barcelona MCR-ALS toolbox, based on the SIMPLISMA algorithm [8–10]), and selected the three most different spectra within the spectrochromatogram. The corresponding time points are shown as circles in Figure 2(a), and the three spectra at these points are shown in Figure 2(c). It is likely that the spectrum shown in green represents a background spectrum, because it corresponds to a spectrum appearing in the baseline (green circle at 9.77 min in Figure 2[a]). After these initial estimate spectra are submitted to MCR-ALS, it should allow the algorithm to estimate the background contribution to the data, as well as the chromatographic peaks for each chemical species present within the profile.
The results for MCR-ALS analysis of this peak using these spectra for initial estimates are shown in Figure 3. Two peak shape responses within the chromatogram are resolved as shown in Figure 3(a). These are two of the components contained in the matrix C, corresponding to two chemical species (peaks shown in blue and red), and a background contribution from the mobileâphase gradient shown in green. The normalized spectra contained in matrix S, which correspond to these species or contributions, are shown in Figure 3(b). Note that the nonânegativity constraint has been applied to the components corresponding to the real chemical species (shown in red and blue), while the background component (green) was not constrained. This flexible application of constraints leads to a powerful algorithm for curve resolution.
Quantitation with MCR-ALS: A natural limitation of the MCR-ALS algorithm in this case is that there generally are multiple mathematical solutions that satisfy equation 3. Constraints are used to limit the possible solutions, but this generally does not provide a unique, chemically valid solution, especially when using MCR-ALS to analyze a single chromatogram, as described above. An extension of the MCRâALS technique to analyze multiple chromatograms simultaneously is quite powerful in this regard, especially for quantitative analysis. In this approach, the analyst runs a series of calibration sample mixtures with varying concentrations of the target analytes, and obtains chromatograms for test samples with unknown concentrations of the target analytes. Because MCRâALS resolves signals resulting from individual chemical species, these calibration solutes are not required to be individual standards and can, in fact, be mixtures of the compounds of interest, minimizing the number of calibration samples that need to be analyzed. These measured spectrochromatograms are appended together along the time axis to form an augmented matrix X as follows:
where the Xc are the L calibration chromatograms and the Xu are the M unknown chromatograms. MCRâALS is carried out similarly to the approach described above. The resulting S matrix still consists of the N spectra of the pure component species, but the resulting C matrix now consists of L + M resolved chromatograms for each of the N species, appended together similarly as shown in equation 6. The resolved chromatograms and spectra for a dataset of five calibration standards, C1–C5, and one unknown, U1, are shown in Figure 4 (that is, L = 5; M = 1). The table above the figure shows the known concentrations of the standard mixtures, and it can be seen that the scaled peak intensities in the chromatograms (Figure 4[a]) are proportional to these concentrations. By integrating these resolved chromatographic peaks, calibration curves can be constructed, as shown in Figure 5.
A clear advantage to handling multiple chromatograms simultaneously is that calibration information and estimates of unknown concentrations can be obtained very efficiently. Another advantage is the potential to add additional constraints to the analysis, which further limits the possible solutions for C and S. For example, if a blank chromatogram is included in the data set, the contributions of the chemical species for this chromatogram can be set to zero forcing the blank to be modelled using only the background components. Additionally, calibration constraints can be added to the analysis, which constrain the peak areas for the calibrated samples to follow an expected relationship between detector signal and concentration (11–13).
Of particular note here is the fact that two compounds present in the unknown sample have been reliably quantified, despite the resolution between the two peaks being significantly less than 1, and a high degree of similarity between their spectra. Here the chromatographic resolution of the two peaks is approximately 0.6.
Peak Capacity Enhancements via MCR-ALS
The performance of the MCR-ALS algorithm is highly dependent on the similarity of the spectra of the species contributing to the overlapped peak, as well as the signal-to-noise ratio (S/N) of the peaks. Here the similarity of the spectra for the two analytes psoralen and angelicin can be expressed by the correlation coefficient, which is 0.98 (see part 1 for further discussion).
The improvement of effective chromatographic performance can be quantified in terms of the peak capacity of the separation. The peak capacity of a gradient separation, nc, can be estimated as follows:
where tgrad is the time of the gradient, and wb is the average width of the peaks at the base. The RsÎ term is the resolution required for effective quantitative analysis (14). Typically, chromatographers use an RsÎ value of 1 when calculating peak capacity. Clearly, if peaks can be quantified at a resolution of less than 1 using curve resolution as discussed above, then the effective peak capacity has been increased. In recent work, we have developed a quantitative relationship between peak capacity and the signal-to-noise ratio of neighbouring peaks and spectral similarity as measured by correlation coefficient. As an example, if the correlation coefficient between the overlapped spectra is 0.89 and S/N is 50, the chromatographic resolution required for quantitation is RsÎ = 0.3. This results in a roughly threefold improvement in peak capacity relative to conventional use of DAD where the only means of separation is that provided by the column itself. Clearly, MCR-ALS can provide a significant enhancement in chromatographic method performance.
Availability of MCR-ALS in Software Packages
One hurdle to widespread usage of MCR-ALS is the lack of implementation of curve resolution options in commercial chromatographic data systems. Although commercial data systems for spectroscopy instruments (for example, infrared) frequently provide MCR-ALS or related curve resolution tools within their software, this situation is as not common for chromatographic data systems. To the best of our knowledge, only Shimadzu has recently added this capability to its data system software (15). The other option for chromatographers wishing to apply these methods to their data is to use one of the many available MCR-ALS toolboxes available for use in the Matlab programming environment. Eigenvector Research, Inc. sells its PLS Toolbox package, which includes MCR-ALS (16). Matlab toolboxes are freely available from the Barcelona MCR-ALS group (10,17) and the Olivieri group (18), with the latter toolbox specifically focused on calibration applications. The Olivieri and Barcelona MCR-ALS toolboxes are also available for users without access to Matlab through a stand-alone graphical user interface (17,18). There is also an ALS package available for the open-source R statistical software environment (19).
Because of the lack of integration with instrumental software, an extra step is required to export the raw spectrochromatogram and read it into the third-party software packages listed above. Unfortunately, this approach is not always straightforward, depending on the instrument software. Although a few extra minutes may be required to move the data and to analyze with the third-party software, it will often require less time than it would take to analyze samples using different chromatographic columns or to vary other method parameters to resolve impurity peaks and increase confidence that none are present.
To those of us who have utilized MCRâALS for chromatographic analyses, it is clear that this technique adds a powerful tool to the chromatographer’s arsenal. While the peak purity approaches described in part 1 of this series can identify whether impurities are present, MCR-ALS can resolve the pure chromatographic profile, allowing quantitation of the target analyte and the impurity if standards are available for the compound. As mentioned earlier, MCR-ALS does require that compound spectra be at least slightly different; however, MCR-ALS is able to distinguish compounds with even small differences in spectra given a large enough S/N as shown in Figure 3.
Here we have limited our discussion to impurity analysis in LC–DAD; however, it is worth noting that MCR-ALS finds use in many other analyses such as metabolomics and environmental analyses as well as other instrumental techniques from hyperspectral imaging to LC with mass spectrometric detection to twoâdimensional liquid chromatography (3,4,20,21). The latter will be the focus of the next instalment in this series where we will look at how the additional separation dimension can help in the quest to determine peak purity particularly when spectrally indistinguishable impurities are present.
Dwight R. Stoll is the editor of “LC Troubleshooting”. Stoll is an associate professor and co-chair of chemistry at Gustavus Adolphus College in St. Peter, Minnesota, USA. His primary research focus is on the development of 2D-LC for both targeted and untargeted analyses. He has authored or coauthored more than 50 peerâreviewed publications and three book chapters in separation science and more than 100 conference presentations. He is also a member of LCGC’s editorial advisory board. Direct correspondence to: LCGCedit@ubm.com
Sarah C. Rutan is a professor of chemistry at Virginia Commonwealth University (VCU), in Richmond, Virginia, USA, where she has been on the faculty for 33 years. Her research spans a broad range of areas in analytical chemistry and chemometrics, and is currently focused on the development of chemometric methods for improving chromatographic analyses, especially comprehensive 2D chromatography. She has more than 100 publications and numerous presentations on these topics.
C.J. Venkatramani is a senior scientist at Genentech USA and has more than 15 years experience in the pharmaceutical industry. He was the key member of Genentech technical team instrumental in the successful launch of gRed’s first small molecule Erivedge, leading from development to commercial. His areas of interest include multidimensional chromatography and ultratrace analysis of genotoxic impurities.
Daniel W. Cook is a postdoctoral fellow in the pharmaceutical engineering laboratory within the Chemical and Life Science Engineering Department at Virginia Commonwealth University in Richmond, Virginia, USA. In this role, he serves as the primary analytical chemist for the research efforts of the Medicines for All Institute. He received his BS from RandolphâMacon College in 2011 and his Ph.D. in 2016 from Virginia Commonwealth University for his work focusing on the development and application of chemometric techniques to chromatography, particularly comprehensive 2D-LC.