Automated Peak Tracking for Comprehensive Impurity Profiling with Chemometric Mass Spectrometric Data Processing

April 1, 2011
Lin Zhang

Gang Xue

Special Issues

Special Issues, Special Issues-04-01-2011, Volume 29, Issue 4
Page Number: 40–44

Screening with multiple orthogonal HPLC methods provides a comprehensive alternative to single method drug impurity profiling with their complementary selectivity. One key challenge of the approach is to track the peaks across the orthogonal chromatograms and identify all unique impurities.

Screening with multiple orthogonal HPLC methods provides a comprehensive alternative to single method drug impurity profiling with their complementary selectivity. One key challenge of the approach is to track the peaks across the orthogonal chromatograms and identify all unique impurities. In this study, we automated the peak tracking by first applying the chemometric component data analysis (CODA) to the electrospray ionization mass spectra to filter out most of the background ion signal, then passing the cleaned-up mass spectra through four sequential decision-making mass ion tests to determine the molecular weights of every peak. Up to 100-fold mass spectrometry (MS) sensitivity gain was achieved with the CODA data preprocessing, which significantly enhanced the peak tracking accuracy. Although the orthogonal screening reduced the time required to obtain a comprehensive impurity profile from weeks to hours, the automated MS peak tracking takes only minutes to interpret all MS spectral data of interest.

Assessment of impurity profiles of active pharmaceutical ingredients (APIs) and drug products is one of the most important fields of activities in pharmaceutical analysis (1). The International Conference on the Harmonization of the Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) impurity guidelines recommend reporting all impurities over 0.05% and identifying structures of impurities over 0.1% in the API at the time of new drug registration (2,3). Degradant reporting and identification thresholds in drug products have been set at 0.1% and 0.2% of API, with maximum daily dose less than 1 g.

Meanwhile, many sources for generation of impurities may arise during the manufacturing of the drug. Any changes during the process optimization may lead to variation in the impurity profile. Thus, repeated tweaks in analytical technique often are required to monitor and control the impurity profile accurately throughout the drug development.

Comprehensive Orthogonal Method Evaluation Technology (COMET)

High performance liquid chromatography (HPLC) is used routinely for impurity profiling in both APIs and formulated drug products. Generally, one single type of column is chosen at random and the analysts have to go through tedious method development processes, adapting chromatographic parameters such as mobile phase, pH, buffer, and column temperature before achieving a suitable method for the lot of drug to analyze. This single HPLC trial and error approach usually is very time-consuming and labor-intensive, requiring significant manual data evaluation, including peak tracking. The continued method reassurance and optimization to keep up with the process evolution, along with the need to track and compare impurity profiles from different batches, further complicates the task.

A more efficient approach would be science-based generic screening, which is based on either a matrix screening between different pH and stationary phases or orthogonal screening from a set of carefully picked generic methods with very different selectivity. The matrix screening covers every combination of the selected wide range of pH and stationary phase. Thus, this strategy may reveal compound retention behavior trends and provide guidance in final method optimization (4). However, because of the required number of runs, it could be very lengthy. The orthogonal screening, on the other hand, has the similar advantage of covering a wide range of pH and stationary phases with a very limited number of experiments.

With this idea, we developed the comprehensive orthogonal method evaluation technology (COMET) approach as we reported earlier (5). By means of extensive column screening and orthogonality evaluation by geometric factor analysis, seven mass spectrometry (MS)-compatible HPLC methods were chosen carefully for the screening system, each having both high resolving power and unique selectivity. The method orthogonality provided the basis for a comprehensive characterization of impurity profile; that is, the dramatic selectivity differences ensured the method set was capable of separating a wide range of compounds with different physiochemical properties, and each method offered complementary chromatographic information about the sample. With all information carefully compiled and pooled together, it illustrates a comprehensive impurity profile of almost any drug with high accuracy.

As an example, Figure 1 shows the COMET screening results of a spiked Pfizer proprietary drug sample PF#2. The number of chromatographic peaks detected in the seven HPLC methods ranged from three to six because of coelution or over-retaining in some of the methods. However, because of the orthogonality nature of the method set, the retention pattern was very different from method to method. Component A was eluted first in M1, M2, and M5, but it was the last peak in M8. Components B and C were coeluted in method M1, but were well separated in all other methods. The same was true for components A and E, which were coeluted in M3 and M6, but were pulled well apart in M1 and M4. Components B and D weren't eluted in M5, but were well detected in M3, M4, M6, and M8. In the end, the COMET screening clearly illustrated six components in the spiked drug sample, each detected as a clean single peak in one of the methods.

Figure 1: COMET separations of spiked drug sample PF#2 (UV detection at 220 nm). Coeluted analytes are indicated by parentheses. The determined molecular weights for each analyte are: (A) 261 Da, (B) 447 Da, (C) 465 Da, (D) 447 Da, (E) 463 Da, (F) 463 Da. (Reprinted with permission from G. Xue and colleagues, J. Chromtogr. A. 1050, 159 [2004] © 2004 Elsevier Science Ltd.)

Dozens of drug samples from various therapeutic areas have been screened by COMET since its deployment. It successfully illustrated the impurity profiles for every sample with no need to modify any chromatographic parameters, despite the variety of the compounds. New lots of drug from an evolved synthesis process were re-screened for the emerging of any new impurities or dropout of previously identified impurities, with no need for any method redevelopment. Again, the method orthogonality assured detection of such changes in the impurity profile.

Automated Peak Tracking

The automation of the multicolumn screening was very straightforward with the established column-switching technologies. However, data processing became a bottleneck because of the pseudomultidimensional separation and the use of an MS detector. Accurate tracking of individual components in multiple runs was the key to compile the complementary chromatographic and spectroscopic information to establish the comprehensive impurity profile.

In-house, we developed integrated system control and data processing software with the core performing the automated peak tracking and reporting. Instead of using the unreliable peak tracking by UV spectra matching, we chose the compound molecular weights detected by mass spectrometer to identify each peak in the chromatograms. The software automatically interpreted the mass spectra by passing every spectrum of interest through a series of mass ion rejection tests to identify the quasimolecular ions, which were indicative of the molecular weights of each compound. (Please refer to reference 2 for detailed explanation of the algorithms.) Coeluted compounds generally could be differentiated by the slight retention time difference in their mass ions' extracted ion chromatogram (EIC). Peak tracking, therefore, simply became assigning annotations for each molecular weight, for example, giving each compound an alphabetic designation starting with "A" as shown in Figure 1. Isomer peaks, though, need to be differentiated further by their relative peak area.

Chemometric Data Preprocessing with Component Data Analysis (CODA)

During our screening of about 70 compounds from various therapeutic areas, which contained over 700 chemical entities, the MS-based automated peak tracking proved its accuracy and efficiency in quickly interpreting the mass spectra and identifying every peak the MS system was capable of detecting. It reduced the required multiple days of manual processing time to minutes. With no manual intervention from initiating the run until the printed peak tracking reports, the fully automated screening system greatly freed up the analysts for more creative research work.

As we screened more and more compounds, one deficiency emerged in the MS-based peak tracking software. About 18% of the low level impurities that were detected clearly in the UV chromatogram could not be tracked because of the lack of MS sensitivity, which accounted for more than 90% of the peak tracking error. Some of the impurities were difficult to ionize by electrospray because of their nonpolar nature. But in general, pharmaceutical compounds are polar and MS is supposed to be more sensitive than UV absorbance.

The reason behind the apparent contradiction is the MS detection mode. MS has superb sensitivity in its single ion monitoring (SIM) mode because it very selectively detects mass ions of only one mass-to-charge ratio (m/z). Thus, the background noise remains very low. Unfortunately, with no information about the sample in such screening system, the mass spectrometer has to be operated in the scanning mode, which monitors a range of mass ions. The resulting total ion chromatogram (TIC) contains a high level of chemical noise mostly from LC mobile phase and buffers. As an example, Figure 2 shows the screening result of the mixture of four commercial drugs, each at 2.5 µg/mL. The UV detection at 254 nm clearly shows three peaks of clenbuterol, omeprazole, and (S)-6-methoxy-α-2-naphthalene acetic acid. The fourth compound, ibuprofen, has a small peak at 21.87 min, but still sufficient for quantitation. In comparison, on the TIC plot, clenbuterol and omeprazole are barely recognizable close to their detection limit, while the other two drugs are buried completely in the high background noise.

Figure 2: LC–MS separation of commercial drug mixure. (a) UV detection at 254 nm. (b) TIC. A: clenbuterol, B: omeprazole, C: (S)-6-methoxy-α-2-naphthalene acetic acid, D: ibuprofen.

The iso-plot gives more insight of the MS data as shown in Figure 3. Each horizontal section in the 2D plot is actually an EIC for a particular m/z value, while the perpendicular slice is the mass spectrum at certain time point. Although the gray scale contrast could not accurately show the intensity difference, a number of high intensity streaks and some discreet dots could be observed in the plot. Those streaks are from the high level chemical noise, such as mass ions at m/z 102, 291, and 352. They share the common appearance of constantly high intensity with gradual variation. The discrete dots are notable for their close to zero baseline for most of the time span and a good Gaussian shape peak when the chemical entity is eluted. These discrete dots are the actual components of interest, such as the designated spots of A, B, C, D. Despite the noisy appearance of the TIC, the individual EICs of those discrete dots are generally of high quality. The challenge, then, is how to extract the dots from the high intensity streaks without knowing the m/z value.

Figure 3: MS iso-plot of drug mixture separation in Figure 2. A: Mass ion of clenbuterol; B: mass ion of omeprazole; C1,C2: mass ions of (S)-6-methoxy-α-2-naphthalene acetic acid; D1,D2,D3: mass ions of ibuprofen.

In 1996, Windig (6) developed a chemometric algorithm named component detection algorithm (CODA), which applies two simple operations to each EIC: smoothing and mean-centering. The smoothed and mean-centered EIC is compared then with the original EIC and a similarity score is calculated. The high intensity streaks would have a bias against the raw EIC after the mean-centering because of the consistent high background, thus a low score. The smoothing identifies the spike artifacts, which are generally one data point wide and would have significantly reduced peak height after smoothing. The EICs of interest, which generally had zero baselines and contained nicely shaped peaks of more than 20 data points wide, would have little change after the two operations and high similarity score. By setting up a cut-off similarity score, the CODA could reject the high intensity background ions and spikes quickly and significantly reduce the number of mass chromatograms without losing information.

Figure 4: Reconstructed TIC after CODA data processing and COMET assigned molecular weights. (* = blank related system peaks.)

We applied the CODA algorithm as preprocessing step of the COMET data. The EICs passing the CODA with similarity scores greater than 0.9 are summed up as the reconstructed TIC (Figure 4). Comparing with the original TIC, sensitivity is enhanced amazingly. All four drug peaks are of very good S/N with very little background. The mass spectrum of ibuprofen before and after the processing are shown in Figures 5a and 5b. The original spectrum is dominated by the m/z 102 and 352 ions, which are background ions from the mobile phase. Even for the experienced mass spectroscopist, it is almost impossible to identify the quasimolecular ion (m/z = 207), which has the intensity of 0.6% relative to the most abundant ion. However, after the CODA processing, all interfering mass ions are filtered out, resulting only in a very limited number of ions, including m/z 161, 207, 248, and their isotopes. The COMET software further passes the preprocessed mass spectrum through the mass ions tests and easily assigns molecular weight of 206, identifying m/z 207 as the quasi-molecular ion, with m/z 248 being the acetonitrile adduct and m/z 161 as a fragment. After repeating the same processing for all seven chromatograms, the software successfully tracks the four drug compounds in the orthogonal screening and generates impurity profile report.

Figure 5: Mass spectrum of ibuprofen eluting at 21.79 min. (a) Original spectrum acquired in scanning mode. (b) Mass spectrum after CODA processing.

Table I lists the detection limit (LOD) comparison of UV, TIC, and CODA-processed TIC. Consistent with the chromatograms shown in Figure 2, the original TIC shows much worse LOD than UV for all four compounds. The ibuprofen is not detected at all, even at the highest spiked concentration at 100 µg/mL. The CODA processing, though, improves the MS LOD by over 100-fold. The resulting LOD is slightly better than that of the UV. With the nominal API concentration usually set at 0.2–0.5 mg/mL, it is then well capable of detecting and tracking impurities at 0.05% level, as required by the ICH guideline.

Table I: Summary of UV, TIC, and CODA-detection limit for the commercial drug mixture (UV detection at 254 nm; ND: not detected)


COMET, the orthogonal screening system, is proved as a systematic and efficient approach for comprehensive impurity profiling. The automated MS-based peak tracking greatly reduced the required data processing time to compile the complementary chromatographic and spectral information. The issue of lacking MS sensitivity in the scanning mode is addressed successfully by the CODA data preprocessing, which enhances the signal-to-noise ratio for over 100-fold. The resulting fully automated impurity profiling system is capable of reliably detecting and tracking impurities at 0.05% level as posed by ICH guideline.

Gang Xue and Lin Zhang are with Pfizer Global Research & Development, Groton, Connecticut.


(1) S. Gorog et al., Talanta 44(9), 1517–1526 (1997).

(2) ICH Quality Guideline Q3A(R), in Impurities in Drug Substances (Revised) (2002).

(3) ICH Quality Guideline Q3B(R), in Impurities in Drug Products (Revised) (2003).

(4) R. Snyder et al., HPLC Screening and Method Development Opportunities, Forum on Method Development Strategies for the 21st Century: Boston (2004).

(5) G. Xue et al., Journal of Chromatography A 1050(2), 159–171 (2004).

(6) W. Windig, J.M. Phalp, and A.W. Payne, Anal. Chem. 68(20), 3602–3606 (1996).

Related Content: