Filtering Out False Positives

March 29, 2012

E-Separation Solutions

E-Separation Solutions-04-05-2012, Volume 0, Issue 0

A retention index tool called iMatch that aims to reduce the number of false positive results when using gas chromatography mass spectrometry (GC-MS ) and comprehensive two-dimensional gas chromatography mass spectrometry (GC×GC-MS) has been developed by a team of researchers from the University of Louisville. Alasdair Matheson, editor of LCGC Europe, spoke to Xiang Zhang to find out more.

A retention index tool called iMatch that aims to reduce the number of false positive results when using gas chromatography mass spectrometry (GC–MS ) and comprehensive two-dimensional gas chromatography mass spectrometry (GC×GC–MS) has been developed by a team of researchers from the University of Louisville. Alasdair Matheson, editor of LCGC Europe, spoke to Xiang Zhang to find out more.

What is the rationale behind developing the iMatch retention index tool?

The research in my group focuses on bioanalytical chemistry and bioinformatics. One of our current research projects is to develop comprehensive two-dimensional gas chromatography mass spectrometry (GC×GC–MS) techniques for metabolomics applications. A significant challenge we are facing is the reliability of mass spectral matching-based metabolite identification in GC×GC–MS. For example, more than 200 compounds can be identified from a MegaMix sample (Restek Corp., Bellefonte, PA) which just contains 76 compounds, depending on the threshold of spectral similarity. For this reason, we started to explore the molecular identification methods for one dimensional gas chromatography mass spectrometry (GC–MS) and GC×GC–MS by using a retention index to aid metabolite identification [1], investigating the methods of measuring spectral similarity [2], statistical methods of assessing identification confidence [3], developing methods to calculate the second dimension retention index for GC×GC–MS [4], etc. We started developing iMatch project in February 2010.

What were the main disadvantages with existing approaches to overcome this problem?

The idea of using retention index to aid spectral matching-based identification has been reported in literature for decades. Even though the purpose of using retention index is to reduce the dependency of retention time on the experimental conditions, retention index is still affected by some experimental conditions such as column stationary phase. Unfortunately, this was not considered, or at least not systematically, in the existing approaches of using retention index for molecular identification. The existing approaches use either a fixed variation window or a variable window derived from the standard deviation of all database recorded retention index values of a molecule.

What is novel about your approach?

iMatch was designed to reduce the rate of false-positive identifications. The novelty of iMatch approach is that it aids the spectral matching-based molecular identification by realizing the experimental conditions that affect the magnitude of retention index; categorizing the retention index values of the same molecule based on the experimental conditions; and applying the statistical information of other molecules measured under the same experimental conditions to the molecules that do not have enough statistical information in the retention index database.

What benefits does iMatch offer chromatographers in practice and can you illustrate this with a practical example?

Two factors are usually considered during molecular identification: the rate of false-positives and the rate of false-negatives. In the spectral matching-based molecular identification process, a user defined threshold of spectral similarity is usually used to determine whether a spectral matching case should be considered as a valid identification. A high spectral similarity threshold will induce a high rate of false-negatives while a low spectral similarity threshold will induce a high rate of false-positives. iMatch can help chromatographers to reduce the rate of false-positives based on retention index matching. For example, a metabolite extract from a biological sample was analyzed on a GC×GC-MS system. The threshold of spectral similarity can be set as 0.6, if the value of spectral similarity ranges from 0 to 1.0. The user can specify the experimental conditions including the column type (capillary column, packed column), column class (semi-non polar column, non-polar column, polar column) and GC mode (isothermal condition, temperature gradient, complex condition). Based on the user provided experimental conditions, iMatch software automatically extracts the statistical information, i.e., retention index variation widow, for each individual molecule from its database that was derived from the retention index database such as the NIST 11 GC Method/Retention Index Database (http://chemdata.nist.gov/mass-spc/ri/). iMatch generates two lists: a preserved list and a filtered list. The preserved list contains all identified molecules whose retention index values match the database information and molecules that do not have database retention index information. The filtered list contains all molecules with experimental retention index values do not match the database information and these molecules are considered as false-positive identifications.

Are there any applications that iMatch is particularly suitable for?

The purpose of iMatch is to filter potential false-positive identifications using statistical information of retention index of each individual molecule. Therefore, iMatch should have wide applications. It can be used in any projects using GC–MS or GC×GC–MS platforms for molecular identifications.

Are you planning to develop this work further?

Even though the GC-MS platform has been widely used in petroleum industry, food sciences and biomedical sciences, high accuracy molecular identification remains a challenge in these fields. Currently, the mass spectral matching is the only information for molecular identification in most of the studies. Based on our analysis, the mass spectral matching can only reach about 79.6% of identification accuracy if the top candidates were considered [2]. For this reason, we will continue to explore new algorithms for high accuracy molecular identification. These studies will include developing new spectral matching methods, incorporating chromatographic information for identification, and optimize the weight factors.

Xiang Zhang is a Professor of Chemistry at the University of Louisville, Louisville, KY, US. He is interested in developing bioanalytical and bioinformatics platforms for ‘omics applications.

1. Zhang, J.; Fang, A.; Wang, B.; Bogdanov, B.; Kim, S. H.; Zhou, Z.; McClain, C.; Zhang, X. iMatch, A retention index tool for analysis of gas chromatography mass spectrometry data. J. Chromatogr. A 2011, 1218, 6522-6530.

2. Koo, I.; Zhang, X.; Kim, S. Wavelet- and Fourier-Transforms-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. Anal. Chem. 2011, 83, 5631-5638.

3. Jeong, J.; Shi, X.; Zhang, X; Kim, S.; Shen, C. An empirical Bayes model using a competition score for metabolite identification in gas chromatography mass spectrometry. BMC Bioinformatics 2011, 12:392.

4. Zhao, Y.; Zhang, J.; Wang, B.; Kim, S.; Fang, A.; Bogdanov, B.; Zhou, Z.; McClain, C.; Zhang, X. A method of calculating the second dimension retention index in comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry. J. Chromatogr. A2011, 1218, 2577-2583.