FT-IR Search Algorithm – Assessing the Quality of a Match - - Chromatography Online
FT-IR Search Algorithm – Assessing the Quality of a Match


Special Issues
Volume 27, Issue 8, pp. s26-s33

The beginning of the age of Fourier transform infrared (FT-IR) spectroscopy meant the availability of digital spectra and opened the possibility of using computers to compare a single spectrum against a reference database containing thousands of spectra, thereby allowing enormous efficiency gains in the comparison of unknown spectra to reference materials. Various algorithms can be used to create a hit quality index (HQI), which is a measure of how well the query spectrum compares against each reference spectrum. However, HQI does not tell the whole story and specifically does not tell us much about the quality of the match between query and reference spectra. In a ranked list of database hits, the difference, or gap in the HQI between two successive hits appears to be a good indicator of the quality of a match. The presence or absence of a significant gap between the first two or more hits has implications for match quality. While intuitively we might consider the highest ranked hit to be the "best" match, several similar HQI scores followed by a significant gap can mean a cluster of hits that are similar but not exact matches. This article looks at the possibility of using an assessment of the gap between hits to determine the quality of a match, what represents a significant gap, and when this assessment can fail.

Spectral searching is a tool commonly used to identify unknown materials and is occasionally used to help classify or interpret unknown materials. Algorithms are used to make a comparison between the unknown spectrum and each spectrum in the reference database. The algorithms return a number called the hit quality index (HQI) and the results are ranked by their HQI. Different software packages use different numbering systems for their HQI, even when the same algorithm is used. For example, the Euclidean distance search generates a HQI of 0.0 for the best possible match and the square root of 2 for the worst possible match. Many companies rescale these numbers to make 100 represent the best possible match and 0 equal to the worst match. For this article, we will use 100–0 as our scale, with 100 being the best possible match. How the algorithms work is not part of this article, but the nature of the algorithms means that each reference spectrum matches the unknown spectrum to some degree. There is no chemical intelligence built into the algorithms; they simply generate an HQI for each comparison and rank the results by HQI. This means there is always a best match, regardless of whether the unknown material is represented in the database or not.

Because we always have a number one match, the presence of a number one match is of no value in determining the quality of a match. To determine the quality of a match, we have relied on either the actual value of the HQI or on a visual comparison of the sample and reference spectra. There are potential problems with both methods. The actual value of the HQI can be misleading because there are a number of factors (1) that can significantly reduce the value of the HQI. Common factors include baseline problems, purge problems, and the presence of other components in the sample. In addition, a high HQI value is not necessarily an indication of an exact match because there is always the possibility of several compounds in the database that are similar to the unknown, but not an exact match. Those who compare the sample and reference spectra can also be misled, particularly if they see the first hit does a good job at matching their unknown and they fail to look at any additional hits. There is always the possibility that you have a good match, but if there are several similar compounds in the database and they all match well, then the match is more likely to mean the unknown has been classified rather than having found an exact match.

To evaluate any search results we should always compare several spectra from the top matches to our unknown spectrum.

This article will look at another measure to evaluate the search results, specifically the gap or difference between successive HQIs.

Experimental

Databases used include ST Japan's Aldrich/ICHEM complete ATR FT-IR library (36,639 compounds) and EPA-NIST Vapor Phase library (5228 compounds).

Sample spectra were acquired from various sources. The search software used was ACD/Labs Spectrus Optical Workbook and UVIR Manager.

Algorithms used were the Euclidean distance and first derivative Euclidean distance.

Each IR spectrum was searched by Euclidean distance and by the first derivative Euclidean distance. A total of 18 spectra were searched resulting in 36 total search results. Of the 18 spectra, eight were pure compounds run in a laboratory, six were vapor-phase spectra, and four were mixtures. The mixtures were created by digitally adding two spectra together.


ADVERTISEMENT

blog comments powered by Disqus
LCGC E-mail Newsletters
Global E-newsletters subscribe here:




 
Survey
If you are you planning to attend HPLC2013 Amsterdam what is your main reason?
If you are you planning to attend HPLC2013 Amsterdam what is your main reason?
Sharing new techniques
Viewing the latest products and applications
Networking
Learning
Sharing new techniques
36%
Viewing the latest products and applications
27%
Networking
9%
Learning
27%
View Results
Source: Special Issues,
Click here