General Linear Modeling Applied as Proof-of-Concept for Expert Algorithm for Substance Identification

Published on: 

As researchers demonstrated in a second recent report, their devised method was effective for identification of cocaine, but there is no consensus as to a “best” algorithm for all necessary scenarios.

Scientists at West Virginia University have further investigated the potential usefulness of an expert algorithm for substance identification (EASI) in mass spectrometry (MS) analysis of cocaine (1). This report is the first in a two-part series, both studies published separately in the Journal of the American Society for Mass Spectrometry, and the second half of which was already recently covered by LCGC.

For their experiment, the researchers used a training set of 128 replicate cocaine spectra from one crime laboratory as the foundation for GLM, proceeding to apply those models to the 20 most abundant fragments of cocaine (1). This method was subsequently tested on 175 additional cocaine spectra from various crime laboratories and 716 known negative spectra, including 10 spectra of three diastereomers of cocaine.

To evaluate the accuracy and reliability of EASI, the team used conventional measures—among them the mean absolute residual and the National Institute of Standards and Technology (NIST)’s spectral similarity score—to assess similarity or dissimilarity of the spectra between measured and predicted abundances (1). GLM predictions were then made and compared to the traditional exemplar approach, which relies on the average of the cocaine training set as the consensus spectrum for comparisons.

The unsupervised models using EASI achieved a true positive rate of over 95% for cocaine identification, with a false positive rate of 0% (1). Moreover, a supervised binary logistic regression model utilizing EASI-predicted abundances, of just four peaks at m/z 152, 198, 272, and 303, achieved 100% accuracy and reported no errors.

Regardless of the measure of spectral similarity used, the researchers found that the error rates, or lack thereof, for identifications using EASI consistently outperformed the traditional exemplar/consensus approach, indicating that EASI’s potential to analyze mass spectra could be more reliable than conventional methods such as Mahalanobis distances (1).

The researchers went on to discuss further implications for this algorithm within the field of forensic science, because the approach not only showed promise to enhance identification of cocaine in crime laboratories, but also to perhaps streamline the overall process, allowing investigators to analyze complex samples confidently and efficiently (1). By providing highly accurate results with minimal error rates, EASI could have the potential to significantly impact forensic investigations that require substance identification.

The team was surprised by how robust modeling turned out to be.


“Oftentimes, when a dataset has many strong covariates, modeling can become unstable because it’s easy to swap variables in and out for one another,” said Glen Jackson, professor of forensic and investigative science at West Virginia University. “Our unpublished work currently uses bootstrapping to show that the models are surprising robust, even when training sets derive from completely different instruments.”

EASI’s demonstrated precision and reliability point to its applications in conjunction with mass spectrometry in other scientific and industrial fields such as quality control, pharmaceutical research, and environmental monitoring, according to the authors (1). Moreover, the success of EASI in this case showcased the power of combining GLM and binary classifiers, leveraging the combination of the two to surpass the accuracy of traditional methods and opening doors for deployment in other areas of research and data analysis.There is still room to develop EASI further.

“Our approach of using stepwise general linear modeling arrives at the simplest possible linear equations that best explain the variance within a training set of replicates. It’s possible that non-linear regression could provide slightly better models,” Jackson said.

To move this algorithm forward, Jackson said adding more visuals could help practitioners see the ease of implementing EASI. A commercial partner to help implement the algorithm into easy-to-use software would help EASI further, he added.

“That way, analysts could just submit their query spectrum to the algorithm and see the confidence of a specific identity. The confidence would derive from a large database of known positives and negatives and the fitness of the EASI model to that specific query spectrum,” Jackson said.


(1) Jackson, G. P.; Mehnert, S. A.; Davidson, J. T.; Lowe, B. D.; Ruiz, E. A.; King, J. R. Expert Algorithm for Substance Identification Using Mass Spectrometry: Statistical Foundations in Unimolecular Reaction Rate Theory. J. Am. Soc. Mass Spectrom. 2023, 34 (7), 1248–1262. DOI: 10.1021/jasms.3c00089