OR WAIT null SECS
The Column spoke to Philipp Weller from the Institute of Instrumental Analytics and Bioanalytics at the Mannheim University of Applied Sciences, Germany, about the benefits of nontargeted screening approaches in food safety and environmental analysis using gas chromatography–ion mobility spectroscopy (GC–IMS), and the best way to approach data processing and model building with the generated data.
Q. Nontargeted screening approaches are gaining in popularity, particularly for the identification of contaminants in the environment or regarding food safety. What advantages does gas chromatography–ion mobility spectroscopy (GC–IMS) offer over alternative analytical techniques?
A: First of all, GC–IMS provides equally high, or even superior, sensitivity compared to GC–mass spectrometry (MS) for polar or medium polar compounds, which are commonly found in volatile organic compound (VOC) fingerprints of fermentations, food stuffs, or environmental samples. In our experiments using GC–IMS, we can often skip sample enrichment procedures, such as solid-phase microextraction (SPME), which is typically not possible for GC–MS—at least not for electron ionization (EI) mode. Since we have GC–IMS–MS prototypes running for most of our experiments, we can directly compare sensitivities between the two. What we found is that for a number of smaller, highly polar molecules, such as carboxylic acids, esters, aldehydes, and ketones, there is around a 10‑fold improvement in sensitivity. This is mainly due to the low-energy ionization, which happens as a secondary effect of the primary 3H ionization of the buffer gas (typically N2) in the form of protonated water clusters. This is, however, also one limitation of the principle because molecules with low proton affinities, such as alkanes, will feature much lower sensitivities than, for example, aldehydes or ketones. As such, it is decisive about the compounds that are relevant when using GC–IMS in contaminant analysis.
The second aspect is the simplicity of both the hardware and the low demand for infrastructure. The IMS cells are extraordinarily robust and simply built, with no complex moving parts, such as turbo pumps, which makes the technology a perfect match for industrial applications or a point-of-care use.
The third aspect is size and weight. IMS cells may be built in a very compact manner, including all required electronics, without having to sacrifice robustness or sensitivity.
Q. One challenge of IMS analysis is interference from widespread ionization, which results in low selectivity. Your paper mentions that the addition of a suitable dopant can overcome these limitations (1). How easy is it to mitigate this issue and do you have any advice when choosing a dopant?
A: In our experience, most of the interference that occurs during
GC–IMS analysis is a result of suboptimal chromatography rather than from IMS ionization; if the user optimizes the GC pre-separation as perfectly as possible, most cases of interferences may at least be minimized. We had the best results in IMS detection when we made sure that the IMS cell did not have to “digest” too much substance signal. However, it is correct that the ionization process is not selective when similar polar substances enter the ion source. This is an issue that may happen in GC–MS as well, yet the problem in IMS is the formation of so-called hetero dimers, which are basically adducts on from two different molecules. This not only complicates signal identification because these hetero dimers will feature different drift times to the pure substance ions but also often leads to nonlinear effects in detection.
The use of dopants is still challenging in routine handling, as the constant dosage is a key aspect. We used a specifically produced calibration gas with 100 ppm nitric oxide (NO) in nitrogen (N2 ) to safely dose the NO into the drift cell, which can be done using a separate electronic pneumatic controller (EPC) valve. This may demonstrate that our prototype setup is still a couple of steps away from routine.
While we used the dopant to demonstrate the use in the separation of charged NO adducts of two terpenes—and this is a remarkable way of using specific ion chemistry as a separation principle—my assumption is that the use of a dopant might be even more useful when applied to nontargeted screening (NTS) workflows. The dopant adds a large number of additional, substance-group-specific signals to the already complex fingerprint, which then can be analyzed by machine learning-based workflows. Since different substances will react differently to the dopant, this may add more sample-specific variance to the data, which is what we are heading for.
So, in short, dopant addition may not be the best choice to overcome an inferior chromatography, but rather a good choice for complex matrices to add selectivity by specific signals to the NTS workflow.
We have not tested any other dopants other than NO yet, but I would assume that ammonia would also work—albeit in a different manner; the choice is limited to gases with a sufficiently high proton affinity, as other than the buffer gas nitrogen, the dopant itself is part of the ion itself. An important consideration obviously would be toxicology, as the IMS cells use relatively high flow rates.
Q. Food fraud is a major concern in modern food markets, with the geographical origin of products coming under intense scrutiny. Your paper (1) contains a very useful table with studies utilizing GC–IMS to combat food fraud and follows that up with a NTS workflow. What advice can you give researchers looking to utilize GC–IMS for nontargeted screening for the first time?
A: The good thing about GC–IMS is that handling wise, it is very similar to GC–MS. So, if you are already using GC–MS for NTS workflows, you can stick closely to what you have done so far—with one exemption. Usually, there is no need for enrichment steps, such as SPME or comparable, simply because GC–IMS is usually sensitive enough for polar to medium polar compounds to be measured directly from the headspace. An important key to the successful use of GC–IMS is to not oversaturate your cell with high analyte concentrations. This is the Achilles’ heel of drift tube ion mobility spectrometry (DTIMS) systems, as upon doing so you’ll leave the path of linear detector response, and that makes things more complicated because you´ll have to deal with nonlinear data analysis.
Q. Your paper also addresses the chemometric approaches that can be paired with GC–IMS. However, the sheer number of techniques can be quite intimidating. How did you initially approach data processing and model building when carrying out this research? And what advice can you offer to those seeking to utilize the same techniques?
A: To be honest, we made every error that a data scientist could make on spectral data, which was obviously frustrating in the beginning. Yet we learned a lot about our data, its potential, and also about its limits. There are a plethora of methods, this is true, but there are a couple of rules of thumb that I would give a beginner. First of all, keep your approach as simple as possible and use principal component analysis (PCA) to get an overview. PCA is often underestimated, but helps so much to get a quick glance of your data and to get an initial idea of what is in there—or what is not. A second rule is a very general one in data science that says “garbage in = garbage out”—if your data is not good because, for instance, your analysis did not work as it should have, don´t use it—it is usually a waste of time. A massively important rule is validation and a critical view on your data analysis: if you are doing supervised learning, such as partial least squares regression (PLS-R) or partial least squares-discriminant analysis (PLS-DA), which would also be my recommendation for a start, do not chose the model that calibrates the best, but the one that predicts the best—so know your figures of merit.
Q. Supervised and unsupervised data analysis and machine learning techniques—are there benefits and downsides to each or can they be used in tandem to ensure accurate conclusions from GC–IMS generated data?
A: The concepts of both approaches are very different, so in short, yes, I would always use both unsupervised exploratory and supervised techniques together. As already said, exploratory analyses tell you so much about your data and also about your analyses, such as systematic errors, retention time or drift time shifts, or getting first ideas about potentially relevant features. Furthermore, PCA can help you judge whether the model will be able to cover the data space in a sufficient manner and also whether the data contains potential outliers by using the T2/Q-plot. In my opinion, the latter is one of the most important figures you should generate.
Supervised learning does a brilliant job when you already have an idea on your sample data and your focus is to either classify or to quantitate latent information. However, it does require much more care when building and validating a model. Furthermore, one has to keep in mind that often class-based models, as it is often done in food authentication, might not be the best choice. Classes should be made up of samples of the same distribution in order for the classification algorithm to properly do its job. But what happens if the suspicious sample that you are analyzing is not part of the distribution you calibrated your model with? Usually the results don´t make sense or the validation fails. In these cases, class‑free algorithms, such as SIMCA, may be more reasonable.
Q. In another recent paper (2), GC–IMS was used for the volatilomic profiling of citrus juices in an authentication study. What challenges arose in the application of the technique for such a study? And what were the conclusions of the paper?
A: We encountered more issues in working with the sample data than we anticipated beforehand (and also those anticipated in the very good discussions we had with our reviewers). One of the most profound ones was that it is absolutely vital to get hold of truly authentic material, which is really a challenge in food authentication in general.
A second issue was a conceptual one. When we began the study, the initial idea was to pair GC–IMS and GC–MS in one machine run to obtain complementary data that could be used for data fusion, which is the term for “merging” the data on a data level to obtain more information than from the individual techniques. IMS and EI-MS seemed like a good pairing, but it turned out that much of the data was not at all complementary, but rather relatively similar in its output. We therefore did not end up with that perfect moment that proved data fusion to be the ideal tool for every question, but it rather showed that it did make a good job for some of the analytical questions—and for some it did not. One of the conclusions of the paper was that data fusion of analytical data is a brilliant strategy, but it is no “fire and forget” technology and still requires optimization for routine use.
Q. Can GC–IMS also be applied for targeted screening?
A: Yes, absolutely. GC–IMS works very well for polar and medium polar compounds—and with notably less sensitivity for apolar compounds in a target analysis type of strategy. An example could be short organic acids in fermentations. A user could even go both ways: a targeted approach for specific marker compounds such as linalool oxides in grapefruit juices, while at the same time using an NTS approach for finding unusual patterns. Sensitivity does not change in IMS whether you focus on individual compounds or on the whole pattern. Yet, in some cases, it might make sense to stay with GC–MS instead. An example of this would be if you´re in need of mass‑to‑charge ratio information because of regulations.
Q. What are you currently working on?
A: We have several projects running: one large one deals with deep learning in authenticity and quality analysis of citrus oils, juices, and coffee; a second one focuses on data augmentation of GC–IMS data by deep learning, to compensate for too low sample numbers; and the third one is machine/deep learning-based bioprocess analysis, out of which we are going to release a Python‑based open-source toolbox for GC–IMS data. This is hopefully going to help other GC–IMS users, who might be less experienced with programming than we are.
Philipp Weller is a full professor at the Institute of Instrumental Analytics and Bioanalytics at the Mannheim University of Applied Sciences. Furthermore, he also heads the competency center for chemometrics and material analysis CHARISMA (www.charisma.hs-mannheim.de). He holds a Ph.D. in food chemistry (2004) from the University of Hohenheim. His research is mainly focused on data‑driven omics approaches based on the multimodal analysis of complex food matrices in the field of quality and authenticity analysis. His research group develops their own specialized machine learning/deep learning toolboxes for high‑dimensional data and data fusion.