LCGC North America
Part V of this series takes a closer look at discriminant analysis (DA). Discriminant analysis is a supervised method, meaning that it involves some previous knowledge of your samples.
Part V of this series takes a closer look at discriminant analysis (DA). Discriminant analysis is a supervised method, meaning that it involves some previous knowledge of your samples.
Contrary to principal component analysis (PCA) and clustering methods that we have discussed in the previous parts of this article series, discriminant analysis (DA) is a supervised method, meaning that it involves some previous knowledge of your samples. Your samples (observations) should be initially classified into classes (not involving any form of rank) and should be described by identical variables. While cluster analysis will classify your observations in an independent fashion, only based on the input data you will supply, discriminant analysis will use the classes you will indicate (based on some initial knowledge or assumptions on your observations). For instance, suppose you want to discriminate between cocoa beans (observations) of different geographical origins (classes), based on their molecular composition (variables). Furthermore, cluster analysis provides no explanation as to why the samples should be clustered in the same or different groups. On the other hand, the purpose of discriminant analysis is precisely to define the features that are common to the observations in one class. For example, in the case of cocoa beans of different origins, you may observe that all samples originating from South America area have a higher concentration in one type of molecules than cocoa beans originating from Asia, while other molecule’s concentration will differ very little between samples.
Let us start with a comparison of two classes of food: fruits and vegetables. For the purpose of demonstration, I have chosen five fruits and five vegetables and asked a “testing panel” about their feelings on the strength of taste, sweetness, acidity, perception of inner color, round shape, and the general pleasantness provided by their consumption. For each criterion, I asked my panel to rank fruits and vegetables on a scale of 0 to 10. This process yielded Table I. If we apply a discriminant analysis to Table I, we will obtain Figure 1. Because there are only two classes in this example, only one axis is sufficient to represent the variables and observations (F1 represents 100% variance while F2 represents no variance). What DA does is to show you the features that are common to the samples in each class. In the discrimination of fruits and vegetables, you can see that the fruit group has all the most interesting features of sweetness, acidity, and pleasantness. The strength of taste is close to zero value thus is probably not discriminating between the two groups.
Figure 1: Discriminant analysis of fruits and vegetables, based on Table I.
Let us apply DA to a chromatography problem. This one is taken from a study I participated in some years ago (1). Brazilian cherries were extracted with supercritical fluid extraction in varied operating conditions (pressure, temperature). The extracts obtained were submitted to a trained panel for evaluation of the flavor intensity, and analyzed via gas chromatography–mass spectrometry (GC–MS). Three levels of flavor intensity appeared, and were used as classes for a DA analysis based on peak areas of identified compounds in the GC–MS chromatograms. When three classes are present, two axes (F1 and F2) will represent 100% variance, thus the image obtained in Figure 2 represents all information available. In other words, there is no loss of information related to the data processing. Note that, with more classes (more than three), the proportion of information represented on a single figure decreases, meaning that some part of the information present in the sample set is not clearly represented with a two-dimensional plot (2). Identifying the analytes pointing to the same direction as the class with the strongest flavor will thus help identifying the analytes causing that characteristic flavor. Analytes pointing to the other side of the figure may contribute to improved extraction yields but do not participate in strong flavor (like waxes, for instance).
Figure 2: Discriminant analysis of brazilian cherry extracts, based on peak areas measured by GC–MS (5).
DA is thus an interesting method to define common features among sample classes. I rarely see it employed by chromatographers, but this discussion should have shown you that it certainly deserved some attention.
In the next lesson, we will learn about desirability functions.
Caroline West is an assistant professor at the University of Orléans, in Orléans, France. Direct correspondence to: caroline.west@univ-orleans.fr.
Machine Learning Models Help Researchers Predict the Ages of Ginseng
November 9th 2023To combat fraudulent sales of low-aged ginseng disguised as high-aged ginseng, scientists from Shanghai University of Traditional Chinese Medicine created machine learning models to predict the ages of ginseng samples.
Gulf Coast Conference: John Wasson Discusses Data Retrieval From Chemical Processing Plant Streams
October 25th 2023At the 2023 Gulf Coast Conference, LCGC spoke with John Wasson of Wasson-ECE Instrumentation, who discussed the process of retrieving high-quality, real-time data from chemical processing plant streams. This interview was one of four conducted live at GCC 2023.
The LCGC Blog: Are You Sick of Hearing About ChatGPT?
August 15th 2023Are you interested by ChatGPT's vast potential? Or are you getting tired of reading or hearing about ChatGPT? If the latter, this blog post might give you some tips and tricks on taking advantage of ChatGPT while avoiding its application beyond its intended scope.
Computational Fluid Dynamics Enables Optimization of Membrane Chromatography Device Design
July 7th 2023Researchers have utilized computational fluid dynamics (CFD) simulations to optimize the design of membrane chromatography devices, enhancing fluid flow uniformity and separation efficiency.