The Value of Chemometrics and Experimental Design to Analytical Chemists

June 14, 2016


I believe that most analytical chemists do not appreciate what the world of chemometrics and experimental design statistics can add to their work.

At the 40th International Symposium on Capillary Chromatography (ISCC) (held concurrently with the 13th GCxGC Symposium) last month in Riva del Garda, Italy, Professor Robert Synovec from the University of Washington was presented with the Marcel Golay Award for his work combining multidimensional gas chromatography (MDGC) with chemometrics. The award was created to honor the pioneering work on GC capillary columns by Marcel Golay, including studies that led to the theory of dispersion in open tubular columns. Prof. Synovec’s efforts in the areas of MDGC research, development, and application made him a natural fit for this award, and I would like to further offer my congratulations to him for this well-deserved recognition.

Even so, I believe that most analytical chemists do not appreciate what the world of chemometrics and experimental design statistics can add to their work. Analytical chemists often work in highly complex systems where the interplay of variables (for example, for optimization of instrument performance) or the differences among two or more classes of extremely complex samples (for example, healthy vs. diseased, wild-type vs. mutant, and so forth) may not be immediately apparent from a cursory evaluation. In these cases, there are a variety of mathematical and statistical tools that can be used to tease out important information, but instruction in these techniques is not commonplace.

Our group has learned first-hand the power of some of these techniques, and their use can provide a significant boost to the analytical workflow, whether it be in the development of a method or the final interpretation of data. I will be the first to admit that we are not operating at the same capacity as Prof. Synovec and his group; in fact, much of our work in these areas has involved collaborative efforts with other groups.

Chemometrics is a broad field on its own. Yet, there are a few techniques that jump to mind as among the most useful to analytical chemists.

Factorial design is a way to evaluate the impact and interplay of variables on a particular outcome. For example, we have performed a study to understand the relative impact of different settings in an electrospray ionization (ESI) source for the efficient generation of ion current (1). A rank order of the most important variables or combinations of variables is determined as the output. More recently, in our design of a new ambient ionization source, termed continuous flow–extractive desorption electrospray ionization, for analysis of samples in non-ESI-friendly solvents, we used partial factorial design to understand what variables were most important to optimize the hardware setup (2). There are many affordable packages available to perform factorial design. All one needs to define is which variables are important, what ranges of values over which the variables should be tested, and what output is desired to be maximized or minimized. Of course, there are different levels of complexity to be considered, from partial to full factorial design experiments, and these carry some basic assumptions; however, the use of factorial design is a very efficient way to test a variable space using a minimum, but effective, number of experiments.

Principal component analysis (PCA) is another powerful tool that can be used to visualize the ability of a method to differentiate between different complex systems. In PCA, a set of observations based on possibly correlated variables are transformed into a set of values that are based on noncorrelated “orthogonal” variables (that is, principal components). The projection of data in these principal components spaces can help one visualize whether highly complex data sets are similar or different. For example, we have used PCA to demonstrate that matrix-assisted laser desorption–ionization (MALDI)–mass spectrometry (MS) methods could differentiate between different crude oils (3) or even different ages and sexes of mosquitoes (based on their cuticular lipids) (4). PCA can be performed in an unsupervised and a supervised fashion.  In the former, the entire data set is used to perform the analysis.  In the latter, specific features from the data are selected beforehand, and then these are used to generate principal components to discriminate between the different classes present. Many commercial chromatography–MS software platforms now provide quite powerful means to perform PCA on data sets collected, and this step can be quite useful, especially in nontargeted metabolomics-type analysis.

In his award lecture, Prof. Synovec described the use of chemometrics in a study on multivariate selectivity, where minimum degrees of resolution needed to resolve complex mixtures could be reduced as the dimensionality of the analytical system increased (5). With such analysis, one could see the power of, for example, adding an information-rich detection system such as time-of-flight MS to MDGC, versus flame ionization detection. In the former, less resolution is needed in the chromatographic separations compared to the latter, to achieve the same performance. The Synovec group has also used combinations of chemometric techniques, such as PCA and parallel factor analysis (PARAFAC) to reveal chemical differences in the metabolomes of different yeast cells (6).

Once you see the power of different chemometric tools in discerning minor differences between highly complex sample sets, it will be hard not to find an application in your own work. Since the initial experience of our group in factorial design, I have pushed my students in class and in lab to incorporate these time-saving strategies. I think one of the limiting factors in the use of PCA and other more-complex chemometric tools in routine analytical chemistry is that many platforms are fairly black-box in nature. It’s not always clear when to use a certain tool and how to go about using it, if you have not been previously trained to do so. I have seen many a statistician turn their nose up at the application of PCA where the data didn’t warrant such treatment. You have to know when to use certain tools, and when they are not appropriate. That said, if such guidance can be found, it can provide a whole new means of optimizing workflows and exploring data sets. For me, I’ll be in contact with Prof. Synovec soon to get his help and guidance on some of our recent work.



(1) M.A. Raji and K.A. Schug, Int. J. Mass Spectrom.279, 100–106 (2009).

(2) L. Li, S.H. Yang, V. Havlicek, K. Lemr, and K.A. Schug, Anal. Chim. Acta769, 84–90 (2013).

(3) H.P. Nguyen, I.P. Ortiz, C. Temiyasathit, S.B. Kim, and K.A. Schug, Rapid Commun. Mass Spectrom.22, 2220–2226 (2008).

(4) E. Suarez, H.P. Nguyen, I.P. Ortiz, K.J. Lee, S.B. Kim, J. Krzywinski, and K.A. Schug, Anal. Chim. Acta706, 157–163 (2011).

(5) A.E. Sinha, J.L. Hope, B.J. Prazen, C.G. Fraga, E.J. Nilsson, and R.E. Synovec, J. Chromatogr. A1056, 145–154 (2004).

(6) R.E. Mohler, K.M. Dombeck, J.C. Hoggard, E.T. Young, and R.E Synovec, Anal. Chem.78, 2700–2709 (2006).


Kevin A. Schug

is a Full Professor and Shimadzu Distinguished Professor of Analytical Chemistry in the Department of Chemistry & Biochemistry at The University of Texas (UT) at Arlington. He joined the faculty at UT Arlington in 2005 after completing a Ph.D. in Chemistry at Virginia Tech under the direction of Prof. Harold M. McNair and a post-doctoral fellowship at the University of Vienna under Prof. Wolfgang Lindner. Research in the Schug group spans fundamental and applied areas of separation science and mass spectrometry. Schug was named the LCGCEmerging Leader in Chromatography in 2009 and the 2012 American Chemical Society Division of Analytical Chemistry Young Investigator in Separation Science. He is a fellow of both the U.T. Arlington and U.T. System-Wide Academies of Distinguished Teachers.