Working with Data Scientists to Improve Online Chemical Extraction and Analysis

LCGC North America, September 2022, Volume 40, Issue 9
Pages: 462

In the past, I have openly lamented the seeming lack of National Science Foundation (NSF) support for research into fundamental separation science. This past fall, I was heartened to receive a NSF grant (CHE-2108767) to support explorations into how data science techniques could be used to advance complex on-line extraction and analysis systems. I am willing to eat my past words to some degree. Although I think more federal research support could always be provided, with this project support, we are investigating fundamental relationships between the structures of molecules and their interaction with different materials, in the context of on-line supercritical fluid extraction–supercritical fluid chromatography (SFE–SFC).

Although I do not intend to drill down into the nuts and bolts of the full project details, our intent is to combine machine learning and surrogate optimization techniques to efficiently reach optimal SFE–SFC conditions for a wide range of applications. A big challenge is being able to generate a methodology, that could work across a wide range of molecule types and materials. SFE and SFC are known to be applicable across much of the space that both gas chromatography and liquid chromatography can analyze. The materials under consideration are those that might contain the molecules from which they need to be extracted (such as sample materials), as well as the SFC stationary phases, which are used to both trap and separate the molecules.

Deriving parameters associated with a particular molecule that are predictive of its physicochemical properties, as well as its interactions with a wide variety of materials, is also challenging. On the one side, much can be done by determining linear solvation energy relationships for solutes that have detailed property descriptors, but the sets of molecules where these descriptors are known is limited, and the descriptors are not easy to determine for other molecules of interest. On the other hand, properties that are easily calculable, such as pKa or log P, provide limited prediction capacity when the molecules are present in complex systems.

With that in mind, we have decided to pursue machine learning methods, which can encode the actual chemical structure of the compound and correlate it with measured properties. This type of work is being pursued in drug discovery and synthesis, but as of yet only to a very limited degree in analytical chemistry.

To determine the best encoding strategy for the two- and three-dimensional chemical structures, we have been exploring the potential to predict vacuum UV (VUV) absorption spectra. Techniques that use, for example, time-dependent density functional theory, do an okay job, but they do not often produce the fine spectral structure we can observe in experimental gas phase VUV spectra. Using a variety of deep learning methods, we have had good success now in predicting VUV spectra from chemical structures, and even vice versa. This should create a powerful tool to aid both VUV detection for gas chromatography, as well as provide a framework for us to advance our work to optimize SFE–SFC.

We will also use a surrogate optimization technique to study and optimize the on-line extraction and trapping variables. Surrogate optimization is an enhanced-response surface methodology, incorporating a wider range of functions, in order to handle more complex response surfaces. Our team has been working on the code, focusing on modeling electrospray response of different analytes, which is a bit simpler and less instrument intensive than jumping into the SFE–SFC optimization.

Although I am far from able to code or decode any Python, it has been extremely enlightening to gain a better appreciation for cutting-edge data science. The biggest challenge with such a collaboration is communication. As we are each experts in our domains, trying to bridge the gap requires discussions that often drop down to very basic fundamentals in each of our fields. But, as we begin to be able to better speak each other’s language, the potential for advancement in both of our fields has become clear.

Now, as some students graduate from the team, it will be interesting to see what kind of employment they can find. I have no doubt that analytical chemists with some data sci- ence background will be snatched up. I wonder more about the data scientists with some analytical chemistry background. Are analytical chemistry companies considering how they might utilize a hard-core data scientist? I can attest to the exceptional level of data science expertise these industrial engineers possess, and would like to, with this blog, promote them to the analytical chemistry community. Let me know if you are hiring in this area, or if you know someone or some company who is.

Kevin A. Schug is a Full Professor and the Shimadzu Distinguished Professor of Analytical Chemistry in the Department of Chemistry & Biochemistry at The University of Texas at Arlington. Direct correspondence to: kschug@uta.edu