Computer-Assisted Down-Skilling

The Column, The Column-08-04-2021, Volume 17, Issue 08
Pages: 22-24

Columns | <b>Column: Incognito</b>

It’s true that the power of computers has changed the working lives of everyone within the developed world, and has allowed the analytical chemistry community to advance beyond anything that might have been imagined, even in the latter years of the last century. However, I do have an unease about the insidious ways in which our use—some might say reliance—upon in silico tools may also be affecting the skills of the analytical chemistry community.

Let me start by saying that I do not consider myself a luddite or indeed a naysayer when it comes to computerized systems or tools. In the arc of my own career, I’ve seen the introduction of computers into the laboratory and the huge developments realized through their use. I’m not going to argue here that we should revert to some pre-silico dark age in which our scientific capabilities and productivity are regressed due to any implied paranoia regarding computer technology. However, I do feel that it’s worthwhile to highlight some observations regarding our, perhaps unwitting, reliance on in silico tools and their potential effects on the basic understanding and skills that I believe are necessary for any good analytical chemist.

Let’s start with molecular drawing programs. Whilst these are literally game changing in terms of speed and efficiency of data presentation and report production, I wonder how many times structures are loaded using these programs without much attention to the “chemistry” of the molecules. When I can use a freeware program that allows me to search for say, aspirin, as a text string and the structure magically appears, this is not like having to draw the structure by hand, which, in a psychological sense, creates a bond (no pun intended) between the person drawing it and the structure itself. The act of drawing the structure, either literally or digitally, gives me a sense of the form and composition of the molecule. This may seem very small beer but from my own experience, I know that I understand less about structures I import rather than those I’m forced to draw, either digitally or literally.

Some of these drawing programs also allow the prediction of analyte physicochemical properties, Log P, Log D, pKa, and the like. Again, for a chromatographer these predictions can save a lot of searching time and can be highly informative when developing chromatographic separations. However, there are often a number of variables that need to be defined to make the prediction more accurate. Indeed, the accuracy of the predictions are driven by the training or reference datasets available to develop the algorithms on which the predictions are made. All too often I see predicted values used without proper setting of the controlling variables that guide the prediction process or values are taken verbatim and without knowledge of the inherent error, often to the detriment of the chromatographic method development process. I get that the alternatives to prediction are not appealing and to have to search endlessly for literature reference values or carry out the determinations empirically can be time-consuming and sometimes futile. But we seem to sleepwalk into believing that what the computer tells us must be correct, somehow without the ability to question or assess the possible inaccuracies of computed values. It would be great to see analysts “calibrating” the accuracy of predicted values by comparing to literature values from trusted sources and investigating the source of any variation between results. That level of scientific curiosity and the quest for rigour seems to be diminishing, and the accessibility of such computed data doesn’t help to explain the derivation of the data or its meaning in terms of separation science. Perhaps we simply don’t do a good enough job of educating younger scientists on the possibility that the computer-generated data of this type may simply be approximate and the magnitude of any such errors needs to be taken into account.

Computer data systems (CDSs) are another area of computing technology that I believe to be rich in the propensity for down-skilling. Again, the merits of the CDS system are undoubted and it would be impossible to dispense with them, but there is so much that goes unexplained within the “black box” that ultimately dispenses the “answer” to our analyses. I come from a world of “standardized” chart paper, where peaks were cut out by hand (yes, literally using scissors) and weighed in order to derive the relative “response” of analyte and internal standard or calibrant. There will be few readers who remember this and it may seem unbelievable to most, but that’s how we used to do quantitative analysis prior to the invention of the computing integrator. How many of us understand the principles of the integration algorithms of modern CDS systems, or even change the integration parameters to better model the “baselines” used when integrating a chromatographic peak to determine its area? When was the last time you optimized these parameters to improve quantitative accuracy by improving the modelling of the computed baseline? Or to cope with highly convoluted chromatograms where a perpendicular must be dropped, valley to valley events are required, or the baseline skimmed? I for one would not be able to mathematically check that the integral function had returned an accurate estimate of the peak area. One could argue that it is not important to be able to do this, but I do think it’s important to know at least something of the process used to ultimately generate the peak area, the user defined variables that can affect the result, and the associated error with that measurement.

I’ve written about this several times previously within the Incognito column, but how many of us really truly understand how the final “answer” is computed within the CDS system? Could you reproduce the value by calculating from first principles? How is the calibration line derived? Is it equally weighted? What is the treatment of the intercept? Are standard purities taken into account? How are the standard and sample dilutions taken into account? All very pertinent questions that need to be considered to compute the value using graph paper and the correct equation to define the calculation. Again, is all of this important? Well, yes it is, especially if the “answer” derived via the CDS is incorrect or accepted verbatim without any consideration of the associated error. In my early career, all equations were derived from first principles, calibration done by hand using graph paper, and the results calculated manually with the help of a calculator. I can say that I can spot errors in the way that CDS systems have been configured to calculate a final “result” more easily than my younger counterparts and that’s not because I’m smarter than them, simply that the experience gained in my early career has taught me to work methodically and forensically through the calculation and the input variables to find the errors. That’s very rarely taught within industry these days, as far as I can tell.

We have been recruiting for structural elucidation analytical staff lately, seeking candidates who can use mass spectrometric (MS) data to find the structures of unknown analytes not contained in commercially available or in-house MS libraries. In almost every interview, we have been asked what high‑resolution MS instruments we use and what software we use for the task. It has been almost impossible to find candidates who can take a gas chromatography (GC)–MS or liquid chromatography (LC)–MS spectrum—alongside some “hints” of the types of analyte we believe might be related to the unknown—and chromatographic data to work from first principles and a solid knowledge of organic chemistry and MS ionization processes to figure out the structure of the unknown. Again, given that high-resolution MS data in combination with the very powerful software packages can suggest empirical formulae, mine possible structural features of the analyte, and in some cases suggest analyte structures from spectral fragmentation patterns, why on earth should I be concerned about this? Well, you may consider my rage against these particular machines to be futile, but I’m the one who sees the suggested structures of unknown analytes and believe me, context and a wider perspective is everything. If the candidates had some knowledge of basic organic chemistry, chromatographic science, and the ionization processes in play in LC–MS and GC–MS analysis, and if they had used the supplementary information about the types of materials being analyzed, they would not have come up with some of the suggested structures that we have been presented with. I must pause here and say that by far the best candidates we see for these roles are those who have a good understanding of synthetic organic chemistry. So, the questions I might pose are, “Does our high‑resolution MS capabilities and highly capable but very expensive software down‑skill the analytical chemist of today? Do we provide the training and learning necessary to give the underpinning knowledge and skills to understand the software structure suggestions and to look with a critical eye on the validity, or otherwise, of the predicted structure? Are basic skills in structural elucidation being eroded to the point at which we cannot properly question the computer predictions and suggest alternatives?”

Readers must decide for themselves if I’m just an old-timer yearning for the good old days and whether the examples cited above resonate with their own experiences of the trends within our industry. From my perspective, I do value the capabilities and advances that in silico tools have brought to the analytical armoury and would never wish to regress from the very fortunate position in which we find ourselves. My beef is that we have stopped teaching the fundamentals, stopped helping analytical chemists to properly understand how and why the computations are being made, and, most importantly, we have robbed them of the ability to critically evaluate results and accurately assess and communicate error. The answer to this is to address these latter points in undergraduate, postgraduate, and industrial education and training but, like any good self-help situation, we first must admit that we have a problem!