Trends and Developments in Data Handling 2022

Column, October 2022, Volume 18, Issue 10
Pages: 2–7

A snapshot of key trends and developments in data handling according to selected panellists from the chromatography sector.

Q. What is currently the biggest problem in data management for chromatographers?

Christoph Nickel: Currently one of the major challenges is the increasing number of samples to run, analyses to conduct, and data to review while keeping data quality high and detecting any potential error. A major driver for this is increasingly complex separation and detection techniques that are required to analyze biotherapeutics. The result being that the chromatographer increasingly needs to use mass spectrometry (MS). Furthermore, the consolidation of all this information into an easily viewable and sharable format at a central location is a massive challenge. This is particularly important for information that is required to take an informed final review and approval. A typical example is the weighing results for calibration standards generated from a balance that should be connected to the calibration data in the chromatography data system (CDS) for confirmation of proper calibration and eventual accurate quantitation of the unknown compounds.

Ofrit Pinco: One of the biggest challenges for chromatographers is that data from different vendors cannot be incorporated together and analyzed collectively due to a lack of a unified data format. Chromatographers can only review data from one system at a time and answer specific questions. This makes it harder to access and conduct secondary analysis across multiple data systems. To address this challenge, several pharmaceutical companies have sponsored the Allotrope Foundation (1) whose initiative is to unify data formats. In addition, some start‑ups are building tools to translate data into a common format. However, both initiatives will take some time and collaboration to overcome this challenge.

Anne Marie Smith: Chromatographers use a variety of different instruments from various vendors, each with their own proprietary data formats. One big problem area is bringing together and managing the data from the different electronic systems. The ability to normalize all that disparate data while retaining the ability to interrogate it, as in native data processing software, is very beneficial to chromatographers. Since chromatography data are so ubiquitous, effective management in a central, accessible place is essential.

Björn-Thoralf Erxleben: Handling large quantities of data requires a lot of time for data processing and interpretation. Additionally, depending on the local situation, secure data storage and archiving can be time consuming, and administration of these processes gets more and more complex.

Although there are network-based multi‑instrument-capable CDSs, all vendors support and maintain their proprietary data format first—data file formats for photodiode array detectors (PDA) and for MS instruments are closed. Even when providing drivers to other CDS systems, still several requests/wishes are not satisfied. Hardware-wise, hybrid configurations may contain different operation workflows, and parameters cannot easily be transferred between vendors. Direct comparison of data between different instruments can be difficult.

Q. What is the future of data handling solutions for chromatographers?

Christoph Nickel: I see three main trends: first, radically streamlined and simplified user experience with more “fit-for-purpose” applications; second, an agglomeration of data from different sources in a single central repository in a consolidated format—often referred to as a Data Lake. This will reduce the time for data review/analysis because it eliminates any manual data transfers or manual consolidation of spreadsheets or PDF files. Third, more and more automation of routine tasks using machine learning (ML) (for routine reviews) and algorithm‑assisted data mining to identify patterns, trends, outliers, or deviations.

In addition, data will continue to become available anywhere, anytime, so there will be no further need to be in the laboratory or at the instrument to analyze your data, and no need for installation and maintaining software applications on your device. Everything will be available online.

Ofrit Pinco: The future of data handling goes well beyond acquiring and analyzing data generated by a single chromatography system. As new tools and solutions are being developed, and as researchers are being expected to extract more information from their samples, chromatographers will need to access and analyze data from multiple instruments and data systems simultaneously. Right now, chromatographers have multiple tools to help them focus on multiple areas, but future tools will allow them to review information from the whole workflow in one space. This has potential to enable researchers to answer more questions. This will also be valuable as requirements and regulations for compliance become stricter. New tools will also give research teams insight into historical instrument performance data, leading to increased operational efficiency and even predictive maintenance. Data handling will only continue to become more streamlined and more advanced through the utilization of these types of tools combined with artificial intelligence (AI) and ML. These are the next steps needed to reach the lab of the future and Industry 4.0.

Anne Marie Smith: The cloud is the future of data handling. All systems will connect to the cloud. It’s secure, simplifies the infrastructure thereby reducing costs, provides better performance, and is scalable. Depending on the system you choose, it can also be “future proof”. It is important, however, that systems architects take into account the scientists’ data access requirements. Whether the data needs to be accessed immediately upon generation, or at a later date, should inform how data management solutions are architected to ensure a seamless transition to cloud-based data access.

Björn-Thoralf Erxleben: We already see cloud-based data storage options at several CDS installations, and this trend will continue because it renders data access and sharing far easier. At the same time, this will require a new level of data security and data protection. A positive aspect is that data storage and archiving is outsourced and will not bind IT resources on-site.

AI software will be implemented in “standard” software, for peak picking, processing, and identification using database packages. Self‑learning algorithms will support method optimization and provide an estimation of retention time, based on structural information of the analyte.

Developing and maintaining special programs and databases for research use is a time- and resource-intensive task. If such a standard is being accepted and used in the industry, instrument vendors have to provide data compatible with these programs. There might be also agreements about new standard data formats, which will be used or supported via conversion.

Last, but not least—it would be nice to see that workflows and parameter definition is adjusted between the vendors and that data processing, at least for two-dimensional (2D) data, becomes a common piece of software accessible via the web, to be used by all chromatographers, after logging on to the dedicated cloud.

Q. What one recent development in “Big Data” is most important for chromatographers from a practical perspective?

Christoph Nickel: While it might sound boring, the biggest impact on the analytical laboratory is the ability to bring data together from all instruments and all devices working on the same function, the same research, or the same discipline. The availability of data in one location is a mandatory prerequisite for every analysis, insight, or application of algorithms. So, any effort that chromatographers can make to bring their data together brings them a major step closer to fast, efficient, and error-free analysis, moving from reactive review or error handling to proactive problem prevention. This can be realized from the availability of unlimited computing power in the cloud, which is becoming more mainstream for deployment of globally connected systems.

Ofrit Pinco: AI and ML have been growing rapidly in the last few years and people are realizing more of their advantages on a daily basis. Take search engines for example, Google has drastically changed the way we search for answers, plan our travels, and consume news. As AI and ML technologies mature, more scientists with this skill set will enter the chromatography field and apply these technologies to the laboratory.

In the current state, chromatographers analyze data based on specific questions, with the aim of confirming predefined hypotheses. Through AI and ML, chromatographers may be able to uncover new trends and patterns from a large set of unstructured data, giving chromatographers insights they didn’t know existed. This will greatly facilitate the progress of scientific research in the long run.

Anne Marie Smith: AI and ML can help find relationships and insights in an otherwise overwhelming amount of data, providing potential predicted outcomes. While AI and ML can drastically improve processes, it is only as good as the data that is input. For instance, for chromatographers where there are a multitude of possible instrument combinations, if data collection is of poor quality or incomplete, the results may be biased.

Björn-Thoralf Erxleben: Analytical intelligence features such as auto‑recovery, start-up, self-check, and feedback. Apart from additional automation, this enables quick and easy hardware diagnostics and helps to decrease downtime of the systems. By applying more process analytical techniques (PAT) features and more feedback from the system to the central server, chromatographers can focus on their work and need to care less for the hardware.

Q. What obstacles do you think stand in the way of chromatographers adopting new data solutions?

Christoph Nickel: One of the greatest challenges is the need to comply with good manufacturing practice (GMP) and data integrity guidelines. The validation guidelines were drafted for on-premise deployment of software, and laboratories now need to transform their validation principles into a more decentralized, globally connected world with often abstracted storage. In simple terms—the demands to prove that your data integrity is maintained now require you to include at least one additional player—the host of your data. This increases complexity of your validation tasks and requires a change in thinking and conducting validation.

Another significant obstacle is the potential delay of data access driven from the need to transfer the data from the laboratory to the central location/entity and access it there. While the internet and cloud performance are fast enough to provide a positive user experience, the in-house infrastructure is often the rate-limiting step. For example, a single low performance element in your local area network such as an old 10 MB switch can slow down your entire data traffic by a factor of 10. Suitability of the infrastructure is a critical prerequisite for transferring the data into the central repositories and increases dependency on your IT infrastructure.

Ofrit Pinco: A few factors contribute to this slow adoption. First is the complex laboratory ecosystem. Due to the interconnectedness of systems and solutions, any change must be evaluated for its impact on all components within the ecosystem. Also, downtime needs to be minimized, as many laboratories are in production and operate on a 24/7 schedule. After implementation, regulated labs require validation for the change. Additional training is also required for technicians to adopt new standard operating procedures (SOPs) and avoid errors. As a result, adopting new solutions is difficult and time-consuming.

Anne Marie Smith: Adopting new data solutions is a daunting task. It involves time to set up the system in a useful way, time for validation and implementation—ensuring the system meets the data integrity requirements and ensuring the data are secure—and time to learn the new system. These factors often lead to reluctance to change, which can stand in the way of adoption of useful solutions.

Björn-Thoralf Erxleben: Changing an established workflow is a critical matter for analytical laboratories and operators do not always come with a strong analytical background and experience. New user interfaces, new operation workflows, and, in the worst cases, new definitions for known parameters in the software present a lot of training for users until a new solution is finally adapted. Risk of longer downtime is high. Right now, we are confronted with objection to installation of necessary service packs or patches to be compatible with modern operating systems and virus protection.

New features and functionality need to prove their advantage first before new software is rolled out and established. Another aspect is the data comparison and transfer, what happens with the “old” data? Legislations require that old data and results have to be kept and provided for inspection if needed—is maintaining a piece of the old software a good solution? Especially when it means some knowledge of how to operate it needs to be available.

Q. What was the biggest accomplishment or news in 2021/2022 for data handling?

Christoph Nickel: The adoption of the cloud with unlimited storage, computing power that enables data agglomeration, and new levels of advanced and super-fast analysis of data.

Ofrit Pinco: In the past two years, more data scientists have entered and brought changes to the analytical industry. Data scientists are skilled at analyzing and extracting insights from structured and unstructured data by using scientific methods, algorithms, and systems. They can be good complementary partners to application scientists, who have backgrounds in chemistry and understand the cases and workflows in the laboratory. Together with application scientists, data scientists can utilize models and algorithms to analyze and visualize historical data and let application scientists relate new findings to workflows and experiments.

In addition to scientific findings, data scientists may also improve laboratory operation efficiency by evaluating instrument performance and data management metrics. Data scientists may provide new perspectives on how laboratories can better store, organize, and manage data.

Anne Marie Smith: Streaming live data as it’s acquired locally and storing it in a cloud instance of a CDS has improved IT systems. With the recent development in Big Data, this simplifies data movement for downstream data analytics.

Reference

  1. https://www.allotrope.org

Christoph Nickel is the Director of Software Marketing at ThermoFisher Scientific.

Ofrit Pinco is Senior Product Manager at Agilent Technologies.

Anne Marie Smith is Product Manager, Mass Spectrometry & Chromatography at ACD/Labs.

Björn-Thoralf Erxleben is Senior Manager at Shimadzu Europa in charge of the pharmaceutical and biopharmaceutical market.