Trends and Developments in Data Handling

Lewis Botcherby;Matheson,Alasdair;

Trends and Developments in Data Handling

January 20, 2021

By Lewis Botcherby
Alasdair Matheson

Publication

Article

ColumnJanuary 2021

Volume 17

Issue 01

Pages: 22–26

A panel discussion on the latest advances and future developments in data handling.

Orlando Florin Rosu/stock.adobe.com

LCGC: What is currently the biggest problem in data management for chromatographers?

Shawn Anderson: The biggest current challenger for chromatographers in managing data is integrating chromatography results into a larger data ecosystem. While there are a variety of different scenarios, a common example is where a lab manager would like the laboratory information management system (LIMS) connected to a chromatography data system (CDS). The sheer number of vendors, platforms, and IT environments in this solution space can be daunting. Fortunately, some options are now available that enable this CDS–LIMS connection with a few straightforward configuration steps.

Darren Barrington-Light: Data security and integrity, and ensuring this during the data’s entire life cycle, are key concerns for many chromatographers. This permeates to nearly all laboratories where data is relied on for product release or any form of quality control. In compliant laboratories, the most recent focus is on data integrity and demonstrating control of processes by ensuring appropriate reviews of data are taking place. This is not about simply viewing audit trails and trying to identify anomalies but really understanding what events occurred in the system during the data life cycle that could affect results; such as manual changes on an instrument during an injection or manual integration of a chromatogram. Chromatography data systems that can highlight these incidents make it easy for reviewers to discover whether they did or, importantly, did not occur and investigate accordingly.

Another challenge we consistently hear from our customers and industry leaders is lack of integration between and across systems because of proprietary data formats or software incapability. For example, mass spectrometry (MS) usage in routine laboratories is becoming more prevalent and many software packages are not optimized for compliance or networking of these instruments leading to siloed data with inherent backup and archiving risks. This “siloing” ultimately limits the value of the data collected. A laboratory-wide CDS that is designed to support both chromatography and MS in a network environment can connect these workflows and deliver compliance with a common data format stored in one central location.

Proprietary data formats also lead to data silos but through more widespread implementation of open standards for data, we can begin to realize the full value of chromatography data.

Graham A. McGibbon: Productivity is a prime goal of chromatographers, so a major daily challenge is to rapidly and easily collect, compare, and interpret data to deliver results — from a collection of different instruments, CDSs, and methods — for multiple samples and injections. Even with “enterprise-scale” systems there will surely be heterogeneity in hardware and software versions, detectors, methods, various metadata, and digital data formats that can complicate rapid results assembly for decision-making. Some may face difficulties innovating to enhance quality while satisfying compliance requirements.

Stephen McDonald: Our focus has been in the laboratory performing routine analyses where chromatography data systems are the most actively used tools to generate data. The most important requirement that we gathered was that any solution used in these types of laboratory- based businesses must be able to be in an operating state that complies with local regulatory requirements. Beyond the requirements of compliance, one must then resolve the issues facing chromatographers, which currently focus on consistent and accurate integration of complex chromatograms, particularly for analyses such as impurity and stability evaluation. Second is the ability to pool and access large data sets, which can then be used for comparative trend analysis for the purpose of understanding how to achieve better quality — currently a challenge because most businesses generate data in multiple data formats. This is particularly important in companies that see the value of method life cycle management as part of the adoption of new ICH guidelines such as Q12, and Q14 for method development. Traditionally the documentation supporting method development was considered outside the realm of “regulated data” but these new paradigm shifts, to include knowledge gathered during method development and include in a regulatory submission, move that line of “compliance”. These emerging expectations require tools that can store large amounts of data, extract and compare across multiple data formats, and finally store information over long periods of time whilst making it readily available for review and reporting.

LCGC: What is the future of data handling solutions for chromatographers?

Shawn Anderson: We feel the future of data handling goes beyond CDS and LIMS products. Our customers are telling us that they want (and need) to interrogate their data in new ways rather than the traditional “how pure is my sample” question. For example, can they extract causal relationships between product purity and yield, and any number of external factors such as raw material source, synthesis step variation, and even shift productivity. To do this, they need access to a much larger data lake which contains all of this information, including the chromatography results.

Darren Barrington-Light: I see three main trends: First, the adoption of cloud and deployment of the CDS as software as a service (SaaS). This provides on-demand data handling across the laboratory and delivers the highest levels of data security while driving down operational costs for in-house administration and capital expenditure for server hardware.

Second, I envisage an ecosystem of connected applications that would allow laboratories to build their own solutions based on their workflows. Furthermore, the adoption of cross-supplier standards for data, such as those supported by the Pistoia Alliance, have the potential to transform the laboratory by enabling cross-technique and cross-supplier review of data, enabling a truly connected digital ecosystem that offers collaboration and helps to accelerate scientific discovery, drive productivity, and minimize risk.

Third, many companies are establishing company-wide pools for all laboratory or even all analytical data, also referred to as “data lakes”. This data can then be associated with data managed in relational databases, like chromatography data. These organizations hope to use data lakes to improve data access and overcome silos. Data lakes would also enable artificial intelligence (AI), machine learning (ML), and deep learning (DL) to extend the value of data.

Graham A. McGibbon: A shift and transformation of technologies to support a wide range of cyber-resilient, enterprise-scale, virtualized, distributed, and services-based data handling solutions is already taking place for all data analysts, including chromatographers. Innovative and sophisticated chromatographic instruments and CDS software will still require some proprietary technologies for optimal control, data acquisition, and pre-processing. Future systems will simply or automatically and seamlessly be able to transfer captured data and results to other software systems.

Stephen McDonald: We hear that scientists need tools that can improve peak integration and automated creation of results for better accuracy and consistency, store large amounts of data in an accessible and open format, and extract and compare information over long periods of time and across multiple or unified data formats.

These tools will need to be able to share this information over large geographies and operate at performance levels that do not inhibit the use of those applications.

The future tools must be “easy to use” and provide a guided workflow just like those we use today for banking and shopping online.

LCGC: What one recent development in “Big Data” is most important for chromatographers from a practical perspective?

Shawn Anderson: Plainly put, it’s the cloud. The value of having instantly scalable storage without having to negotiate with an overburdened IT department cannot be underestimated. There are many additional benefits such as simplified back-up/restore and connectivity across globally dispersed laboratories. Many software solutions now support cloud integration for these very reasons. Some vendors offer a stepwise approach based on comfort level and organizational flexibility. For example, a less flexible laboratory may only want to store data in the cloud, keeping other essential components on-premise, while an organization more open to change might want a fully cloud-based solution.

Darren Barrington-Light: Leveraging the power of the cloud has recently been made possible for chromatographers through support for infrastructure as a service (IaaS) in CDS and LIMS software. Uplifting central network resources, such as servers, databases, and file storage, to the cloud can deliver both cost-savings and performance gains. Using the elasticity of cloud computing, laboratories only need to pay for what they use and can quickly ramp up or down resources to meet local, regional, or global demands. This eliminates the need to spec servers for peak usage and can deliver extra compute power should it be required at peak times potentially delivering “workstation-like” performance in the network, even for large data sets like MS.

Graham A. McGibbon: We are so familiar with the term Big Data that it may be used when it wouldn’t apply — most organizations’ own chromatography data is not considered “Big Data”. It may come from a variety of sources and each may contain many data points, but unlike genomic data, for example, the scale of data is not truly large, and data processing to a much smaller useful set of peaks is typically straightforward using traditional methods. Nevertheless, some big data technologies may be used to store and transfer chromatographic data more effectively or to assess the performance and usage of chromatographic equipment and data at an organizational asset level to help make educated decisions about future capital investments in hardware and software.

Stephen McDonald: The biggest development in “Big Data” is the speed at which we have seen its application multiply, today there are so many new applications being explored by so many. The number of players has dramatically increased and the sheer number of people and companies driving/collaborating on “Big Data” solutions has increased exponentially. Proposals for the application of “machine learning” to solve some of those challenges of complex integrations, and improvement and expediting for design of experiments (DOE) are some of the exciting things we are seeing.

A stepping stone on the path of “Big Data” is “automated system integration”, allowing data to flow safely and traceably, between” best-in-class” applications. This opens the door to innovation and leveraging the data a scientist may already have to create smarter and more timely insights, whether this is in QC or in discovery of new molecules.

LCGC: What obstacles do you think stand in the way of chromatographers adopting new data solutions?

Shawn Anderson: The bad news is there are many obstacles. The good news is that they can all be overcome with an open attitude and a bit of persistence. One common barrier that we encounter is a reluctance to change what has always worked well in the past. While another is an often unfounded fear of increased complexity and cost. Vendors in the industry share the responsibility with customers to make the future accessible, by streamlining adoption steps, proving equivalent functionality, and reducing validation costs.

Darren Barrington-Light: For many chromatographers the biggest obstacles to adopting new solutions are compliance, (re)validation, and previous investments into legacy systems with proprietary data formats. The big challenge is adopting new technology capabilities while preserving existing information and capabilities of current systems. Aligning the various stakeholders in large organizations with complex structures and competing goals is another. IT may want to place all new applications in the cloud, but the laboratory may be hesitant to store their valuable data, and quite possibly intellectual property, in the cloud due to perceived security risks and uncertainty around validation requirements. Fortunately, these concerns can be mitigated through appropriate contracts with internet service provider (ISP) and cloud providers. Still, for many users just the idea of redeploying and revalidating their systems is unpalatable and can outweigh any potential benefits they may see. Companies can provide validation services and documentation that alleviate these pains, streamlining the process and enabling chromatographers to take advantage of the latest data handling solutions available.

Graham A. McGibbon: Chromatographers will continue to value getting reliable results faster and more simply and being able to design effective experiments to characterize and understand materials. So, to rapidly and consistently exchange methods, data, and results, they need to have simple but extensive programmatic application programming interfaces (APIs) and user interfaces (UIs) that allow easy connections between the CDS and other software systems and informatics tools. The consequent challenge will be to work collaboratively using those other systems so their key results — namely assigned and integrated peaks having properties such as relative area and widths that are the deciphered outputs of the chromatographic data — can be examined and understood in larger contexts, such as making quality materials.

Stephen McDonald: For most of our customers the need to be compliant and to maintain a strong data integrity position is the first hurdle any new technology must clear before being accepted. Currently the barrier to adopting an improved software update, instrument, or analytical method is the hurdle of qualifying, verifying, or validating that change.
There is a big drive to work with regulatory agencies and expert groups to change this stagnating paradigm in the regulatory space, and then provide tools, data, and services that customers can leverage to properly risk-manage changes to make their laboratory processes faster and more automated. This, at the same time, will help make their products safer. These concepts are being promoted by the FDA’s center for medical devices (CDRH), such as the case for quality computer software assurance (CSA) guidance, and are being promulgated to reduce the barriers to innovation and improving quality and safety, without introducing new risks.