News|Articles|May 29, 2026

LCGC E-Books

  • LCGC E-Books-6-2-2026
  • Volume Sponsored eBook

R Workflow for Semi-Automated LC–HRMS Data Processing for Unveiling Copolymer Distributions

Listen
0:00 / 0:00

Key Takeaways

  • Polyether polyol properties depend on initiator functionality, oxide type (EO/PO), and molecular-weight distributions, with formulated systems amplifying analytical complexity and competitive-intelligence needs.
  • UHPLC–HRMS is well-suited for polyether polyol compositional screening, but differential ionization, adduct heterogeneity, and multidistribution mixtures make annotation and deconvolution computationally demanding.
SHOW MORE

New R tool auto-decodes polyol structures from mass spec data—great for screening complex formulas.New R tool auto-decodes polyol structures from mass spec data—great for screening complex formulas.

The market size of polyols was estimated at USD 26.2 billion in 2019 and is forecast to reach USD 34.4 billion by 2024.1 In 2018, the polyol market was roughly represented by 25% polyester polyols and 75% polyether polyols. Polyether polyols have been used to produce both flexible and rigid foams. As a result, polyether polyols have been intensively applied in the construction, building, automotive, and other important industries. For most applications, polyether polyols have been used as building blocks for polyurethanes, which make up about 7.9% of global plastics consumption (in 2019).2

These polyether polyols are manufactured by a catalyzed reaction between an initiator and an organic oxide.3 The choice of initiators, for example, water [f=2], glycerin [f=3], and pentaerythritol [f=4], determines the functionality (f) of the produced polyol. Typical organic oxides are cyclic ethers; for example, ethylene oxide (EO) and propylene oxide (PO), which ultimately represent the repeating unit of the produced polyols. As such, the physical-chemical properties and the application of the polyether polyols are mainly determined by the inherent initiators, organic oxide, and molecular weight distribution(s) of the polyether polyols. These polyether polyols are often used in formulated systems, which increases tremendously the complexity of such samples. Complex formulations may comprise multiple initiators with varying functionalities, organic oxides, and even additives to alter the properties of the polyether polyols. Consequently, characterizing these polyether polyols is also of utmost importance to advance new applications, improve product performance, and screen competitor markets. A better understanding of the exact composition of polyester polyols depends heavily on the proper selection of analytical tools.

Today, a plethora of analytical techniques are available for characterizing these polyols, including nuclear magnetic resonance, titration, matrix-assisted laser desorption/ionization mass spectrometry, size-exclusion chromatography, and two-dimensional chromatography. Nevertheless, the aforementioned techniques have shown some constraints in terms of accuracy, efficiency, and specificity.4 Literature has shown that liquid chromatography (LC) coupled with high-resolution mass spectrometry (HRMS) has been the most frequently used technique to screen and acquire compositional information on polyether polyols.5 LC separates the polyether polyol depending on the functional type and chemical composition, while HRMS enables the accurate identification. Processing the collected ultrahigh-performance liquid chromatography (UHPLC)-HRMS data can be challenging, especially when different initiated polyols are present and/or molecular ions of the distributions ionize differently. Calculations and annotations can be done manually and/or by using commercially available software; however, both have some limitations. Manual processing, including the calculations and annotations, can be very time-consuming and requires an in-depth knowledge of mass spectrometry. In addition, commercial software limits the degree of user adaptation. Consequently, the use of open-source software (such as R and Python) can help overcome previous limitations. As such, combining UHPLC-HRMS with open-source data processing software will make it the platform of choice for unraveling the identities of inherent polymeric distributions. Moreover, UHPLC-HRMS provides excellent polymer separation, accuracy, precision, and high-resolution data collection,6 while an open-source platform allows fast and consistent data processing.

Therefore, this study developed a new processing workflow integrated in the open-source software R for the semi-automated deconvolution of polyester polyols observed by UHPLC–HRMS. Moreover, this novel approach provides a complete deconvolution of the polyether polyol composition by automatically annotating the polymeric distributions, calculating the number of repeating units/monomers, and potential initiators. This novel approach may be used as a first qualitative screening approach to better understand formulated systems.

Experimental

Chemicals and Reagents

To study the developed R code, the following commercially available polyether polyols were investigated: a blended polyether polyol, a glycerol ethoxylate (Gly-EO), a glycerol propoxylate (Gly-PO), and a glycerol ethoxylate-random-propoxylate copolymer (Gly-EO/PO). Additionally, a rigid polyurethane foam sample and a real polyol research sample were evaluated. Research samples were kindly supplied by Dow Benelux B.V.

Aqueous solutions were prepared using Milli-Q grade water (Millipore). The solvents used were acetonitrile (ACN, LC–MS grade) and methanol (MeOH, ULC/MS grade), both obtained from Biosolve. Ammonium formate and formic acid (reagent grade, ≥ 95%) were purchased from Sigma-Aldrich.

UHPLC–HRMS

The polyether polyols were chromatographically separated using a 1290 Infinity II UHPLC system (Agilent), consisting of a pumping system coupled to an autosampler and column oven compartment. Chromatographic separation was achieved using reversed-phase chromatography with gradient elution. Separation of the polyether polyols was carried out using a Zorbax RRHD Eclipse Plus C18 (50 mm × 2.1 mm i.d 1.8 µm particle size) (Agilent) at a temperature of 40 °C. Furthermore, the mobile phase consisted of a mixture of water (Eluent A) and methanol (Eluent B), both containing 0.1% ammonium formate, pumped at a flow rate of 0.2 mL min-1. The linear gradient program was as follows: 0 min, 20% B; 0 – 0.5 min, 20% B; 0.5 – 10 min, 20 – 100% B; 10 – 20 min, 100% B. The injection volume was 0.5 µL, and samples were kept cool at a temperature of 4 °C in the autosampler. After every injection, the needle was subjected to a standard wash with isopropanol by using the flush port for 3s.

The detection of polyether polyols was carried out using a quadrupole time-of-flight (Q-TOF) HRMS system (Agilent) fitted with a thermally heated electrospray ionization spray source (Agilent Dual AJS ESI Jet Stream). Optimal positive ionization source working parameters were gas flow of 8 L/min, gas temperature of 320 °C, nebulizer pressure of 35 psig, sheath gas temperature of 350 °C, and sheath gas flow of 11 L/ min, capillary voltage of +3500 V, nozzle voltage of 1000 V, and fragmentor of 150 V. The optimal MS parameters of the QToF were performed within a range of 100 – 3000 Da at a 1Hz (1 scan for each s). Data were collected in positive-ion mode and stored in centroid mode. Initial instrument calibration was carried out by infusing calibration mixtures for the positive and negative ion modes. Instrument control and data export were carried out using MassHunter Workstation Data Acquisition (version B.06.01) (Agilent) and MassHunter Qualitative Analysis (version B.07.00) (Agilent), respectively.

Data Processing

The collected UHPLC–HRMS data were exported from MassHunter Qualitative Analysis (Agilent) and integrated in R. The code for unveiling the polyether polyols was written in R (version 3.4) and made use of the following libraries: XLConnect (6) and ggplot2.7 Note that the two utilized libraries are open source and available online. XLConnect allows one to import data from Excel to R, and ggplot2 was used to create the distribution graph, Kendric Mass Defect plot, and heatmap graphics.

The R-code built and utilized in this work, is described in full detail in the Results and Discussion section. Explanation on criteria and choices are elucidated.

Results and Discussion

Two strategies for unveiling polymeric distributions are discussed. Depending on the knowledge of the mass spectrometric specialist, a specific strategy will be more time-efficient in terms of data processing. The first approach is designed for use when the identity of the repeating unit is suspected and is intended to unveil a single- or multiblock copolymeric distribution. The second approach is designed for unknown multiple copolymeric distributions.

Unveiling Single-Block Copolymeric Distributions With a Suspected Repeating Unit

The first goal of the processing workflow was to uncover copolymeric distributions containing single-block copolymeric distributions. Therefore, UHPLC–HRMS data for a glycerol ethoxylate and a glycerol propoxylate polyether polyol were collected and are shown in Figure 1. The total ion chromatogram (TIC) depicts the signal intensity obtained for all species detected. The observed pattern in the TIC is typical for polymeric distributions. The summarized mass spectrometric data of these distributions were exported by MassHunter to a .csv file by using the XLConnect package in R6 (provides comprehensive functionality to read, write, and format Excel data), and subsequently imported into the developed processing R workflow. Within the written workflow, a universal format was selected to import the mass spectrometric data (for example, .csv file). The latter can always be easily changed depending on the user’s format. After unencrypting the MS data, these data were read by the developed algorithm program and viewed by the user. A schematic overview of the developed workflow, depicting the required input parameters and the output of the processing workflow, can be found in Figure 2.

Input parameters of the developed workflow include the experimental mass spectrometric data, mass error, the accurate mass of the suspected repeating unit, and a database of potential initiators. The mass error—defined in ppm— was used as a peak assignment criterion. Positive identification has been considered when the mass error between experimental accurate mass and theoretical accurate mass has a mass accuracy lower than the defined value. Narrowing the mass error can result in a higher confidence level of the peak assignment, but stricter values can also result in missed identifications during processing. Generally, in the high-resolution mass spectrometric field, a mass error of 3 ppm has been considered an appropriate value for retaining positive compounds during identification.8 Within the developed processing workflow, the degree of mass error has been left open to be defined by the user, as the mass error for identifications also depends on the type of mass spectrometer and the complexity of the sample. For this study, a mass error of 3 ppm was used to unveil components in the HRMS data.

The theoretical masses in the in-house developed database have been generated based on the input of potential initiators. Moreover, based on the defined potential initiators and the suspected repeating unit, the processing workflow constructs a theoretical database containing all combinations of the number of repeating units (based on experimental mass spectrometric data) with the potential initiators. Subsequently, the experimental data have been compared to the theoretical in-house database by using an indexing approach.9 This approach effectively excludes the number of theoretical database entries that could be labelled a positive result (or matches with the entry) of the sample. A positive hit is considered whenever the difference between the observed mass (sample) and the theoretical mass in the in-house database is below the defined mass error (3 ppm in this study). It should be noted that different mass spectrometric adducts (such as H+, Na+, K+, and NH4+) were also considered to support positive identifications. As a result, a list of positive identifications will be generated for each initiator containing the number of repeating units and the observed adducts. All the steps—ranging from calculations, annotations, and data processing—were semi-automatized (see Appendix for the full R code). The latter was not completely automatized to give the user the freedom to filter out false positive results (see KM_filter in R pipeline) if that would be required. It is important to highlight that the latter can be performed using other programming languages, such as the work from Molenaar and associates, where MatLab was used to create a similar pipeline and graphics.10 Nevertheless, the programming language is different (for example, high performance language [MatLab] vs. interpretive language [R]), where R provides a larger online community for documentation and support.11

When subjecting the glycerol ethoxylate and glycerol propoxylate polyether polyols sample data (Figure 1) to this workflow, the insights and visual capabilities of the R pipeline were able to be demonstrated. Visualizations were realized by using the ggplot2 library in R.7 The advantage of this package lies in the step-by-step plot generation from multiple data sources. In addition, the implementation of sophisticated multidimensional conditioning systems and a consistent interface to map data to aesthetic attributes is simplified. The results of the included automatic data visualizations for the glycerol ethoxylated polyether polyol can be found in Figure 3. Similar results were also achieved for the glycerol propoxylated polyether polyol. The developed workflow provides scientific and user-friendly plots. Moreover, in the MS-summed spectra, the detected polymeric distribution is automatically highlighted in a specific color, thereby including all MS adducts (e.g., H+ and NH4+). Data can also be depicted in a more user-friendly way by using the MS density and/or violin plots, which represent the same information. These graphs enable more information regarding the distribution based on the intensity, masses, and number of species that were detected in the polymeric distribution. The violin plot has an additional advantage in contrast to the MS density plots when more single block copolymeric distributions are detected in the same samples.

Finally, several criteria can be added to the workflow to identify single-block copolymeric distributions with a suspected repeating unit. For example, an additional criterion could be checking the isotopic pattern of the identified component to evaluate the probability of a true positive identification. However, considering the isotopes in the processing workflow, this can significantly reduce processing capacity. Another criterion that could be implemented is to define how many repeating units coupled to a specific initiator constitute a polymeric distribution. For the latter, the definition of polymer used by the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) can be applied.12 Nevertheless, considering the REACH definition, the weight % of polymer should be higher than 50 wt.%, which is not reliable using the areas of annotated HRMS peaks. For now, it was decided not to implement this feature and to keep the freedom for the user to evaluate the positive identifications and the lack of determination of concentration using mass spectrometry data. To increase processing capacity, only compounds above a certain atomic arbitrary unit (104 for HRMS) could be considered for implementing isotopic deconvolution. Moreover, filtering out low-abundance peaks can be very effective, as HRMS data frequently contain thousands of unique mass peaks and/or background noise. Furthermore, another criterion that can effortlessly be added to the R pipeline is the multiple charging of components. The latter can easily be implemented by dividing the masses of the database through the charge (information regarding the charge of the mass can be found in the encrypted output csv-file, and as such can be read and taken into consideration by the R pipeline). Nevertheless, the reason this was not yet implemented is due to the following two reasons: The current UHPLC–HRMS conditions favor mainly singly charged pseudo-molecular species, and the reliability of HRMS annotations significantly reduces with increasing charging state of the species.

Unveiling Multiple Same-Block Copolymeric Distributions With a Suspected Repeating Unit

To demonstrate that the pipeline can handle multiple same-block copolymeric distributions, the workflow was applied to a real research sample (for example, polyurethane [PU] foam). Because a solid foam cannot be analyzed by UHPLC–HRMS, the rigid PU foam was first deconstructed using a hydrolysis process. The resulting liquid was analyzed, and the data were processed using the R workflow, which unveiled the presence of three propoxylated distributions initiated by water, glycerol, and sorbitol. The graphical results can be found in Figure 4. The water- and glycerol-initiated propoxylated polyol distributions were represented by singly charged protonated and ammonium adducts. For the sorbitol-initiated propoxylated polyol distribution, only the protonated adduct was observed. The observed MS species suggest the number of PO units reacted with each initiator to be roughly between 4–11, 2–13, and 6–10 for the water, glycerol, and sorbitol initiators, respectively. It should be noted that the observed results were complementary to NMR data collected on the hydrolyzed foam (data not shown).

Unveiling Multiple Block Copolymeric Distributions With Unknown Repeating Units

For unveiling multiple block copolymeric distributions (such as EO-PO-polymeric distributions), another approach was written and included in the R workflow. The general principles of the first strategy remained the same; however, some small differences exist. First, the generated databases included more species with several combinations of EO and PO. To limit the processing capacity, and therefore reduce the processing time, the maximum number of repeating units can now be defined by the user. Second, this program has been written in a more efficient for-if-loops, which includes the database generation as well (while previously this was written outside the for-if-loops). The latter was required to reduce the processing time for the automatic annotation. Annotation was done in a similar way as the first strategy, by comparing the experimental data to the theoretical in-house database by using an indexing approach.9 Furthermore, the visualizations were supported by ggplot2 library in R.

To evaluate the developed strategy for multiple block copolymeric distributions, the latter was applied to a real polyol research sample. Coupling the UHPLC–HRMS analysis with the written R workflow allowed the generation of a client-friendly heatmap, which can be found in Figure 5. The latter clearly demonstrates the number of EO/PO that was coupled to a toluene-diamine (TDA) initiator. These calculations were also confirmed by NMR (data not shown).

Conclusions

Manual annotations of polymeric distributions have been shown to be very time-consuming and labor-intensive; however, in many cases, the initial preliminary screening can be performed automatically. Furthermore, not all commercial software packages can handle the high-resolution, accurate masses generated by UHPLC–HRMS. Therefore, a new R workflow was developed that encompasses data processing and data visualization in a semi-automated approach. A semi-automated approach was selected, as the users still need to check the final proposed candidates before all graphics are automatically generated. As a result, the workflow enabled fast (several seconds for suspected analysis, and several hours for untargeted analysis) and consistent data processing. Moreover, the workflow automatically calculated the number of repeating units and the initiator to which each was coupled. Consequently, the researcher has more time to focus on the components that were not identified by the written code. Finally, the developed R workflow effectively demonstrated the ability to unveil single- and multiblock copolymeric distributions in real polyols and hydrolyzed foams. Future perspectives can be to append the R workflow with additional post-analyses (such as multivariate analysis and online library screening).

References
  1. Grand View Research. Polyols Market Size, Share & Trends Analysis Report by Product (Polyether, Polyester), by Application (Flexible Foam, Rigid Foam, Coatings, Adhesives & Sealants), and Segment Forecasts, 2018–2025; Grand View Research: San Francisco, CA, 2018.
  2. Plastics Europe. Plastics—The Facts 2020; Plastics Europe: Brussels, Belgium, 2020. https://plasticseurope.org/
  3. Szycher, M. Szycher’s Handbook of Polyurethanes, 2nd ed.; CRC Press, 2013.
  4. Groeneveld, G.; Dunkle, M. N.; Rinken, M.; et al. Characterization of Complex Polyether Polyols Using Comprehensive Two-Dimensional Liquid Chromatography Coupled to High-Resolution Mass Spectrometry. J. Chromatogr. A. 2018, 1569, 128–138. DOI: 10.1016/j.chroma.2018.07.054
  5. Stutzman, J. R.; Crowe, M. C.; Alexander, J. N.; et al. Coupling Charge Reduction Mass Spectrometry to Liquid Chromatography for Complex Mixture Analysis. Anal. Chem. 2016, 88, 4130–4139. DOI: 10.1021/acs.analchem.6b00485
  6. Studer, M. Excel Connector for R; 2026. https://github.com/
  7. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, 2009.
  8. Thurman, E. M.; Ferrer, I.; Fernández-Alba, A. R. Matching Unknown Empirical Formulas to Chemical Structure Using LC/MS TOF Accurate Mass and Database Searching: Example of Unknown Pesticides on Tomato Skins. J. Chromatogr. A. 2005, 1067, 127–134. DOI: 10.1016/j.chroma.2004.11.007
  9. Kong, A. T.; Leprevost, F. V.; Avtonomov, D. M.; et al. MSFragger: Ultrafast and Comprehensive Peptide Identification in Mass Spectrometry–Based Proteomics. Nat. Methods. 2017, 14, 513–520. DOI: 10.1038/nmeth.4256
  10. Molenaar, S. R. A.; van de Put, B.; Desport, J. S.; et al. Automated Feature Mining for Two-Dimensional Liquid Chromatography Applied to Polymers Enabled by Mass Remainder Analysis. Anal. Chem. 2022, 94, 5599–5607. DOI: 10.1021/acs.analchem.1c05336
  11. Ozgur, C.; Colliau, T.; Rogers, G.; et al. MATLAB vs. Python vs. R. J. Data Sci. 2021, 15, 355–372. DOI: 10.6339/JDS.201707_15(3).0001
  12. ECHA. Guidance for Monomers and Polymers; European Chemicals Agency: Helsinki, Finland, 2026. https://echa.europa.eu/