The LCGC Blog: A Simplified Guide for Weighted Fitting and its Significance in Separation Science

Blog
Article

Key Points

  • Weighted least squares (WLS) improves calibration accuracy, especially at low analyte concentrations where ordinary least squares (OLS) fails.
  • Chromatographic data often exhibits heteroscedastic noise, making WLS a better choice than OLS for regression.
  • A step-by-step WLS guide using high-performance liquid chromatography–ultraviolet (HPLC–UV) data for carbamazepine demonstrates reduced quantitation errors.
  • WLS can be implemented easily in Excel, offering analysts a cost-effective tool for robust quantitation.

This blog is focused on explaining the fundamentals of weighted fitting, along with its significance and application in analytical chemistry.

Calibration curves are commonly used for quantification of an analyte in a sample by comparing the analyte’s response to the response of standards at known concentrations. Calibration curves are commonly used for quantitation of target analytes in food chemistry, environmental analysis, and the pharmaceutical industry. All separation scientists are familiar with building a calibration curve through linear regression. The most common calibration function is the affine linear equation:

where f (x) is the modeled instrument response, x is analyte concentration or mass (independent variable), m is the slope, and b is the intercept. Typically, the and parameters in Equation 1 are solved for by minimizing the residual sum of squares between f (x) and the measured instrument response (y, dependent variable). This approach is known as the “ordinary least squares” (OLS) solution.

OLS methodology does not always work for quantitating samples in the lower range of the calibration curve. This is an especially challenging scenario for the pharmaceutical industry, where accurate quantitation of impurities at ≤1% of target analytes is essential. Here, weighted least squares (WLS) regression can significantly improve the quantitation of analytes at the low end of the calibration range. WLS can easily be performed at the click of a button in several specialized software, such as MATLAB, R, etc (1). However, it is vital for separation scientists to have a fundamental understanding of WLS and its rationale for proper use in any software.

In this blog, we provide a simplified step-by-step guide to perform WLS using calibration data of carbamazepine (a small molecule drug commonly utilized for treating epilepsy) analyzed via high-performance liquid chromatography–ultraviolet (HPLC–UV). A fundamental discussion of noise characteristics is provided to better explain why WLS can improve quantitation. The implementation of WLS in Excel (see the DOI link below) is provided, alleviating the need for analysts to buy specialized software. The errors in quantitation via ordinary least square fit and weighted least square fit are compared. Weighted fitting can greatly improve quantitation, making the evaluation of its utility, as described in this blog, a critical step of developing accurate and robust analytical methods.

Click here for the replication data.

Step 1

Define the linear range for your method and prepare a suitable number of standards within that linear range. In this case, HPLC–UV provided a linear response for the concentration range of 0.005 mM to 1 mM carbamazepine. Standards were prepared in MeOH and 1 µL of each sample was injected into the HPLC in triplicate. Here, peak area was used as a measurement of instrument response.

Table 1: Concentration and respective peak areas of carbamazepine standards used for the calibration curve.

Table I: Concentration and respective peak areas of carbamazepine standards used for the calibration curve.

The data in Table 1 can be used to plot a calibration curve (in Excel or similar software), where the average peak area for three injections is plotted on the y-axis versus analyte concentration on the x-axis. Note that some analysts may prefer to plot all three replicates rather than the average value. The linear trend line with fitted equation and R2 value can be automatically calculated in Excel (using the OLS solution) as is shown in Figure 1.

Figure 1: Calibration curve of carbamazepine from 0.005 mM to 1.000 mM showing average peak areas of three injections and error bars representing plus or minus one standard deviation.

Figure 1: Calibration curve of carbamazepine from 0.005 mM to 1.000 mM showing average peak areas of three injections and error bars representing plus or minus one standard deviation.

Step 2

Next, the calibration curve should be evaluated using quality control solutions or in this case, for simplicity, through back calculation (2). In back calculation, the peak areas of the calibration standards and the fitted calibration function are used to back calculate the calibration standards’ concentration. These back calculated concentrations can then be compared to their true value, and the quality of the calibration curve can be evaluated.

Generally, a R2 value of 0.9991 is considered good, and visually inspecting the calibration curve and data points also seems to indicate good linearity. The fitted calibration curve equation is:

Is this fit truly good enough to quantitate analytes within the whole linear range? Let’s see!

Table II: Calculation of % error of experimental vs predicted concentration.

Table II: Calculation of % error of experimental vs predicted concentration.

As can be seen from Table 2, the errors for Cal 1 and Cal 2 are exceedingly high, and this can result in inaccurate quantitation of samples at low end of the concentration range. This area of the calibration curve is often used when performing impurity analysis, where concentrations of the target analytes are low.

What can be done to resolve this issue? Let’s take a closer look at the analytical data and the assumptions of our calibration.

Step 3

For chromatographic quantitation with OLS, it is assumed that variance due to instrument noise is homogenous. However, this is not always true due to the concept of “heteroscedastic noise” that is often superimposed onto our analytical data. Noise in measurement science, regardless of its probability density or power spectral density, can either be (i) independent of sample concentration or measurement intensity, making it “homoscedastic,” or (ii) be related to sample concentration, making it “heteroscedastic.” Figure 2 shows the difference between a simulated calibration curve with homoscedastic noise (ε) versus heteroscedastic noise (ε (x)). Homoscedastic noise (Figure 2-A) is advantageous, as the probability density of the measurement does not change throughout the concentration range. This allows for the robust use of OLS, as well as associated calculations, such as the popular three times standard deviation method for limit of detection (LOD = 3s). Heteroscedastic noise (Figure 2b) complicates data analysis as the probability density changes throughout the concentration range, making OLS not robust and common calculations, such as LOD = 3s, not statistically valid. In our experience, most chromatography-based quantitation methods, such as LC–UV, LC–MS, and GC–MS, tend to have heteroscedastic noise if calibration is achieved over a wide enough concentration range (~ >2 orders of magnitude).

Figure 2: The effect of (a) homoscedastic noise and (b) heteroscedastic noise on a calibration curve. The noise (ε) is normally distributed (µ=0) in the homoscedastic case with a constant variance, as the variance increases with analyte concentration in the heteroscedastic case. All measurements in 2a have the same precision allowing for the simple use of OLS. In 2b, measurements at low concentration values have a higher precision (less variance) than those at the higher concentrations making WLS advantageous.

Figure 2: The effect of (a) homoscedastic noise and (b) heteroscedastic noise on a calibration curve. The noise (ε) is normally distributed (µ=0) in the homoscedastic case with a constant variance, as the variance increases with analyte concentration in the heteroscedastic case. All measurements in 2a have the same precision allowing for the simple use of OLS. In 2b, measurements at low concentration values have a higher precision (less variance) than those at the higher concentrations making WLS advantageous.

Now, let’s see how the variation in our data looks. Is the noise homoscedastic or heteroscedastic?

Figure 3: Plot of standard deviation versus concentration of carbamazepine calibration curve, showing the increase in variance with increase in concentration.

Figure 3: Plot of standard deviation versus concentration of carbamazepine calibration curve, showing the increase in variance with increase in concentration.

We can see that standard deviation is increasing with the increase in concentration and response, which shows that data is heteroscedastic (standard deviation is square root of variance). One way to improve the calibration of data which has heteroscedastic noise (such as this case) is to use weighted least squares regression. WLS has a similar objective as OLS, and it minimizes a weighted residual sum of squares. There are many options for weighting schemes such as 1/xi, 1/xi2, or 1/(standard deviation)2. Weighting by inverse variance (1/variance) is the most common method, as this allows for more weight to be placed on observations measured with less absolute uncertainty which often leads to more robust calculation of the m and b parameters in Equation 1.

Step 4

When using weighted least squares, the slope and intercept of equation one can be solved using the following equations (3):

We used the weighting (Wi) of 1/variance for calibration level i. Utilizing equation 3 and 4, an m value of 17024.1 and a b value of 8.79 were obtained. The updated parameters were used to repeat the quantitation that was done in Table 2 using:

Table III: Calculation of % error of experimental versus predicted concentration.

Table III: Calculation of % error of experimental versus predicted concentration.

In Table 3, it can be seen that with the updated parameters in the calibration curve obtained by the weighted fit, there is no bias in error towards the lower range of the calibration curve. The average error for back calculation using WLS was 4.02%, a reduction of more than 6-fold compared to the average back calculated error using OLS of 27.7%.

Concluding Thoughts

This blog shows the advantage of weighted least square fitting for real-world methods. It is crucial for emerging separations scientists to understand the reasoning behind weighted fitting, how to perform it, and its utilization in various industries. Using the four simple steps provided in this blog, analysts can determine if WLS is useful for their method and easily implement WLS by just using Excel, as shown in the Excel sheet.

References

(1) Least-Squares Solution in Presence of Known Covariance (lscov). MATLAB Help Center 2025. https://www.mathworks.com/help/matlab/ref/lscov.html (accessed 2025-5-14)

(2) Jurado, J. M.; Alcázar, A.; Muñiz-Valencia, R.; Ceballos-Magaña, S. G.; Raposo, F. Some Practical Considerations for Linearity Assessment of Calibration Curves as Function of Concentration Levels According to the Fitness-for-Purpose Approach. Talanta 2017, 172, 221–229. DOI: 10.1016/j.talanta.2017.05.049

(3) de Levie, R. When, Why, and How to Use Weighted Least Squares. J. Chem. Educ. 1986, 63 (1), 10. DOI: 10.1021/ed063p10

About the Authors

Adrian Carranza is an Associate Scientist (EW) working within the Synthetic Separations team at Amgen. His current work focuses on quantification and analysis of small molecules, and a high throughput LC-based solubility assay for early-stage synthetic compounds. Adrian received his degree in Microbiology from California State University Northridge in 2019. He has previously worked on liquid formulation screenings of adeno-associated virus (AAV)-based drug products and other chromatographic techniques, such as, size exclusion and ion exchange chromatography.

Adrian Carranza is an Associate Scientist (EW) working within the Synthetic Separations team at Amgen. His current work focuses on quantification and analysis of small molecules, and a high throughput LC-based solubility assay for early-stage synthetic compounds. Adrian received his degree in Microbiology from California State University Northridge in 2019. He has previously worked on liquid formulation screenings of adeno-associated virus (AAV)-based drug products and other chromatographic techniques, such as, size exclusion and ion exchange chromatography.

Troy T. Handlovic is a Scientist in the Synthetic Separations Group at Amgen, Inc. in Thousand Oaks California. Troy earned a Ph.D. in analytical chemistry at the University of Texas in Arlington (UTA) under the guidance of Prof. Daniel W. Armstrong. His dissertation work focused on greening enantioseparations, improving the efficiency of columns, understanding separations with compressed fluids, and reducing the fluidic dispersion of LC instrumentation. Troy has also innovated within the digital signal processing space for analytical chemistry working on algorithms for high-throughput ultra-fast separations, some of which are now patent pending.

Troy T. Handlovic is a Scientist in the Synthetic Separations Group at Amgen, Inc. in Thousand Oaks California. Troy earned a Ph.D. in analytical chemistry at the University of Texas in Arlington (UTA) under the guidance of Prof. Daniel W. Armstrong. His dissertation work focused on greening enantioseparations, improving the efficiency of columns, understanding separations with compressed fluids, and reducing the fluidic dispersion of LC instrumentation. Troy has also innovated within the digital signal processing space for analytical chemistry working on algorithms for high-throughput ultra-fast separations, some of which are now patent pending.

Muhammad Qamar Farooq is a Senior Scientist in the Synthetics Separations team at Amgen. His work focuses on analysis and purification of small and hybrid molecules using chromatographic techniques. Performing mass-directed high throughput purifications at microgram-scale as well as chiral method development and purifications at preparative scale using supercritical fluid chromatography (SFC). He completed his PhD in 2022 under the supervision of Dr. Jared Anderson at Iowa State University. His graduate research focused on using ionic liquids and deep eutectic solvents for analytical separations. He is also an executive committee member of ACS SCSC (Subdivision of Chromatography & Separations Chemistry).

Muhammad Qamar Farooq is a Senior Scientist in the Synthetics Separations team at Amgen. His work focuses on analysis and purification of small and hybrid molecules using chromatographic techniques. Performing mass-directed high throughput purifications at microgram-scale as well as chiral method development and purifications at preparative scale using supercritical fluid chromatography (SFC). He completed his PhD in 2022 under the supervision of Dr. Jared Anderson at Iowa State University. His graduate research focused on using ionic liquids and deep eutectic solvents for analytical separations. He is also an executive committee member of ACS SCSC (Subdivision of Chromatography & Separations Chemistry).


Recent Videos
Image Credit: Josephine Ouma
Image Credit: Josephine Ouma
Image Credit: Josephine Ouma
Christopher Pohl and Katelynn A. Perrault | Image Credit: © Ricky Haldis
Related Content