Open Access Article
Jiayu Zhanga and
Jason E. Hein
*abc
aDepartment of Chemistry, The University of British Columbia, Vancouver, BC V6T 1Z1, Canada. E-mail: jhein@chem.ubc.ca
bDepartment of Chemistry, University of Bergen, Norway
cAcceleration Consortium, University of Toronto, Toronto, ON, Canada
First published on 26th December 2025
Infrared (IR) spectroscopy is a powerful tool for real-time reaction monitoring in chemical synthesis. For applications that require tracking concentration profiles of reactive species, univariate linear calibration models are commonly used to relate IR signals to analyte concentrations. Despite their simplicity, the accuracy of these models can be limited by spectral overlap and other effects that distort the linear relationship between concentration and signal. To address this limitation, chemometric models are often employed, typically without further examination of opportunities to improve univariate calibration performance itself. Here, we present a novel workflow based on Bayesian statistics to enhance univariate calibration for IR reaction monitoring. The central feature of this workflow is the use of three diagnostic Bayesian probabilistic models, combined with data-preprocessing selection, to screen for IR signals that can potentially improve univariate calibration performance when non-linear effects are present. We applied the workflow to a test reaction system and identified an IR signal in the fingerprint region, along with an uncommon preprocessing strategy, that reduced prediction error by more than 50% compared with the univariate model using the original preprocessing steps. Overall, our workflow aims to improve the usability of univariate calibration approaches and expand the toolbox available to chemists for IR monitoring of complex chemical processes.
Infrared (IR) spectroscopy has become a mainstay among analytical techniques for real-time reaction monitoring. The in situ nature of IR spectroscopy allows direct analysis of the reaction medium, avoiding delays and artifacts associated with offline sample handling or transfer. In its simplest use, unique IR signals are first identified, and key process information, such as conversion is obtained by tracking the evolution of these signals throughout the reaction.6 For more quantitative analysis, offline samples are collected simultaneously via complementary techniques to obtain analyte concentrations. These concentration values are then linked back to the corresponding IR signals to construct a univariate linear regression model based on Beer–Lambert's law.7,8
Despite the straightforward nature of the univariate linear regression approach, recent studies8,9 cautioned against its use, even when a distinct spectroscopic feature can be assigned to the analyte of interest. Chemical reactions often exhibit higher-order effects. The changes in the observed spectroscopic response are influenced not only by analyte concentration but also by additional factors such as pH, temperature, or the presence of other components in the reaction medium. When such effects arise, chemometric models10 are frequently employed to address the complex dynamics in the data. Very few works, however, have been done to improve the univariate model in the case of higher-order effects. For example, spectral preprocessing optimization is often used to improve chemometric modeling accuracy,11,12 but such optimization is rarely applied in the development of univariate calibration models.
Bayesian models have gained increasing attention in recent years for diverse applications in chemistry, including reaction and process optimization,13,14 kinetic parameter estimation, model selection,15,16 and development of spectroscopic calibration models.17,18 Their ability to incorporate prior knowledge (e.g., assumptions about measurement noise) and to treat model parameters and predictions probabilistically makes them especially valuable for data analysis in low-data regimes.16
Here, we present a workflow based on Bayesian modeling to more deeply interrogate the use of the univariate linear regression approach for IR reaction monitoring. The central idea is to perform Bayesian model criticism. Rather than immediately discarding the univariate model following an initial lack of fit, we employ a Bayesian hierarchical linear regression model to assess the potential for performance improvement through optimization of preprocessing steps. When such opportunities are identified, a grid-search optimization campaign is used to locate the optimal preprocessing pipeline. Finally, the improvements in model performance are evaluated using Bayesian posterior predictive checks. The aim of this work is to provide a practical roadmap for improving univariate calibration approaches for IR reaction monitoring, thereby expanding the set of tools available to chemists when working with challenging and dynamic reaction mixtures.
![]() | ||
| Fig. 1 Top: Reaction scheme for the BTM-catalyzed acylation. Bottom: Reaction conditions used in the training and validation experiments. | ||
A separate nuclear magnetic resonance (NMR) reaction monitoring experiment (Fig. S4) was conducted to confirm the mass balance between starting material 1 and the product (i.e., 1 was converted exclusively to products in this reaction). With this information, the method introduced in this work22 was used to convert the HPLC peak area for 1 and products into concentrations without requiring an external calibration curve (Fig. S4).
![]() | (1) |
The Gaussian likelihood is widely used in chemistry applications,17,18 based on the assumption that measurement noise is normally distributed and independent of the magnitude of the measurement. In this work, concentration measurements were obtained from an online HPLC reaction monitoring platform. Because the fluidics transfer and dilution steps may introduce noise that scales with signal magnitude, we also evaluated an alternative likelihood function: the lognormal distribution. To assess which likelihood better reflects the measurement process, a double exponential decay function was fitted to the observed concentration profiles of 1 to approximate the mean signal behavior. These mean estimates were then used to infer the noise parameter for each candidate likelihood. The resulting log-likelihood values were similar for both models (Fig. S6), indicating that the current dataset does not provide sufficient evidence to favor one likelihood function over the other. Consequently, we selected the Gaussian likelihood for the remainder of the analysis due to its computational simplicity.
Model performance was assessed using Bayesian leave-one-out (BayesLOO) scores. BayesLOO is calculated by training the model on all but one data point, followed by validation on the left-out point. This procedure is repeated until each data point in the training set has been left out once. The BayesLOO score of M3 was used as a reference (best-case scenario) for subsequent comparisons, as the interpretability of BayesLOO scores is more meaningful when used in relative comparisons.
In the first scenario, both M1 and M2 exhibit much lower BayesLOO scores than M3. This indicates that higher-order effects are present at the individual level (within each experiment) for the selected IR wavenumber region. In this case, chemometric models should be employed to address these complex effects, or an alternative IR signal can be selected to re-enter the model comparison.
In the second scenario, M1 shows a low BayesLOO score, but M2 performs reasonably well relative to M3. This suggests that the linear relationship between concentration and IR response is maintained within each individual experiment. But higher-order effects, which likely originate from the variation in different experimental conditions, still influence the analyte IR responses. In this scenario, applying data preprocessing optimization will have a considerably higher chance of mitigating the higher-order effects compared to the first scenario. After such a scenario is identified, principal component analysis (PCA) is applied to understand the nature of the variations. This step will inform us which preprocessing categories we should consider for the subsequent optimization. Then, a grid search is performed on all possible combinations of preprocessing steps to identify an optimal pipeline that will enhance the M1 performance. Finally, Bayesian posterior predictive checks are performed to assess the quality of the improvements.
In the third scenario, M1 performs comparably to M3. This suggests that higher-order effects are negligible in the current application, and the univariate model can be reliably used for reaction monitoring.
The results showed that M1 and M2 both yielded substantially lower BayesLOO scores than M3 (Fig. 3b), indicating that this IR region is affected by higher-order effects at the individual experiment level. As a result, the univariate linear regression model is not suitable for calibration in this spectral region. This behavior may arise from the overlap between the product carbonyl stretch and the carbonyl signal of the byproduct isobutyric acid.
Next, two IR signals (1156 cm−1 and 1178 cm−1) that showed strong correlation with the product concentration were identified in the IR fingerprint region. These features likely originate from carbon–oxygen single-bond stretching or bending of the product. Recalculation of the BayesLOO scores using these newly selected IR signals showed substantial improvement in the performance of M2 (Fig. 3b), whereas M1 continued to perform poorly relative to M3 (Table 1, entry 3). These results suggest that the selected fingerprint-region signals, particularly the one at 1156 cm−1, exhibit a strong linear relationship with product concentration. But as the experimental condition changed, a common higher-order emerged and altered the nature of this linear relationship.
| Index | IR signals | Baseline correction | IR peak range | elpd_loo |
|---|---|---|---|---|
| 1 | Peak height | First derivative | Zero | 250.7 ± 5.9 |
| 2 | Peak height | First derivative | Valley | 259.7 ± 3.4 |
| 3 | Peak height | Second derivative | Zero | 218.0 ± 4.2 |
| 4 | Peak height | Second derivative | Valley | 223.4 ± 4.1 |
| 5 | Peak area | First derivative | Zero | 84.1 ± 1.5 |
| 6 | Peak area | First derivative | Valley | 292.8 ± 3.6 |
| 7 | Peak area | Second derivative | Zero | 148.5 ± 3.8 |
| 8 | Peak area | Second derivative | Valley | 240.5 ± 3.0 |
PC-2 captured a smaller (∼9%) but still meaningful portion of variance. The PC-2 scores (Fig. 4c, gray hollow cycle) for exp 1 exhibited an upward trend, whereas those for experiment 4 and 7 exhibited downward trends. We therefore reasoned that the variation in the second PC was not directly associated with the change of the major reactive species shown in Fig. 1. This is because such changes would be expected to produce consistent trends across different experiments. We also compared PC-2 scores with the temperature profiles (measured by the in situ IR probe), but did not observe any meaningful correlation. We speculate that the varying amounts of triethylamine used in these experiments may influence the protonation states and hydrogen bonding interactions among reaction species, thereby giving rise to the variations observed in PC-2. In future studies, a pH probe could be incorporated to track pH changes across different experiments. Comparing the pH profiles with the PC scores will provide additional evidence to support or refute the current hypothesis.
PC-2 loading plot showed that the wavenumbers at 1140 and 1164 cm−1 were the two strongest contributors to PC-2 (Fig. 4b). Conventionally, after baseline correction, the maximum peak height of the identified IR peak is used as the IR signal. Based on the observation in PC-2 analysis, we reasoned that incorporating additional information from the surrounding spectral region into the IR signal calculation might help mitigate the higher-order effects. To test this idea, peak area integration was introduced as an alternative way for IR signal calculation, and the integration bounds were extended from a zero-point baseline to the neighboring valley points (Fig. 3a).
To this end, eight different preprocessing pipelines (Table 1) were constructed by permuting three design choices: first vs. second derivative (for baseline correction), peak area vs. maximum peak height (for IR signal calculation), and zero vs. valley (for IR peak bounds definition). Scattering corrections were not included in this optimization. This is because the experimental setup minimized scattering effects through careful control of the IR probe position and the homogeneous nature of the reaction mixture. Normalization and scaling, commonly used to equalize the variation of important and less important variables for multivariate model development,11 were also not considered. BayesLOO scores were calculated for M1 with all eight preprocessing pipelines. The combination of “first derivative + peak area + valley” (Table 1 entry 6) showed substantial improvement relative to the original pipeline (entry 3). These results support our hypothesis that incorporating additional spectral information around the peak maximum in the IR signal calculation helps mitigate higher-order effects, thereby improving the performance of the univariate model (M1).
The results showed that, as expected, the simulated data distributions from M3 (Fig. 5a, green shaded area) closely resembled the observed data distribution (Fig. 5a, black dashed line). Due to pronounced higher-order effects, the simulated data distributions from M1 using the original preprocessing pipeline (Fig. 5a, blue shaded area) failed to reproduce the observed distribution. However, the simulated data from M1 using the optimized preprocessing pipeline (Fig. 5a, orange shaded area), though displaying slightly greater uncertainty, successfully captured all salient features of the observed distribution.
Based on these observations, we concluded that the univariate linear regression model with the optimized preprocessing pipeline could be used for reaction monitoring, and we applied it to four additional test experiments. The results showed that, except for experiment 3, the new model accurately predicted product concentrations, as indicated by the close agreement between observed and predicted values in Fig. 5b. It also produced an average 70% decrease in the root mean square error of prediction (RMSEP) relative to the original model, and yielded an RMSEP comparable to that of M3 (Fig. S7). We attribute the discrepancy observed in experiment 3 to the inaccurate calculation of the IR signal using the optimized preprocessing pipeline. This is caused by the uncertainty in identifying the correct valley positions in its IR spectra. This limitation could be mitigated in future studies by increasing the spectral resolution of the IR measurements (e.g., from eight to four wavenumbers per data point), which would facilitate more reliable identification of valley points.
The use of Bayesian models is central to the proposed workflow, as they allow measurement noise to be incorporated directly into the models and provide uncertainty estimates for all inferred parameters. These features are particularly valuable when analyzing reaction time-course data, where datasets are typically limited in size and affected by measurement noise. With advances in computing power, programming languages, open-source software, and high-quality online documentation, Bayesian approaches have become increasingly accessible to practitioners without formal training in statistics or computer science. As a result, we anticipate that the proposed workflow can be readily adopted and adapted to other applications with little additional effort.
One limitation of the current study is that only a single reaction system was investigated. For new reaction systems, the current preprocessing optimization strategy may not yield meaningful improvements, even when scenario two is identified. In future work, we aim to apply this workflow to a broader range of reaction systems and to incorporate multivariate curve resolution (MCR) analysis into the workflow. MCR can provide more chemically meaningful interpretations of IR spectra than PCA.10 This added insight will enable the design of more tailored and effective preprocessing strategies, and ultimately enhance the generalizability of the workflow for improving univariate calibration approaches across a wide range of chemical applications.
| This journal is © The Royal Society of Chemistry 2026 |