Open Access Article
Mahdi Ahmeda,
Shady Hameda,
Ricardo Cardosoa,
Charley Kenyona,
Manoj Pohare
a,
Mabrouka Maamraa,
Mark Dickman
a,
Joan Cordinera and
Zoltán Kis
*ab
aSchool of Chemical, Materials and Biological Engineering, University of Sheffield, Sheffield S1 3JD, UK. E-mail: z.kis@sheffield.ac.uk; Web: https://sheffield.ac.uk/cmbe/people/cbe-academic-staff/zoltan-kis
bDepartment of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
First published on 20th May 2026
Real-time monitoring of in vitro transcription (IVT) reactions is critical for enabling continuous manufacturing of high-quality mRNA vaccines and therapeutics for a wide spectrum of diseases. Compared to traditional batch manufacturing, continuous IVT production offers higher throughput, improved consistency, and reduced costs, but requires timely process monitoring to detect deviations and maintain product quality. Since pH is routinely measured in bioreactors, it can serve as a convenient, non-invasive input for real-time monitoring. We present the first IVT soft sensor based on H+ release during NTP incorporation, using in-line pH data to infer up to 40 otherwise predominantly unobservable species in real time, without requiring additional sensors. Validated against a separate set of offline measurements (not used for model fitting), it delivers updates every 25 milliseconds via two complementary models. The first couples a mechanistic IVT model with an Unscented Kalman Filter (UKF) to dynamically infer ≈40 key indicators, including mRNA yield (R2 = 0.95) and NTP depletion (R2 = 0.84). The second applies the semi-empirical Henderson–Hasselbalch correlation to reconstruct mRNA yield (R2 = 0.93) and NTP depletion (R2 = 0.76) from buffer capacity and pH change alone. This soft sensor enables continuous, real-time process monitoring by generating ≈1600 concentration estimates per second, supporting quality-by-digital-design and advanced control for continuous, disease-agnostic mRNA medicine manufacturing.
This rapid clinical expansion creates a pressing need for highly productive and scalable multi-product manufacturing platforms capable of delivering high-quality mRNA drug substance in a cost-effective manner.6–8 Achieving such manufacturing advancements requires: (1) multi-product manufacturing capabilities; and (2) rapid, or ideally real-time, monitoring of both process performance and product quality attributes, which is vital for continuous production.
Quality-by-Design (QbD) principles have the potential to enable multi-product manufacturing capabilities by mapping the impact of critical material attributes (CMAs) and critical process parameters (CPPs) onto product critical quality attributes (CQAs) and manufacturing key performance indicators (KPIs). This can establish a multi-product design space defined by CMA and CPP ranges, within which products can be manufactured with optimal KPIs (cost effectively and rapidly) and with the desired CQAs (translating to patient safety and product efficacy).6,9–11 Quality-by-Digital-Design (QbDD) extends this capability by using mechanistic, data-driven, or hybrid models to guide manufacturing process development and automate the operation of the developed manufacturing process. This involves defining and optimizing the QbDD design space in silico before a manufacturing run is executed and performing real-time optimization, feed-back control and feed-forward control (e.g., model-predictive control) during the operation of the manufacturing process.6,9,11 The implementation of this QbDD-aided multi-product mRNA manufacturing is currently constrained by the absence of real-time monitoring of manufacturing KPIs and product CQAs. Several spectroscopy-based Process Analytical Technology (PAT) tools have been explored for monitoring of the in vitro transcription (IVT) reaction. In situ Raman spectroscopy, Fourier-transform infrared (FTIR) spectroscopy, flow-nuclear magnetic resonance (flow-NMR, aka. online NMR) spectroscopy, for example, resolve spectral signatures of NTPs, PPi and growing RNA chains, enabling chemometric reconstruction of reaction progress.12–14 These spectroscopy methods offer molecular specificity, but lack the required sensitivity and require expensive optics and often labor-intensive calibration and data analysis.
A simpler, more accessible monitoring strategy capable of quantifying both KPIs and CQAs in real-time would offer significant value by ensuring that CQAs and KPIs are kept within specification. Real-time (or near real-time) monitoring is especially important for continuous IVT processes, where raw materials are continuously fed and the product is continuously generated. Without timely monitoring, process faults may only be detected hours or days later, potentially resulting in large quantities of off-specification products. Real-time monitoring improves process understanding and enables early detection of deviations, thereby enhancing efficiency, reducing costs, and improving consistency in product quality.15–18 Since pH is already routinely measured in bioreactors, it can serve as a convenient, non-invasive input for such a monitoring approach without requiring additional sensors or measurements.
The IVT reaction is at the core of the mRNA manufacturing process. The optimal outcome of the IVT reaction depends on carefully controlled reaction composition (e.g., NTP
:
Mg ratio, template DNA concentration, T7 RNA polymerase, etc.), pH, buffer strength, heat and mass transfer.6,19–21 During IVT, the mRNA is assembled from nucleotide building blocks by bacteriophage enzymes (e.g., T7 RNA polymerase) based on a template DNA.19,22 The T7 RNA polymerase is a formidable molecular machine capable of incorporating 200–250 NTPs per second into the nascent mRNA strand, each NTP incorporation consisting of 3 reversible and 1 irreversible sub-step.19,22–24 During each NTP addition cycle, the 3′-hydroxyl group of the primer terminus is deprotonated to generate a 3′-O− nucleophile. This nucleophile attacks the α-phosphorus atom of the incoming NTP, generating a transient pentacoordinated phosphorane intermediate. In the canonical two-metal-ion mechanism, MgA2+ (which acts as the general base to deprotonate the primer's 3′-OH) and MgB2+ (which stabilizes the triphosphate moiety of the incoming NTP) work together to facilitate catalysis.25–27 The breakdown of the resulting pentacoordinate phosphorane intermediate yields a new phosphodiester bond (Fig. S1). This reaction releases both pyrophosphate (PPi) and a proton into the solution.28–30 Since one proton is released for each NTP incorporated, the cumulative proton release is stoichiometrically linked to NTP consumption. In IVT reactions, which are moderately buffered, this proton release results in a measurable pH change that can be detected with pH meters.28–31 Because IVT models can account for proton balance, the pH trajectory can be linked to reaction progress.20,32 This creates an opportunity for a low-cost IVT monitoring approach that relies on routine pH measurements and requires minimal calibration of both the sensor hardware and the model used to estimate IVT species concentrations.
The in-line, on-line, at-line or off-line measurement of pH in small volumes (e.g., 1 mL or less) can be achieved using micro pH sensors, such as fiber optic pH sensors. These microsensors were originally developed in the late 1970s33 and have been used for rapid cell-culture monitoring.34–36 However, they remain under-used in cell-free RNA production. Several IVT kinetic models that account for pH and proton balance have been reported in the literature. Young et al.37 developed an early model of batch IVT that included proton release and buffer equilibria to predict pH changes during transcription. Van De Berg et al.32 extended this by building a QbD IVT model that accounts for NTP–Mg complexation and buffer speciation, enabling design space exploration. More recently, Ahmed et al.20 developed a comprehensive mechanistic IVT model that captures ≈40 reaction and buffer species, including detailed Mg2+ complexation, pyrophosphate hydrolysis, and multiple buffer equilibria, while also accounting for enzyme kinetics and template-specific nucleotide composition. The key advance of the present work is the integration of such a mechanistic model into a real-time state estimation framework (Unscented Kalman Filter), enabling reconstruction of a high-dimensional reaction state from in-line pH measurements.
Here, we embed an in-line pH microprobe directly in the IVT reaction mixture, streaming frequent pH data to a soft sensor that computes real-time estimates of (i) individual NTP depletion (consumption), (ii) mRNA titer (yield) (both validated against offline assays), (iii) reaction rate, and (iv) additional state variables predicted by the mechanistic model (e.g., magnesium–nucleotide complexes, pyrophosphate, orthophosphate and buffer species such as protonated and deprotonated forms of HEPES or TRIS). Two complementary computational models are evaluated using this soft sensor: (1) a mechanistic IVT model coupled to an Unscented Kalman Filter (UKF); (2) a semi-empirical Henderson–Hasselbalch (H–H) correlation that exploits buffer capacity and pH alone with no need for kinetic parameterization. These two approaches serve different but complementary purposes. The mechanistic IVT–UKF model can predict the future concentrations of a large number of reaction species and is regularly updated during the IVT reaction using pH measurements through the UKF. This enables estimation of otherwise unmeasurable reaction states and provides a framework suitable for advanced monitoring and future model-based control. However, it requires parameterization of kinetic models and higher computational effort. In contrast, the H–H approach relies only on pH measurements and buffer chemistry, making it simple, requiring minimal calibration, and easy to deploy with minimal computational requirements. However, it captures a smaller number of IVT species, provides more limited mechanistic insight, and does not enable forecasting of future concentration values. Using both models in parallel therefore provides a balance between practicality and predictive capability: the H–H model offers a minimal-data, rapid-deployment monitoring option, while the mechanistic IVT-UKF model provides deeper process insight and improved state estimation.
We demonstrate the application of this soft sensor in IVT reactions producing enhanced green fluorescent protein (eGFP) and SARS-CoV-2 spike protein (CSP) mRNA under two widely used buffer conditions (HEPES and TRIS). Predictions from both models are benchmarked against offline measurements of RNA yield and NTP depletion (hereafter used interchangeably with “NTP consumption” and “RNA titer,” respectively), validating the utility of pH-driven soft sensing for real-time IVT monitoring.
Crucially, this work goes beyond conventional soft sensing and PAT approaches by demonstrating that a single, routinely measured variable (pH) can encode sufficient information to reconstruct a high-dimensional biochemical reaction network in real time, establishing a new paradigm in process analytics in which information is extracted from the underlying reaction stoichiometry rather than from multiple or complex sensor inputs. While the framework infers ≈40 IVT and buffer species through mechanistic relationships, only a subset (RNA yield and individual NTP concentrations) is directly validated against independent experimental measurements.
:
2.5 in RNase-free water before following manufacturer's instructions. Eluted RNA was quantified on a NanoDrop™ Onec (Thermo Fisher Scientific) at 260 nm, with A260/A280 and A260/A230 ratios checked for purity.
:
200 in the corresponding buffer. The instrument was calibrated using two standards, prepared by mixing 10 µL of the kit-provided Standard #1 (0 ng µL−1 in TE Buffer) and Standard #2 (1000 ng µL−1 in TE Buffer) with 190 µL of the working solution. For sample quantification, 2 µL of each RNA sample was added to 198 µL of the working solution. All tubes were briefly vortexed and incubated at room temperature for 2 minutes prior to fluorescence measurement.
| NTP4− + RNAn→ RNAn+1− + PPi4− + H+ | (1) |
Accordingly, the net consumption of NTP and production of protons can be captured by
![]() | (2) |
![]() | (3) |
The released protons are buffered by HEPES and other ionic complexes. For instance, the HEPES equilibrium is
![]() | (4) |
| Htot = [H+] + [HNTP3−] + [HPPi3−] + 2[H2PPi2−] + [MgHNTP−] + [MgHPPi−] + [HHEPES] + [HP2−i] + 2[H2Pi−] + 3[H3Pi] + [MgHPi] + [HACET] | (5) |
All bound-species terms are expressed in free variables via dissociation constants; the complete set of equilibrium constants, kinetic parameters, and initial conditions is provided in ref. 20. For systems with alternative buffering (e.g., TRIS), the corresponding equilibria can be trivially derived and adapted to the model as needed.
We collect the system's differential states in x(t) and the algebraic states in z(t). The model thus forms a set of differential–algebraic equations (DAEs),
![]() | (6) |
| 0 = g(x,z,θ) | (7) |
k|k,ẑk|k) and the covariance Pk|k. Each sigma point is advanced to tk+1 by integrating the system. The resulting predicted sigma points are then averaged to obtain ŝk+1|k and Pk+1|k.| pHmeas,k = −log10([H+]) + vk | (8) |
| pH(i)k+1|k = −log10([H+](x(i)k+1|k,z(i)k+1|k)) | (9) |
From the ensemble {pH(i)k+1|k}, the UKF computes the updated mean and covariance in measurement space, calculates the cross-covariance with the state, and applies the Kalman gain to refine (
k+1|k+1,ẑk+1|k+1) and Pk+1|k+1. Fig. S2 depicts the overall algorithmic flow.
In a Kalman filtering framework, “process noise” (Q) represents uncertainty in the underlying model. Although the IVT reactions are deterministic in principle, real-world operation always deviates from an idealized model, so the instantaneous differential states (e.g., NTP, RNA) will not follow the nominal dynamics exactly. In an ODE-based filter one can freely perturb all states; in a DAE system, however, arbitrary noise in the algebraic variables z(t) risks violating
| g(x,z,θ) = 0 |
We inject process noise only into the differential states x(t) and selected kinetic parameters in θ. To ensure the noise scales with each state's order of magnitude using a single tuning parameter, we define
| Q = ε2diag(xnom)2, |
The UKF is implemented in Python using FilterPy, with IDA/Assimulo solving the underlying DAE and a 30 second measurement interval for real-time tracking. Calibration is performed by iteratively adjusting four species to align the UKF's predicted distributions with offline pH analytics: process noise (Q) assigned to key differential states or kinetic constants; measurement noise (R) inferred from the fiber-optic probe's empirical precision; unscented transform parameters (α, β, κ) tuned to control sigma-point spread; and the initial covariance (P0) set from prior state uncertainty. The kinetic model uses fixed kinetic parameters, calibrated against experimental data prior to runtime (these parameters need re-fitting when a new RNA molecule is produced based on a new template DNA); only state variables are updated by the UKF during operation. A summary of the final UKF parameter values appears in Table S4.
At discrete times tk, let
| pHk = −log10(cH,k), cH,k = 10−pHk, cH,0 = 10−pH0 |
The change in proton concentration relative to t0 is
| ΔcH,k = cH,k − cH,0 | (10) |
We relate pHk to the HEPES buffer equilibrium via the H–H equation:
| pHk = pKa(T) + log10([A−]k/[HA]k), | (11) |
Knowing that each NTP incorporation liberates one proton, the cumulative NTP count at step k is
| nNTP,k = VrxnΔcH,k | (12) |
The H–H correlation was also used to calculate the conjugate-base concentration [A−] as a function of pH for commonly used IVT buffers (HEPES, TRIS–acetate, TRIS–HCl, HEPES titrated with NaOH, and TRIS–EDTA) as well as for various HEPES and TRIS–acetate buffer concentrations (20–160 mM). For this, the pH range was discretized into fine intervals (0.001 pH units), and [A−] was computed within each interval. Then the sum of [A−] within each range was plotted as a function of pH.
:
NTP complexes, pyrophosphate, orthophosphate, and buffer species. Model predictions are updated every 25 milliseconds, allowing high-frequency state estimation despite slower pH measurement rates (30 second frequency). The second approach is based on the semi-empirical H–H correlation, which calculates RNA yield and NTP consumption from the buffer equilibrium and pH trajectory, without requiring kinetic parameterization. This can enable rapid estimation of ≈15 IVT species. The output layer of the soft sensor delivers continuous, real-time predictions of KPIs, including RNA yield, NTP consumption, reaction rate, as well as IVT and buffer component dynamics, accounting for a total of ≈15 (H–H model) and ≈40 (kinetic model) IVT and buffer species, respectively, which are inferred through the mechanistic framework constrained by pH measurements. Among these, RNA yield and NTP consumption are directly validated experimentally, using offline assays (AEX-HPLC, UV absorbance, and fluorometry). Together, these three modules form a model-based digital sensing framework that enables real-time tracking and prediction of IVT reaction progression, supporting QbDD, process development, and advanced control strategies.
In every case, the pH drops sharply upon initiation, mirroring NTP consumption and RNA production. This indicates that pH provides a direct, non-invasive proxy for transcriptional progress. The period of maximum catalytic activity is clear in the first 30 minutes, where all four curves fall steeply. As NTPs become limiting, the slope flattens. Notably, a larger net pH drop (especially in the eGFP–TRIS run) is associated with increased raw sensor noise. This is likely due to the measured pH values approaching the lower measurement limit (pH ≈ 5.5) specified by the manufacturer.
Buffer choice dictates the magnitude of the drop. TRIS (pKa ≈ 7.8 at 37 °C) lies farther from the initial reaction pH (≈6.67) than HEPES (pKa ≈ 7.3), so its buffering capacity is lower. RNA length modulates curve shape: the longer CSP mRNA transcript maintains an almost linear decline, indicative of either initiation, elongation or termination limitation when transcribing CSP mRNA, whereas eGFP mRNA shows a more pronounced initial fall. Based on H–H modeling results, a pH drop is expected in all commonly used IVT buffers (see Fig. S3) and even at increased buffer concentrations (see Fig. S5).
As illustrated in Fig. 3, the UKF-predicted RNA concentrations closely paralleled experimental measurements, with R2 > 0.90 in every case. Comparing the two models under the four conditions, the H–H model generally tracks the experimental data well and in some cases (e.g., Fig. 3A) appears to follow the experimental trend more closely than the UKF during mid-reaction; however, the UKF achieves a higher overall R2 because it can dynamically correct its trajectory as new pH measurements arrive. The kinetic-UKF model is particularly advantageous under conditions where the reaction deviates from simple proton-balance assumptions (e.g., Fig. 3D, CSP-TRIS), where the UKF's adaptive correction yields R2 = 0.983 compared to 0.966 for the H–H model.
This confirms that a single, frequent pH measurement can reliably track overall reaction progression, substantially reducing the need for frequent, labor-intensive offline sampling that gives time-lagged readings and is subject to cumulative manual errors from aliquot withdrawal, EDTA quenching, spin-column purification, and instrument calibration. Indeed, the RMSE values for UKF-predicted RNA yield (0.40–1.15 g L−1, Table 1) are comparable to or smaller than the standard deviations of the offline UV absorbance measurements themselves, suggesting that a significant fraction of the apparent prediction “error” may originate from variability in the offline reference assay rather than from the soft sensor. Because the soft sensor derives its estimates from a single, continuous pH signal that bypasses all sample-handling steps, it can provide not only higher-frequency monitoring but also potentially more consistent estimates of reaction progress.
| HEPES + eGFP | HEPES + CSP | TRIS + eGFP | TRIS + CSP | |||||
|---|---|---|---|---|---|---|---|---|
| Analyte | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 |
| RNA (H–H) | 1.054 | 0.890 | 1.120 | 0.914 | 0.732 | 0.964 | 0.556 | 0.966 |
| RNA (UKF) | 0.890 | 0.911 | 1.152 | 0.909 | 0.944 | 0.915 | 0.396 | 0.983 |
| NTP (H–H) | 0.778 | 0.909 | 1.949 | 0.696 | 1.694 | 0.641 | 1.231 | 0.850 |
| NTP (UKF) | 0.743 | 0.898 | 1.454 | 0.820 | 1.463 | 0.799 | 1.231 | 0.850 |
Occasional deviations of the UKF mean from individual experimental points align with regions of higher sensor noise or offline assay variability rather than a systematic model error. In addition, some data points (e.g., at 100 and 120 minutes in Fig. 3A) may be affected by experimental or analytical bias. Despite low standard deviation, this does not exclude systematic error (e.g., dilution or sample handling error), which would not necessarily be reflected in the error bars. The limited number of replicates further restricts definitive interpretation. The ±2σ uncertainty bands around the UKF traces (largest in the eGFP–TRIS) are drawn directly from the filter's state-covariance update and mirror increases in pH measurement noise when the reaction drifts outside the sensor's optimal range. Notably, in the CSP–TRIS run the final RNA yield fell below the mechanistic model's initial forecast; the UKF responded by gradually adjusting its prediction downward from 60 to 120 minutes as the real-time pH began to diverge from the model's expected trajectory, illustrating the filter's ability to detect and correct systematic deviations. For additional details on the filter's adaptive weighting (process covariance trace and Kalman gain), see SI Fig. S6A&B. Both the eGFP and CSP mRNA produced in the two different IVT buffers were of high integrity (intactness), as evidenced by CGE measurements, see Fig. S7.
We also employed a H–H approach to estimate RNA yield solely based on proton balance and buffer equilibrium. As shown in Fig. 3, overlaying the H–H predictions on the experimental data yields an average coefficient of determination of R2 > 0.90 under all conditions, despite the absence of any fitted parameters or adjustments to the data by a Kalman filter. Under the assumption of perfectly accurate pH measurements, the H–H curve can be regarded as an internal “ground truth,” such that any systematic deviation of the experimental RNA yields from this curve may reflect assay noise or bias. This interpretation is reinforced in Fig. 3C: during the 20–60 minute window, which coincides with a large pH drop, both the H–H and UKF models overestimate the measured RNA, whereas the raw data remain comparatively flat, suggesting sampling artifacts rather than model failure. Of course, pH itself is challenging to measure with absolute accuracy and precision, and neither model captures every possible side reaction or ionic interaction. Nonetheless, the high R2 values between the RNA yield predicted by the model and that measured by offline analytical assays indicate that the pH change reflects the proton balance associated with RNA formation in both models (kinetic-UKF and H–H). This shows that pH captures the dominant proton-release chemistry and supports its use as a robust, non-invasive input for the soft sensor.
We also applied the H–H approach to infer the consumption of individual NTPs under the four template–buffer conditions, using only proton balance and buffer equilibria. As shown in Fig. 4E–H, the H–H model predicted NTP consumption with an overall R2 > 0.76 relative to offline AEX–HPLC measurements, without fitting parameters or Kalman filter updates. The lowest average R2 (0.64) was obtained for eGFP mRNA production in TRIS buffer, while the highest average R2 (0.91) was observed for eGFP mRNA production in HEPES buffer.
Overall, the kinetic model embedded in the UKF, and to a lesser extent the H–H model, capture the expected NTP consumption patterns and rates, and the modest drop in R2 highlights the limits of offline assays. A full summary of the UKF's and H–H model's fit to the NTP data, including RMSE values, is provided in Table 1.
Such rapid and abundant measurements are otherwise not possible for these IVT species. This enables unique real-time, model-aided insights into the chemical speciation and progression of the IVT reaction. As an example, Fig. 5 shows the time-course concentrations of pyrophosphate (PPi), orthophosphate (Pi), and magnesium–nucleotide complexes (Mg2+
:
ATP, Mg2+
:
CTP, Mg2+
:
GTP, Mg2+
:
UTP), as predicted by the kinetic model and dynamically updated using the UKF. The model predicts low and progressively decreasing concentrations of PPi, consistent with its enzymatic hydrolysis to Pi by pyrophosphatase, in accordance with the IVT experimental setup. As a result, Pi concentrations (in the 0–70 mM range) increase over time, as expected. The concentrations of Mg
:
NTP complexes remain low (in the µM range) and decline gradually as the four NTPs are consumed through incorporation into the growing RNA chain. The time-course concentration changes of the free and total H+ alongside buffering species (namely, the protonated (acid) and deprotonated (base) forms of TRIS, HEPES, and acetate), predicted by both the H–H and UKF-embedded kinetic models are shown in Fig. S8 and S9. As expected, the concentrations of the acid forms increase over time, while those of the base forms decrease, due to proton release during the IVT reaction. Among the conditions tested, the HEPES-buffered reactions (blue and green curves) exhibit the smallest variation in free H+, indicating that HEPES provides stronger pH buffering compared to TRIS under these IVT conditions. This aligns with the pKa of HEPES (7.3 at 37 °C) being closer to the IVT operating pH range (5.5–7.0) compared to that of TRIS (7.8 at 37 °C). The soft sensor, which operates by integrating the kinetic model within the UKF framework, supports real-time monitoring of numerous chemical species during the IVT reaction, providing mechanistic insight to aid the development, real-time optimization, and automation of the IVT process.
The observed pH trajectories closely mirrored the reaction's progression under all four tested conditions. In each case, a rapid initial pH drop (Fig. 2) corresponded to high transcriptional activity, RNA yield increase (Fig. 3), and NTP consumption (Fig. 4). The extent and shape of the pH curve were governed by both buffer capacity and RNA yield. For example, the eGFP-TRIS reaction showed the steepest pH drop, driven by a higher RNA yield (∼11 g L−1) and the use of TRIS buffer (pKa ≈ 7.8 at 37 °C), which is farther from the reaction pH (∼5.5–7) than HEPES (pKa ≈ 7.3 at 37 °C). In contrast, CSP–TRIS yielded only ∼8 g L−1 RNA, resulting in a more modest pH decline despite the same buffer system. Meanwhile, both HEPES-buffered reactions (CSP–HEPES and eGFP–HEPES) achieved ∼11 g L−1 yields and similar final pH values. The initial pH of the IVT reactions was lower than that of the buffer, and this difference can be attributed to the addition of NTPs to the reaction.37 Importantly, CGE confirmed that the mRNA products were of high integrity under all conditions (see Fig. S7). These results support the use of pH as a robust indicator of IVT productivity, provided that buffer conditions are well characterized.
We developed a soft sensor that integrates pH measurements into a kinetic IVT model embedded within a UKF. The kinetic model continuously generates model predictions at 25 millisecond intervals and compares them to experimental pH measurements acquired at every 30 seconds, therefore state updates occur at a frequency of 30 seconds. This comparison allows the UKF to update the model state periodically, correcting for noise, uncertainty, and process variability. As a result, the soft sensor provides more accurate and robust live estimates of reaction progress. This soft sensor generates approximately 1600 model predictions of IVT and buffer species concentrations; equivalent to 40 sets of time-resolved predictions per second. Importantly, the kinetic IVT model uses fixed kinetic parameters that were calibrated against experimental data prior to runtime; only state variables are updated by the UKF during operation. A separate set of experimental data (not used for model fitting) was used to validate both the IVT model (embedded into the UKF) and the semi-empirical H–H model, and the R2 and RMSE values reported in this work were calculated against this separate validation dataset. A sensitivity analysis for the underlying kinetic IVT model parameters is provided in ref. 20.
The UKF-based soft sensor achieved strong agreement with offline RNA measurements (R2 > 0.90), demonstrating that pH input can be effectively used to infer RNA yield in real-time. The soft sensor provides a more continuous and potentially more accurate readout than intermittent offline assays, which are subject to manual sampling errors, dilution errors, and instrument calibration variability. By eliminating the need for repeated manual sampling, the soft sensor reduces the risk of such errors and provides high-frequency monitoring that captures transient process dynamics that would be missed by periodic offline measurements. Similarly, NTP depletion profiles predicted by the UKF showed a robust correlation with offline AEX-HPLC data (overall R2 ≈ 0.84), although deviations were more pronounced due to greater experimental noise in NTP assays (Fig. 4). The H–H model also reproduced the overall trends in NTP concentration profiles, albeit with lower accuracy (overall R2 ≈ 0.76) relative to offline AEX-HPLC measurements. Nevertheless, this performance can be considered satisfactory given the model's simplicity, the absence of parameter fitting to experimental data, and the lack of state updates via the UKF. Notably, model predictions reflected the known nucleotide composition of each template (Table S2): NTPs with higher fractional abundance in the RNA were consumed more rapidly, and the onset of substrate limitation was clearly mirrored in the flattening of the pH trajectory.
While the overall trends matched well, some discrepancies were observed. Initial NTP concentrations measured by AEX-HPLC varied between 8 and 11 mM despite nominally starting at 10 mM, suggesting pipetting, sampling or AEX-HPLC calibration errors. In CSP–TRIS, for example, offline data suggested near-complete CTP depletion, which was inconsistent with the final RNA yield. Variability in the experimental data (e.g., fluctuations in the AEX-HPLC measurements during the first 30–50 minutes in Fig. 4C and D) may arise from several sources, including pipetting errors during IVT assembly, sampling inaccuracies (e.g., non-representative aliquots due to local heterogeneities in the reaction mixture), dilution errors during sample preparation for AEX-HPLC analysis, and calibration or operational issues related to the AEX-HPLC instrument. Overall, these deviations were attributed to experimental noise rather than model inaccuracy, further supporting the reliability of the soft sensor in noisy environments. The soft sensor thus provides a more continuous and potentially more reliable readout than intermittent offline assays, especially in the context of noisy or resource-intensive analytical workflows.
Beyond RNA and NTP quantification, the underlying kinetic IVT model can track the time-dependent concentrations of ≈40 IVT species (see Table S3). This results in around 1600 concentration estimates per second and over 11.5 million model predictions across a 2-hour IVT reaction. This provides a rich dataset for IVT optimization and mechanistic insight into buffering dynamics, proton flux, and enzyme performance. For example, as concentrations and charge states evolve during the reaction, the IVT model–based soft sensor can identify which species contribute to buffering the released H+. For instance, Mg2+
:
NTP complexes (pKa ≈ 6.5) increasingly bind protons as the pH drops below 6.5.38 However, by the time such low pH values are reached, Mg2+
:
NTP concentrations have also declined (as NTPs are consumed to produce RNA), therefore there will be fewer Mg2+
:
NTP complexes present to bind protons. The concentration of a subset of IVT species was plotted over time (cf. Fig. 5, S8 and S9). These predictions offer real-time insight into the evolving chemical environment of the IVT reaction, useful for optimizing buffer formulations, magnesium usage, and enzyme loading. For example, the model correctly predicted near-complete degradation of PPi into Pi by pyrophosphatase, as well as micromolar-level depletion of Mg
:
NTP complexes in line with nucleotide consumption (see Fig. 5).
Importantly, buffer dynamics were also captured: HEPES-buffered reactions exhibited the smallest variation in free H+ concentration, consistent with its pKa ≈ 7.3 at 37 °C being closer to the IVT operating pH range than that of TRIS (pKa ≈ 7.8 at 37 °C). These mechanistic insights highlight the soft sensor's utility not just for monitoring, but also for IVT process development and design space exploration.
It is important, however, to distinguish between directly validated outputs and model-inferred quantities: RNA yield was validated using orthogonal analytical methods (UV-vis spectroscopy and fluorometry) and individual NTP concentrations using AEX-HPLC, whereas the remaining IVT and buffer species are inferred through the mechanistic framework, which relies on mass balances, equilibrium constants, and fixed kinetic parameters (Table S3). These inferred states are therefore contingent on the validity of the model structure and parameterization, and should not be interpreted as directly measured quantities. The associated uncertainty arises from model assumptions, parameter uncertainty, and potential unmodeled effects (e.g., side reactions or deviations from equilibrium); nonetheless, inference through literature-reported equilibrium constants represents the most accurate quantification currently achievable for these species, which cannot be directly measured in real-time during IVT reactions. Future work could validate additional species through orthogonal assays, complementary sensors or targeted experimental measurements, and further reduce uncertainty through parameter estimation.
The lower R2 values observed for NTP predictions (compared to RNA yield) are largely attributable to higher experimental variability in the offline AEX-HPLC assay, as evidenced by the standard deviations of these measurements, rather than systematic model inaccuracy. On a related note, pH is not coupled directly to each species in the IVT model, therefore there are observability limitations. To minimize pH drift and maintain optimal enzyme activity, the IVT reaction could be initiated at a higher pH (e.g., pH 8–8.5), allowing the pH to decrease into the pKa range of HEPES or TRIS during the reaction and thereby making more effective use of the buffer's capacity.37,39 This would also provide a larger measurable pH window for the soft sensor. Additionally, the NTP counterion formulation may influence the sensor's performance: the sodium-salt NTPs used in this work do not contribute to buffering, whereas TRIS-salt NTPs would add buffering capacity, potentially reducing the initial pH dip and the magnitude of the subsequent pH signal.
Both the UKF and H–H models demonstrated robustness against measurement noise and offline assay variability. The UKF's ability to repeatedly reconcile model predictions with frequent pH measurements allows it to detect and correct for systematic deviations over time.
The H–H approach, while simpler, also achieved RNA yield predictions with R2 > 0.90, and an overall NTP consumption prediction with R2 > 0.76, despite using no kinetic parameters or experimental fitting. Accounting for ≈15 IVT species (Table S3), it provides a rapid, low-complexity benchmark or internal reference, particularly valuable when high-precision pH measurements are available and kinetic parameterization is not feasible. Consequently, the H–H quantification could serve to validate the kinetic-UKF model outputs within a model-based predictive control strategy, as it is independent of kinetic parameterization. Future improvements could focus on refining the process noise covariance to further improve UKF performance, particularly during early reaction phases. Additionally, while the current kinetic model reflects standard T7 polymerase behavior, adapting it to other polymerases or buffer systems might require reparameterization and revalidation.
This pH-based soft sensor is expected to enable real-time monitoring of RNA yield, NTP consumption, and buffering dynamics across all commonly used IVT buffers. This is supported by H–H modeling results (Fig. S3), which show that IVT buffers such as HEPES, TRIS–acetate, TRIS–HCl, and HEPES titrated with NaOH exhibit limited buffering capacity. Notably, even at elevated buffer concentrations, most likely suboptimal for IVT performance, a measurable pH drop should persist (Fig. S5), indicating that the soft sensing approach remains applicable under a broad range of conditions. However, the sensor's sensitivity depends on the formulation producing a sufficient pH signal; formulations with substantially higher buffer concentrations or TRIS-salt NTPs could reduce this signal and should be validated experimentally. The H–H predictions in Fig. S5 are theoretical and may not capture all interactions present in an assembled IVT reaction.
This soft sensing framework has direct implications for real-time bioprocess control. Because RNA yield and NTP depletion can be inferred continuously with low (25 millisecond) latency, it is feasible to implement closed-loop control strategies in which pH signals trigger automated feeding of NTPs, enzymes, or buffer, preventing premature substrate depletion or suboptimal reaction conditions. If deviations are detected from current measurements, feedback control can be applied to adjust and optimize the process in real time. Moreover, the kinetic model can forecast future trajectories of KPIs (e.g., mRNA yield and NTP concentrations over the next 5–10 minutes). If the model predicts a potential decrease in RNA yield or other KPI deviation, corrective actions can be implemented through feed-forward control (e.g., model-predictive control) to prevent deviations before they occur. Such digital twin-based control approaches could be vital for maximizing yields and productivity while minimizing reagent costs and reaction times, especially in a continuous manufacturing setting.
The H–H model effectively represents a reduced-complexity alternative to the full mechanistic IVT-UKF model. Despite tracking only ≈15 species (compared to ≈40 for the kinetic model), the H–H model achieved comparable RNA yield predictions (R2 > rbin 0.90) without kinetic parameterization. This demonstrates that a simplified proton-balance approach can serve as a practical monitoring tool when kinetic parameterization is not feasible. However, the mechanistic IVT-UKF model provides additional value by explicitly representing reaction species and equilibria, enabling estimation of internal states (e.g., Mg
:
NTP complexes, PPi, Pi) and forecasting future reaction behavior—capabilities that are essential for advanced monitoring and potential model-based control. Additionally, the minimal, non-intrusive sampling requirements of the pH-driven soft sensor make it attractive for scale-up, as repeated offline assays can be resource-intensive. Furthermore, the approach aligns with the principles of QbDD, enabling in silico scenario analysis, real-time decision-making, and advanced process automation.
Finally, because proton release from the 3′-OH of the ribose is a universal feature of IVT for mRNA, self-amplifying RNA (saRNA), trans-amplifying RNA (taRNA), and circular RNA (circRNA),4,40 and IVT buffers have limited buffering capacity (cf. Fig. S3), this pH-based soft sensing framework offers a broadly applicable tool for monitoring and optimizing the manufacturing of any mRNA, saRNA, and circRNA vaccines and therapeutics. It can also be extended to monitor DNA polymerization or other bioprocesses in which measurable pH changes correlate with substrate depletion, product formation, or reaction progress—such as enzymatic conversions or metabolic pathways in cell-free systems.
Since pH is routinely logged in bioreactors, this soft sensor has strong potential as a widely applicable PAT, offering a convenient, non-invasive, single-variable route to continuously monitor KPIs in real-time, while reducing the need for offline assays. The pH-driven soft sensor was implemented here in a batch IVT reaction; however, in principle, it can also be deployed in continuous flow IVT reactors, where two or more pH sensors would feed into the models to monitor reaction progression along the reactor.
Future work will evaluate the suitability of this soft sensor across additional IVT buffers (e.g., TRIS, TRIS base, HEPES-NaOH), varied buffer concentrations, NTP counterion formulations (e.g., TRIS-salt NTPs), varying starting pH values, and RNA formats including saRNA and circRNA. H–H simulations could be used to systematically map the sensor's operating envelope across this formulation space, with targeted experimental validation of representative conditions. The pH measurement hardware will also be expanded to include microelectrode probes and pH flow cells, which may offer greater robustness for longer-duration and larger-scale continuous production runs. Overall, pH-driven soft sensing converts an already-monitored variable into actionable insight, enabling real-time control to improve throughput, quality and reproducibility in mRNA medicine manufacturing.
| This journal is © The Royal Society of Chemistry 2026 |