Open Access Article
Muhammed Kashif*,
Mark E. Keating and
Hugh J. Byrne
Physical to Life Sciences Research Hub, FOCAS, Technological University Dublin, City Campus, Dublin D08 CKP1, Ireland. E-mail: Muhammed.Kashif@TUDublin.ie
First published on 23rd February 2026
Raman Spectroscopy (RS) is a powerful technique for the identification of molecules based on the characteristic fingerprint spectra of their vibrational modes. Although challenging, real time spectroscopic monitoring of reactions and processes has great potential value in multiple fields, including process analysis, bio reactors, cell therapies and in vitro metabolomics. Refined chemometrics methodologies are required to datamine the kinetic evolution of multivariate spectral mixtures to establish the constituent reactants and products, as well as the characteristic rates of the reaction. To explore the capabilities and challenges, RS was used to study the chemical kinetics of propyl acetate hydrolysis in an aqueous environment at room temperature, in situ, as a model reaction. The continuous conversion of propyl acetate to 1-propanol and acetic acid was monitored periodically over 250 min using RS with a 532 nm laser source. Simulated admixture solutions, mimicking the reaction from pure reactants to pure products conversion, were also recorded for comparison. Problem based nonlinear least squares (NLS) fitting was applied to both the actual reaction and simulated solution data sets using pure components spectra of both the reactants, propyl acetate, water and products, 1-propanol and acetic acid, in order to visualise and confirm the trends and kinetics of the reaction components. Multivariate Curve Resolution-Alternating Least Squares analysis (MCR-ALS) with kinetic constraints was applied to further resolve the concentration and spectral profiles and to quantify the rates of the reaction. It is demonstrated that MCR-ALS could not accurately resolve the evolving reaction species with respect to concentration, due to rank deficiency. To enhance the analysis, a data augmentation approach was used, seeding the measured datasets with the spectra of the pure components to bias the initial singular value decomposition and spectral unmixing process, resulting in an improved resolution of the systematic variation of concentration dependent data to monitor the kinetic evolution of the reaction mixture. The required seeding weights were optimized by visualizing the sum residual error (SUMR) in least squares fitting of the actual components with the identified pure components by MCR-ALS. Minimum SUMR values for admixtures were found at a seeding weight of 10
000X, while 100X was found to be optimum for the actual reaction. This proof of concept can further pave the way for better analysis and understanding of cascade reactions, and ultimately, potentially of metabolomic pathways.
Raman spectroscopy has emerged as a highly versatile tool for the study of reaction kinetics in pharmaceutical processes, disease diagnostics, cellular functions, and real-time monitoring of chemical and biological species.6,7 In pharmaceutical applications, Raman spectroscopy enables precise tracking of drug crystallization, polymorphic transitions, and degradation pathways under process-relevant conditions, providing crucial insights for quality control and formulation stability.8 Raman spectroscopy excels in capturing onsite chemical changes, such as oxidation, hydrolysis, and catalytic processes, with high temporal resolution,9 making it invaluable for understanding complex reaction mechanisms in both chemical and biological contexts.10,11 In process analytical technology (PAT), Raman spectroscopy enables continuous, analysis of critical quality attributes (CQAs) and process parameters (CPPs).12,13 In bioreactor monitoring, it is widely applied for tracking metabolites, nutrients, and biomass in real time.14–16 Its integration with chemometrics and multivariate analysis allows simultaneous quantification of multiple analytes, supporting efficient upstream and downstream process control.15,17 Within cellular environments, Raman spectroscopy offers non-invasive, label-free observation of intracellular dynamics, enabling real-time studies of metabolic pathways, drug uptake, and organelle-specific reactions without disrupting biological integrity.18
Despite the widely accepted promise of Raman spectroscopy, accurately datamining the spectroscopic signatures and quantifying the kinetic parameters of the processes of interest remains a challenge. In this context, the technique of multivariate curve resolution, alternating least squares (MCR-ALS) fitting can be employed to extract independent spectral profiles from a complex mixture,19 and the analysis can be constrained using kinetic rate equations to describe the evolution of the component concentrations over time, in a hard–soft modelling approach.20 Such an approach has been employed to analyse the in vitro uptake of chemotherapeutic drugs in biological cells,21–23 as well as stem cell differentiation.24
MCR-ALS exhibits notable limitations when applied to complex reaction systems, however, particularly those involving multiple reactants and products. The case of complete and incomplete kinetic processes, in which the starting or end point points are respectively missing from the experimental analysis, was considered by de Juan et al.25 The study also assessed the influence of drift of the spectral baseline or of an absorbing interference in the kinetic process, as monitored by UV-Visible absorption spectroscopy. For the analysis of complex cascade reactions, faithful resolution of the spectral components of parallel or competing processes has been recognised as a challenge in a number of studies, and the issue has been addressed in part by the use of complementary techniques, such as chromatography,26 or spectral analysis of similar compounds,27 which are analysed in a multiset structure.28
A promising solution to address the challenges of resolving complex reaction systems is the recently demonstrated seeded-MCR-ALS approach.29 In this method, the data matrix is deliberately augmented with known spectral profiles, either full spectra of pure components or selected features as “seeds”, prior to curve resolution. This strategic seeding significantly biases the multivariate optimization toward chemically meaningful solutions, effectively guiding the algorithm to unmix overlapping spectra in complex mixtures. By integrating prior spectral knowledge directly into the MCR-ALS workflow, seeded-MCR-ALS offers a robust means to disentangle multiple reactants and products even in multiproduct reactions thus mitigating rotational ambiguity and improving interpretability of concentration change profiles in complex kinetic studies.
In this research work, the propyl acetate acid hydrolysis reaction (Fig. 1) was studied to demonstrate the proof of concept, and for better understanding, eleven simulated admixtures of reactants and products were prepared to mimic the conversion of 100% reactants to 100% products. The contributions of the respective constituent component spectra to the evolution or the reaction data were initially illustrated using a problem based nonlinear least squared NLS fitting approach. The limitations of MCR-ALS in resolving simultaneously evolving components are demonstrated, and then the seeded approach was utilized for better identification of pure reaction components in the singular value decomposition (SVD) stage of the MCR-ALS algorithm. Least squares fitting was applied to optimize the seed weight by observing the sum residual error (SUMR) of the MCR-ALS results.
Fig. 1 shows the reaction scheme of propyl acetate hydrolysis, in which the OR group of an ester is replaced by an OH from water, resulting in the corresponding carboxylic acid (acetic acid) and alcohol (propan-1-ol). The reaction is promoted by H2SO4 and for methyl ester typically takes place at 20–80 °C. In this study hydrolysis of propyl ester was monitored at room temperature ∼20 °C using Raman spectroscopy.
Fig. 2 illustrates the simulated Raman spectral evolution and the corresponding reaction kinetics for the acid-catalyzed hydrolysis of propyl acetate. In Fig. 2(a), the simulation depicts the progressive changes in Raman spectra of the pure components over the course of the reaction, modeled using 100 acquisition points from the initial state to reaction completion, thereby capturing the continuous spectral transformation associated with reactant consumption and product formation. Fig. 2(b) presents the reaction kinetics obtained from supervised MCR-ALS analysis, clearly demonstrating a systematic decrease in the reactant contributions accompanied by a corresponding increase in the product profiles, consistent with the expected reaction progression.
Propyl acetate and water were mixed with the 1
:
10 by volume, taking 5 mL of propyl acetate and 50 mL of water, 1 mL of concentrated H2SO4 was used to catalyze the reaction.30 Reaction was allowed to continue for 250 min at room temperature around 20 °C under gentle stirring at 500 rpm. Progress of reaction was monitored periodically with the interval of 5–10 min using Raman spectroscopy. Eleven simulated admixture solutions were made at room temperature for the comparison according to the volume ratios shown in Table 1S. The “admixture 0” contains 0% products and 100% reactants while “admixture 10” contains 100% products and 0% reactants. The 10% change of reactants concentration to the products by volume has been considered in each admixture from 0–10th admixture.
Data was recorded by placing the beaker containing the reaction mixture under the 10× lens. Raman acquisitions are made keeping the focus inside the reaction mixture. The same conditions were used to record simulated solutions data. Once acquired, each spectrum was baseline corrected using the asymmetric least squares (ALS) method, noise subtracted and lightly filtered using a third order linear model to improve.
4 components:
| A → B + C + D, |
| A + B → C + D if A = B, |
| A + B → C + D if A = 1 and B = 10 |
3 components:
| A → B + C |
2 components:
| A → B |
Spectral profiles (sopt) and concentration profiles (copt) were analysed to resolve the kinetics of the reaction by MCR-ALS.
MCR-ALS normally uploads a reaction data set (Rdata) in the form of an (n × m) matrix of n spectra of spectral range m datapoints. In the data seeding process, Rdata was augmented by adding the spectra of each of the four reaction components, CA, CB, CC, CD, to the same matrix, in the form of (1 × m) matrices, resulting in a matrix of dimensions (n + 4 × m). In order to explore the effect of biasing the algorithm in favour of the components spectra the value of seeding weight “X” was varied from 1–10
000 and sum residual error (SUMR) of the model was monitored continuously to get minimal SUMR values and best fit of the components in the data. This seeding protocol was applied to both the experimental and simulated reaction spectral datasets. Fig. 4 illustrates the overall workflow of the chemometric analysis performed in this study.
Multiple changes in spectral features are observed as the reaction proceeds; the most robust changes include the evolution of two distinct peaks in the region of 800–900 cm−1 linked to products formation, while a continuous decrease of features in the region below 800 cm−1 and above 900 cm−1 to 3100 cm−1 occurs. A similar behavior was observed in the Raman spectral acquisition of the 11 simulated admixtures of the reaction components. Fig. 6(a) shows the spectral data for the simulated admixtures of the gradually increasing product ratios as compared to the reactants in the mixtures.
:
10 dilution ratio. The features were observed at 1
:
20 dilution ratio and above, which confirms that certain spectroscopic features may be suppressed in the presence of other molecules at some ratios of the mixtures. In terms of the spectroscopic evolution, the water spectrum emerges as product in the reaction instead of the reactant and may be seen as increasing component in the reaction kinetics plots.
These findings confirm that the application of NLS fitting to elucidating kinetic parameters in systems in which spectral data are complex and overlapping, does not offer precise insights into the dynamic behaviour of the multi-component processes like ester hydrolysis reactions, in which more than one component react to give two or more components e.g. A + B → C + D. These algorithms are quite good for the simple process in which change is considered as a whole e.g. A → B.31,32
MCR-ALS can be constrained by kinetic models to quantify the rate of evolution. Although, the reaction of Fig. 1 should be represented by a model of the form A + B → C + D, although, as A and B, and B and C might be expected to evolve simultaneous, they may be equally represented by a more simplistic model of the form A → B. However, the observed suppression of the Raman signal of water in the presence of propyl acetate, and its re-emergence as the propyl acetate is consumed as the reaction progresses means that the Raman spectroscopic representation of the reaction may be better described by a model of the form A → B + C + D, or A → B + C. All four models were therefore used as kinetic constraints of the MCR-ALS analysis of both the simulated and experimental reaction data sets. All the models depict time evolution of just as two components in the data and do not exhibit the correct spectral variation for the actual 4 components.
Fig. 44S shows the MCR-ALS time evolution for the 4 components for the model A + B → C + D if A = 1 and B = 10. This also does not correctly represent the correct time evolution profile for the 4 components due to inappropriate selection of the components. While Fig. 45S–47S show the component selection by the algorithm for three different models considering 4 components in the data. Fig. 48S(a) shows the supervised MCR-ALS time evolution for the simulated admixtures data for 4 components A → B + C + D. Fig. 48S(b) shows the time evolution for the simulated admixtures data for 4 components of kinetic model A + B → C + D, Fig. 44S(c) shows the time evolution for the simulated admixtures data for 3 components using kinetic model A → B + C and Fig. 44S(d) shows the time evolution for the simulated admixtures data for 2 components using kinetic model A → B. Fig. 49S shows the time evolution for the simulated admixtures data for 4 components using kinetic model A + B → C + D if A = 1, B = 10. While the Fig. 50S–52S illustrates the component selection for different models of 4 components for the admixtures data. Indeed, all the models show a time evolution characteristic of either only two components or incorrect, and do not follow the correct spectral variations for the actual components due to incorrect selection of components. In all cases, MCR-ALS effectively only resolves two components, as simultaneously evolving reactants and/or products are not distinguished. It suggests that, while MCR-ALS is a powerful technique for the simple processes in which spectral changes are considered as a whole, from one point to other,33–35 it has limitations when applied to multicomponent processes.
000X, and Fig. 9(b) shows the components identified by the seeded MCR-ALS. The components spectra at this seeding weight are nearly identical to the original spectra of the pure components, shown in Fig. 5(a). As the components in the simulated admixtures change from pure reactants to the pure products, the spectral changes are translated to concentration profiles by MCR-ALS accordingly. Similarly Fig. 9(c) shows the concentration profile for seeded MCR-ALS analysis of the reaction data at 100X components seeding and Fig. 9(d) shows the components identified by the seeded MCR-ALS. The components spectra are again quite similar to the original spectra of the reaction components, shown in Fig. 5(a). This suggests that the augmentation of data with component seeds at specific weights enhances the accurate identification of the components in the data and results in correct concentration changes over time.
000X does not further reduce the SUMR values. The minimal SUMR values for all four components were obtained at 10
000X seeding weights, which are close to zero, 0.0043, 0.0021, 0.0030 and 0.0034 respectively. At this seeding weight, MCR-ALS is capable of accurately identifying the participating 4 components in the simulated admixtures.
000
| Seeding Weight (X) | SUMR component A (propyl acetate) | SUMR component B (water) | SUMR component C (propanol) | SUMR component D (acetic acid) |
|---|---|---|---|---|
| 1 | 0.8456 | 1.2582 | 0.1408 | 0.0502 |
| 2 | 0.5681 | 0.4008 | 0.0466 | 0.0257 |
| 3 | 0.4642 | 0.1895 | 0.0231 | 0.0119 |
| 4 | 0.3999 | 0.1095 | 0.0148 | 0.0101 |
| 10 | 0.1441 | 0.0191 | 0.0068 | 0.0082 |
| 20 | 0.0484 | 0.0053 | 0.0059 | 0.0078 |
| 50 | 0.0474 | 0.0023 | 0.0059 | 0.0078 |
| 100 | 0.0474 | 0.0021 | 0.0059 | 0.0078 |
| 1000 | 0.0474 | 0.0021 | 0.0059 | 0.0078 |
10 000 |
0.0043 | 0.0021 | 0.0030 | 0.0034 |
Fig. 11(a)–(d) shows the comparison of original components spectra of the components fitted with the identified spectra of the components along with the residual spectra, multiplied by 10. The flat lines of the residual spectra confirm that the original spectra and the fitted spectra (identified by the MCR-ALS in data) are effectively identical, providing the proof of concept that seeding with an appropriately weighted seed, the algorithm can correctly identify the components that results in generation of accurate concentration profile/kinetics of the process.
000X. SUMR values are minimal at 100X seed weights. The further increase in seed weights does not reduce the SUMR values. The SUMR values for all four components obtained at 100X seeding weights are 0.0474, 0.0021, 0.0058 and 0.0078 respectively. At this seeding weight, MCR-ALS is capable of accurately identifying the participating 4 components in the simulated admixtures.
000
| Seeding weight (X) | SUMR component A (propyl acetate) | SUMR component B (water) | SUMR component C (propanol) | SUMR component D (acetic acid) |
|---|---|---|---|---|
| 1 | 1.4247 | 1.9852 | 0.0620 | 0.1407 |
| 2 | 1.0224 | 0.6699 | 0.0521 | 0.0799 |
| 3 | 0.8674 | 0.3198 | 0.0470 | 0.0516 |
| 4 | 0.7752 | 0.1849 | 0.0437 | 0.0316 |
| 10 | 0.3386 | 0.0319 | 0.0205 | 0.0093 |
| 20 | 0.0511 | 0.0085 | 0.0059 | 0.0079 |
| 50 | 0.0475 | 0.0027 | 0.0058 | 0.0078 |
| 100 | 0.0474 | 0.0021 | 0.0058 | 0.0078 |
| 1000 | 0.0474 | 0.0021 | 0.0059 | 0.0078 |
10 000 |
0.0474 | 0.0021 | 0.0059 | 0.0078 |
Fig. 54S(a)–(d) shows the comparison of original components spectra of the reaction components fitted with the identified spectra of the components along with the residual spectra multiplied by 10. The straight lines of residual spectra confirm that the original spectra and the fitted spectra (identified by the MCR-ALS in data) are quite similar. Depicting the proof of concept that seeding with a proper weight augment the algorithm to correctly identify the components that results in generation of accurate concentration profile/kinetics of the process.
The initial analysis of the starting reactants indicated that the Raman spectrum of the mixture was not a weighted (1
:
10) linear combination of the spectra of propan-1-ol and water on their own, and analysis of the reaction mixture as it progressed gave the surprising result that the signatures of water emerged as though it were a reaction product. In order to investigate this phenomenon further, admixtures of systematically varied proportions of reactants and products were prepared to simulate the evolution of the reaction mixtures. No heat was applied to promote the reaction, however, and so the composition of these admixtures is precisely known. The spectra of the simulated mixtures were seen to faithfully reproduce the evolution of those of the reaction mixture, and therefore no further analysis was deemed necessary. Note, it is assumed that the tetrahedral intermediate formed in ester hydrolysis is very short lived, low in concentration,38 and therefore does not contribute significantly to the spectroscopic signatures, measured over seconds. Time resolved resonance and ultrafast Raman spectroscopies are used for reaction intermediate detection.39
The observed behavior can be attributed to the change in polarizability due to changes in intermolecular water interactions in the presence of a solute.40 Studies have consistently demonstrated that the presence of solutes can alter the local hydrogen-bond network of water, resulting in shifts, intensity changes, and band-shape modifications in the O–H stretching and bending regions of the vibrational spectrum.41 While propyl acetate is nonpolar and sparingly soluble in water, both the propanol and acetic acid products are capable of hydrogen bonding and thus do not significantly perturb the local clustering of the bulk water, resulting in the reappearance of water signals.42
Although the NLS approach can resolve the evolution of the reaction products, MCR-ALS fails to do so, essentially because of the issue of rank deficiency, as previously highlighted by de Juan et al.20 The rank of a matrix describing R reactions involving S species is the minimum of (R,S). In the case of the reaction A > B + C, for example, the rank is two, which is insufficient for the SVD process to identify that there are three independent species, and the analysis rather returns a representation of the evolution of their mixture. Such rank deficiency has been overcome by, for example, augmenting the data by independent pathways, essentially an additional parallel reaction to increase the rank of the matrix.43,44 Examples include the parallel use of chromatographic and spectroscopic data,26,45 and augmentation of the dataset of a parallel reaction pathway with that of a model compound reaction.27
As an alternative approach, the process of seeding the multivariate analysis with weighted spectra of the known constituent components can be employed. The process of seeding multivariate analysis was first described in the context of Principal Component Analysis (PCA),29,46 which is one of the methodologies commonly used in the initial SVD stages of MCR-ALS. In the case of rank deficiency, SVD will not distinguish two simultaneously evolving components. However, the process of seeding biases the variance of the data set towards one component such that they are more easily distinguishable by PCA. Note that, in the process, the data is mean-centred, and thus the seed will bias the mean, and the mean-centred “score” of each datapoint in the comparative analysis. In Fig. 10 the effect of systematically varying the seeding weight on the resolved component fitting error has been shown for each of the components, although the same weighting was employed for each, at all points in the analysis. It can be seen that the propyl acetate requires the highest seeding weight, while acetic acid requires the lowest for a similar improvement in accuracy of resolution of the component. While further examination is required, including independently varied weightings, which of the species requires higher or lower seeding weight may depend upon the complexity of the spectrum, and/or the similarity of the spectra of simultaneously evolving components.
The approach of seeding the MCR-ALS resulted in the successful resolutions of the reaction components, and their kinetic evolution, promising potential in a range of applications, potentially extending to systems biology and metabolic flux analysis (fluxomics),47–49 although application of the MCR-ALS technique to evolving spectra of the in vitro intracellular50 or extracellular51,52 environment may pose additional challenges.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5an01055a.
| This journal is © The Royal Society of Chemistry 2026 |