Connor J.
Taylor
a,
Hikaru
Seki
b,
Friederike M.
Dannheim
b,
Mark J.
Willis
c,
Graeme
Clemens
d,
Brian A.
Taylor
d,
Thomas W.
Chamberlain
*a and
Richard A.
Bourne
*a
aInstitute of Process Research and Development, School of Chemistry and School of Chemical and Process Engineering, University of Leeds, Leeds, LS2 9JT, UK. E-mail: t.w.chamberlain@leeds.ac.uk; R.A.Bourne@leeds.ac.uk
bDepartment of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
cSchool of Engineering, University of Newcastle, Newcastle upon Tyne, NE1 7RU, UK
dChemical Development, Pharmaceutical Technology & Development, Operations, AstraZeneca, Macclesfield, UK
First published on 7th May 2021
We herein report experimental applications of a novel, automated computational approach to chemical reaction network (CRN) identification. This report shows the first chemical applications of an autonomous tool to identify the kinetic model and parameters of a process, when considering both catalytic species and various integer and non-integer orders in the model's rate laws. This kinetic analysis methodology requires only the input of the species within the chemical system (starting materials, intermediates, products, etc.) and corresponding time-series concentration data to determine the kinetic information of the chemistry of interest. This is performed with minimal human interaction and several case studies were performed to show the wide scope and applicability of this process development tool. The approach described herein can be employed using experimental data from any source and the code for this methodology is also provided open-source.
Such kinetic models can be complex and occur over many reaction steps, involving many measurable and immeasurable intermediates. This allows quantitative chemical synthesis information to be obtained, allowing classical reaction engineering principles to be applied to shorten process development times and lower costs.14 In these instances, obtaining stoichiometric and kinetic descriptions of the individual transformations is often more important than detailed mechanistic insights and rationales.15 For a system involving multiple reactions, each featuring multiple species with differing stoichiometric coefficients and respective kinetic parameters, this is referred to as a chemical reaction network (CRN). Upon determination of a CRN, mathematical relationships are thereby established between participating species, allowing simulations and process engineering to drive reaction understanding and ultimately optimise chemical systems through a reduction in laboratory experiments.16–18 However, in this report we will be explicitly focussing on the identification of the process CRN from apparent kinetics; for further discussions on the scale-up of chemical processes, refer to work by Levin19 and Khang and Levenspiel.20
There are many reports in the literature involving the use of computation to evaluate aspects of a CRN,21–27 many of which are based on work reported by Aris and Mah,28 then later Bonvin and Rippin.29 However, there are no CRN identification tools to date that identify catalytic reaction pathways or non-integer species orders, which greatly limits the applicability of these techniques for many chemical systems. Other techniques have also emerged, such as the use of model-based design of experiments (MBDoE) for various applications.30–35 Whilst there is an increased scope for this methodology, assigning a model may be difficult for noisy systems due to standalone measurements in otherwise empty chemical space.36 Furthermore, expert chemical intuition is necessary in compiling initial model candidate sets for evaluation and many models may be excluded based on their presumed inability to occur.
We recently illustrated the first experimental application of a comprehensive CRN determination methodology,37 initially reported by Tsu et al.14 This machine-learning-based tool identifies and evaluates all possible kinetic models, then uses statistical analysis to establish which model is most likely to be correct. The user simply inputs the mass of each participating species and experimental data, then the approach autonomously identifies the most likely CRN. This paper reports significant improvements to this technique, including how the scope of application has been expanded to zero- and non-integer-order reactions and catalytic processes. This work represents the first CRN determination tool in the literature with these capabilities. Two new experimental case studies are herein reported and one further case study was revisited due to the applicability of this tool to its unusual and complex kinetic model. The methodology can be used with data from any source and any analytical technique, although this work features exclusively reactions in batch using both NMR sampling and HPLC analysis. This upgraded open-source tool serves as a comprehensive, automated resource for the process development of almost any chemistry, where scalable process understanding can be achieved with minimal need for high-level chemical intuition.38
(1) |
The second stage of the approach relates to the suitability of each individual CRN, or model, to the supplied experimental data, referred to as the kinetic fitting stage. A cycle is conducted as a CRN is loaded, then a gradient-based local minimisation algorithm is used to optimise kinetic parameters and fit simulated kinetic curves to the data points using the CRN's respective rate laws, thereby measuring the convergence of the CRN to the chemical data. These simulated kinetic curves are referred to as ordinary differential equations (ODEs), as they are a result of solving a linked set of differential equations. The sum of squared error (SSE) measurement between the measured (experimental) and estimated (simulated) concentrations is the objective function that is minimised to ensure maximum convergence of the model to the data. This SSE value is calculated using eqn (2), where NData is the number of data points sampled, Ej is the jth experimental point and Sj is the jth simulated point:
(2) |
(3) |
This approach serves as an automated tool that removes high-level chemical expertise from both model and kinetic parameter determination during process development. Conducting this model discrimination, without chemical intuition, creates a system whereby a model is identified based purely on the most statistically accurate and quantitative physical-organic description of the chemistry. This allows experts to spend their human resource on more challenging aspects of process development that cannot be otherwise automated, including critically appraising the resultant list of model candidates. For example, after this initial model discrimination experts can alter the identified CRN based on perceptions of chemistries that are unlikely to be true, as model terms may be included due to anomalies in the experimental data provided. More advanced engineering principles may also be applied or further experiments may be prescribed by experts to discriminate between the list of ranked CRNs generated.
Experiments were undertaken at various temperatures between −25 °C and 50 °C using an NMR tube within a 500 MHz NMR machine with a constant acquisition rate, thereby obtaining complete reaction profiles. This experimental data could then be inputted into the computational approach, as well as the species involved (1–5) to identify all possible elementary reactions. Every combination of these reactions (in every form of their corresponding rate laws) were then constructed and comprehensively assessed, resulting in 3320 unique model evaluations. It was found via AICC analysis that the most likely model is the reaction model shown in Scheme 1 with a first-order dependence of both 1 and 2 in forming the major and minor products. The approach also determined the kinetic parameters of the transformation to the major product: k25°C = 0.499 ± 0.006 M−1 min−1, Ea = 44.19 ± 0.57 kJ mol−1 and to the minor product: k25°C = 0.384 ± 0.009 M−1 min−1, Ea = 36.57 ± 0.88 kJ mol−1. In this example the user has only inputted the experimental data, the molecular weights of each of the species and indicated that the reaction is not catalytic. All kinetic analysis was then performed autonomously, producing the list of ranked CRNs to be evaluated by a trained chemist shown in Table 1. After appraisal, this newly obtained information can subsequently be used to significantly shorten the overall process development time during pharmaceutical manufacture. The top-ranked model and the corresponding kinetic parameters allowed a fit to the experimental data with an average residual of less than 0.3% and is shown in Fig. 2. Full experimental details and raw data can be found in the ESI† (Section 3).
Rank | Reaction model | Kinetic parameters | SSE/103 M | AICC | |
---|---|---|---|---|---|
k 25°C/Mα min−1 | E a/kJ mol−1 | ||||
1 | 1 + 2 → 3 + 5 | 0.499 | 44.19 | 1.614 | −51.91 |
1 + 2 → 4 + 5 | 0.384 | 36.57 | |||
2 | 1 + 2 → 3 + 5 | 0.407 | 40.61 | 3.723 | −48.57 |
1 + 20.5 → 4 + 5 | 0.073 | 31.01 | |||
3 | 1 + 2 → 3 + 5 | 0.466 | 42.25 | 3.777 | −48.51 |
10 + 2 → 4 + 5 | 0.017 | 27.56 |
It is clear from the kinetic plots that the fit of the ODEs to the experimental data do not result in normally-distributed residuals. This could be for one of many reasons but is most likely due to either NMR integrations errors, evaporation of solvent (CD3OD) into the headspace at higher temperatures or changes in solvent viscosity at these differing temperatures. It is also possible that liberation of HCl as the reaction progresses forms rapid equilibria with the base, NEt3, and partial protonation of the nucleophile leading to reduced reactivity. Although this approach can automatically determine the most likely reaction model and kinetic parameters from experimental data, all of these factors must still be considered by a trained chemist upon completion of the computation to confirm the methodology output. In this case, however, the residuals are only a small consideration and it can be observed that the model still provides a good fit to the data. This kinetic information can then be used to optimise this process for the highest yields of the 4-substituted product, 3, when scaling up production of the material.
The next case study conducted was a chemical system featuring the protection of an amino acid for further functionalisation, where alanine methyl ester (Al-Me), 6, reacts with 9-bromo-9-phenylfluorene (PfBr), 7, to form the protected amino acid (Pf-Al-Me), 8, and hydrobromic acid, 9, as shown in Scheme 2 as a presupposed model. PfBr is the reagent used to introduce the 9-phenylfluorene (Pf) protecting group in synthesis. Pf is a pharmaceutically relevant and bulky protecting group, that can be introduced as a more acid stable alternative to the more commonly used trityl group.42,43 This case study is the protection step involved in the total synthesis of (S)-eleagnine from L-alanine. Therefore, understanding this transformation by performing kinetic analysis would lower costs and accelerate process development in the overall bioactive alkaloid manufacture,44 following promising reports of potent analgesic properties.45
Three experiments were run between 30 °C and 40 °C in a batch vessel, then the four identified species (6–9) and all experimental data were inputted into the computational approach. In this system a heterogeneous base, K3PO4, was added to remove the HBr as it formed. Two feasible mass-balance-allowed reactions were calculated, then each combination of these reactions (in every variant of their rate laws) were assessed by the approach automatically, resulting in 30 unique model evaluations. The most likely representation of the model was identified by the approach as the presupposed forward reaction shown in Scheme 2, with a first-order dependence on PfBr (7) and a zero-order dependence on Al-Me (6). This has not been reported elsewhere in the literature, but this discovery makes chemical sense as the bulky aromatic rings would stabilise the cation formed, should the reaction proceed via a traditional SN1 mechanism; this means that the rate-determining step is likely to be the loss of the bromide ion, followed by a fast nucleophilic addition by the alanine methyl ester. The approach also determined the kinetic parameters of this transformation to be k35°C = 1.06 × 10−2 ± 0.01 × 10−2 min−1, Ea = 62.91 ± 0.23 kJ mol−1. This example further proves how using this automated methodology in conjunction with chemical study helps to both elucidate reaction mechanisms and gain process understanding for real-life applications. The top-ranked model and corresponding parameters allowed a fit to the experimental data with an average residual of less than 0.4% and is shown in Fig. 3. Full experimental details, raw data and model rankings can be found in the ESI† (Section 4).
The final case study explored was the chemical system involving maleic acid, 10, reacting with methanol, 11, to form the monomethylated maleic acid ester (mono-product), 12, and the dimethylated maleic acid ester (di-product), 13, each liberating a molecule of water, 14, as shown in the presupposed model in Scheme 3. Our collaborators at AstraZeneca were interested in this reaction following the contamination of a batch of an API maleate salt with the corresponding monomethyl maleate salt, as this contamination could be mitigated if the impurity formation was well understood.
Kinetic experiments were run using this system and the findings were published, as it was found that the reaction forming the mono-product was self-catalytic.46 In this case, the reaction was found to be catalytic with an order of 0.5, meaning that the overall order with respect to the maleic acid was 1.5 in forming the mono-product. The consecutive reaction forming the di-product is also catalytic with respect to the maleic acid with a 0.5 order dependence. As methanol is the solvent in this reaction with an effective concentration of ∼24 M, pseudo-order conditions are assumed and rate laws with non-zero-order dependence on methanol were removed from consideration for simplicity. Although the kinetics of this process are now known and reported, it took several months for physical-organic chemists to decipher this model as the reactivity is not immediately intuitive. Therefore, this case study was run to validate the approach in elucidating non-intuitive reaction models, and how this methodology can serve as a viable, automated substitute for many hours spent on a project by experts that could be working on other aspects of process development.
Five experiments were conducted in batch at differing starting concentrations and temperatures in the range of 40 °C to 60 °C and analysed using 1H NMR sampling. This experimental data and the five species (10–14) were inputted into the approach, resulting in six mass-balance-allowed transformations identified. The catalyst was defined, then every combination of these catalytic reactions (and corresponding rate law variants) were compiled into different reaction models, resulting in 5086 automatic unique model evaluations. Each of these models were then ranked based on their AICC and it was found that the most likely representation of the system is the reaction model shown in Scheme 3, but each step is catalysed by maleic acid with an apparent species order dependence of 0.5. These are the same findings that our collaborators made during their kinetic analysis, as they also employed a pragmatic approximation of a 0.5 catalytic order in maleic acid due to the observation of specific acid catalysis. The approach also determined the kinetic parameters of this transformation to be: kmono 50°C = 3.85 × 10−3 ± 0.01 × 10−3 M−0.5 min−1, Ea = 72.61 ± 0.12 kJ mol−1 and kdi 50°C = 4.66 × 10−4 ± 0.01 × 10−4 M−0.5 min−1, Ea = 69.74 ± 0.10 kJ mol−1. These parameters are in agreement with those obtained by our collaborators. This model and the corresponding kinetic parameters allowed a fit to the experimental data with an average residual of less than 0.2% and two experimental fittings at 50 °C are shown in Fig. 4. Full experimental details, raw data and model rankings can be found in the ESI† (Section 5).
As the model was already identified by our collaborators, this methodology has corroborated their findings and shown that process understanding can be accelerated using this approach. This approach automatically identified the correct reaction model in ∼17 hours of computational time, whereas the more traditional science-led approach took appreciably longer with significant human resource required. Clearly the use of this comprehensive model-based approach would be an asset to kinetic specialists during similar case studies when there is significant time pressure to find robust solutions to manufacturing problems identified.
It has also been shown from these case studies that using this methodology can greatly accelerate process development and reduce the workload of kinetic analysis for physical-organic chemists. After experimentation, all kinetic analysis can be implemented autonomously whilst experts focus on other aspects of process development. After the computational evaluations have finished running, these experts can then work in conjunction with the approach to identify which models are the most likely to be statistically and scientifically correct as well as determine if further studies are necessary. The increased breadth of scope of this reported approach serves as an open-source enabling tool in discriminating kinetic models, thereby greatly reducing the time and cost barriers to complete process understanding.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1re00098e |
This journal is © The Royal Society of Chemistry 2021 |