Open Access Article

This Open Access Article is licensed under a Creative Commons Attribution-Non Commercial 3.0 Unported Licence

Qiaofeng
Li
,
Huaibo
Chen
,
Benjamin C.
Koenig
and
Sili
Deng
*

Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. E-mail: silideng@mit.edu

Received
30th October 2022
, Accepted 11th January 2023

First published on 12th January 2023

Chemical reaction neural network (CRNN), a recently developed tool for autonomous discovery of reaction models, has been successfully demonstrated on a variety of chemical engineering and biochemical systems. It leverages the extraordinary data-fitting capacity of modern deep neural networks (DNNs) while preserving high interpretability and robustness by embedding widely applicable physical laws such as the law of mass action and the Arrhenius law. In this paper, we further developed Bayesian CRNN to not only reconstruct but also quantify the uncertainty of chemical kinetic models from data. Two methods, the Markov chain Monte Carlo algorithm and variational inference, were used to perform the Bayesian CRNN, with the latter mainly adopted for its speed. We demonstrated the capability of Bayesian CRNN in the kinetic uncertainty quantification of different types of chemical systems and discussed the importance of embedding physical laws in data-driven modeling. Finally, we discussed the adaptation of Bayesian CRNN for incomplete measurements and model mixing for global uncertainty quantification.

Traditional methods for chemical model construction are based on either expert knowledge or first-principle quantum chemistry calculations.^{5} However, complex chemical systems can involve hundreds or even thousands of species, leading to even more potential reaction pathways.^{6} Consequently, prior expert knowledge is limited and first-principle simulation is computationally intractable, making the construction of chemical reaction models challenging. On the other hand, the development of diagnostics facilitates the generation of various measurements,^{7} which lays the foundation for using data-driven approaches as a new paradigm for kinetic model construction.^{8–10}

Deterministic^{11} and probabilistic^{12–14} data-driven tools have been developed to infer the reaction rate constants from experimental data based on reaction template, the generation of which still relies on expert knowledge. Previously, symbolic and sparse regression have been utilized to determine the reaction pathways,^{15,16} but these techniques are limited to simple systems due to difficulties related to handling high-dimensional problems. In recent years, neural networks have shown amazing performance in high-dimensional tasks such as image classification^{17} and natural language processing.^{18} Therefore, it feels natural to use neural networks as a black box to represent kinetic models.^{19} However, the lack of physical interpretability, and thus generalizability, limit the usage of such models.

Chemical reaction neural network^{20} (CRNN) was introduced for autonomous and simultaneous determination of reaction pathways and chemical kinetic parameters from the time histories of species concentration. Physical laws, such as the law of mass action and the Arrhenius law, are hard-encoded in the architecture of CRNN to improve its generalizability. In this way, the weights and biases of the NN can be interpreted as stoichiometry coefficients and rate constants of the reactions, making the model fully interpretable and transparent to users. The applicability and robustness of CRNN have been tested on a wide variety of cases.^{20,21}

Uncertainty quantification (UQ) of a constructed chemical reaction model is crucial for further reliable application of the model.^{22,23} The importance of UQ is two-fold: the uncertainty of identified parameters would inform the direction of future experiments; the uncertainty of species profiles simulated by the constructed model would provide confidence for predictions. Many methods have been developed for the inverse UQ of kinetic models^{11–14} under the framework of traditional mechanism construction methods,^{24–26} as well as for the forward UQ of predictions.^{27,28} However, these methods again rely on either expert knowledge or computationally expensive first-principle calculations.

In this paper, we will extend the capability of the proposed CRNN^{20,21} with uncertainty quantification by Bayesian inference, resulting in the Bayesian chemical reaction neural networks (B-CRNN). B-CRNN will not only allow autonomous reaction pathway discovery but also the uncertainty quantification of the associated stoichiometric coefficients and chemical kinetic parameters. We will introduce the efficient implementation algorithm, Bayes by Backprop, and demonstrate both the inverse and predictive UQ on a variety of reaction systems including a temperature-dependent mechanism, reversible system, and catalytic system. We will also demonstrate how B-CRNN is different from a pure data-fitting tool, how B-CRNN can be useful in incomplete information scenarios, and how to comprehensively incorporate the influence of different but viable kinetic models.

v_{A}A + v_{B}B → v_{C}C + v_{D}D. | (1) |

(2) |

(3) |

(4) |

Fig. 1 Overview of the Bayesian chemical reaction neural network (B-CRNN). (a) A single pathway CRNN corresponding to eqn (1). By explicitly embedding the law of mass action and the Arrhenius law, the weights and biases in the CRNN have physical meanings, i.e., they are stoichiometric coefficients and the Arrhenius law parameters. (b) Illustration of the B-CRNN considering multiple pathways. In order to incorporate and quantify the uncertainty of the physical parameters and concentration trajectories, the physical parameters (NN weights) are given probabilistic distributions instead of being set as deterministic. (c) Illustration of the B-CRNN performance. The B-CRNN generates not only estimates of concentration trajectories of species (solid lines) but also the credible intervals (shaded regions). Dashed lines are the ground truth. |

For a system with multiple reactions (four as an example), the NN pathways can be built in parallel, as shown in Fig. 1b. The production rates of the species will simply be the summation of the production rates from individual reaction pathways. The computation procedure for multiple reactions can be summarized as

(5) |

Ẏ(t) = CRNN(Y(t)). | (6) |

From the deep learning perspective, the CRNN is hard-encoding physical laws, the law of mass action and the Arrhenius law, into the more general formulation of Neural Ordinary Differential Equation (NODE).^{29} By doing this, we improve (1) the interpretability of the NN since the weights and biases have clear physical meanings and (2) the generalizability of the NN. When a trained CRNN is performed on unseen data, especially out-of-distribution data, the prediction is bound by the widely-applicable physical laws instead of being generated by an arbitrarily fitted function.

We denote the optimizable parameters of the CRNN as θ and look for the posterior distribution of θ, given certain measurements and estimated noise on the measurements. Based on Bayes' theorem, the posterior distribution of θ can be expressed as

(7) |

(8) |

2.2.1 Hamiltonian Monte Carlo.
The Hamiltonian Monte Carlo (HMC) method, as a specific variant of Markov chain Monte Carlo (MCMC) algorithms, tries to approximate the posterior distribution of θ by constructing a Markov chain. The samples from the Markov chain are used to perform statistical analysis of the supposedly matched probabilistic distribution. The HMC method is proposed to suppress random walking behaviors seen in the earlier variants of MCMC^{30} and thus sample more efficiently in a high-dimensional space. In the HMC method, the sampling of unknown variables is equivalent to calculating the position of a particle, with the posterior distribution of the unknown variable as the potential energy of the particle. The state of a particle is sampled according to

where m is the momentum of the particle, and θ serves as the position of the particle. m is randomly initialized with . Leapfrog steps,^{30}L steps with stepsize ε, are taken to update the particle state.

where and are the new particle position and momentum after the update step. These new values are accepted with a probability α

By repeating the above process, we can generate a sequence of samples for θ for later statistical analysis. L and ε are two adjustable parameters for HMC. They are usually automatically tuned in open source packages,^{31,32} such as TyXe^{32} used in this paper.

(9) |

(10) |

(11) |

2.2.2 Variational inference.
Rather than sampling a Markov chain, the variational inference (VI) approach tries to approximate the true posterior distribution of θ, , with another parameterized probabilistic distribution (usually Gaussian) q(θ|Θ) that is much easier to sample. The goal is to minimize the Kullback–Leibler (KL) divergence between q(θ|Θ) and , with Θ as the optimization parameters. The optimal Θ is

The first term of eqn (12e) is a regularization term, preventing the approximation distribution q(θ|Θ) from being totally dissimilar to the prior distribution of q. The second term of eqn (12e) is the data reconstruction term. The negation of the cost function of eqn (12e)

is often referred as the Evidence Lower Bound (ELBO),^{33} since it serves as the lower bound for the marginalized data probability . The advantage of the VI approach, as will be shown in later sections, is that the optimization process is extremely fast. As for the B-CRNN, the posterior distribution of each stoichiometric coefficient/reaction constant/kinetic parameter is assumed to be individually Gaussian. Θ thus contains the means and variances of the Gaussian distributions. Mean field approximation^{34,35} is applied here, i.e., the Gaussian distributions of the parameters are independent to one another. Bayes by Backprop^{34} (BBB) is used to optimize Θ

Eqn (14a) is a one-sample approximation to eqn (12e). Thus, BBB can be seen as a stochastic gradient descent optimization of the VI problem in eqn (12a). After optimization, samples of θ can be easily generated from

since the distribution is Gaussian. Further statistical analysis can be performed based on these samples.

(12a) |

(12b) |

(12c) |

(12d) |

(12e) |

(13) |

(14a) |

∼ q(θ|Θ). | (14b) |

∼ q(θ|Θ*), | (15) |

(16) |

Fig. 4 shows the uncertainty quantification of the rate constants from the VI method. The identified mean is close to the ground truth (except k_{3}) and uncertain intervals for each of the constants are given.

(17a) |

(17b) |

(17c) |

First, we study the scenario where the information is complete, i.e., the concentration trajectories of all species are known. The VI uncertainty quantification results under different measurement noise levels are shown in Fig. 5. Fig. 5(a) contains the uncertainty intervals for the concentration trajectories of TG and DG with 1%, 5%, and 9% Gaussian noise. As the noise level increases, the uncertainty intervals also increase, as expected. Fig. 5(b) and (c) show the calculated distribution of the stoichiometric parameter of TG and the logarithm of activation energy of the reaction eqn (17a). As the noise level increases, these distributions also become more spread.

Fig. 5 (a) The uncertainty intervals for TG and DG under Gaussian noise level 1%, 5%, and 9% (from left to right) at the same initial conditions. Blue dots are noisy measurements. (b) The calculated distribution of the stoichiometric parameter of TG in eqn (17a) (W_{11} stands for the element in row 1 and column 1 in W). Red dashed line indicates the true value. (c) The calculated distribution of the activation energy of eqn (17a). Red dashed line indicates the true value. The initial conditions for the shown case are [1.98, 1.93, 0, 0, 0, 0]. |

The VI uncertainty quantification results in the scenario with a missing species are shown in Fig. 6. In this scenario, we assumed that the concentration measurements of the intermediate species DG was completely missing.^{20} Physically, the stoichiometric coefficients of intermediate species should not be all positive nor negative. Therefore, such constraints were implemented during the training of CRNN. The results in Fig. 6 show that the B-CRNN can still provide high-quality uncertainty estimations when the intermediate species is missing. However, since the constraint on the weight/solution space is lesser due to missing information, we observe enlarged uncertainty intervals compared with when the information is complete. This is more pronounced for the missing species DG.

(18) |

(19) |

A total of 30 cases with 5% Gaussian noise were generated. The initial conditions of E and S were randomly generated from [1, 9], and [2, 10] respectively. The reaction rate constants are k_{f} = 2.5, k_{r} = 1.0, and k_{cat} = 1.5. We further assumed at most 2 reversible reactions and 4 species, i.e., . The uncertainty quantification results are shown in Fig. 7(a). The identification results can be extracted as two reactions

(20) |

Fig. 7 (a) The predictions and uncertainty quantification of the reversible enzyme–substrate reactions. The shaded regions indicate 4 standard deviations. (b) The comparison between CRNN with (solid lines) and without (dashed lines) considering eqn (19) when model a hypothetical reversible reaction . Dots indicate the synthetic measurements with 5% of Gaussian noise. The initial conditions for the shown cases are [6.46, 7.57, 0, 0] (a) and [1.78, 4.68, 0] (b). |

Activation

De-activation

(21) |

In this case, several species act as catalysts in different reactions. The reaction of these catalysts do not strictly follow the form in eqn (5) (to be exact opposite between W_{in} and W_{out}), since the catalysts affect the reaction rates but are not consumed. Consequently, we introduced a channel selection weight matrix W_{c}, the elements of which are either 0 or 1.

(22) |

Fig. 8 Selected species profiles for the catalyst reaction case. The shaded regions indicate 4 standard deviations. |

Fig. 9 Uncertainty quantification comparisons between the CRNN and an MLP (without physical laws embedded). Measurements contain 5% Gaussian noise. |

It is worth mentioning that there are literature^{40–42} on the more general topic of Bayesian NODE (similar to the MLP case shown above). However, B-CRNN is still valuable since it is dedicated to the modeling of chemical reaction systems with domain-specific physical laws embedded, eventually leading to better interpretability and tighter uncertainty bounds.

Fig. 10 The B-CRNN results on the same case as shown in Fig. 3, but with missing measurements in the [2, 28] s time span. The shaded regions indicate 4 standard deviations. |

Suppose that we have a collection of possible models . The predictive posterior distribution of the concentration trajectory ỹ, given an unseen initial condition , is

(23) |

(24) |

Fig. 12 Comparison between UQ on all parameters and UQ on kinetic parameters only for the biodiesel system. The initial conditions are the same as Fig. 6. Blue dots are the synthetic measurements with 5% of Gaussian noise. Red and green curves mark the uncertain intervals (4 standard deviations) by the B-CRNN model performing UQ on all parameters and kinetic parameters, respectively. The dashed lines are the corresponding mean predictions. When performing UQ on kinetic parameters only, since the number of uncertain variables decreases significantly (from 24 to 6 in this case), the uncertainty intervals become tighter. |

Generally speaking, there is no theoretical hurdles preventing the B-CRNN from application on larger-scale more complex chemical systems across disciplines, such as combustion and biochemical systems. One of the potential issues is that training CRNN could be difficult because of the stiffness issue commonly encountered in combustion systems. Another issue is that the amount of data needed grows as the considered system grows larger, which may reduce the training speed and raise the question of data availability (e.g. in biochemical systems). The former could be solved by incorporating more advanced neural network training schemes.^{44,45} The latter may require integrating B-CRNN with active learning for efficient experimental design. Nonetheless, the B-CRNN framework opens new opportunities to applying physics-informed machine learning techniques in chemical systems for model inference and uncertainty quantification.

Fig. 14 The training loss history and the moving average (over 100 adjacent data points) of the VI for the case in Section 3.1. |

- A. Y. Poludnenko, J. Chambers, K. Ahmed, V. N. Gamezo and B. D. Taylor, Science, 2019, 366, eaau7365 CrossRef CAS PubMed .
- L. Yin, Y. Jia, X. Guo, D. Chen and Z. Jin, Int. J. Heat Mass Transfer, 2019, 133, 129–136 CrossRef CAS .
- B. Sanchez, J.-L. Santiago, A. Martilli, M. Palacios and F. Kirchner, Atmos. Chem. Phys., 2016, 16, 12143–12157 CrossRef CAS .
- Y. Shi, H.-W. Ge and R. D. Reitz, Computational optimization of internal combustion engines, Springer Science & Business Media, 2011 Search PubMed .
- C. W. Gao, J. W. Allen, W. H. Green and R. H. West, Comput. Phys. Commun., 2016, 203, 212–225 CrossRef CAS .
- T. Lu and C. K. Law, Prog. Energy Combust. Sci., 2009, 35, 192–215 CrossRef CAS .
- M. Aldén, Proc. Combust. Institute, 2022 DOI:10.1016/j.proci.2022.06.020 .
- M. Schmidt and H. Lipson, Science, 2009, 324, 81–85 CrossRef CAS PubMed .
- S. H. Rudy, S. L. Brunton, J. L. Proctor and J. N. Kutz, Sci. Adv., 2017, 3, e1602614 CrossRef PubMed .
- K. Champion, B. Lusch, J. N. Kutz and S. L. Brunton, Proc. Natl. Acad. Sci. U. S. A., 2019, 116, 22445–22451 CrossRef CAS PubMed .
- M. Frenklach, A. Packard, P. Seiler and R. Feeley, Int. J. Chem. Kinet., 2004, 36, 57–66 CrossRef CAS .
- D. A. Sheen and H. Wang, Combust. Flame, 2011, 158, 2358–2374 CrossRef CAS .
- T. Turányi, T. Nagy, I. G. Zsély, M. Cserháti, T. Varga, B. T. Szabó, I. Sedyo, P. T. Kiss, A. Zempleni and H. J. Curran, Int. J. Chem. Kinet., 2012, 44, 284–302 CrossRef .
- K. Braman, T. A. Oliver and V. Raman, Combust. Theory Modelling, 2013, 17, 858–887 CrossRef CAS .
- N. M. Mangan, S. L. Brunton, J. L. Proctor and J. N. Kutz, IEEE Trans. Mol., Biol. Multi-Scale Commun., 2016, 2, 52–63 Search PubMed .
- M. Hoffmann, C. Fröhner and F. Noé, J. Chem. Phys., 2019, 150, 025101 CrossRef PubMed .
- W. E, Communications in Computational Physics, 2020, 28(5), 1639–1670 CrossRef .
- J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, arXiv, 2018, arXiv:1810.04805 DOI:10.48550/arXiv.1810.04805.
- R. Ranade, S. Alqahtani, A. Farooq and T. Echekki, Fuel, 2019, 251, 276–284 CrossRef CAS .
- W. Ji and S. Deng, J. Phys. Chem. A, 2021, 125, 1082–1092 CrossRef CAS PubMed .
- W. Ji, F. Richter, M. J. Gollner and S. Deng, Combust. Flame, 2022, 240, 111992 CrossRef CAS .
- H. Wang and D. A. Sheen, Prog. Energy Combust. Sci., 2015, 47, 1–31 CrossRef .
- J. Wang, Z. Zhou, K. Lin, C. K. Law and B. Yang, Combust. Flame, 2020, 213, 87–97 CrossRef CAS .
- J. A. Miller, R. Sivaramakrishnan, Y. Tao, C. F. Goldsmith, M. P. Burke, A. W. Jasper, N. Hansen, N. J. Labbe, P. Glarborg and J. Zádor, Prog. Energy Combust. Sci., 2021, 83, 100886 CrossRef .
- J. Badra, A. Elwardany and A. Farooq, Proc. Combust. Institute, 2015, 35, 189–196 CrossRef CAS .
- M. Frenklach, Proc. Combust. Institute, 2007, 31, 125–140 CrossRef .
- W. Ji, J. Wang, O. Zahm, Y. M. Marzouk, B. Yang, Z. Ren and C. K. Law, Combust. Flame, 2018, 190, 146–157 CrossRef CAS .
- W. Ji, Z. Ren, Y. Marzouk and C. K. Law, Proc. Combust. Institute, 2019, 37, 2175–2182 CrossRef CAS .
- R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Adv. Neural Information Processing Systems, arXiv, 2018, 31, arXiv:1806.07366 DOI:10.48550/arXiv.1806.07366.
- S. Brooks, A. Gelman, G. Jones and X.-L. Meng, Handbook of Markov chain Monte Carlo, CRC press, 2011 Search PubMed .
- E. Bingham, J. P. Chen, M. Jankowiak, F. Obermeyer, N. Pradhan, T. Karaletsos, R. Singh, P. A. Szerlip, P. Horsfall and N. D. Goodman, J. Mach. Learn. Res., 2019, 20, 28:1–28:6 Search PubMed .
- H. Ritter and T. Karaletsos, Proc. Machine Learn. Syst., 2022, 398–413 Search PubMed .
- D. P. Kingma and M. Welling, arXiv, 2013, arXiv:1312.6114 DOI:10.48550/arXiv.1312.6114.
- C. Blundell, J. Cornebise, K. Kavukcuoglu and D. Wierstra, International conference on machine learning, 2015, pp. 1613–1622 Search PubMed .
- L. Yang, X. Meng and G. E. Karniadakis, J. Comput. Phys., 2021, 425, 109913 CrossRef .
- D. P. Searson, M. J. Willis and A. Wright, Reverse Engineering Chemical Reaction Networks from Time Series Data, John Wiley & Sons, Ltd, 2012, ch. 12, pp. 327–348 Search PubMed .
- K. P. Murphy, Probabilistic Machine Learning: Advanced Topics, MIT Press, 2023 Search PubMed .
- D. Darnoko and M. Cheryan, J. Am. Oil Chem. Soc., 2000, 77, 1263–1267 CrossRef CAS .
- J. Xing, D. D. Ginty and M. E. Greenberg, Science, 1996, 273, 959–963 CrossRef CAS PubMed .
- E. D. Brouwer, J. Simm, A. Arany and Y. Moreau, 2019, GRU-ODE-Bayes: continuous modeling of sporadically-observed time series, Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 7379–7390.
- R. Dandekar, V. Dixit, M. Tarek, A. Garcia-Valadez and C. Rackauckas, arXiv, 2020, arXiv:2012.07244 DOI:10.48550/arXiv.2012.07244.
- M. A. Bhouri and P. Perdikaris, Philos. Trans. R. Soc., A, 2022, 380, 20210201 CrossRef PubMed .
- G. Yan, H. Sun and H. Waisman, Comput. Struct., 2015, 152, 27–44 CrossRef .
- Z. Yao, A. Gholami, S. Shen, K. Keutzer and M. W. Mahoney, AAAI, 2021, 9, 093122 Search PubMed .
- S. Kim, W. Ji, S. Deng, Y. Ma and C. Rackauckas, Chaos, 2021, 31, 093122 CrossRef PubMed .

This journal is © the Owner Societies 2023 |