Open Access Article
Felix A.
Döppel
a and
Martin
Votsmeier
*ab
aTechnical University of Darmstadt, Peter-Grünberg-Straße 8, 64287 Darmstadt, Germany. E-mail: martin.votsmeier@tu-darmstadt.de
bUmicore AG & Co. KG, Rodenbacher Chaussee 4, 63457 Hanau, Germany
First published on 11th July 2023
We propose a new modeling strategy to build efficient neural network representations of chemical kinetics. Instead of fitting the logarithm of rates, we embed the hyperbolic sine function into neural networks and fit the actual rates. We demonstrate this approach on two detailed surface mechanisms: the preferential oxidation of CO in the presence of H2 and the ammonia oxidation under industrially relevant conditions of the Ostwald process. Implementing the surrogate models into reactor simulations shows accurate results with a speed-up of 100
000. Overall, the approach proposed in this work will significantly facilitate the application of detailed mechanistic knowledge to the simulation-based design of realistic catalytic systems. We foresee that combining this approach with neural ordinary differential equations will allow building machine learning representations of chemical kinetics directly from experimental data.
In contrast to the logarithm, logarithm-like functions like the inverse hyperbolic sine asinh(x) are not limited to positive inputs but can process negative and zero values in a meaningful way. As shown in Fig. 1, asinh(x) behaves logarithmically for values |x| ≫ 1 while it is linear in the interval −1 < x < 1. This function is commonly used to analyze financial data when zero or negative values occur.38–40 Like economic data, the net rates of chemical kinetics usually span many orders of magnitude and assume both, positive and negative values or zero.
![]() | ||
| Fig. 1 Plot of the asinh(x) function which approximates the logarithm of 2x for large positive and negative inputs while being linear near the origin. | ||
1. We transform the rates using the logarithm-like function asinh(x) that can be applied to negative values and zero, which is crucial for modeling systems of practical interest e.g., when they include intermediate species.
2. We work with latent (hidden) representations of the transformed data. This means we embed data transformation directly into the model instead of the conventional preprocessing of data, see Fig. 2. This allows minimizing meaningful error metrics like the relative error of reaction rates while preserving the advantage of data transformation. In other words, we avoid spending model capacities to regions that are not important for its application in reactor simulations.
With this setup, neural networks can accurately model wide-range data changing sign such as chemical kinetics. No prior knowledge about the reaction mechanism is required, paving the way for learning kinetics directly from experimental data or highly detailed first principles simulations. The approach is validated by reactor simulations. The preferential oxidation of CO in the presence of H2 is simulated in a plug-flow reactor showing a speed-up of 100
000 when using neural networks instead of solving the full mechanism. Further, we model the ammonia oxidation under conditions of the Ostwald process.
Reaction rates rj (s−1) are calculated as
![]() | (1) |
![]() | (2) |
![]() | (3) |
For each reaction condition given by a temperature and the partial pressures of CO, CO2, H2, H2O and O2, steady state surface coverages are calculated. This is done by integrating eqn (4) in time until dθl/dt = 0. Gas composition and temperature are assumed to be constant during this process. The obtained surface coverages are used in eqn (5) to calculate steady state source terms ṡi.29,42 Numerically, integration is performed using the DASPK solver43 with an integration time of 107 s, a relative tolerance of 10−6 and an absolute tolerance of 10−50.
![]() | (4) |
![]() | (5) |
| Quantity | Unit | Minimum | Maximum | Scaling |
|---|---|---|---|---|
| T | K | 280 | 600 | Reciprocal |
| p(H2) | atm | 8 × 10−2 | 8 × 10−1 | Logarithmic |
| p(O2) | atm | 1 × 10−7 | 4 × 10−2 | Logarithmic |
| p(H2O) | atm | 4 × 10−2 | 4 × 10−1 | Logarithmic |
| p(CO) | atm | 1 × 10−7 | 4 × 10−2 | Logarithmic |
| p(CO2) | atm | 4 × 10−2 | 4 × 10−1 | Logarithmic |
![]() | (6) |
A total pressure of 1 atm, a site concentration cPt of 26.3 mol m−3, a reactor length of 1 m divided into 200 cells and a gas velocity of 1 ms−1 are used. The resulting residence time τ is 1 s. The feed consists of 40% H2, 1% O2, 10% H2O, 1% CO and 10% CO2 with N2 as the balance species.
If conditions outside the input range defined in Table 1 occur, they are set to the corresponding minimum or maximum values to avoid extrapolation of the neural network models.
Reaction rates rj (s−1) are calculated as
![]() | (7) |
For each reaction condition given by a temperature and the partial pressures of NH3, O2, H2O, NO, N2O and N2, steady state surface coverages are calculated. This is done by integrating eqn (8) in time until dθl/dt = 0. Gas composition and temperature are assumed to be constant during this process. The obtained surface coverages are used in eqn (9) to calculate steady state source terms ṡi (mol m−2 s−1) using the site density Γ which is assumed to be 2.3 × 10−5 mol m−2.
![]() | (8) |
![]() | (9) |
Numerically, integration is performed using MATLAB's ode15s solver45 with an integration time of 1015 s, a relative tolerance of 10−8 and an absolute tolerance of 10−50.
The rate constants for surface reactions ksurfj are calculated as
![]() | (10) |
![]() | (11) |
![]() | (12) |
![]() | (13) |
![]() | (14) |
![]() | (15) |
![]() | (16) |
We chose this mechanism because in contrast to simpler ammonia oxidation mechanisms considered in our earlier works15,26,28,30 it is more detailed and does not neglect the consumption of several gas species. In consequence, all species except NH3, H2O and N2 show source terms changing sign in the range of reaction conditions considered. Therefore, it is not possible to rely on modeling only strictly positive source terms and compute all other species source terms from the atom balance. Rather, at least one species with sign changing source terms has to be modeled for use in a reactor simulation. We focus on predicting the source terms of NH3, N2 and N2O.
| Quantity | Unit | Minimum | Maximum | Scaling |
|---|---|---|---|---|
| T | K | 103 | 1.3 × 103 | Reciprocal |
| p(NH3) | Pa | 0.5 | 6 × 104 | Logarithmic |
| p(O2) | Pa | 1.25 × 104 | 1 × 105 | Logarithmic |
| p(H2O) | Pa | 0.5 | 9 × 104 | Logarithmic |
| p(NO) | Pa | 0.5 | 6 × 104 | Logarithmic |
| p(N2O) | Pa | 0.5 | 3 × 103 | Logarithmic |
| xT = (T/1 K)−1 | (17) |
| xp,i = log(pi/1 atm) | (18) |
![]() | (19) |
The preprocessed thermo-chemical state is fed to the hidden layer(s) which are fully connected and use tanh activation. Per key species (CO and O2 for test case 1 and NH3, N2O and N2 for test case 2) we train a separate neural network with a single output node which contains a latent representation of the transformed source terms y = asinh (ṡ/z), see eqn (20). This node uses the inverse of this function z sinh(y) as activation to restore outputs in the form of the original source term target values ṡ. The only parameters to be learned are the weights in and out of the hidden layer(s) and optionally the parameter z of the sinh activation. To mimic the behavior of the well-known logarithmic transformation, we will choose the parameter z in a way that all modeled rates lie within the logarithmic part of the function, i.e. by assigning it the smallest absolute source term occurring in the training data.
![]() | (20) |
When alternatives to the hyperbolic sine function are discussed, the output activation is replaced by either z·nl−1 (y), gpow−1 (y, n) or exp(y).
In contrast to the latent transformation approach, the conventional approach computes transformed target values in a preprocessing step (asinh (ṡ/z) in our case) and uses a standard fully connected neural network to learn the transformed target values. Consequently, during training the differences between exact and estimated transformed values, e.g. measured by the root-mean-square error of transformed values, are minimized instead of a typical error metric of interest like the relative error of the source terms.
Direct modeling means dropping both, the input transformation in eqn (17) and (18) as well as the output activation. However, the original steady state source terms are used as targets.
For test case 1 we model CO and O2 source terms. The other species are calculated as follows:
![]() | (21) |
![]() | (22) |
![]() | (23) |
For test case 2 we model NH3, N2 and N2O source terms. The other species are calculated as follows:
![]() | (24) |
![]() | (25) |
![]() | (26) |
000 input–output pairs of reaction conditions and resulting steady state source terms for both test cases. The training set contains 25
000, the validation set contains 5000 and the test set contains another 5000 input–output pairs. Every input–output pair is contained in only one of the three data sets. The data for the preferential oxidation test case are identical to the ones used in our previous work.29
We do not perform excessive hyper-parameter tuning as the focus of this work lies on the general modeling strategy for steady state source terms.
![]() | (27) |
rel (eqn (28)) is minimized when source terms are used as target data.![]() | (28) |
The root mean squared absolute loss
abs (eqn (29)) of transformed values is minimized when using transformed source terms as targets.
![]() | (29) |
| 2CO + O2 → 2CO2 (CO oxidation) |
| 2H2 + O2 → 2H2O (H2 oxidation) |
| CO + H2O ⇄ CO2 + H2 (water-gas shift) |
As the mechanism contains five gas phase species and three elements, at least 5 − 3 = 2 species source terms must be modeled to fully describe the reaction progress in the system. Analogous to the procedure in our previous work29 we train a separate neural network to model the source terms of both key species O2 and CO each. The net production rates of all other species are derived via the atom balance. Therefore, the mass balance is always exactly closed.
The equilibrium of the CO and H2 oxidation reactions is fully on the right side, so that the source term of O2 is negative under all relevant reaction conditions. Consequently, a logarithmic transformation can be applied to model the O2 source terms.14,15,17,19,27,29 As typical for systems of practical relevance, other species in the mechanism (including CO) change the sign of their source term depending on the reaction conditions. For those, logarithmic transformation cannot be applied in a meaningful way. We propose the latent hyperbolic sine transformation, overcoming this limitation. Fig. 4a shows a histogram of the distribution of CO source term values while Fig. 4b shows the same data on a logarithmic scale.
![]() | ||
| Fig. 4 Histogram of the CO source term distribution: (a) in linear scale, and (b) in logarithmic scale. | ||
![]() | ||
Fig. 5 Comparing the CO source term prediction accuracy of different neural network training strategies on 5000 unseen test data randomly sampled from the input range of Table 1. Using a standard feed-forward neural network without data transformation (“direct” modeling) does not yield accurate results. Using the asinh transformation conventionally, i.e. in a preprocessing step, reduces the prediction errors considerably. When the asinh transformation is implemented in a latent fashion, the models are even more accurate and yield application ready predictions with relative errors below 1%. All models contain about 5000 parameters distributed over one (625 nodes), three (48 nodes each) or five hidden layers (34 nodes each) and were trained with 25 000 data points. | ||
Fig. 6 shows how the prediction error of the latent transformation models scales with the number of parameters. For example, application ready models with relative prediction errors of 1% can be obtained with 15 nodes each in five hidden layers (≈1000 parameters) and less than 15 minutes of training time (see Fig. S1 in the ESI†). As neural networks are usually deployed with orders of magnitude more parameters and layers, all models used in this work can be considered small. Wan et al. for example used about 180
000 parameters for modeling chemical kinetics.51
![]() | ||
| Fig. 6 Relative prediction error of CO source terms dependent of the total number of learnable parameters in a neural network using the latent hyperbolic sine transformation. | ||
In our previous work we proposed another method dealing with chemical source terms changing sign by approximating the rates of the rate-determining step.29 As shown in Fig. S4 in the ESI,† it yields more accurate models than the latent hyperbolic sine transformation proposed in this work. That is achieved by exploiting detailed insights from a reaction path analysis. An analysis, however, is not feasible when dealing with experimental data or highly complex computational models. In contrast, the latent hyperbolic sine transformation is designed to work without any previous knowledge about the mechanism and therefore poses the first method to obtain accurate and lightweight surrogate models for detailed surface kinetics when dealing with experimental data or highly complex computational models.
In summary, the latent hyperbolic sine transformation works well because of two major points: 1. The inverse hyperbolic sine transformation brings target data into a similar order of magnitude and leads to a more linear input–output relation. 2. Using the transformation in a latent fashion gives full control over the training objective while maintaining the advantages of data transformation.
Fig. 7 shows that the concentration profiles obtained from the neural network models (dotted line) cannot be visually separated from the exact solution (full lines). The lower part of Fig. 7 shows, that the relative difference between both solutions is about 1% or lower. Note however, that calculating the neural network estimation of the source terms is approximately 50
000 times faster than evaluating the exact steady state kinetics, see Table S5 in the ESI.† Using a consumer grade graphics card for inference increases the speed-up to 100
000. As the obtained accuracy is well above that of the kinetic parameters, these results suggest that our method can be applied to much larger and more complex reaction mechanisms.
![]() | ||
Fig. 7 Plug-flow reactor model of the preferential oxidation of CO in H2 rich environments at three different temperatures. The upper part shows CO and O2 molar fractions along the reactor length. The neural network solution (dotted lines) cannot be visually separated from the exact solution (full lines). The lower part shows the relative difference between both solutions. None of the conditions shown were part of the 25 000 training data which were randomly sampled from the input range of Table 1. Feed composition and other details are described in section 2.1.3. | ||
In summary, the neural network models obtained with latent hyperbolic sine transformation are well suited for replacing the computationally expensive steady state source term calculations associated with heterogeneous catalysis. They yield accurate solutions and speed-up the calculations by factor 100
000.
Neural networks are used to predict the steady state source terms as a function of temperature and gas composition. The training data set covers all industrially relevant reaction conditions of a medium pressure ammonia oxidation reactor and contains 25
000 samples. As the mechanism contains six gas phase species and three elements, at least 6 − 3 = 3 species source terms (e.g. NH3, N2 and N2O) must be modeled to fully describe the reaction progress in the system.
Since ammonia is burned at high temperatures, NH3 source terms are negative for all training conditions. N2 shows only positive source terms because it is the thermodynamically favored product. Consequently, NH3 and N2 source terms can be modeled using the well-known logarithmic transformation. For both species a separate lightweight neural network with 63 nodes in a single hidden layer (≈500 parameters) is trained, resulting in relative prediction errors around 0.1%.
N2O source terms, however, do change sign and the logarithmic transformation cannot be applied. Again, using a standard feed-forward neural network without data transformation (“direct” modeling) does not yield usable models as it leads to relative prediction errors near 100%. Using the asinh transformation in the conventional way increases accuracy to about 15%. The latent variant of the asinh transformation performs even better leading to errors near 1%, see Fig. 8a. See section 2.3.1 for a detailed comparison between the three modeling approaches.
![]() | ||
Fig. 8 Comparing different neural network training strategies. All models contain 63 nodes in a single hidden layer (≈500 parameters) and were trained with 25 000 data points. (a) Comparing the N2O source term prediction accuracy of different neural network training strategies on 5000 unseen test data randomly sampled from the input range of Table 2. Using a standard feed-forward neural network without data transformation (“direct” modeling) does not yield accurate results. Using the asinh transformation conventionally, i.e. in a preprocessing step, reduces the prediction errors to approximately 15%. When the asinh transformation is implemented in a latent fashion, the models yield application ready predictions with relative errors near 1%. (b) Models are validated by comparing product selectivities at unseen Ostwald process conditions and zero ammonia conversion (10% NH3 in air at 5 bar). The model using asinh data transformation in the conventional way covers the general trend of selectivities but fails at lower temperatures. In contrast, the model using the latent asinh transformation is in excellent agreement with the full model over the whole temperature range. Besides N2O, neural network models of NH3 and N2 are used to fully describe the reaction progress in the system and use the well-known logarithmic transformation. | ||
The neural network models are validated by computing the product selectivities at Ostwald process conditions and zero ammonia conversion. None of these conditions are part of the training data set. Analogous to the original publication of the mechanism44 mass transfer was not considered. The model based on the newly proposed latent transformation is in excellent alignment with the results from the full kinetic model, see Fig. 8b. In contrast, the model based on the conventional data transformation shows significant deviations. Most notably it overestimates the N2O selectivity by an order of magnitude at lower temperatures.
Overall, the latent asinh transformation allows lightweight and therefore computationally cheap neural network models ready for use in reactor simulations.
The Bi-Symmetric log transformation nl(x) was introduced by Webber to depict data that cover a wide range of scales and have both positive and negative components.52–54 It is defined as
![]() | (30) |
Power transformations pose another way to normalize the skewed distribution of wide-range data that assume both, positive and negative values. To this end, a generalized n-th root of x can be defined as
gpow(x, n) = sgn(x)·|x|1/n | (31) |
The three functions asinh(x), nl(x) and gpow(x,n) perform similarly for CO source term predictions when applied with the latent approach, see Fig. S5a in the ESI.† However, for O2 source term predictions gpow(x,n) performs significantly worse than asinh(x), nl(x) and log(x), see Fig. S5b in the ESI.† This might be attributed to the fact, that the logarithmic rate transformation that is commonly used can be motivated by the Arrhenius equation and the power law expressions the rate calculations are based on and might therefore be ideal for transforming source term data without sign changes. Consequently, deviating from logarithm-like behavior can be expected to have a negative effect on accuracy. Since the adjustable parameter variant did not lead to higher accuracy, we conclude that the lowest target value occurring in the training data is a good initial guess for z. However, there seems to be no obvious initial guess for the parameter n of the generalized power function. The data shown use n = 12 for CO and n = 18 for O2 source terms as these values provided the most accurate results in an initial testing phase.
In summary, all three functions studied in this work are suitable for latent transformation of steady state source terms changing sign and perform about similar. We suggest using the inverse hyperbolic sine function to get started as nearly all numerical libraries provide an efficient implementation.
The development of the new approach is demonstrated using two test cases. The first test case is a detailed surface reaction mechanism describing the oxidation of CO in presence of H2 as well as the water-gas shift reaction. It includes 5 gas species, 9 surface species, and 36 reactions. Models are validated by implementing them in plug-flow reactor simulations. While the neural network-assisted solution is visually not separable from the exact solution, it is computed 100
000 times faster. Neural network training used 25
000 data points and takes less than an hour on a consumer grade PC.
The second test case is a detailed surface mechanism based on density functional theory calculations of the ammonia oxidation on platinum. The latent hyperbolic sine transformation increases model accuracy significantly and allows using very small and thus computationally efficient neural networks in detailed reactor simulations.
In our previous work, we reached similarly good results by performing a reaction path analysis to exploit the detailed insights into the reaction mechanism available.29 The present work, however, can produce accurate models of detailed surface kinetics without any previous knowledge about the underlying mechanism.
Currently, there is huge interest in determining kinetic models directly from experimental data. Especially neural ODEs56 are promising for generating a representation of the reaction kinetic ODEs from experimental data directly.57–59
In accordance with our findings, it is reported that the parameter z of the inverse hyperbolic sine transformation (eqn (20)) can substantially affect regression results.39 While several works developed strategies for finding the best value others even argue not to use this transformation at all, emphasizing that the optimal parameter value is not given by theory.40 In this work we embedded the transformation function into a neural network. This allows optimizing all transformation parameters automatically during training and could potentially be used to identify the optimal parameter value for related problems like economic analyses.
The concept of latent data transformation is not limited to neural networks and can be used in all machine learning methods that allow customizing the loss function. For this purpose we define the custom loss function
* that applies the inverse of the desired data transformation f−1(x) to the model outputs h before comparing them to the target values y in the conventional loss function
, see eqn (32). This approach yields the same results as embedding f−1(x) in the output layer of a neural network but does not allow optimizing transformation parameters like z from eqn (20) during training.
![]() | (32) |
Overall, the approach proposed in this work will not only significantly facilitate the application of detailed mechanistic knowledge in the simulation-based design of realistic catalytic systems, but it also presents a first step towards learning detailed surface kinetics directly from experimental data.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3re00212h |
| This journal is © The Royal Society of Chemistry 2023 |