E FFICIENT N EURAL N ETWORK M ODELS OF C HEMICAL K INETICS U SING A L ATENT ASINH R ATE T RANSFORMATION

,


Introduction
Detailed multi-scale modeling provides valuable insights into the complex phenomena of catalytic systems that typically occur in a wide range of time and length scales.[1,2] While such highly complex models would allow for rational catalyst and reactor design [3,4] they will be computationally infeasible for the foreseeable future [2,5].The computationally most demanding part of those simulations is the solution of the chemical kinetics that often takes 70 % to 90 % of the computation time for both gas-phase [5,6] and surface-reactive [1,7] systems.Therefore, there is huge interest in accelerating the kinetic calculations.[1,2,8,9] This can be done by tabulating the kinetics or even a time integration step.[9,10] Latter is often done for gas-phase reactive systems [11][12][13] because the integration of the stiff ODE system resulting from the gas-phase kinetics is very time-consuming.For heterogeneous catalysis, timescales of surface reactions and the gas phase are usually separable via the steady state approximation.[2,[14][15][16][17] For each simulation time step, the surface kinetics can be solved for steady state separately to avoid unnecessarily small time steps in the computational fluid dynamics simulation.Even with that simplification, evaluating the surface kinetics still poses a severe computational bottleneck.[1,7,[18][19][20] Several works are mapping steady-state solutions of surface kinetics in a tabulation approach.Those maps can be built before a simulation using pre-computed solutions or during a simulation with so-called on-the-fly techniques.Some of the most used on-the-fly techniques exploit prior solutions to estimate new queries like the in-situ adaptive tabulation (ISAT) [7,8,18,[20][21][22][23] and piecewise reusable implementation of solution mapping (PRISM) [24] technique.In contrast, agglomeration algorithms exploit similarities of open queries to reduce the number of calls to the kinetic solver.[8,20,25] Surrogate models like splines have been extensively used to map pre-computed steady-state solutions of chemical kinetics for accelerating reactor simulations [14,15,19,[26][27][28][29] or even subsystems of the reactor [30,31].The (error-based modified) Shepard interpolation approach has been used to replace very demanding but detailed kinetic Monte-Carlo calculations in reactor simulations.[17,32,33] Recently, machine learning techniques gained growing attention for modeling kinetic data because they can overcome the so-called curse of dimensionality.[34] In this context, random forests [35,36] and neural networks [29] have been used for accurate predictions of steady-state surface kinetics.
Not only the model type but also the way data are presented to the model strongly determine its accuracy.Besides scaling also transforming data is known to make models of wide-range data such as chemical kinetics more efficient.Logarithmic transformations have been used for gas-phase mass fractions [6,11,37] while preprocessing data as log(r), log(p i ) and 1/T is well known to facilitate modeling of surface kinetics [14,15,17,19,27,29].This can be accounted to the fact that it makes the target function more linear over a wide range of reaction conditions [17,29] as demonstrated in equations S1 -S3 in the ESI † .However, because these transformations rely on the logarithm a problem arises for source terms that are not strictly positive or strictly negative over the entire range of interest.This presents a substantial limitation as most systems of practical interest contain species that are both, consumed and produced depending on the reaction conditions.In our previous work we showed that this limitation can be overcome by modeling the rates of the rate-determining steps.[29] Since elementary rates are always positive, they can be modeled accurately using the logarithm.Source terms can then be constructed as a linear combination of the modeled elementary rates.Choosing the rates of the rate-determining steps instead of e.g., the adsorption/desorption reaction avoids subtracting two very similar numbers, which would lead to unfavorable error propagation.However, this approach requires insights into the mechanism that are not always available for example when modeling experimental data.This leaves us with the challenge to accurately model source terms changing sign without prior insights into the reaction mechanism.
Figure 1: Plot of the asinh(x) function which approximates the logarithm of 2x for large positive and negative inputs while being linear near the origin.
In contrast to the logarithm, logarithm-like functions like the inverse hyperbolic sine asinh(x) are not limited to positive inputs but can process negative and zero values in a meaningful way.As shown in figure 1, asinh(x) behaves logarithmically for values |x| 1 while it is linear in the interval −1 < x < 1.This function is commonly used to analyze financial data when zero or negative values occur.[38][39][40] Like economic data, the net rates of chemical kinetics usually span many orders of magnitude and assume both, positive and negative values or zero.Consequently, we use the inverse hyperbolic sine function to transform chemical source terms.We use a parameter z to control the transition between logarithmic and linear behavior of the asinh as shown in equation 1.
To mimic the behavior of the well-known logarithmic transformation, we will choose the parameter z in a way that all modeled rates lie within the logarithmic part of the function, i.e. by assigning it the smallest absolute source term occurring in the training data.
While in this work, we focused on the inverse hyperbolic sine, there are other functions able to flatten unfavorable data distributions such as wide-range data changing sign.
The Bi-Symmetric log transformation nl(x) was introduced by WEBBER to depict data that cover a wide range of scales and have both positive and negative components.[41][42][43] It is defined as with the scale parameter z and the standard mathematical sign function sgn.Like the asinh, this function approximates logarithmic behavior for |x| z .In parallel to this work, the Bi-Symmetric log transformation has been used by KLUMPERS et al. for the representation of catalytic reaction rates by neural networks.[44] Power transformations pose another way to normalize the skewed distribution of wide-range data that assume both, positive and negative values.To this end, a generalized n-th root of x can be defined as Here, we focus on the inverse hyperbolic sine because there are efficient implementations in almost every numerical library including MATLAB and PyTorch.The prediction performance of the three functions will be compared in section 3.3.They differ in how data transformation is applied and which error metric is optimized during training.In both cases, a machine learning model predicts chemical rates using reaction conditions as inputs.The loss is computed to evaluate the prediction accuracy and the model parameters are updated accordingly.Conventionally, the transformation is applied to the data in a preprocessing step.The transformed values asinh(r) are then learned by a standard neural network.The disadvantage of this approach is that during training a loss function with respect to the transformed values has to be used instead of the actual error measure of interest.We propose to work with latent (hidden) representations of the transformed data.This means that a model with standard fully connected layers learns a latent representation of the transformed rates.Afterwards, the inverse of the transformation function is applied as a custom output activation in the final layer so that outputs represent the original rates.Hence, the error metric of interest can be optimized during training.If all parameters of the transformation function are fixed before the optimization, the inverse transformation can alternatively be implemented in a customized loss function.
In this work, we propose a neural network architecture specialized to efficiently model steady-state solutions of detailed surface kinetics thus removing the computational bottleneck from reactor simulations.It consists of two major points: 1. We transform the rates using the logarithm-like function asinh(x) that can be applied to negative values and zero, which is crucial for modeling systems of practical interest e.g., when they include intermediate species.2. We work with latent (hidden) representations of the transformed data.This means we embed data transformation directly into the model instead of the conventional preprocessing of data, see figure 2. This allows minimizing meaningful error metrics like the relative error of reaction rates while preserving the advantage of data transformation.In other words, we avoid spending model capacities to regions that are not important for its application in reactor simulations.
With this setup, neural networks can accurately model wide-range data changing sign such as chemical kinetics.No prior knowledge about the reaction mechanism is required, paving the way for learning kinetics directly from experimental data or highly detailed first principles simulations.The approach is validated by reactor simulations.The preferential oxidation of CO in the presence of H 2 is simulated in a plug-flow reactor showing a speed-up of 100 000 when using neural networks instead of solving the full mechanism.Further, we model the ammonia oxidation under conditions of the Ostwald process.

Methodology
2.1 Preferential Oxidation of CO 2.1.1The Reaction Mechanism.
We consider the same reaction mechanism as used in our previous work for surrogate modeling of detailed surface kinetics.[29] The mechanism was developed by MHADESHWAR and VLACHOS to describe CO oxidation, H 2 oxidation, water-gas shift reaction as well as the preferential oxidation of CO and the promoting role of H 2 O on CO oxidation on platinum.[45] We use the kinetic parameters provided by HAUPTMANN et al. that are listed in table S1 in the ESI † for all 36 elementary reactions.[46] Reaction rates r j (s −1 ) are calculated as with the rate constant k j of reaction j (m 3 mol −1 s −1 for adsorption and s −1 else), the concentration c i of gas species i (mol m −3 ), the surface coverage θ l of species l (unitless) and the reaction order ν i,j (unitless).[46] The rate constants for adsorption reactions k ads j and the rate constants for all other surface reactions k surf j are calculated using equations 5 and 6 respectively.
with the universal gas constant R (J mol −1 K −1 ), the temperature T (K), the site density Γ (2.490 81 × 10 −5 mol m −2 ), the molecular mass M i (kg mol −1 ), the reference temperature T 0 (300 K), the temperature exponent β (unitless), the sticking coefficient s 0,i (unitless), the preexponential factor A j (s −1 ) and the activation energy E A,j (J mol −1 ).[46] For each reaction condition given by a temperature and the partial pressures of CO, CO 2 , H 2 , H 2 O and O 2 , steady state surface coverages are calculated.This is done by integrating equation 7 in time until dθl /dt = 0. Gas composition and temperature are assumed to be constant during this process.The obtained surface coverages are used in equation 8 to calculate steady state source terms ṡi .[29,46] Numerically, integration is performed using the DASPK solver [47] with an integration time of 10 7 s, a relative tolerance of 10 −6 and an absolute tolerance of 10 −50 .
2.1.2The Input Range of the Surrogate Model.
The input range was chosen to cover typical operating conditions met in a reactor for the removal of CO from H 2 streams by preferential oxidation of CO with small amounts of added O 2 .Also, operating conditions in a low temperature water-gas shift reactor are covered.[29] The input ranges are shown in table 1.

Plug-Flow Reactor Model.
We use a simple isothermal and isobaric plug-flow reactor model as described in our previous work [29] to showcase the suitability of the surrogate models for reactor simulations.The model is discretized in 200 cells of equal size in axial with the concentration c i,n (mol m −3 ) of species i in cell number n, the temperature T (K), the source term ṡi (mol m −3 s −1 ) of species i and the residence time τ n (s) in cell n obtained by dividing the total residence time by the number of cells.[29] A total pressure of 1 atm, a site concentration c Pt of 26.3 mol m −3 , a reactor length of 1 m and a gas velocity of 1 m s Ammonia oxidation on platinum is the key step in nitric acid production via the Ostwald process and plays an important role in automotive catalysis where it is used to remove excess ammonia from the exhaust of diesel vehicles.We consider the mechanism MA and SCHNEIDER developed based on density functional theory (DFT) calculations.[48] This mechanism aims to describe the reaction kinetics of both applications despite the widely differing operating conditions.It consists of 15 reversible reaction featuring six gas phase species and ten surface species as shown in table S2 in the ESI † .
Reaction rates r j (s −1 ) are calculated as with the rate constant k j of reaction j (m 3 mol −1 s −1 for adsorption and s −1 else), the concentration c i of gas species i (mol m −3 ), the surface coverage θ l of species l (unitless) and the reaction order ν i,j (unitless).
For each reaction condition given by a temperature and the partial pressures of NH Numerically, integration is performed using MATLAB's ode15s solver [49] with an integration time of 10 15 s, a relative tolerance of 10 −8 and an absolute tolerance of 10 −50 .
The rate constants for surface reactions k surf j are calculated as with the universal gas constant R (J mol −1 K −1 ), the temperature T (K), the activation energy E A,j (J mol −1 ) and the preexponential factor A j (s −1 ).Latter is calculated using transition state theory with the partition functions q TS for transition states and q IS for initial states as shown in equation 14 with the Boltzmann constant k B and the Planck constant h.The partition functions (unitless) are calculated using the harmonic oscillator model with the vibrational frequencies ν obtained by DFT calculations (s −1 , excluding the imaginary ones, values are given in table S3 in the ESI † ) of the N vib vibrational degrees of freedom.The rate constants for adsorption reactions k ads j are calculated as with the molecular mass M i (kg mol −1 ) and the sticking coefficient s 0,i (unitless).Desorption rate constants k des j are calculated using the equilibrium constant K p as follows.
with the energy differences ∆E (J mol −1 ) obtained by DFT calculations and the reaction entropies ∆S (J mol −1 K −1 ).Gas phase entropies are obtained from the NIST database [50] using data from [51] while surface species entropies are calculated using the harmonic oscillator model as shown in equation 19.
We chose this mechanism because in contrast to simpler ammonia oxidation mechanisms considered in our earlier works [15,26,28,30] it is more detailed and does not neglect the consumption of several gas species.In consequence, all species except NH 3 , H 2 O and N 2 show source terms changing sign in the range of reaction conditions considered.Therefore, it is not possible to rely on modeling only strictly positive source terms and compute all other species source terms from the atom balance.Rather, at least one species with sign changing source terms has to be modeled for use in a reactor simulation.We focus on predicting the source terms of NH 3 , N 2 and N 2 O.

2.2.2
The Input Range of the Surrogate Model.
The input range was chosen to cover typical operating conditions met in a reactor for the Ostwald process i.e., maximal 12 % ammonia in air at up to 5 bar.The input ranges are shown in table 2 and are sampled uniformly in the inverse temperature and the logarithmic partial pressures.

Neural Networks
Shallow neural networks are implemented using PyTorch.[52] All neural networks are fully connected, use tanh activation and have an equal number of nodes in all hidden layers.The number of nodes per hidden layer is chosen to meet a total number of adjustable parameters up to 5000.Hidden layer weights are initialized using PyTorch's kaiming uniform function.The proposed neural network architecture is shown in figure 3. It takes the thermo-chemical state of the reactor simulation consisting of temperature (K) and partial pressures p i (bar) as input.Those values are transformed as shown in equations 20 and 21 and further a linear transformation is applied which maps the training data to the interval (-1, 1) (equation 22).Since these operations do not change during training, they can alternatively be done in a data preprocessing step.
The preprocessed thermo-chemical state is fed to the hidden layer(s) which are fully connected and use tanh activation.
There is a single output node per species which contains a latent representation of the transformed source terms y = asinh ( ṡ/z).This node uses the activation function z • sinh(y) to restore outputs in the form of the original source term target values ṡ.The only parameters to be learned are the weights in and out of the hidden layer(s) and optionally the parameter z of the sinh activation.When alternatives to the hyperbolic sine function are discussed, the output activation is replaced by either z • nl −1 (y), gpow −1 (y, n) or exp(y).
In contrast to the latent transformation approach, the conventional approach computes transformed target values in a preprocessing step (asinh( ṡ/z) in our case) and uses a standard fully connected neural network to learn the transformed target values.Consequently, during training the differences between exact and estimated transformed values, e.g.measured by the root-mean-square error of transformed values, are minimized instead of a typical error metric of interest like the relative error of the source terms.
Direct modeling means dropping both, the input transformation in equations 20 and 21 as well as the output activation.However, the original steady state source terms are used as targets.

Data Sets.
This work uses 35 000 input-output pairs of reaction conditions and resulting steady state source terms for both test cases.
The training set contains 25 000, the validation set contains 5000 and the test set contains another 5000 input-output pairs.Every input-output pair is contained in only one of the three data sets.The data for the preferential oxidation test case are identical to the ones used in our previous work. [29]

Training.
Neural network training is performed using full batch.The LBFGS algorithm with strong wolfe line search and an initial learning rate of 1 is used to update weights during training until the chosen loss evaluated on the validation set did not reduce over the last 1000 epochs.We do not perform excessive hyper-parameter tuning as the focus of this work lies on the general modeling strategy for steady state source terms.

Error Measure.
In physics (and chemistry) small quantities are typically as important as others.[53] Therefore, also slow reactions have to be modeled with high precision for successful reactor simulations.[54] Consequently, we use the mean absolute relative error (MARE, equation 23) of the test set to measure the performance of the regression models built.
with the number of points N , the target y and the prediction h.

Loss Functions.
Different loss functions are used depending on the modeling strategy.The root mean squared relative loss L rel (equation 24) is minimized when source terms are used as target data.
The root mean squared absolute loss L abs (equation 25) of transformed values is minimized when using transformed source terms as targets.

Hard-and Software
Datasets for this work were produced using MATLAB Version R2021a.Neural network training and inference were performed using PyTorch Version 1.10.Prediction times were measured using a Ryzen 7 5800X CPU @3800 MHz and a NVIDIA GFORCE RTX 3070 GPU running Linux Mint 20.3 as an operating system and averaged over 1000 identical calculations.

Results and Discussion
We discuss the proposed latent hyperbolic sine transformation in detail using the preferential oxidation of CO as a showcase mechanism.The obtained models are validated in a plug-flow reactor simulation and compared to our previous work based on approximating the rates of the rate-determining steps.[29] A DFT-based mechanism for the ammonia oxidation under conditions of the industrial Ostwald process is used as a second test case.Finally, we discuss alternatives to the hyperbolic sine function.

Test case 1: Preferential Oxidation of CO
The latent hyperbolic sine transformation will be presented in detail for the preferential oxidation of CO in the presence of H 2 with a platinum catalyst.This system is important in H 2 production for fuel cell applications [46] and has been the first detailed surface mechanism modeled with neural networks in literature [29].It can be described by three global reactions: (water-gas shift) As the mechanism contains five gas phase species and three elements, at least 5 -3 = 2 species source terms must be modeled to fully describe the reaction progress in the system.Analogous to the procedure in our previous work [29] we focus on the source terms for O 2 and CO.
The equilibrium of the CO and H 2 oxidation reactions is fully on the right side, so that the source term of O 2 is negative under all relevant reaction conditions.Consequently, a logarithmic transformation can be applied to model the O 2 source terms.[14,15,17,19,27,29] As typical for systems of practical relevance, other species in the mechanism (including CO) change the sign of their source term depending on the reaction conditions.For those, logarithmic transformation cannot be applied in a meaningful way.We propose the latent hyperbolic sine transformation, overcoming this limitation.Figure 4a shows a histogram of the distribution of CO source term values while figure 4b shows the same data on a logarithmic scale.

Modeling CO Source Terms.
As the well-known logarithmic transformation cannot be applied to CO source terms due to the occurrence of negative values, the fallback approach is to directly model source terms without any transformation.However, in alignment with the results of our previous work [29] neural network models of reasonable size are not suited for capturing the strong non-linear character of the data.Figure 5 shows that relative prediction errors are around 100 % or higher.In contrast, the latent hyperbolic sine transformation leads to significantly more precise predictions.For example, application ready models with relative prediction errors of 1 % can be obtained with 1000 parameters in five hidden layers and less than 15 minutes of training time (see figure S1 in the ESI † ).As neural networks are usually deployed with orders of magnitude more parameters and layers, all models used in this work can be considered small.WAN et al. for example used about 180 000 parameters for modeling chemical kinetics.[55] In our previous work we proposed another method dealing with chemical source terms changing sign by approximating the rates of the rate-determining step.[29] As shown in figure S4 in the ESI † , it yields more accurate models than the latent hyperbolic sine transformation proposed in this work.That is achieved by exploiting detailed insights from a reaction path analysis.An analysis, however, is not feasible when dealing with experimental data or highly complex computational models.In contrast, the latent hyperbolic sine transformation is designed to work without any previous knowledge about the mechanism and therefore poses the first method to obtain accurate and lightweight surrogate models for detailed surface kinetics when dealing with experimental data or highly complex computational models.
So far, we applied the inverse hyperbolic sine transformation in a latent way.To investigate the effects of the latent approach, we also applied the inverse hyperbolic sine transformation in the conventional way.This means data are transformed in a preprocessing step and the transformed values are used as targets to be learned by a conventional neural network.As shown in table 3, the conventional approach leads to relative prediction errors above 1000 %, while the latent approach achieves 15 %.This can be attributed to the fact that instead of the relative error, the conventional approach minimizes an error measure defined in terms of the transformed values asinh( ṡ) which we call MATE.This error measure, however, is not relevant for reactor simulations.See section 5 in the ESI † for a comparison between the relative error and MATE.
Table 3: Prediction errors of lightweight neural networks with 320 adjustable parameters and a single hidden layer modeling the CO source terms ṡ with two different approaches: Latent transformation minimizes the relative error during training resulting in an average accuracy of 15 %.Conventional transformation minimizes the error of transformed values MATE instead. 3Therefore, its predictions are two orders of magnitude less accurate, as measured by the relative error.Because MATE is not a relevant measure for the application in reactor simulations, the slightly better MATE score of the conventional approach poses no considerable advantage over the latent approach.The equations show how the errors are computed using the neural network predictions h  We validate the neural network models by simulating an isobaric and isothermal plug-flow reactor under conditions of the preferential oxidation of CO in H 2 rich environments.We used the neural network representations of CO (5000 parameters distributed over 5 hidden layers with a relative prediction error on the validation set of about 0.5 %) and O 2 (1 hidden layer with 750 parameters and a relative prediction error on the validation set of about 0.05 %) kinetics to replace the steady state source term calculations in the reactor simulation as show in equation 9.The source terms of all other species are calculated using the atom balance.Figure 6 shows that the concentration profiles obtained from the neural network models (dotted line) cannot be visually separated from the exact solution (full lines).The lower part of figure 6 shows, that the relative difference between both solutions is about 1 % or lower.Note however, that calculating the neural network estimation of the source terms is approximately 50 000 times faster than evaluating the exact steady state kinetics, see table S4 in the ESI † .Using a consumer grade graphics card for inference increases the speed-up to 100 000.
In summary, the neural network models obtained with latent hyperbolic sine transformation are well suited for replacing the computationally expensive steady state source term calculations associated with heterogeneous catalysis.They yield accurate solutions and speed-up the calculations by factor 100 000.

Test case 2: Ammonia Oxidation in the Ostwald process
To test the generality of the latent hyperbolic sine transformation approach we apply it to a second detailed surface mechanism.We consider the DFT-based ammonia oxidation mechanism from MA and SCHNEIDER [48] for the Ostwald process under industrially relevant conditions.NH 3 and N 2 source terms do not change sign and can therefore be modeled using the well-known logarithmic transformation.Lightweight neural networks with a single hidden layer and 500 parameters archive relative prediction errors around 0.1 %.Since N 2 O source terms do change sign, the logarithmic transformation cannot be applied.Again, direct modeling does not yield usable models as it leads to relative prediction errors near 100 %.Using the asinh transformation in the conventional way increases accuracy to about 15 %.The latent variant of the asinh transformation performs even better leading to errors near 1 %, see figure 7. Overall, the latent asinh transformation allows lightweight and therefore computationally cheap neural network models ready for use in reactor simulations.

Alternatives to the Hyperbolic Sine
We compare the prediction accuracy of three different transformation functions: the inverse hyperbolic sine asinh(x) (eq.1), the Bi-Symmetric log transformation nl(x) (eq.2) and the generalized root function gpow(x, n) (eq.3).The three functions perform similarly for CO source term predictions when applied with the latent approach, see figure S5a in the ESI † .However, for O 2 source term predictions gpow(x, n) performs significantly worse than asinh(x), nl(x) and log(x), see figure S5b in the ESI † .This might be attributed to the fact, that the logarithmic rate transformation that is commonly used can be motivated by the Arrhenius equation and the power law expressions the rate calculations are based on and might therefore be ideal for transforming source term data without sign changes.Consequently, deviating from logarithm-like behavior can be expected to have a negative effect on accuracy.Since the adjustable parameter variant did not lead to higher accuracy, we conclude that the lowest target value occurring in the training data is a good initial guess for z.However, there seems to be no obvious initial guess for the parameter n of the generalized power function.The data shown use n = 12 for CO and n = 18 for O 2 source terms as these values provided the most accurate results in an initial testing phase.
In summary, all three functions studied in this work are suitable for latent transformation of steady state source terms changing sign and perform about similar.We suggest using the inverse hyperbolic sine function to get started as nearly all numerical libraries provide an efficient implementation.

Conclusions and Future Work
This work proposes the latent hyperbolic sine transformation for efficient neural network models of detailed surface kinetics.As the standard logarithmic transformation is not applicable to source terms changing sign we introduced the asinh function that behaves similar to the logarithm but can deal with negative numbers and zero.Further, we work with latent (hidden) representations of the transformed data.This means we embed data transformation directly into the model instead of the conventional preprocessing of data.This allows to decouple the error metric optimized during training from the data transformation and therefore increases the model accuracy significantly.
The development of the new approach is demonstrated using two test cases.The first test case is a detailed surface reaction mechanism describing the oxidation of CO in presence of H 2 as well as the water-gas shift reaction.It includes 5 gas species, 9 surface species, and 36 reactions.Models are validated by implementing them in plug-flow reactor simulations.While the neural network-assisted solution is visually not separable from the exact solution, it is computed 100 000 times faster.Neural network training used 25 000 data points and takes less than an hour on a consumer grade PC.
The second test case is a detailed surface mechanism based on density functional theory calculations of the ammonia oxidation on platinum.The latent hyperbolic sine transformation increases model accuracy significantly and allows using very small and thus computationally efficient neural networks in detailed reactor simulations.
In our previous work, we reached similarly good results by performing a reaction path analysis to exploit the detailed insights into the reaction mechanism available.[29] The present work, however, can produce accurate models of detailed surface kinetics without any previous knowledge about the underlying mechanism.
Currently, there is huge interest in determining kinetic models directly from experimental data.Especially neural ODEs [56] are promising for generating a representation of the reaction kinetic ODEs from experimental data directly [57][58][59].
In accordance with our findings, it is reported that the parameter z of the inverse hyperbolic sine transformation (equation 1) can substantially affect regression results [39].While several works developed strategies for finding the best value others even argue not to use this transformation at all, emphasizing that the optimal parameter value is not given by theory [40].In this work we embedded the transformation function into a neural network.This allows optimizing all transformation parameters automatically during training and could potentially be used to identify the optimal parameter value for related problems like economic analyses.
The concept of latent data transformation is not limited to neural networks and can be used in all machine learning methods that allow customizing the loss function.For this purpose we define the custom loss function L * that applies the inverse of the desired data transformation f −1 (x) to the model outputs h before comparing them to the target values y in the conventional loss function L, see equation 26.This approach yields the same results as embedding f −1 (x) in the output layer of a neural network but does not allow optimizing transformation parameters like z from equation 1 during training.
Overall, the approach proposed in this work will not only significantly facilitate the application of detailed mechanistic knowledge in the simulation-based design of realistic catalytic systems, but it also presents a first step towards learning detailed surface kinetics directly from data.

Figure 2 :
Figure2: Comparing conventional and latent training strategy.They differ in how data transformation is applied and which error metric is optimized during training.In both cases, a machine learning model predicts chemical rates using reaction conditions as inputs.The loss is computed to evaluate the prediction accuracy and the model parameters are updated accordingly.Conventionally, the transformation is applied to the data in a preprocessing step.The transformed values asinh(r) are then learned by a standard neural network.The disadvantage of this approach is that during training a loss function with respect to the transformed values has to be used instead of the actual error measure of interest.We propose to work with latent (hidden) representations of the transformed data.This means that a model with standard fully connected layers learns a latent representation of the transformed rates.Afterwards, the inverse of the transformation function is applied as a custom output activation in the final layer so that outputs represent the original rates.Hence, the error metric of interest can be optimized during training.If all parameters of the transformation function are fixed before the optimization, the inverse transformation can alternatively be implemented in a customized loss function.

Figure 3 :
Figure 3: Scheme of the recommended architecture.The neural network takes reaction conditions in form of temperature and partial pressures as input.Those values are transformed and linearly scaled before being fed into conventional hidden layers with tanh activation.The last layer holds a single node per gas phase species and contains y, a latent representation of the transformed target values.A hyperbolic sine activation is applied to obtain outputs in the form of source terms.

Figure 4 :
Figure 4: Histogram of the CO source term distribution . . .

Figure 5 :
Figure5: Relative prediction error of CO source terms dependent of the total number of learnable parameters in a neural network compared between direct modeling and the latent hyperbolic sine transformation.
|asinh( ṡ) − h (asinh( ṡ))| 120 % 110 % In summary, the latent hyperbolic sine transformation works well because of two major points: 1.The inverse hyperbolic sine transformation brings target data into a similar order of magnitude and leads to a more linear input-output relation.2. Using the transformation in a latent fashion gives full control over the training objective while maintaining the advantages of data transformation.

Figure 6 :
Figure 6: Plug-flow reactor model of the preferential oxidation of CO in H 2 rich environments at three different temperatures.The upper part shows CO and O 2 molar fractions along the reactor length.The neural network solution (dotted lines) cannot be visually separated from the exact solution (full lines).The lower part shows the relative difference between both solutions.Feed composition and other details are described in section 2.1.3.

Table 1 :
[29]t range for reaction conditions (temperature and partial pressures) which are solved for steady state.The ranges are identical to the ones used in our previous work[29]For each cell, steady state kinetics are determined, and the gas phase concentrations are updated according to equation 9, −1are used.The resulting residence time τ is 1 s for each cell.The feed consists of 40 % H 2 , 1 % O 2 , 10 % H 2 O, 1 % CO and 10 % CO 2 with N 2 as the balance species.
If conditions outside the input range defined in table 1 occur, they are set to the corresponding minimum or maximum values to avoid extrapolation of the neural network models.2.2 Ammonia Oxidation in theOstwald Process 2.2.1 The Reaction Mechanism.
3, O 2 , H 2 O, NO, N 2 O and N 2 , steady state surface coverages are calculated.This is done by integrating equation 11 in time until dθl /dt = 0. Gas composition and temperature are assumed to be constant during this process.The obtained surface coverages are used in equation 12 to calculate steady state source terms ṡi (mol m −2 s −1 ) using the site density Γ which is assumed to be 2.3 × 10 −5 mol m −2 .

Table 2 :
[48]t range for reaction conditions (temperature and partial pressures) which are solved for steady state within the ammonia oxidation mechanism by MA and SCHNEIDER[48] Figure7: Relative prediction errors of lightweight neural networks with 500 parameters in a single hidden layer modeling N 2 O source terms using 25 000 training data points.Direct modeling without data transformation does not yield accurate results.Using the asinh transformation conventionally, i.e. in a preprocessing step, reduces the prediction errors to approximately 15 %.When the asinh transformation is implemented in a latent fashion, the models yield application ready predictions with relative errors near 1 %.