Open Access Article
Benjamin C.
Koenig†
,
Suyong
Kim†
and
Sili
Deng
*
Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. E-mail: silideng@mit.edu; Tel: +1 617-452-3064
First published on 28th July 2025
Efficient chemical kinetic model inference and application in combustion are challenging due to large ODE systems and widely separated time scales. Machine learning techniques have been proposed to streamline these models, though strong nonlinearity and numerical stiffness combined with noisy data sources make their application challenging. Here, we introduce ChemKANs, a novel neural network framework with applications both in model inference and simulation acceleration for combustion chemistry. ChemKAN's novel structure augments the generic Kolmogorov–Arnold network ordinary differential equations (KAN-ODEs) with knowledge of the information flow through the relevant kinetic and thermodynamic laws. This chemistry-specific structure combined with the expressivity and rapid neural scaling of the underlying KAN-ODE algorithm instills in ChemKANs a strong inductive bias, streamlined training, and higher accuracy predictions compared to standard benchmarks, while facilitating parameter sparsity through shared information across all inputs and outputs. In a model inference investigation, we benchmark the robustness of ChemKANs to sparse data containing up to 15% added noise, and superfluously large network parameterizations. We find that ChemKANs exhibit no overfitting or model degradation in any of these training cases, demonstrating significant resilience to common deep learning failure modes. Next, we find that a remarkably parameter-lean ChemKAN (344 parameters) can accurately represent hydrogen combustion chemistry, providing a 2× acceleration over the detailed chemistry in a solver that is generalizable to larger-scale turbulent flow simulations. These demonstrations indicate the potential for ChemKANs as robust, expressive, and efficient tools for model inference and simulation acceleration for combustion physics and chemical kinetics.
For kinetic model discovery, a variety of learning algorithms, model structures, and optimization approaches have emerged. The chemical reaction neural network approach,5 for example, is capable of inferring reaction networks and parameters from limited species trajectory or heat release data6–8 by directly enforcing the Arrhenius and mass action laws in a neural network structure. The sparse identification of nonlinear dynamics (SINDy) approach is similarly capable of extracting models from experimental data by assuming various functional relationship building blocks and learning the precise forms needed to fit the data.9 Further optimization and inverse modeling tools exist for other chemical kinetic inference problems,10–12 with a key piece of many physics-based model inference techniques being a certain (and often substantial) degree of prior knowledge of the governing equations, reaction pathways, and reactants. A critical need across all of these methods is robustness to noisy data and model uncertainty, conditions common in combustion kinetics.5,7,13
On the solver front, researchers have proposed methods for dimension reduction6,14 and computational acceleration15–18 to handle stiff, high-dimensional systems. For instance, Owoyele and Pal17 recently proposed ChemNODE, a creative and high-performing tool that uses the neural ODE concept of Chen et al.19 to replace a complete chemical kinetic model with a collection of neural networks, one for each tracked thermochemical quantity. By using these networks to directly link the current thermochemical state to the chemical source term with no other problem-specific treatment, computational acceleration was enabled in the as-studied homogeneous reactor while retaining the generalizability of the surrogate model to higher-dimensional reacting flows where such acceleration becomes significantly more meaningful. Other recent works leverage DeepONets20 to directly learn stiff integrators using neural operators, either with problem-specific network structures14,21 or by mapping from the current state to the source term, similar to ChemNODE,18,22 allowing for significant computational acceleration downstream.
These strengths all come with drawbacks, however. Owoyele and Pal,17 for example, found that while the neural ODE approach's clever exploitation of the dynamical structure of chemical kinetic models can provide high accuracy, the nonlinearity inherent to such models creates a challenging inference problem for the underlying MLP layers. This led the authors to omit a handful of species (including the key H radical) and break the training up into multiple unique and likely redundant networks, rather than a single cohesive architecture. Similarly, DeepONet techniques are cheap to evaluate and capture steady state behavior well, but their accuracy can suffer in stiff regions of the data that the integrator (which with DeepONets must be inferred directly by the network, as they do not explicitly leverage existing ODE solvers) can learn to skip over without significant penalties. We thus find that despite these recent novel and productive efforts, the training of efficient surrogate models for combustion chemistry remains a challenging task with open questions due to stiff behavior in the solution profiles and numerical instability in the nonlinear training processes.
Kolmogorov–Arnold networks were proposed recently23 as an alternative to multi-layer perceptrons (MLPs) for general neural network applications, where instead of learning weights and biases on fixed activation functions, the shapes and magnitudes of the activation functions themselves are learned via gridded basis function sums and products. This shift was proposed to increase neural convergence rates, accuracy, and generalization. Echoing the development of traditional MLPs, physics-informed KAN structures were proposed shortly after, where it was found that certain knowledge of physical laws embedded in the training process can help the KAN converge to a physically meaningful solution.24–26 Similar developments have been studied where direct encoding of physical insights or specific geometries into novel KAN structures has shown significant promise, such as including physical symmetries for quantum architecture search27 or irregular geometries for flow simulations.28 The inference benefits of KANs were additionally demonstrated to extend to dynamical system modeling in the Kolmorogorov-Arnold network ordinary differential equations (KAN-ODEs) framework,29 where KANs replaced MLPs in the neural ODE algorithm,19 and have since been demonstrated in a variety of settings, including predator-prey dynamics, shock formation, complex equations, phase separation,29 personalized cancer treatment,30 and flashover prediction.31
In short, KAN-ODEs leverage KAN networks as gradient getters, while maintaining standard ODE solvers to integrate the solution profiles as in a traditional numerical approach. KAN-ODEs were shown to retain all major KAN benefits while also accessing the dynamical system inference capabilities of the neural ODE framework,29 which would appear to lend them to efficient chemical kinetic system modeling. However, a few key questions remain. KAN-ODEs have so far only been tested in relatively small systems (up to two-dimensional state variables), making their applicability and performance in larger, practical combustion systems unknown. Additionally, KANs in general have been shown to suffer substantially when trained with noisy datasets.32 While we theorize that their direct coupling to ODE integrators combined with their sparse parameterization and smooth activations should provide KAN-ODEs with strong robustness to noise regardless of previously reported issues in generic KANs, this has not yet been studied quantitatively.
In this work, we aim to develop a chemistry KAN-ODE (ChemKAN) framework for chemical kinetic modeling by designing the Kolmogorov–Arnold network gradient getter to learn the spatially-invariant relationship between the current thermochemical state and the chemical source terms. In the operator splitting regime, such a surrogate can be directly coupled to existing CFD or machine learning-based flow solvers for multi-dimensional combustion simulations in arbitrary physical domains. Developed across two case studies of increasing complexity, the ChemKAN framework contains a physics-informed, two-stage training process that enforces the direct coupling between species production and heat release, and additionally contains a soft constraint for element conservation. We further stabilize the optimization problems by implementing forward sensitivity analysis. Stiff chemistry is fully resolved via an attached numerical ODE solver. We study two cases here to explicitly probe the key behaviors and gaps in the model inference and solver acceleration literature identified above, both of which are addressed with ChemKANs.
First, we demonstrate the capability of ChemKANs to extract realistic and multi-species models from synthesized experimental data in a comparison against DeepONets (DONs) in biodiesel production modeling. In this case, increasing levels of noise in the training data test the abilities of the two different approaches to extract the true underlying behavior, and evaluate the robustness of ChemKANs (and KAN-ODEs in general) to noisy data in light of the recent work suggesting the weakness of KANs in the presence of noise.32 Second, we demonstrate ChemKANs as efficient and time-saving surrogate models in an even larger system by learning zero-dimensional hydrogen combustion behavior using homogeneous reactor data that are subject to stiff dynamics, in a study designed to facilitate direct comparison against the MLP-based ChemNODE structure17 that this second ChemKAN application was inspired by. In contrast to ChemNODE, where a reduced subset of the thermochemical state (excluding H, HO2, and H2O2) was learned using separately trained, non-interacting networks, we learn all species and temperature profiles here with a single compact ChemKAN network while retaining similar computational acceleration and performance. Across these two distinct cases, we demonstrate the strong capability and robustness of ChemKANs as efficient and expressive tools for both modeling and inference in combustion chemistry.
![]() | (1) |
is the molar production or consumption rate.33 In the commonly used operator splitting approach, this homogeneous reactor is also applicable to higher-dimensional, turbulent simulations. While the specific functional form may differ across systems,
is typically a strong function of temperature (for example, in the Arrhenius form
) as well as the current species concentrations. Additionally, energy conservation can be modeled by tracking the system temperature as in eqn (2).
In some cases, eqn (1) sufficiently describes a chemical process when heat release or consumption is negligible. In other cases, such as combustion and pyrolysis processes, chemical reactions entail exothermic and endothermic behaviors. In these cases, the temperature of a system can be tracked with energy conservation, as in
![]() | (2) |
![]() | (3) |
| u(u(0), t) = MLPopt[MLPbr(u(0), θbr)⊙MLPtr(t,θtr),θopt]. | (4) |
![]() | (5) |
We remark again that the split networks MLPi in this framework appear inefficient. In fact, this multi-network approach requires an (m + 1)-step training process, where each network is trained with the remaining thermochemical scalars held frozen. While this separated training strategy facilitates model convergence (and in fact was found necessary17 to converge the MLP gradient getters), it increases training cost significantly with the number of species, especially when considering that no knowledge is shared between networks, even for common reactants. See Fig. 1 for visualizations of these two existing methods, as well as a simplified depiction of the method we will propose in the next sections.
![]() | (6) |
![]() | (7) |
). In other words, each input is connected to each output with a unique learnable activation function (much like in an MLP, where each input is connected to each output with a unique learnable weight), leading to a total of nl·nl+1 activation functions connecting the lth and (l + 1)th layers. We use RBF basis functions in the current work as was shown previously for KANs34 and KAN-ODEs,29,35 although the choice of ϕ is flexible, and many other options have been proposed in the literature including B-splines,23 ReLU functions,36 and various other combinations.37,38 The AddKAN structure has an inherent problem of expression using only additive operations, limiting its concise expressivity for problems involving substantial use of the multiplication operator. Therefore, recent studies have proposed new layer structures to address this issue and improve parameter efficiency.35,39 Here we use LeanKAN,35 which has shown promise to be the most effective and efficient for both additive and multiplicative operations. The LeanKAN structure is achieved by summation of two separate terms, yaddl and ymultl, such that![]() | (8) |
![]() | (9) |
![]() | (10) |
Finally, the activation functions themselves can be expressed with gridded basis functions34 as per
![]() | (11) |
![]() | (12) |
![]() | (13) |
![]() | (14) |
| = Linear(KANkin(u, θkin), θthermo) + KANcor(u, θcor). | (15) |
We define the terms in these equations line by line throughout this paragraph. In eqn (13), we encode prior knowledge of the true species production/consumption rate relationship of eqn (1) into the ChemKAN by computing the species-only production rate dũ/dt from the entire state input u, via a KAN network (KANkin) parameterized by θkin. Then, from the true energy equation (eqn (2)), we recognize that the temperature rate dT/dt is a simple linear combination of the species production rates, with scaling factors defined by the enthalpy h and specific heat values cp (or alternatively, in eqn (14) by the m scalars in the linear mapping parameterized by θthermo). Thus, the crux of the thermodynamic superstructure is a computationally trivial, simple linear sum of the already-evaluated outputs of the kinetic core, as shown in eqn (14) and (15). One level deeper, we recognize that a secondary effect in the true eqn (2) is the dependence of the thermophysical parameters, specifically cp, on the temperature and species mixture. The error stemming from the first-order Linear(·) approximation's failure to account for such thermophysical parameter variation is reflected in the ε term of eqn (14), which in the final formulation of eqn (15) is accounted for via a supplemental single-layer, single-output KAN correction carrying forward a functional dependence on the species and temperature inputs, parameterized by θcor. Overall, ChemKAN is composed of a kinetic core structure and a thermodynamic superstructure that strongly mimic the true governing equations, operate largely in series, and include full sharing of all reaction and species production information. This architecture allows for versatile and flexible modeling by turning “on” or “off” the energy equation for standalone, kinetic core-only modeling or combined kinetic and thermodynamic modeling.
| KANkin(u(t), θkin) = (Ψlean1∘Ψadd0)(u(t)), | (16) |
| KANcor(u(t), θcor) = Ψadd0(u(t)). | (17) |
![]() | (18) |
Here ûk denotes the kth thermochemical state normalized to the [0, 1] window by subtracting the minimum then dividing by the range. ûpredk(tj, θ) is the network prediction for this state quantity at time tj with the network parameters θ (including the kinetic, thermo, and correction parameters), while ûobsk(tj) is the corresponding training data. Nt is the number of datapoints in the temporal profiles. In the first MSE term, n* = m when training the kinetic core, as only species profiles are learned. For the thermodynamic superstructure, n* = m + 1 is used to train the added temperature output. We also provide an optional element conservation physics-informed loss term, or PINN term,43,44 to encourage the ChemKAN to find models that obey physical laws. There, Ne is the number of elements in the data (i.e., H, O, N). For a given element i, the element conservation term begins by computing the mass fraction difference across all m species between the current ChemKAN timestep and the initial condition. This is then converted to an elemental conservation difference via Nki, the atom count of element i in species k; Wi, the atomic mass of element i; and Wk, the molar mass of species k. This conserved term is computed at all timesteps j across all elements i, and then weighted by αPINN (here αPINN = 10−4). In the examples below, we use only the MSE loss for all biodiesel model inference results, and the MSE loss with the PINN term for H2 model acceleration.
• Training stage 1—core kinetics θkin: all m + 1 inputs are used, and only the m species production rate outputs are learned (see eqn (13)). The thermodynamic superstructure of eqn (14) and (15) is not used in this stage, and the input temperatures are simply read in from the training data to provide the kinetic core of the network with a simpler training task. See the grey highlight in Fig. 2. For cases without heat release, this step in isolation is sufficient for a complete model. With heat release, we move to stage 2 once stage 1 is converged.
• Training stage 2—thermodynamic superstructure θthermo and θcor: once stage 1 is converged, the thermodynamic superstructure is added and the temperature rate is explicitly learned. The entire network (eqn (13) stacked with eqn (15)) is updated in order to infer the temperature together with all species. See the red highlight in Fig. 2, which is stabilized during training with the already-converged behavior of the grey kinetic core.
This two-stage training process contrasts with the m + 1 stage training process of ChemNODE,17 which requires m + 1 networks to all learn their own distinct representations of what we know to be shared kinetic and thermodynamic governing laws. The current ChemKAN approach, while trained in two distinct stages, has an overall structure resembling a simple feedforward KAN thanks to its stacked design, where all kinetic and thermodynamic information is shared across all inputs and outputs. For downstream evaluation, a single forward pass through the combined network of Fig. 2, or alternatively through eqn (13)–(15), predicts the complete thermochemical state vector u(t).
While adjoint sensitivity analysis was used in the original KAN-ODE paper29 to compute the gradients of a loss function d
/dθ, in the current work we note the numerical instability present in many chemical kinetic modeling problems due to numerical stiffness,3,17,18,22,45 potentially leading to failures with adjoint sensitivity analysis in the training process.3 To prevent this, we implement forward sensitivity analysis to mitigate this potential stiffness issue in the ODE solver.3 ChemKANs and their corresponding forward sensitivity equations are solved using an ODE integrator of Tsit5 (Tsitouras 5/4 Runge–Kutta method46). The learnable parameters θ are trained by the Adam optimizer47 with a learning rate of 2 × 10−3.
With the three byproducts di-glyceride (DG), mono-glyceride (MG), and glycerol (GL), the three-reaction system can be expressed as
![]() | (19) |
![]() | (20) |
![]() | (21) |
exp(−Ea,i/RT), with i = 3 for the three reactions. As studied previously,5 we generate data in this case using Ea = [14.54, 6.47, 14.42] kcal mol−1 and ln(A) = [18.60, 7.93, 19.13], with isothermal experiments at temperatures randomly sampled in the range of 323 K to 343 K. We define the species scalar quantities Y here as concentrations rather than mass fractions, to match the convention with the governing equations. Initial TG and ROH concentrations are randomly sampled uniformly between 0.5 and 2 with all other intermediate and output species initialized at zero. 20 training data sets and 10 testing data sets are generated, with 30-second time windows in both consisting of 30 sampled points. The temperature-dependent yet isothermal reaction rates present in this system motivate the use of the kinetic ChemKAN core structure only (see the Hydrogen example below for use of the kinetic core together with the thermodynamic superstructure).
In this case we additionally probe the effectiveness of ChemKANs in the presence of significant experimental noise. To do so, we add increasing amounts of noise to the data and evaluate ChemKAN's performance against that of a standard DeepONet.20
In addition to significantly increased numerical stiffness and complexity, a key fundamental difference between the current combustion system and the previous biodiesel synthesis case is the presence of substantial two-way temperature coupling as per eqn (1) and (2). To account for this, we include the thermodynamic ChemKAN superstructure and the two-stage training process outlined in Sections 2.3.2–2.3.5.
A common characteristic of experimental data used for machine learning model inference is the presence of uncertainty or noise, which can cause even well-parameterized deep learning models to overfit as they struggle to distinguish genuine underlying trends from experimental artifacts. This is no different in traditional KANs, where small amounts of noise have been shown to severely limit inference capabilities.32 In the current work, we further probe whether coupling to inherently noise-robust ODE solvers helps ChemKANs to extract useful models from increasingly noisy data. To do so, we task ChemKANs with extracting models from the same dataset with varying amounts of noise added (up to 15%) as shown in Fig. 3. Surprisingly, the ChemKAN, even with significant amounts of noise, demonstrates strong robustness and a capability to infer smooth and accurate solution profiles that correspond well to the underlying true data. We provide detailed discussions on how ChemKAN performs compared to DeepONet in the following subsections.
To further probe the nuances of ChemKAN's convergence efficiency, we compare its results with those of DeepONet. The ChemKAN iterates roughly an order of magnitude slower but converges in fewer epochs (as also noted in prior work29). Thus, to facilitate fair comparison the ChemKAN was trained only for 5000 epochs, while the DeepONet was trained for 50
000 epochs. Two key distinctions between ChemKAN and DeepONet are as follows. First, as discussed earlier, the extremely sparse ChemKANs (toward the left half of Fig. 4(A)) are able to reach remarkably low error even with just 78 parameters, while the DeepONets see significantly worse performance at sparse parameterizations (also seen in the left half of Fig. 4(A)). Secondly, we note that the DeepONet sees significantly higher order neural convergence in the training results, allowing it to surpass the training performance of the ChemKAN at above 200 parameters, with remarkably strong training accuracy for the largest, 456-parameter DeepONet studied here. Linear fits with slopes are shown in Fig. 4(A) to illustrate this point, where the last ChemKAN and last two DeepONet points are excluded as they begin to plateau in training loss. From this observation, a large-enough DeepONet seems to outperform ChemKAN when looking only at the training losses.
To further contextualize the structural efficiency, we discuss results for the testing error of the same neural convergence runs in Fig. 4(B). Here, we notice a significant departure from the training convergence rates in Fig. 4(A). Unlike the neural convergence for training data, the ChemKAN is seen to outperform the DeepONet at all sizes, with the largest ChemKAN leveling out and retaining nearly the same testing performance as the second-largest ChemKAN. The DeepONet, while enjoying a faster neural convergence rate below 308 parameters, notably fails to plateau and instead appears to diverge at higher parameter counts. When compared against the training results in Fig. 4(A), we observe two distinct modes of training saturation. Saturation, or the point where the linear fit no longer holds, appears to occur for the two largest DeepONets. For the ChemKAN, we might either interpret the entire profile to be saturated, or highlight the single largest network as the saturation point. Regardless, what we observe in these high-parameter networks is high robustness in the ChemKAN to overfitting (with a flat testing loss plateau), compared to the significant overfitting and divergence seen in the DeepONet past 308 parameters (i.e., the last two testing points seeing increasing loss).
It is unsurprising that a standard deep learning technique begins to overfit a small dataset when given a large number of parameters. The DeepONet overfitting past saturation leads to further decreases in training loss accompanied by significant increases in testing loss. What we do find surprising, however, is ChemKAN's apparent resilience to overfitting, even with similar parameter counts. In this context, its original failure to reach the same training performance as the DeepONet appears to be a strength of the method rather than a drawback, as it reaches a minimum value for both training and testing and then remains robust to superfluously added parameters, while the DeepONet clearly requires additional care to avoid overfitting. This pilot neural convergence test suggests that ChemKANs are robust to overfitting, and motivates further study of their capability in a second, more realistic model inference scenario.
000 epochs, to enable fair comparison and limit overfitting in the DeepONet cases. In all cases, a DeepONet with 308 parameters was used to compare against a ChemKAN with 156 parameters. The 308-parameter DeepONet was chosen based on the results of Fig. 4, where this was seen to be the largest DeepONet before the test loss began overfitting. The 156-parameter ChemKAN, meanwhile, was chosen to roughly match the DeepONet's training and testing performance at zero noise. In other words, we chose the best-performing DeepONet size possible based on the preliminary neural convergence study, and then sized the ChemKAN according to the zero-noise performance of both networks. In more detail, the DeepONet had a three-layer branch network with eight nodes per layer and a two-layer trunk network with seven nodes in the first layer and eight in the second layer, with a final output layer converting these two eight-dimensional layers to the six-dimensional solution vector. The ChemKAN, meanwhile, had a single hidden layer with four nodes, two of which included multiplication operators (nmu = 2) as per the standard LeanKAN formulation,35 and three gridpoints per activation.
Fig. 5(A) shows average training results after 104 epochs across the 20 training and 10 testing cases at varied noise levels from 0% to 15%. As expected from Section 3.1.2, the training MSE with 0% noise shows that the 308-parameter DeepONet slightly beats the 156-parameter ChemKAN in training performance, and is slightly worse in terms of reconstructing the unseen testing data.
As increasing noise is added to the system, we see in the standard training and testing metrics of Fig. 5(A) that both networks unsurprisingly see increases in training and testing errors. Roughly, the increase in MSE error in the training losses scales with the square of the noise, as follows from eqn (18). For example, the ChemKAN sees an increase in MSE between 0% noise and 1% noise of 3.78 × 10−5, where the second degree scaling of the MSE suggests that a 25× larger increase of 9.45 × 10−4 might be expected between 0% noise and 5% noise. This is indeed observed, with an increase in this latter case of 9.64 × 10−4 ≈ 9.45 × 10−4. Thus, the increase in training error for both networks as noise is added can be attributed to the effect of the noise itself on the loss function (eqn (18)), and does not appear to indicate any problems with the two networks' capabilities to fit the increasingly noisy training data. For a direct comparison, The ChemKAN retains lower testing error throughout all tested noise values, and at 7% noise and above, it is actually able to reach a lower training error than the DeepOnet. Looking at the big picture, however, results in these two metrics remain within a factor of two of each other at all noise levels, indicating largely similar performance.
Upon further evaluation, we found that the training and testing MSEs evaluated on noisy datasets do not fully capture the effects of noise on useful model predictions, as overfitting can occur not only to the training conditions but also to the noise present in the data. To more effectively compare these frameworks, we introduce a noise-free MSE metric as was previously studied in the context of KANs,32
![]() | (22) |
MSE in eqn (18) in its use of the true, noise-free data rather than the noisy observations. This noise-free metric serves to quantify the capability of the two modeling approaches to accurately extract the true underlying model from noisy data, rather than overfit the noise or otherwise fail to deliver a useful model. With this metric, much more significant performance shifts can be seen in the noise-free testing values in Fig. 5(A). The impact of the added noise on the noise-free testing error for the ChemKAN is relatively small (a roughly 2× increase from 0% to 15% noise), suggesting that the ChemKAN is able to extract the true underlying behavior well from the noisy datasets. We reiterate the significance of this result in the context of prior work,32 where standard KANs were demonstrated to fail when faced with small added noise. In contrast, the DeepONet sees a 5× increase in noise-free testing MSE from 0% to 15% noise, with its final noise-free testing loss 4.4× larger than that of the ChemKAN.
The loss profiles in Fig. 5(B) provide further insight on the training dynamics that lead to this significant discrepancy in noise-free testing reconstructions. In the 0% noise training cycle of Fig. 5(B), we see fairly standard behavior across the training and noise-free testing traces for the DeepONet and ChemKAN, with all quantities steadily decreasing for the entire duration of training. This is the expected result, as we have sized the DeepONet based on Fig. 4 to avoid any overfitting in the noise-free case. As we increase the amount of noise to 2%, the training loss values are heavily penalized (due to the noisy data used in the computation of eqn (18)), while the noise-free testing values are slightly penalized but remain comparatively strong, indicating that both approaches are at least to a certain extent able to extract the true underlying model from the noisy data. While the training loss continues to drop quickly and then plateau in all four subplots, we see in the 7% noise case of Fig. 5(B) that the DeepONet noise-free testing loss dynamics begin to suffer, with a minimum value near 5000 epochs and a slight upward trend toward later epochs, likely due to overfitting. In the 15% noise case of Fig. 5(B), this issue is further exacerbated, with an early minimum near 1000 epochs followed by significant overfitting to the noisy data for the remainder of the training profile. The ChemKAN in both cases remarkably continues to drop its noise-free testing loss even while the DeepONet is overfitting, with late-epoch dynamics showing plateaued minimum values rather than the overfitting seen in the DeepONet. This echoes the behavior seen in Fig. 4, where the ChemKAN did not overfit and instead simply plateaued at its minimum training and testing errors.
Reconstructed training data profiles are shown for the 15% noise case in Fig. 6. While the DeepONet and ChemKAN are both roughly able to find the unseen, noise-masked ground truth profile, a close inspection reveals not only better ChemKAN fits but also notably jagged profiles from the DeepONet as it attempts to overfit the noise present in the training data. These results suggest that the dynamical system exploitation inherent to the ODE-based framework of ChemKANs (and KAN-ODEs in general) can mitigate or even entirely resolve previously observed issues32 regarding noisy data with KANs, and help to surpass the performance of standard DeepONets.
In summary, we have demonstrated ChemKANs as a promising tool for model discovery in temperature-dependent chemical kinetic systems, especially with realistically noisy datasets. They show promise not only on the scientific side of the problem, where they were demonstrated here to have significant capability compared to a standard tool in discovering models hidden under noisy data, but also on the machine learning side of the problem, where we have demonstrated that the neural ODE implementation of KAN-ODEs and ChemKANs helps them to overcome the noisy data limitation recently shown in vanilla KANs.32
We begin in Fig. 7 with a demonstration of the ChemKAN homogeneous reactor reconstructions for one training case (left column, Φ = 0.9 and T0 = 1050 K) and the unseen testing case (right column, Φ = 1.3 and T0 = 1150 K), after the two-stage training process outlined in Section 2.4.2. These reconstructions were generated entirely by the ChemKAN, given only the initial conditions. Overall the learned model successfully predicted the temperature and mass fractions in both cases, with no notable deterioration in the testing case (as expected from the strong testing results and robustness to overfitting observed in the previous biodiesel investigation). We further emphasize that the ChemKAN was largely able to capture the behavior of the low-concentration and highly reactive species YH2O2, YH, and YHO2 (Fig. 7(C) and (F)) that were neglected in ChemNODE.17
A broader comparison is shown in Fig. 8(A), where the MSE (in the normalized u units) is plotted across not only the initial set of 35 training and 1 testing initial conditions, but a wider set of 441 total initial conditions (406 of which were unseen during training) at a finer resolution in the same range. The low-temperature initial conditions see near-perfect reconstructions, as the ignition delays there are larger than the studied time window, leading to smooth gradients and easily trainable, near-isothermal behavior. In the remainder of the domain, strong performance is seen at all training conditions, and additionally at the single testing condition plotted in Fig. 7. In terms of generalization to intermediate temperatures and equivalence ratios, the ChemKAN performed very well throughout the vast majority of the domain, with many testing points even surpassing the accuracy of nearby training points at and above 1050 K. We notice that toward the slower-igniting cases, however, the ChemKAN struggled more with generalization. At 1000 K, for example, all six training points saw strong MSE values in the 10−4 range, although a few of the intermediate equivalence ratios suffered. At 987.5 K, however (one tick below 1000 K), all testing points saw poorer performance in the 10−3 range. We can conclude from these results that the ChemKAN retains its strong capability to generalize in the more challenging H2–air combustion case (as was originally reported in the biodiesel modeling case), but with practical limits in the colder, more temperature-sensitive ignition cases. While the ChemKAN's ability to accurately learn the six studied training points at 1000 K suggests that it is capable of tracking ignition behavior through the cooler regions, its relatively poor performance elsewhere in the initiation-sensitive regime suggests that a non-uniform training grid with denser sampling toward such cooler regions is needed to fully resolve this behavior and provide more accurate results when applied in combustion CFD simulations that require accurate ignition behavior.
![]() | ||
| Fig. 8 Evaluation of the proposed ChemKAN framework for various conditions. (A) ChemKAN reconstruction error at 35 training initial conditions (navy crosses), single testing initial condition plotted in Fig. 7 (teal dot), and 405 additional testing locations between the initial 36. (B) Actual and ChemKAN-predicted ignition delay times (green pentagons and blue triangles, respectively), as a function of equivalence ratio, at different initial temperatures (1000 K, 1050 K, 1100 K, 1150 K, and 1200 K, from top to bottom). | ||
We finally plot the actual and ChemKAN-predicted ignition delays in Fig. 8(B), for the 30 studied cases that ignited (the lowest temperature cases did not see ignition given the time span of 0.6 ms). Ignition here is defined as the point of maximum temperature rise rate.17 Accuracy is strong across the board, even in the testing case.
This collection of results shows that the ChemKAN structure was able to accurately learn the dynamics of all nine species and temperature scalars across the same set of initial reactor conditions as was studied using traditional MLP-based Neural ODEs in ChemNODE.17 Compared to the six species plus temperature scalars learned there via seven unique MLP networks with 91 parameters each (according to standard MLP parameterizations, 637 total parameters), the current ChemKAN was able to learn the complete set of thermochemical scalars (nine species plus temperature) using a single, 344-parameter network. While training took place in two stages to decouple the kinetic and thermodynamic behavior and facilitate convergence, the final network remains a single cohesive structure with shared information across all nodes, eliminating the redundancies in repeated yet isolated 91-parameter MLPs.
Finally, regarding computational efficiency, we report that the average time to solve all 36 homogeneous reactor conditions in the Arrhenius.jl combustion solver package52 was 2× faster when switching from the detailed chemistry to the reduced ChemKAN framework. The total number of time steps in the integration process remains largely unchanged, as the ChemKAN is solving the same full-dimensional thermochemical state. The 2× speedup is predominantly achieved through faster gradient computation (i.e., each time step is faster to compute), facilitated by the ChemKAN's compression of 29 reactions into just three hidden nodes and a handful of sparse activations. By learning the relationship between the current thermochemical state and the chemical source terms, the ChemKAN is capable not only of predicting ignition delay times and homogeneous reactor solution profiles 2× faster than the detailed model, but also of generalizing to other simulation conditions when coupled to flow solvers, including simple laminar flames and complex 2-D and 3-D turbulent combustion conditions. Such downstream uses of similar surrogate machine learning models were discussed and tested in previously,16,17 where a 2× speedup in the chemical solver (which is often the most computationally expensive component in a reacting flow simulation) implies the potential for substantial acceleration unlocked by ChemKANs while retaining the full-sized, detailed solution state vector. While slightly slower than the 2.3× speedup reported in ChemNODE,17 we reiterate that the ChemKAN solves for an additional three minor species (including the key H radical). A summarized comparison of ChemKAN and ChemNODE is provided in Table 1.
| # of nets | # of params | Species modeled | Speed-up vs. true model | |
|---|---|---|---|---|
| ChemNODE17 | 7 | 637 | H2, O2, H2 O, N2, O, OH | 2.3× |
| ChemKAN (our work) | 1 | 344 | H2, O2, H2 O, N2, O, OH, H, HO2, H2O2 | 2.0× |
We have additionally in this section compared a baseline ChemKAN implementation against a baseline ChemNODE implementation. Later works exist that appear to successfully combine ChemNODE with augmented loss functions, autoencoders, and latent space time stepping. The most recent, “Phy-ChemNODE”, includes all of these techniques.44 We do not draw comparisons between the current ChemKAN implementation and the larger-scale, combined-methodology results reported there, as our current aim is to compare the pure performance of ChemKAN against the MLPs that underlie both ChemNODEs and Phy-ChemNODEs. All further augmentations carried out in Phy-ChemNODE that go beyond this baseline can be replicated with ChemKANs, and an interesting target of future studies may be to quantify the performance gains of ChemKAN when applied in tandem with other advanced neural network structures.
In a second case, ChemKANs were demonstrated as efficient acceleration surrogates for learning chemical source terms in a hydrogen combustion case. A two-stage training process for the kinetic core and thermodynamic superstructure enabled a single, 344-parameter ChemKAN to accurately learn complete solution profiles across a range of hydrogen-air homogeneous reactor initial conditions, a significant reduction in parameter and network bulk compared to previous MLP-based neural ODE approaches that required 637 parameters to learn a truncated set of solution profiles. Timing comparisons against the detailed mechanism revealed a 2× speedup when using the ChemKAN surrogate model, which is significant for downstream applications of the hydrogen combustion surrogate learned here (for example, 3-D turbulent reacting flow). In summary, we find that ChemKANs are a promising tool for both dynamical system modeling and acceleration tasks in combustion chemistry. In doing so, we have also successfully advanced the underlying KAN-ODE framework to much larger, practical systems than had been studied previously. We hope that these promising preliminary case studies motivate future implementation of ChemKAN layers and modules in combustion and chemical kinetic machine learning applications.
Footnote |
| † B. C. K. and S. K. contributed equally to this work. |
| This journal is © the Owner Societies 2025 |