 Open Access Article
 Open Access Article
Sebastian Knoll *a, 
Klara Silberbc, 
Christopher A. Hone
*a, 
Klara Silberbc, 
Christopher A. Hone bc, 
C. Oliver Kappe
bc, 
C. Oliver Kappe bc, 
Martin Steinbergera and 
Martin Horna
bc, 
Martin Steinbergera and 
Martin Horna
aInstitute of Automation and Control, Graz University of Technology, Inffeldgasse 21b, 8010 Graz, Austria. E-mail: martin.horn@tugraz.at
bCenter for Continuous Synthesis and Processing (CCFLOW), Research Center Pharmaceutical Engineering GmbH (RCPE), Inffeldgasse 13, 8010 Graz, Austria. E-mail: christopher.hone@rcpe.at
cInstitute of Chemistry, University of Graz, NAWI Graz, Heinrichstrasse 28, 8010 Graz, Austria
First published on 5th September 2025
This paper introduces the neural tanks-in-series (NTiS) model, an extension of the traditional tanks-in-series (TiS) model using physics-guided neural networks (PGNNs). The NTiS model integrates physical principles with data-driven approaches to improve the accuracy and reliability of flow reactor modeling. The NTiS can optimize physical parameters and learn unmodeled dynamics while ensuring physically feasible predictions, even for out-of-domain predictions. The approach is validated using simulations and experimental data from a Paal–Knorr pyrrole reaction, demonstrating its capability to model flow reactor systems under varying conditions. The NTiS framework offers a new, robust, and flexible tool for advancing chemical flow reactor modeling.
Thereby, traditional modeling approaches, such as the tanks-in-series (TiS) model, the axial dispersion (AD) model but also computational fluid dynamics (CFD) have provided valuable insights into complex chemical processes.11 For instance, Sheng et al. explored the use of the TiS model to analyze and characterize non-ideal flow patterns. By applying the TiS model, the authors investigated various flow regimes and their impact on the residence time distribution (RTD) of a molten metal.12 Romero-Gomez et al. investigated the AD in a pressurized pipe under various flow conditions. The study focuses on understanding how different flow regimes affect the dispersion of solutes in the pipe. Using experimental data and computational models, the authors analyze the impact of flow velocity, pressure, and pipe geometry on the axial dispersion coefficient.13
However, these methods often require extensive computational resources or detailed knowledge of the underlying physical and chemical phenomena. In recent years, machine learning (ML) has emerged as a powerful complement to traditional modeling approaches in process modeling.14 Beyond neural networks, other surrogate modeling techniques such as Gaussian process regression,15,16 support vector machines17,18 and polynomial chaos expansions19 have been explored to approximate complex process behavior. By leveraging large datasets and advanced algorithms, these data-driven approaches can uncover patterns and relationships not easily captured by conventional models. Such methods hold great promise for accelerating innovation, reducing costs and improving the overall performance of chemical processes.20 This variety of surrogate models also introduces challenges in model training and generalization, which will be discussed in the following section.
Despite the advantages of ML for advanced chemical process modeling, many surrogate modeling techniques, including neural networks, can face challenges related to data availability. Neural networks in particular often require substantial amounts of data to achieve accurate results. To address this limitation, new methods such as physics-informed neural networks (PINNs) and physics-guided neural networks (PGNNs) have been developed.
In the case of PINNs, known physical relationships such as partial differential equations (PDEs) are incorporated into the neural network training by embedding them directly into the loss function as penalty terms. During training, the network not only minimizes the error between predictions and data but also strives to satisfy the PDE constraints. As a result, the model is guided towards more physically consistent and meaningful predictions. This approach enables the use of fewer data compared to traditional data-driven methods while maintaining high prediction accuracy.21 Moreover, it is possible to maintain parameters of the incorporated physical equations from the training of the PINN.
Thus, PINNs are extensively researched and attract significant interest. For instance, Ngo et al. leverage PINNs to incorporate physical laws directly into the training process of a neural network, enhancing the accuracy and reliability of a fixed-bed reactor model for catalytic CO2 methanation. The paper demonstrates that PINNs can effectively solve the governing equations of the reactor model and identify key parameters with high precision, even with limited data.22 Moreover, Zheng et al. developed a physics-informed recurrent neural network (PIRNN) that combines data and mechanistic models for nonlinear systems. They proved a generalization error bound and integrated it into a Lyapunov-based predictive control tested on a noisy chemical reactor. The hybrid model improved noise rejection and generalization over purely data-driven or physics-based controllers.23 Trávníková et al. present an approach using PINNs for modeling fluid dynamics in stirred tanks. By incorporating the physical laws into the training of the neural network, the models achieve higher accuracy and reliability in predicting flow patterns. The paper demonstrates the effectiveness of this approach through various simulations and experimental validations.24 Noteworthy is also the work of Alhajeri et al., who addressed overfitting in neural networks for chemical process modeling with noisy data. Using partially connected RNNs, they examined Gaussian and non-Gaussian noise and applied Monte Carlo dropout and co-teaching within a PINN framework. Tested on a large-scale Aspen Plus process, their method improved prediction accuracy and closed-loop control under a Lyapunov-based MPC.25 Wang and Wu proposed a foundation model for chemical reactor modeling that combines meta-learning with physics-informed fine-tuning. Pretrained on diverse reactor dynamics, it adapts quickly to new processes with minimal data while maintaining physical consistency. The model outperformed conventional approaches across multiple reactor types.26
Despite their advantages, PINNs and also traditional ML approaches often struggle with extrapolation beyond their training data. This limitation means that predictions for unseen data cannot always be guaranteed to be feasible or accurate and their out-of-domain behavior is in general not well understood.27,28
To overcome this limitation, PGNNs were developed. PGNNs integrate physical knowledge such as governing equations, constraints, or domain specific relationships directly into the neural network architecture and training process. This integration can take several forms, for example by adding extra input features derived from physical variables, by embedding physics based equations or constraints into the loss function, or by designing network structures that reflect known physical laws. In this way, PGNNs guide the learning toward physically consistent solutions, which allows training with less data than traditional neural networks. This approach also improves the ability of the model to generalize and extrapolate to unseen conditions. Due to these advantages, PGNNs are increasingly applied to enhance modeling and prediction in the chemical industry.
For instance, Gallup et al. focus on PGNNs to improve hybrid process modeling. Their developed framework simplifies PGNN techniques, speeding training and reducing data needs on reactor simulations. Physics guided loss functions and initialization enhance accuracy and consistency while transfer learning boosts convergence though misuse may hurt generalizability.29 Alongside, Muralidhar et al. presented PhyFlow, a novel physics-guided deep learning framework designed to generate interpretable 3D flow fields. By incorporating physical laws directly into the neural network architecture and emulating the projection method commonly used in CFD simulations, PhyFlow ensures that the generated flow fields comply with fundamental principles such as conservation of mass and momentum. The integration not only enhances the accuracy of the predictions but also provides a level of interpretability that is often lacking in purely data-driven models. The authors demonstrated the effectiveness of PhyFlow through various experiments, showing significant improvements in both accuracy and physical consistency compared to traditional deep learning approaches. The study highlights the potential of PGNNs in advancing the field of fluid dynamics modeling, particularly in scenarios where data is sparse or noisy.30 Additionally, Panjapornpon et al. introduced a PGNN model to enhance the prediction accuracy of acid–base treatments in dynamic tubular reactors. By integrating fundamental physical variables, such as residence time and hydroxide ion concentration, derived from reaction schematics and batch experimental data, the model effectively addressed challenges of high nonlinearity and limited data availability. The resulting PGNN demonstrated the potential for achieving high prediction accuracy.31 Moreover, Kircher et al. embedded physical knowledge into neural ODEs to learn reaction kinetics from integral reactor data. Building on their global reaction neural networks, they improved kinetic model discovery over standard neural ODEs. Applied to industrial reactors, including CO oxidation in hydrogen-rich streams, their method handles stiff systems and uses rich data for accurate kinetics.32
While PGNNs have demonstrated significant advancements in the chemical industry, their application to traditional models like the TiS or AD models remains limited. There is research conducted to improve those traditional models like Martin-Dominguez et al. who presents an improved version of the traditional TiS model by incorporating additional parameters to better capture the complexities of tracer tests in various flow systems. Besides the number of tanks, they also introduce a dead-space fraction and a bypassing fraction to account for stagnant zones and zones, where the flow circumvent the intended treatment. By refining the TiS model, the study aims to provide more accurate interpretations of tracer test data.33 Moreover, Shahin et al. extend the traditional TiS model to capture the dynamic behavior of solid oxide fuel cells (SOFC) with direct internal reforming. The study emphasizes the importance of modeling the interactions between reforming processes and electrochemical reactions within the TiS framework. Key findings demonstrate that the extended TiS model effectively predicts the SOFC system's transient responses and can be used to optimize its performance under varying conditions.34 Dutta et al. present a dynamic TiS model to simulate gas–liquid interactions in a trickle bed reactor designed for gas fermentation. Their extension incorporates dynamic behavior by introducing time-dependent variables and parameters to capture transient states. The model accounts for gas holdup, liquid holdup, and mass transfer between phases, providing a more accurate representation of the reactor's performance under varying operational conditions. The enhanced TiS model offers improved predictions for gas–liquid interactions, aiding in the optimization and scale-up of gas fermentation processes.35
Despite advancements in the TiS model, current approaches still struggle to accurately capture complex, nonlinear reactor dynamics, particularly under varying operating conditions or when extrapolating beyond training data. Additionally, purely data-driven models often lack physical consistency, limiting their reliability for process optimization and control. Combining PGNNs with the TiS model offers a powerful solution to these challenges. The TiS model is well-studied, well-established, and relatively simple, making it a robust foundation for such an extension. To the best of our knowledge, no PGNN-enhanced TiS models have been developed so far. Therefore, we propose a PGNN-enhanced TiS model that improves prediction accuracy by learning complex reactor behaviors while enforcing physical constraints to ensure physically feasible and robust predictions even in out-of-domain scenarios. This integration addresses key limitations of conventional TiS models and provides a more reliable, generalizable tool for advanced flow reactor modeling.
The paper is structured as follows: first, one can find the theoretical background on the traditional TiS model, the concepts of PGNNs and the training process of neural networks. Next, we introduce our neural tanks-in-series (NTiS) approach, detailing the necessary insights and derivations. This is followed by simulations to validate the proposed approach. Next, the NTiS approach is applied to experimental data in a case study. Herein, we demonstrate the applicability to real-world scenarios for enhancing the modeling and prediction of flow reactor systems. We show that the NTiS approach delivers refined predictions and can extrapolate to unseen data while ensuring that predictions remain within physically feasible bounds.
In Fig. 1 a schematic overview of the TiS model is shown. As one can see, a flow reactor is split in N perfectly mixed tanks whereby an inflowing species is propagating through each tank. The concentration of the i-th species within the j-th tank is represented by Cji(t) whereby t denotes the time.
The change of the i-th concentration within the j-th tank can be described by
|  | (1) | 
In the Laplace domain, the relationship between the concentration of the i-th species in the j-th tank and the (j − 1)-th tank is characterized by the transfer function
|  | (2) | 
In the equation, s denotes the Laplace variable, and τ represents the time constant of the low-pass filter for a constant flow rate q(t) = q, as defined by
|  | (3) | 
For the first tank, the inflowing concentration is the concentration flowing into the reactor system and thus C0i(t) = Cini(t).
The time constant τ has a significant influence on the behavior of how a species is flowing from one tank to the next. Thus, the time constant τ also affects, how the species is flowing through the entire reactor system. In Fig. 2 the step response of Hτ(s) for different time constants τ is depicted. The figure represents, how a species is flowing from one tank into the next when the concentration Cj−1i(t) is changed from 0 mol L−1 to 0.1 mol L−1 at time t = 0. This step is denoted with Cj−1i(t) = 0.1σ(t) mol L−1.
|  | ||
| Fig. 2 Step response of the system Hτ(s) describing the relation of the concentration Cj−1i(t) in the preceding tank j − 1 to the concentration Cji(t) in tank j for the i-th species. | ||
If the reaction rate rji is not neglected, the model accounts for additional concentration changes caused by chemical reactions within each tank. This introduces nonlinearities and a time-dependent behavior, as reaction rates are influenced by temperature and species concentrations. Incorporating these reaction terms transforms the system into a differential equation that captures both transport and reaction dynamics, as seen in eqn (1).
For a fully connected neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) (z,θ) with the input z = md, NL layers, and the form
(z,θ) with the input z = md, NL layers, and the form
| ![[scr N, script letter N]](https://www.rsc.org/images/entities/i_char_e52d.gif) (z,θ) = hNL(WNL−1hNL−1(…W1h1(W0z + b0) + b1…) + bNL−1), | (4) | 
Before one can train the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) (z,θ), the structure of the network such as the number of layers NL, the activation functions hp and the number of neurons within each layer must be defined. A neuron is a node in the neural network that processes the input signals from the previous layer, applies the activation function, and produces an output which is forwarded to the next layer. After defining the structure, the parameters of the neural network
(z,θ), the structure of the network such as the number of layers NL, the activation functions hp and the number of neurons within each layer must be defined. A neuron is a node in the neural network that processes the input signals from the previous layer, applies the activation function, and produces an output which is forwarded to the next layer. After defining the structure, the parameters of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) (z,θ) need to be initialized. It is common, to choose random values for the initial weights Wp and to set the initial bias values bp to zero for all p ∈ {0, 1,…, NL − 1}.
(z,θ) need to be initialized. It is common, to choose random values for the initial weights Wp and to set the initial bias values bp to zero for all p ∈ {0, 1,…, NL − 1}.
After having defined the structure of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) and having initialized the parameters, one can train the neural network for the given set of input–output-samples. To do so, one can define the prediction loss
 and having initialized the parameters, one can train the neural network for the given set of input–output-samples. To do so, one can define the prediction loss
|  | (5) | 
![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) (md,θ) and the real value nd for each sample d and the current network parameters θ. The regularization term mitigates overfitting by penalizing large weights Wp and bias bp values. The loss
(md,θ) and the real value nd for each sample d and the current network parameters θ. The regularization term mitigates overfitting by penalizing large weights Wp and bias bp values. The loss ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ) quantifies how well the network is performing on the current data.
(θ) quantifies how well the network is performing on the current data.
In a so called forward pass the loss ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ) for the current network parameters θ and all the given input–output-samples are calculated. For the optimization of the network parameters θ one usually makes use of a gradient based optimization such as the Adam optimization algorithm,36 stochastic gradient descent, or RMSprop.37 Thus, the gradient of the loss function
(θ) for the current network parameters θ and all the given input–output-samples are calculated. For the optimization of the network parameters θ one usually makes use of a gradient based optimization such as the Adam optimization algorithm,36 stochastic gradient descent, or RMSprop.37 Thus, the gradient of the loss function ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ) with respect to each weight Wp and bias value bp has to be determined. To determine those gradients, we perform a backward pass in which we propagate the gradients backward through the network to identify how each parameter contributed to the loss
(θ) with respect to each weight Wp and bias value bp has to be determined. To determine those gradients, we perform a backward pass in which we propagate the gradients backward through the network to identify how each parameter contributed to the loss ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ). After determining the gradients, one can update the network parameters using
(θ). After determining the gradients, one can update the network parameters using
| ![[small theta, Greek, macron]](https://www.rsc.org/images/entities/b_i_char_e0c9.gif) = θ − η∇ ![[script L]](https://www.rsc.org/images/entities/i_char_e144.gif) (θ). | (6) | 
Herein, η represents the learning rate, θ the current network parameters, ![[small theta, Greek, macron]](https://www.rsc.org/images/entities/b_i_char_e0c9.gif) the updated network parameters, and ∇
 the updated network parameters, and ∇![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ) the gradient of the loss function with respect to the corresponding network parameters. The learning rate η is a crucial hyperparameter in the optimization process, determining the step size for updating the network parameters θ. A small learning rate η ensures stable convergence but may slow down the training, while a large learning rate η can speed up training but risks overshooting the optimal solution or introducing convergence issues.
(θ) the gradient of the loss function with respect to the corresponding network parameters. The learning rate η is a crucial hyperparameter in the optimization process, determining the step size for updating the network parameters θ. A small learning rate η ensures stable convergence but may slow down the training, while a large learning rate η can speed up training but risks overshooting the optimal solution or introducing convergence issues.
In addition to the network parameters θ, which are learned during training, there are hyperparameters, variables that are set prior to training and remain fixed throughout the learning process. Examples of hyperparameters include the number of layers NL, the activation functions hp, and the learning rate η. These hyperparameters have a significant impact on training dynamics and model performance. Therefore, while network parameters are optimized through training algorithms, hyperparameters are typically chosen empirically or found through separate tuning procedures or optimization methods.
A PGNN can be realized using different structures. In Fig. 3 three common structures of PGNNs are depicted. The structures are very similar as Daw et al. described in their paper using PGNN to model the temperature in a lake.38 In each structure, the input X is used to form a prediction Ỹ of the corresponding real output Y. To form this prediction, a neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) and a physical model
 and a physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) is used in combination. In the figure, the first structure (structure a) shows a “hybrid input model”. For this structure, the input of the neural network
 is used in combination. In the figure, the first structure (structure a) shows a “hybrid input model”. For this structure, the input of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) is the input X which is extended by the corresponding output of the physical model
 is the input X which is extended by the corresponding output of the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) . The output of the neural network
. The output of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) is the predicted overall output Ỹ. The second structure (structure b) shows a “residual model” for which the neural network
 is the predicted overall output Ỹ. The second structure (structure b) shows a “residual model” for which the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) predicts only residuals which cannot be covered by the physical model
 predicts only residuals which cannot be covered by the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) . This might be due to the fact, that the physical model
. This might be due to the fact, that the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) is not accurate enough and not all real-world effects are modeled. Thus the predicted output Ỹ is the sum of the predicted output of the physical model
 is not accurate enough and not all real-world effects are modeled. Thus the predicted output Ỹ is the sum of the predicted output of the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) and the neural network
 and the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) . The third structure (structure c) shows a “hybrid input residual model” which is a combination of the two previous structures. The input of the neural network
. The third structure (structure c) shows a “hybrid input residual model” which is a combination of the two previous structures. The input of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) is the input X in combination with the predicted output of the physical model
 is the input X in combination with the predicted output of the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) . The neural network
. The neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) predicts only residuals, which the physical model
 predicts only residuals, which the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) cannot cover. Thus, the output of the neural network
 cannot cover. Thus, the output of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) is added to the predicted output of a physical model
 is added to the predicted output of a physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) to form the final prediction Ỹ.
 to form the final prediction Ỹ.
In all the structures, the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) typically relies on physical parameters, collectively denoted as Δ. Since these parameters may be unknown or only partially accurate, they can be estimated or refined by the PGNN. This refinement process is illustrated in Fig. 3 with a dashed arrow. During training, the PGNN simultaneously trains the neural network and the physical parameters Δ. By incorporating appropriate loss terms and constraints in the training process, it is ensured, that the refined parameters Δ remain within physically feasible bounds. Consequently, the refined parameters Δ can be reliably used for out-of-domain predictions, maintaining the physical validity of the outputs coming from the physical model
 typically relies on physical parameters, collectively denoted as Δ. Since these parameters may be unknown or only partially accurate, they can be estimated or refined by the PGNN. This refinement process is illustrated in Fig. 3 with a dashed arrow. During training, the PGNN simultaneously trains the neural network and the physical parameters Δ. By incorporating appropriate loss terms and constraints in the training process, it is ensured, that the refined parameters Δ remain within physically feasible bounds. Consequently, the refined parameters Δ can be reliably used for out-of-domain predictions, maintaining the physical validity of the outputs coming from the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) .
.
![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j and a neural network
j and a neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j. The physical model
j. The physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j retains the functionality of the traditional model, accounting for delay effects and reactions within a tank. Meanwhile, the neural network
j retains the functionality of the traditional model, accounting for delay effects and reactions within a tank. Meanwhile, the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j learns unmodeled dynamics and residuals Rjk, which are combined with the physical model's output before progressing to the next NT. Furthermore, the physical parameters Δj for the physical model
j learns unmodeled dynamics and residuals Rjk, which are combined with the physical model's output before progressing to the next NT. Furthermore, the physical parameters Δj for the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j are optimized within each NT, enhancing the overall modeling accuracy. To optimize the physical parameters Δj, the output layer of the neural network
j are optimized within each NT, enhancing the overall modeling accuracy. To optimize the physical parameters Δj, the output layer of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j is extended with additional outputs corresponding to these parameters. Each of these outputs is connected solely to additional bias values, with a linear activation function applied at the final layer. This setup ensures that each additional bias value directly represents a physical parameter. If it is desirable for the physical parameters Δj to vary based on the input, the output vector of the neural network can be extended with full connections to the preceding layers. This allows each NT j to have its own estimation of the physical parameters Δjk for each time sample k. In any case, the physical parameters are included in the set θ and are optimized during the training process of the neural network
j is extended with additional outputs corresponding to these parameters. Each of these outputs is connected solely to additional bias values, with a linear activation function applied at the final layer. This setup ensures that each additional bias value directly represents a physical parameter. If it is desirable for the physical parameters Δj to vary based on the input, the output vector of the neural network can be extended with full connections to the preceding layers. This allows each NT j to have its own estimation of the physical parameters Δjk for each time sample k. In any case, the physical parameters are included in the set θ and are optimized during the training process of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j. Consequently, training the PGNN is equivalent to training all the neural networks
j. Consequently, training the PGNN is equivalent to training all the neural networks ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j. In the figure, dashed arrows illustrate this approach, connecting the neural networks
j. In the figure, dashed arrows illustrate this approach, connecting the neural networks ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j to the physical model
j to the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j.
j.
As discussed in the theoretical background for PGNN (section 2.3), all weights, including the physical parameters, are optimized simultaneously during training. During training, the predictions for the physical parameters Δi are constrained by an additional loss term, ensuring that each parameter remains within physically feasible bounds.
In most reactor systems, the physical effects of the reactor remains consistent throughout its entirety. Therefore, we assume that each NT exhibits identical physical effects. Consequently, one can state that ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j =
j = ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) for all j ∈ {1,…, N}, and the estimates for the physical parameters Δj should be uniform across all NTs j. However, this assumption can generally be relaxed, allowing our NTiS approach to accommodate varying physical parameters within each NT. In combination with the fact, that all neural networks
 for all j ∈ {1,…, N}, and the estimates for the physical parameters Δj should be uniform across all NTs j. However, this assumption can generally be relaxed, allowing our NTiS approach to accommodate varying physical parameters within each NT. In combination with the fact, that all neural networks ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j are trained simultaneously, all neural networks
j are trained simultaneously, all neural networks ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) j can be combined into a single global neural network
j can be combined into a single global neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) . This further simplifies the structure and reduces the overall training complexity of the NTiS. The global neural network
. This further simplifies the structure and reduces the overall training complexity of the NTiS. The global neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) can be designed, to provide a single prediction for the physical parameters Δ and predictions for the individual residuals Rjk within each NT j. The predictions for the physical parameters Δ are shared across all NTs, while the individual residual estimates Rjk are associated with their respective NTs. These residual estimates are processed through a set of radial basis functions, ensuring that the residuals are forced to zero when predicting out-of-domain data. This approach guarantees that the predictions remain physically feasible, even for unseen data.
 can be designed, to provide a single prediction for the physical parameters Δ and predictions for the individual residuals Rjk within each NT j. The predictions for the physical parameters Δ are shared across all NTs, while the individual residual estimates Rjk are associated with their respective NTs. These residual estimates are processed through a set of radial basis functions, ensuring that the residuals are forced to zero when predicting out-of-domain data. This approach guarantees that the predictions remain physically feasible, even for unseen data.
In Fig. 5 the refined architecture of the NTiS along with a NT in detail is depicted. To improve the predictions of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) , previous input samples
, previous input samples  , up to a configurable number Nd, are provided as input. To facilitate this, a storage block was introduced before the neural network
, up to a configurable number Nd, are provided as input. To facilitate this, a storage block was introduced before the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) . Within the details of a NT, one can see, that the physical model
. Within the details of a NT, one can see, that the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j utilize the predictions of the physical parameters Δ. Before the prediction is passed to the next NT, the corresponding residuals Rjk are added. The NT depicted in the overview differs slightly from the NT in the schematic shown in Fig. 4, as the neural network is not integrated within the NT. Nevertheless, we will retain this terminology to differentiate it from a traditional tank of the tanks-in-series model. The final predicted output Ỹk of the NTiS is the last output XNk or parts of it. Consequently, the relation Ỹk = g(XNk) holds where g represents a function which maps the last output XNk to the target structure of the output samples Yk, which may correspond solely to the outflowing concentrations CNi,k of the final NT, and therefore, of the entire system.
j utilize the predictions of the physical parameters Δ. Before the prediction is passed to the next NT, the corresponding residuals Rjk are added. The NT depicted in the overview differs slightly from the NT in the schematic shown in Fig. 4, as the neural network is not integrated within the NT. Nevertheless, we will retain this terminology to differentiate it from a traditional tank of the tanks-in-series model. The final predicted output Ỹk of the NTiS is the last output XNk or parts of it. Consequently, the relation Ỹk = g(XNk) holds where g represents a function which maps the last output XNk to the target structure of the output samples Yk, which may correspond solely to the outflowing concentrations CNi,k of the final NT, and therefore, of the entire system.
When we assume, that there are P species flowing into the system with a total flow rate qk of the solute and a temperature ϑk, then the input Xk to the NTiS, can be defines as a combination of the inflowing concentrations C0i,k = C(in)i,k for all i ∈ {1, 2,…, P}, the total flow rate qk and the temperature ϑk. In general, the input to the j-th NT can be written as
|  | (7) | 
![[scr M, script letter M]](https://www.rsc.org/images/entities/h3_char_e145.gif) j
j![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j within each NT j captures the transport effects from one NT to the next and captures changes due to reactions for each concentration Ci(t). To account for all these effects, one can discretize eqn (1) using the forward Euler method and obtain
j within each NT j captures the transport effects from one NT to the next and captures changes due to reactions for each concentration Ci(t). To account for all these effects, one can discretize eqn (1) using the forward Euler method and obtain|  | (8) | 
 is representative for the time constant of the low-pass filter. In real-world scenarios, discrepancies may arise between the theoretical time constant τk and the effective time constant
 is representative for the time constant of the low-pass filter. In real-world scenarios, discrepancies may arise between the theoretical time constant τk and the effective time constant ![[small tau, Greek, macron]](https://www.rsc.org/images/entities/i_char_e0d4.gif) k due to unmodeled dynamics. To account for such discrepancies, an adjustment factor δτ, which will be part of the physical parameters Δ, is introduced such that
k due to unmodeled dynamics. To account for such discrepancies, an adjustment factor δτ, which will be part of the physical parameters Δ, is introduced such that ![[small tau, Greek, macron]](https://www.rsc.org/images/entities/i_char_e0d4.gif) k = δττk.
k = δττk.
For the reaction rate rji,k, in general, any law can be used. One might assume, that the reaction follows the concept of the Arrhenius equation, which relates the reaction rate to temperature and activation energy.39 The reaction rate forming species i in tank j, while consuming a set of species ![[scr R, script letter R]](https://www.rsc.org/images/entities/char_e531.gif) , is given by
, is given by
|  | (9) | 
In the equation, the parameters Ai and Ei are reaction parameters which are in general not precisely known. Thus, the NTiS can refine those parameters by incorporating offsets δAi and δEi which will also be part of the physical parameters Δ. The discretized reaction rate within the j-th NT can thus be rewritten to
|  | (10) | 
In the equation, Āi and Ēi denote the nominal reaction parameters, R the ideal gas constant and ϑk the temperature.
When assuming, that Xj−1k is the input and ![[X with combining circumflex]](https://www.rsc.org/images/entities/b_i_char_0058_0302.gif) jk the output to the physical model
jk the output to the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j, the model can be defined as
j, the model can be defined as
|  | (11) | 
Thereby, eqn (8) is applied to each concentration Cji,k for all i ∈ {1, 2,…, P} and the additional input parameters Ωj−1k are forwarded without any changes. In eqn (11), the individual reaction rates for each species are collected in the vector ξ and is thus of size P × 1.
When applying the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j to the input Xj−1k, we abbreviate it with the notation
j to the input Xj−1k, we abbreviate it with the notation
|  | (12) | 
![[scr T, script letter T]](https://www.rsc.org/images/entities/h3_char_e533.gif) and NTiS model
 and NTiS model ![[capital script C]](https://www.rsc.org/images/entities/h3_char_e522.gif)
![[scr T, script letter T]](https://www.rsc.org/images/entities/char_e533.gif) for a NT whereby we apply the physical model
 for a NT whereby we apply the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j to an input Xj−1k and, before passing the output to the next NT, the corresponding residuals Rjk predicted by the neural network
j to an input Xj−1k and, before passing the output to the next NT, the corresponding residuals Rjk predicted by the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) are added. Thus, the NT model can be defined as
 are added. Thus, the NT model can be defined as|  | (13) | 
When all residual terms Rjk are collected in the vector
|  | (14) | 
| Xjk = ![[scr T, script letter T]](https://www.rsc.org/images/entities/char_e533.gif) (Xj−1k, Δ, Γk). | (15) | 
Having defined an individual model for a NT, and assuming, that the measured output consists of the measured output concentrations like Yk = [C(out)1,k C(out)2,k … C(out)P,k]T, one can also define an overall model for the entire series of NTs as
|  | (16) | 
In the equation, the operator ∘N represents the N-fold repeated application of the function ![[scr T, script letter T]](https://www.rsc.org/images/entities/char_e533.gif) . The abbreviated notation for the repeated application of the NT model can be stated as
. The abbreviated notation for the repeated application of the NT model can be stated as
| Ỹk = ![[capital script C]](https://www.rsc.org/images/entities/i_char_e522.gif) (X0k, Δ, Γk). | (17) | 
![[scr N, script letter N]](https://www.rsc.org/images/entities/b_char_e52d.gif) . The neural network
. The neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) of the NTiS can be realized using various architectures, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or a fully connected neural network. The structure such as the number of hidden layers and neurons per layer can be customized by the user to suit the complexity of the underlying system.
 of the NTiS can be realized using various architectures, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or a fully connected neural network. The structure such as the number of hidden layers and neurons per layer can be customized by the user to suit the complexity of the underlying system.As seen in Fig. 5, for the input zk to the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) , we combine the current input to the NTiS X0k with previous inputs
, we combine the current input to the NTiS X0k with previous inputs  up to a definable number Nd. This approach can increase the prediction accuracy as the neural network
 up to a definable number Nd. This approach can increase the prediction accuracy as the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) gains knowledge of previous inputs to the system. The input zk to the neural network
 gains knowledge of previous inputs to the system. The input zk to the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) can thus be defined as
 can thus be defined as
|  | (18) | 
For the output yk = ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) (zk,θ), we want to have a combination of the physical parameters Δ and the residual terms Γk. The output of the neural network
(zk,θ), we want to have a combination of the physical parameters Δ and the residual terms Γk. The output of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) thus read as
 thus read as
|  | (19) | 
As described in section 3.1.1, the physical parameters Δ are obtained by extending the output layer of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) with an appropriate number of additional bias terms. This approach enables training the neural network
 with an appropriate number of additional bias terms. This approach enables training the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) while simultaneously optimizing the bias terms for the physical parameters Δ.
 while simultaneously optimizing the bias terms for the physical parameters Δ.
For the training of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) we make use of a gradient based optimization. Thereby, a defined loss
 we make use of a gradient based optimization. Thereby, a defined loss ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) is minimized to optimize all the weights Wp and bias values bp which are collectively denoted in the network parameters θ.
 is minimized to optimize all the weights Wp and bias values bp which are collectively denoted in the network parameters θ.
When we assume, that the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) has NL layers and that each time sample can be used for the training, one can define a cost
 has NL layers and that each time sample can be used for the training, one can define a cost ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ), similar as stated in eqn (5), as
(θ), similar as stated in eqn (5), as
|  | (20) | 
In the cost ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ), an additional loss term
(θ), an additional loss term ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) Δ(θ) is added to ensure, that the refined parameters Δ remain within physically feasible bounds. The cost term can be, for example, of a cost term similar to
Δ(θ) is added to ensure, that the refined parameters Δ remain within physically feasible bounds. The cost term can be, for example, of a cost term similar to
|  | (21) | 
For the training process, we have to calculate the gradient of the loss ![[script L]](https://www.rsc.org/images/entities/char_e144.gif) (θ) with respect to all the network parameters in θ. For the penalization term one can find, that
(θ) with respect to all the network parameters in θ. For the penalization term one can find, that
|  | (22) | 
|  | (23) | 
 and
 and  depend on the structure of the neural network
 depend on the structure of the neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) . The according derivations are calculated within the training process of the neural network itself. All derivations utilized in the implementation are detailed in Appendix A.
. The according derivations are calculated within the training process of the neural network itself. All derivations utilized in the implementation are detailed in Appendix A.
In Fig. 6, the structure of the NTiS implementation in Matlab is presented as a single comprehensive neural network. The neural network ![[scr N, script letter N]](https://www.rsc.org/images/entities/char_e52d.gif) features a custom parameter layer that predicts the physical parameters Δ. In parallel, a user-defined neural network, such as a fully connected neural network, predicts the residuals Γk. The prediction of the residuals Γk is processed through a layer of radial basis functions before being forwarded. The input to the NTiS in Matlab includes the current input X0k and previous inputs
 features a custom parameter layer that predicts the physical parameters Δ. In parallel, a user-defined neural network, such as a fully connected neural network, predicts the residuals Γk. The prediction of the residuals Γk is processed through a layer of radial basis functions before being forwarded. The input to the NTiS in Matlab includes the current input X0k and previous inputs  , up to a configurable number Nd. A reduction layer selects only the most recent input X0k, which is then combined with the physical parameters Δ and residual predictions Γk in a concatenation layer. This forms a unified input vector for the subsequent custom NTiS layers, ensuring that each NTiS layer has access to all relevant information while maintaining a single input–output structure. A single input–output structure simplifies implementation in Matlab, as it aligns well with its computational framework. Within a custom NTiS layer, the output concentrations Πjk are updated based on the input concentrations Πj−1k, additional information Ωjk, physical parameters Δ, and residual predictions Γk. Finally, a custom output layer ensures that only the final output concentrations ΠNk are forwarded.
, up to a configurable number Nd. A reduction layer selects only the most recent input X0k, which is then combined with the physical parameters Δ and residual predictions Γk in a concatenation layer. This forms a unified input vector for the subsequent custom NTiS layers, ensuring that each NTiS layer has access to all relevant information while maintaining a single input–output structure. A single input–output structure simplifies implementation in Matlab, as it aligns well with its computational framework. Within a custom NTiS layer, the output concentrations Πjk are updated based on the input concentrations Πj−1k, additional information Ωjk, physical parameters Δ, and residual predictions Γk. Finally, a custom output layer ensures that only the final output concentrations ΠNk are forwarded.
Alongside, we implemented the NTiS model. In contrast to the traditional TiS setup, we intentionally altered the NTiS parameters so that the resulting time constant τk and the reaction parameters A3 and E3 differed from those of the TiS model used as ground-truth. As a result, the set of physical parameters Δ in the NTiS comprises the adjustment factor δτ and the offset parameters δA3 and δE3. The latter allow refinement of the pre-exponential factor and activation energy, respectively, while δτ accounts for variations in the reactor parameters that cause discrepancies between the theoretical time constant τk and the effective time constant ![[small tau, Greek, macron]](https://www.rsc.org/images/entities/i_char_e0d4.gif) k.
k.
For the neural network of the NTiS, we used a shallow, fully connected neural network with one hidden layer consisting of 20 neurons. The hyperparameters, such as the number of neurons per layer and the activation function, were determined empirically through manual tuning. In general, it is possible to apply hyperparameter optimization techniques to systematically explore and optimize these structural aspects. The neural network used only the most recent time sample as its input allowing the NTiS model to be set up without the custom reduction layer. For the training of the NTiS, we used the simulated output data from the traditional TiS together with the corresponding inputs. These simulated outputs serve as a stand-in for potential measurements from a real flow reactor, effectively mimicking the actual system response to the given inputs.
During the training process, we showed that the physical adjustment parameters δτ, δA3, and δE3 were updated by the neural network such that the resulting time constant τk and the reaction parameters A3 and E3 closely matched those of the ground truth TiS setup. The adjustment parameter δτ and the resulting reaction parameters A3 and E3 for the ground truth TiS setup, the initial NTiS model, and the optimized NTiS model can be found in Table 1. The residuals of the NTiS within each NT remained nearly zero, as the predictions were already enhanced by adjusting the physical parameters.
| Model | A3 | E3 | δτ | 
|---|---|---|---|
| Ground truth TiS setup | 10.00 | 15 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 | 1.20 | 
| Optimized parameters of the NTiS model | 9.97 | 14 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 982 | 1.19 | 
| Initial parameters of the NTiS model | 12.00 | 13 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 | 1.00 | 
In Fig. 7, the traces for the temperature ϑ(t), the total flow rate q(t), and the input concentrations C1(t) and C2(t) are shown. The input concentration C3(t) was fixed at zero throughout the experiment. Additionally, the output concentration C3(t) of species 3, which is formed within the flow reactor, is depicted for three cases: the simulated output of the traditional TiS model and thus representing the ground truth, the initial prediction of the NTiS with parameters containing deviations, and the prediction of the NTiS after the training process. One can see, that the initial prediction of the NTiS shows a significant difference in the shape of the outflowing concentration C3(t) compared to the actual output. However, after training, the NTiS achieves a highly accurate estimate, demonstrating that its learning process effectively can predict the underlying physical parameters. This trend is also seen, when calculating the mean square error (MSE), denoted as ζ, across all three output predictions for both the initial and optimized predictions of the NTiS. The MSE ζ is defined as
|  | (24) | 
|  | ||
| Fig. 7 Reactor simulation showing input variables and output C3(t) for the traditional tanks-in-series model, the initial prediction of the NTiS, and the prediction of the trained NTiS. | ||
As seen, in the first simulation, the NTiS mainly focused on adjusting the physical parameters (δτ, δA3, and δE3) while keeping residuals exceptionally small. The minimal residuals confirm that the NTiS does not heavily depend on neural network corrections for parameter deviations. Instead, it achieves high accuracy by aligning its physical parameters with those of the real system which is represented by the traditional TiS model. This highlights the robustness of the NTiS framework, where the neural network component serves as a supplementary mechanism, activating only when necessary. The limited need for residual corrections shows that the NTiS captures the core system dynamics effectively through parameter adjustments, ensuring interpretability and precise predictions.
In Table 2, the adjustment parameter δτ and the resulting reaction parameters A3 and E3 are reported for the ground truth simulation with a flow-rate-dependent offset, as well as for the initial and optimized NTiS model. The results illustrate how the NTiS training successfully recovers the true physical parameters despite starting from altered initial values. Moreover, as shown in the corresponding figure, the offset dependent on the flow rate q(t) is accurately reconstructed by the NTiS, demonstrating its capability to capture systematic effects not explicitly modeled in the traditional TiS model. This improvement is also evident when comparing the mean squared error (MSE) across all three predictions: the initial NTiS prediction yields ζinitial = 1483 × 10−6, which is reduced to ζTrained = 17 × 10−6 after training, highlighting the effectiveness of the parameter adaptation and residual learning in the NTiS approach.
| Model | A3 | E3 | δτ | 
|---|---|---|---|
| Ground truth simulation (with offsets) | 10.00 | 15 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 | 1.20 | 
| Optimized parameters of the NTiS model | 9.89 | 15 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 072 | 1.18 | 
| Initial parameters of the NTiS model | 12.00 | 13 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 000 | 1.00 | 
As seen in the second simulation, the NTiS model demonstrated its ability to learn unmodeled dynamics, such as the flow-rate-dependent offset, while still adjusting the physical parameters. The model focused on updating δτ, δA3, and δE3, while also accounting for the flow-rate-dependent offset. The NTiS successfully captured the offset caused by variations in the flow rate q(t). The model's predictions closely matched the simulated output, showing its ability to adjust both physical parameters and unmodeled dynamics. This ability to incorporate both physical parameter updates and the flow-rate-dependent offset highlights the NTiS's robustness, as it can handle complex effects that the traditional TiS model cannot model directly.
Overall, the simulation results clearly demonstrate the effectiveness of the NTiS model in accurately predicting flow reactor outputs by jointly training both physical parameters and neural network weights. This hybrid learning approach enables the NTiS to leverage domain knowledge embedded in the TiS structure while flexibly capturing unmodeled dynamics through residual learning. By balancing updates to the physical parameters with targeted residual estimations, the NTiS can not only recover true underlying process parameters but also adapt to systematic deviations.
In the experiment, we applied different flow rates and temperatures to the flow reactor. The selected temperature ϑ(t) and total flow rate q(t) profiles, as well as the resulting input concentrations of the reactants (species 2 and 3), are shown in Fig. 10.
During the experiment, the output concentrations of the product and the remaining 2,5-hexanedione were measured and used to train a NTiS model. For the NTiS model we set the number of NTs to 50. Similar as for the simulations, the physical parameters Δ comprise the adjustment factor δτ, as well as the offset parameters δA3 and δE3. The nominal reaction parameters Ā4, Ē4 were selected to lie within a physically meaningful range, and the reactor volume V was adopted from the actual reactor configuration.
For the neural network of the NTiS, we employed a shallow architecture with one hidden layer containing 40 neurons. The network was trained only on the first 4.5 hours of data, while the remaining data beyond this point was kept completely unseen for testing. The hyperparameters of the network and the training process were determined empirically through manual tuning, and no hyperparameter optimization was performed.
Fig. 11 compares the measured output concentrations with the initial NTiS predictions obtained using the initial parameters. During training, the NTiS refined the physical parameters and learned residuals, capturing both modeled and unmodeled effects. Consequently, the trained NTiS predictions shown in the figure demonstrate excellent agreement with the measured data.
|  | ||
| Fig. 11 Predictions for the Paal–Knorr pyrrole reaction. Data to the left of the border was used to train the NTiS and the data-driven neural network while the data to the right are used for test. | ||
Additionally, predictions for out-of-domain data, where the residual contributions of the neural network within the NTiS were set to zero, are included and labeled as “PBM only”. Those results are equivalent to a traditional TiS model with optimized physical parameters. The figure shows that while the out-of-domain predictions are an improvement over the initial predictions, the predictions incorporating residuals are slightly more accurate. This indicates that the primary improvements stem from adjustments to the physical parameters, with the residuals contributing only a minor additional boost in accuracy for this experiment.
To further illustrate the behavior in out-of-domain scenarios, we trained a purely data-driven neural network with two hidden layers of 60 neurons each. The neural network was trained exclusively on the first 4.5 hours of data and subsequently applied to estimate the full time series of output concentrations. As shown in the figure, the data-driven neural network achieves excellent agreement within the training domain but exhibits an increasing offset once applied to out-of-domain data. In contrast, both the trained NTiS and the PBM-only predictions maintain physically consistent behavior and do not show such deviations. These results emphasize that while a purely data-driven model can fit in-domain data well, its lack of physical constraints can limit its reliability for extrapolation.
This improvement is further confirmed by the MSE analysis for each individual output shown in Fig. 12. The initial estimates yield the highest MSE, followed by the data-driven neural network and then the PBM-only case. The latter represents out-of-domain predictions or a traditional TiS model with optimized parameters. The trained NTiS consistently achieves the lowest MSE values.
The results of the case study highlight the improved performance of the NTiS compared to the traditional TiS. The NTiS ensures reliable out-of-domain predictions, which purely data-driven methods may not achieve, as illustrated by a neural network that showed deviations on unseen data.
Through simulations, we demonstrated that NTiS can refine physical parameters and learn residuals, enabling accurate modeling of complex flow reactor systems. The case study on the Paal–Knorr pyrrole reaction further confirmed its capability to predict real-world reactor behavior under varying conditions. NTiS not only enhances prediction accuracy but also provides reliable estimates when extrapolating beyond the training data.
It should be noted, however, that NTiS relies on the TiS structure for modeling flow reactors. If the assumed topology does not match the actual system, parameter identification may be affected, as the neural components may compensate for structural mismatches. Nevertheless, the NTiS concept can be extended to other reactor types by adapting the underlying physical model, creating opportunities for future studies.
Overall, NTiS provides a robust and flexible framework for enhancing flow reactor modeling. By merging mechanistic understanding with data-driven flexibility, it delivers accurate and physically consistent predictions under varying conditions and out-of-domain scenarios. This makes it particularly suited for flow chemistry applications, digital twins, and PAT-integrated process automation, supporting improved process optimization, real-time decision-making, and accelerated design and scale-up of chemical processes. Its combination of reliability, interpretability, and adaptability underscores its potential for broad impact in both research and industrial settings.
The entire implementation for this study, including the MATLAB implementation, experimental data, source code, example scripts, and documentation, is available from Zenodo. The repository data includes: the full MATLAB implementation of the NTiS concept with examples and the case study, example scripts for reproducing the results in this paper, and comprehensive documentation and user guides for the implementation and examples. See DOI: https://doi.org/10.5281/zenodo.15730195. Additionally, the MATLAB implementation is accessible on GitHub: https://github.com/SKenb/NeuralTanksInSeries/.
To ensure generality, we consider the possibility of time-varying physical parameters Δjk and allow each NT j to have its own unique set of parameters. To capture this variability, we use the subscript k to indicate time dependency and the superscript j to denote the specific NT.
Fig. 13 illustrates the generalized structure of an NT, whereby the inputs include Xj−1k, Δjk, and Rjk, while the output is Xjk. As an NT does not contain internal weights and biases, and when following the derivatives stated in eqn (23), one can find, that the focus is solely on determining
|  | (25) | 
|  | (26) | 
Thereby,
|  | (27) | 
|  | (28) | 
As the output Πjk of a NT depends only the previous input (Xj−1k−1) and not on the current input Xj−1k, the derivatives  and
 and  will be zero. This is because the current input Xj−1k does not directly influence Πjk. Finally, one can summarize that
 will be zero. This is because the current input Xj−1k does not directly influence Πjk. Finally, one can summarize that
|  | (29) | 
![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j with one reaction forming species i, the physical parameters Δjk can be defined as the combination of the delay adjustment δjτ,k, the offset for the pre-exponential factor
j with one reaction forming species i, the physical parameters Δjk can be defined as the combination of the delay adjustment δjτ,k, the offset for the pre-exponential factor  and the offset for the activation energy
 and the offset for the activation energy  like
 like|  | (30) | 
This is no general limitation, as the physical model ![[scr M, script letter M]](https://www.rsc.org/images/entities/char_e145.gif) j can include as many reactions as required. More reactions would require to extend the physical parameter vector Δ accordingly.
j can include as many reactions as required. More reactions would require to extend the physical parameter vector Δ accordingly.
For the derivative with respect to the physical parameters Δjk, one can find
|  | (31) | 
|  | (32) | 
For the three derivations, one can find
|  | (33) | 
When the input Δjk is identical across all time steps k and tanks j, the gradient calculation for each tank and time step,  , remains unchanged.
, remains unchanged.
Whenever the input Δjk = Δ is shared across all NTs, the overall gradient with respect to Δ is aggregated as the mean of all individual gradients:  , where N is the total number of tanks and time steps. This averaging ensures the collective contribution from all tanks and time steps is accounted for consistently, enabling balanced updates during optimization.
, where N is the total number of tanks and time steps. This averaging ensures the collective contribution from all tanks and time steps is accounted for consistently, enabling balanced updates during optimization.
|  | (34) | 
| This journal is © The Royal Society of Chemistry 2025 |