Krishna Gopal
Sharma
a,
Niket S.
Kaisare
b and
Himanshu
Goyal
*b
aDepartment of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
bDepartment of Chemical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India. E-mail: goyal@iitm.ac.in
First published on 17th November 2021
Detailed chemical kinetic models involving hundreds of species and thousands of reactions have recently been developed for biomass thermochemical conversion. The high computational cost of these kinetic models makes them impractical even for simple reactor geometries. In this work, we develop a recurrent neural network (RNN) model for the secondary gas-phase reactions of biomass gasification in an inert environment in the temperature range of 800–1000 °C. A gated recurrent unit (GRU) based RNN architecture is used to ensure accurate predictions over the entire range of time in the reactor. A compact kinetic model reduced from a detailed kinetic scheme using an automated reduction algorithm is employed as the reference kinetic scheme for the gas-phase reactions. A comprehensive range of biomass compositions and reactor conditions are used to generate the training data ensuring a wide range of the model applicability. The developed GRU-based RNN model can predict the temporal evolution of important reactants and products during biomass gasification in the freeboard region of a fluidized bed reactor. The model reduces the computational cost associated with the reference kinetic scheme by four orders of magnitude.
Models for the biomass thermochemical conversion range from detailed but computationally expensive computational fluid dynamics (CFD) simulations3 to computationally fast ideal reactors.4 The CFD simulations are necessary to understand the coupling between the reactor flow dynamics and chemical processes. However, CFD simulations become too expensive for large-scale applications, and ideal reactor models are preferred.4 In ideal reactor models, limits of transport processes are assumed, for example, complete mixing in a continuously stirred tank reactor (CSTR) and no-axial mixing in a plug flow reactor (PFR). In some cases, insights gained from CFD simulations are combined with ideal reactors.5 An adequate level of modeling is selected depending on the application and details required.
Biomass gasification takes place at higher temperatures (∼800–1000 °C), where the solid biomass produces intermediate or primary products, which are then converted to gases (e.g., CO and H2) and tar species (e.g., phenol and naphthalene). For biomass gasification modeling, a common assumption is to model the bubbling bed and freeboard regions of the fluidized bed as the CSTR and PFR, respectively.4–7 The accuracy of these models relies on adequately representing the conversion chemistry involving the volatile release from biomass (primary reactions) and the subsequent gas-phase evolution (secondary reactions). Recently developed detailed kinetic schemes,8–10 consisting of hundreds of species and thousands of reactions, allow the possibility of a comprehensive investigation of reactor operating conditions in biomass conversion. However, their usage is limited by the high computational cost and difficulty in solving the associated system of stiff ordinary differential equations (ODEs). For this reason, modeling studies in the literature utilize global kinetic schemes consisting of a few lumped species and reactions; however, these models cannot predict the product composition.11–13 In a few studies,8,14 automatic reduction algorithms, such as the directed relation graph (DRG)15 and the DRG with error propagation (DRGEP),16 are used to reduce the detailed kinetic models. Reduced kinetic models require less computational resources than detailed models, but they are still computationally expensive and need stiff ODE solvers.8 Nonetheless, high variability in biomass feedstock and the heterogeneous structure of biomass particles make the chemistry modeling a daunting task.9,17
The challenge of the high computational cost and complexity of using detailed or reduced kinetic models can partially be resolved using machine learning algorithms, which have recently found applications in numerous areas, including reaction engineering. Examples include reaction screening,18 reaction condition optimization,19 modeling fuel chemistry,20 and kinetic Monte Carlo models.21 Machine learning models require most of the computational effort a priori during the development stage and are computationally fast during their application. This aspect makes machine learning attractive in the context of biomass thermochemical conversion. Several investigations have focused on building machine learning models for biomass thermochemical conversion using experimental22–29 and simulation data.30 These investigations primarily employed artificial neural networks (ANN), decision tree, random forest, and support vector regression. Two significant limitations of these works are the inability to predict the temporal evolution of the product yield and composition and the applicability in a narrow range of parameters, especially for models based on experimental data. Machine learning models able to predict temporal variations in the reactant and product compositions under realistic reaction conditions are missing in the literature.
This work tackles the challenges mentioned above by combining DRGEP,16 an automatic reduction algorithm, with a recurrent neural network (RNN) to build a model for predicting the secondary gas-phase reactions of biomass devolatilization (primary) products in the freeboard region of a fluidized bed reactor. Since the freeboard region is modeled as a PFR, an RNN is employed due to its ability to store information in a sequence. The developed RNN model can predict the composition of 27 target species, consisting of major gaseous and tar products, in a PFR. These target species, the reactor temperature (between 800–1000 °C), and the residence time or length of the PFR constitute the model input. Our work demonstrates the efficacy of the combined application of automatic reduction algorithms and machine learning tools to develop computationally fast models for complex reaction systems, such as biomass gasification.
Fig. 1 Schematic of the biomass gasification process and the RNN model. The RNN models the dynamics of the freeboard region of the fluidized bed gasifier. The input to the RNN model consists of the reactor temperature and the mass fractions of 27 species provided in Table 1. |
The input to the RNN model consists of the reactor temperature and the mass fractions of the 27 chemical species, provided in Table 1, that are part of the reference kinetic scheme. The temporal evolution of these 27 species in a PFR is the output of the RNN model. For this work, the temperature range is 800–1000 °C, and the maximum residence time is five seconds, representative of biomass gasification conditions.
Parameter | Value |
---|---|
Reactor temperature | 800, 850, 900, 950, 1000 °C |
Heat transfer coefficient | 300, 1000, 3000 W m−2 K−1 |
Biomass particle diameters | 300 μm, 1 mm, 3 mm |
Moisture content | 0, 10, 20 kg water per kg dry biomass |
Solid heat capacity | 2300 J kg−1 K−1 |
Biomass bulk density | 650 kg m−3 |
Pressure | 1 atm |
Biomass composition | List31 provided in Table S1 in the ESI† |
In the second step, the primary products obtained in the first step undergo secondary gas-phase reactions to form light gases and tar species. These reactions are represented by the reference kinetic scheme8 and simulated in a PFR configuration at five temperatures in the range of 800–1000 °C. A time resolution of 50 ms is used to resolve the evolution of chemical species adequately. Thus, for a 5 s residence time in the PFR, we generate data at 100 time points, for each of the 27 species. The resulting data consists of about 11 million species compositions (∼0.4 million data points for each of the 27 species). The mass fractions of the 27 target species are normalized to a standard normal distribution using StandardScaler class of scikit-learn,35 ensuring that the data for all species has the same order of magnitude and is zero-mean. The data is divided into training, validation, and test sets in the ratio of 0.6:0.2:0.2. Normalization is performed on the training data, and the associated parameters (mean and standard deviation) are then used to normalize the validation and test sets. With the input–output variables being identified and training data being available, the next step is selecting an appropriate RNN structure.
We define an input sequence of length T as X = x1, x2⋯, xt, xt+1,⋯, xT for any general t ∈ {1,⋯, T}. The corresponding output at time t is represented as yt. Mathematically, a simple RNN model can be represented as:
(1) |
RNNs can be trained using the same backpropagation technique used in artificial neural networks and deep learning. The different steps in a sequence represent the residence timesteps in the PFR, along which the data for the 27 species is available. The technique called backpropagation through time41 (BPTT) involves backpropagation through these timesteps in a sequence while finding gradients of the parameters. It is similar to the usual backpropagation technique for finding gradients, apart from the fact that the gradient of the current timestep is calculated with respect to the previous timestep as well.
This work aims to predict the species concentrations along the length (residence time) of the reactor, starting from a given initial concentration. In general, an RNN model is adequate for sequences of short length; training an RNN model for long sequences is challenging. While backpropagating through time in long sequences, the training suffers from problems of exploding or vanishing gradients.42 In both situations, an RNN model fails to learn the model accurately. Commonly employed gradient descent is prone to fail to train an RNN model to learn long term dependencies in a long sequence. In this regard, long short term memory43 (LSTM) and gated recurrent units44 (GRU) are robust RNN architectures that handle some of the limitations of a feed forward RNN architecture and provide a way to address long term dependencies. LSTM and GRU use states to encode the sequence to handle the long term dependencies as valuable information is already saved in the states. Both GRU and LSTM based RNN models have comparable performance, but a GRU cell uses fewer parameters, requiring less memory and shorter training and execution time.45 We use GRU based RNN models in this work.
Fig. 2 The three RNN architectures used in this work: (a) model 1, (b) model 2, and (c) model 3. Dense layers with ReLU activation (shaded and hatched boxes) are added to improve model prediction. |
RNNs use unique embedding layers to encode the input more efficiently for tasks such as machine translation and speech recognition. We add a fully connected dense layer after the input layer (grey-shaded box in Fig. 2(b)) in the architecture to mimic such an embedding layer. The output space of the GRU layer is limited as the gates use tanh or sigmoid activation. Since the output layer with linear activation of model 1 might not decode the output efficiently, we add another fully connected layer before the output layer with ReLU activation to introduce non-linearity. The improved RNN model architecture (model 2) is schematically depicted in Fig. 2(b).
Training and testing of the model architecture will be discussed in the next section. As shall be discussed subsequently, we observed that both of these models showed larger errors, especially at the initial time-points. One of the reasons for this could be the lack of variability in the initial state of GRU. The model architectures model 1 and model 2 initialize the GRU state to a zero vector for all model inputs. To address this issue, we add a fully connected layer (orange-hatched dense layer of Fig. 2(c)) into the model, which is taken as the input layer and uses its output as the initial state for the GRU. Including the initial state training in the model architecture considerably improved the results for the first few milliseconds. The model architecture (model 3) with the initial state training is shown in Fig. 2(c).
We end this discussion on model architecture by noting that an important distinction from the traditional RNN architectures is using a repeat vector layer to pass the input layer (model 1) or the output of the dense layer used to encode it (models 2 and 3) to the GRU cell at each time step during training and testing. In the traditional RNN structure, the output at time step t − 1 is used to predict the output at time step t. For example, in machine translation tasks, the output of the RNN model is the probability distribution over the vocabulary of the target language, and the prediction is accurate so long as the output word with the maximum probability distribution matches the reference word. In contrast, since any deviation of the output value from the reference value is an error, we employ the RNN architecture as a regression model using the initial concentration to predict future time steps. Another key distinction with tasks such as machine translation is that Teacher forcing46 is used in training such RNN models so that the model learns the correlation xt+1real = f(xtreal, st), where f is the function learned by the RNN. However, in tasks such as prediction of reactor concentrations in future, the model will be trained for xt+1pred = f(xtreal, st), whereas only the x0pred = x0real is known exactly and the error in prediction of future output, i.e., xt+1pred = f(xtpred, st), gets carried forward and compounded. Thus, in both training and testing stages, we provide only the initial concentration x0 as the RNN model input and let the state of GRU handle the time progression.
(2) |
Hyperparameter | List of values |
---|---|
Dense layer neurons | {64, 128, 512, 1024} |
GRU units | {64, 128, 512, 1024} |
Initial learning rate | {0.01, 0.001, 0.0001} |
Learning rate decay rate | {0.9, 0.99} |
Learning rate decay steps | {25, 50, 100} |
The selected model architectures are trained on a NVIDIA GeForce GTX 1050Ti GPU with a batch size of 64 for 1500 epochs. The python packages – Keras48 and TensorFlow49 – are used to build and train the neural networks. The training took around one second for each epoch.
Layer | Activation | Output shape |
---|---|---|
Model 1 | ||
Input | — | (None, 28) |
Repeat vector | — | (None, 100, 128) |
GRU | tanh | (None, 100, 128) |
Dense (output) | Linear | (None, 100, 27) |
Total parameters: | 64155 |
Model 2 | ||
---|---|---|
Input | — | (None, 28) |
Dense1 | ReLU | (None, 128) |
Repeat vector | — | (None, 100, 128) |
GRU | tanh | (None, 100, 128) |
Dense2 | ReLU | (None, 100, 128) |
Dense3 (output) | Linear | (None, 100, 27) |
Total parameters: | 122779 |
Model 3 | ||
---|---|---|
Input | — | (None, 28) |
Dense1 | ReLU | (None, 128) |
Dense2 | ReLU | (None, 128) |
Repeat vector | — | (None, 100, 128) |
GRU | tanh | (None, 100, 128) |
Dense3 | ReLU | (None, 100, 128) |
Dense4 (output) | Linear | (None, 100, 27) |
Total parameters: | 126491 |
The predictive ability of the trained RNN architectures is assessed by comparing their predictions with the test set. Note that the test set data is never used during the development of the RNN models, including the hyperparameter tuning. We compare the three model architectures for the accuracy in predicting the species concentrations and training and prediction computational time. The latter is also compared with that of solving the system of stiff ODEs associated with the reference kinetic scheme.8
A total of 100 cases are randomly selected from the test set and given as input to the RNN and compared with the reference kinetic model to predict the species evolution for five seconds in the PFR. The reference kinetic model is solved using the stiff ODE solver, DVODE, which required an average prediction time of around 252 s. Table 5 provides the training time, prediction time, coefficient of determination (R2), and MAE for the three RNN architectures described in section 2.3.1. The training time increases with the number of RNN model parameters, whereas the execution time is comparable for all the models. A reduction of four orders of magnitude in the execution time is achieved using the RNN model compared to the reference kinetic model. Note that we are using a reduced kinetic model developed from a detailed chemical mechanism. For the detailed mechanism as the reference kinetic scheme, the reduction in the execution time is expected to be much higher. For all the models, the R2 value is greater than 0.99, demonstrating the ability of even the simplest RNN model to capture the temporal species evolution. An increase in the model complexity also results in a smaller MAE.
Model architecture | Training time (min) | Average model prediction time (s) | R 2 | MAE |
---|---|---|---|---|
Model 1 | 40 | 2.65 × 10−2 | 0.9985 | 15.16 × 10−3 |
Model 2 | 55 | 2.87 × 10−2 | 0.9997 | 11.14 × 10−3 |
Model 3 | 58 | 2.88 × 10−2 | 0.9998 | 9.10 × 10−3 |
For further analysis of the performance of the three RNN models, the MAE for the training, validation, and test sets, varying residence time, and the individual chemical species is shown in Fig. 3. Also evident from Table 5, the MAE is the lowest for model 3. Fig. 3(a) shows that the MAEs for the training, validation, and test sets are comparable, confirming that the models are not overfitted. The MAE reported in this section corresponds to the normalized data to present a comparison across all species. The overall MAE for model 3 is below 1%. We also show the MAE at each time in Fig. 3(b). Clearly, the MAE is higher for the initial time (<0.5 s) than the rest of the residence time. This behavior is attributed to the low resolution (50 ms) of the training data during the initial phase of the secondary reactions (<0.5 s) at higher reactor temperatures. Model 3 provides the minimum MAE in the entire residence time range, though the performance improvement is especially significant in the first 0.5 s. Fig. 3(c) shows the MAEs for all the input–output species. As expected, model 3 provides the lowest MAE for all the species.
Fig. 4 Parity plots of major light gases and tar species for training, validation, and test sets and predictions of model 3. |
As an application of the developed RNN model, we predict the temporal evolution of the major reactant and product species in the PFR. Only the species concentrations under the initial conditions, reactor temperature, and residence time are input to the RNN model. Fig. 5 shows the comparison of the RNN predictions and the test set data for a few randomly selected examples from the test set. The performance at three different temperatures is shown with the data converted back to mass fraction units. These results confirm the ability of the RNN model to predict the temporal evolution of species concentrations for the considered temperature (800–1000 °C) and residence time (5 s). With the excellent match for test data, we also tested the RNN model for future predictions of five more seconds (Fig. S4 in ESI†) to understand the function learned by the RNN model. Note that all the original data using the reference kinetic model is generated until five seconds (hence, the lines end at 5 s). Still, we make model predictions using the RNN for ten seconds. Interestingly the model is qualitatively performing as expected even for future predictions pointing to the suitability of the RNN algorithm for time series data. We will explore this aspect of the RNN models in our future studies.
Fig. 5 Comparison of the mass fraction profiles of the major product species obtained from model 3 (symbols) and test set (lines) at different temperatures. |
Finally, we analyze the effect of the training data since the availability of the training data plays a significant role in the predictive ability of the RNN model for unseen data. For this purpose, model 3 is trained separately for 500 epochs with different fractions of the entire training set. Fig. 6 shows the variation in the MAE for the RNN model 3 trained with different fractions of the training set while ignoring the remaining data from the training set. The validation and test sets remain identical for a fair comparison. The MAE decreases continuously with the amount of training data available, implying an increasing model accuracy. Moreover, the validation and test set MAE is comparable to the training set MAE in the entire range of the training data considered. This observation illustrates that the RNN model is not overfitted even for a smaller fraction of the training data and is thus also suitable when less training data is available. Note that even at 20%, the training data points are more than twice the number of parameters for the model 3 architecture.
Fig. 6 MAE on training, validation, and test sets when model 3 is trained with different fractions of training data. |
This work is a step towards reducing the current gap between data science and realistic reaction engineering problems. Future investigations can focus on integrating detailed chemistry with non-ideal reactor models.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1re00409c |
This journal is © The Royal Society of Chemistry 2022 |