Predicting the performance of oxidation catalysts using descriptor models

Practical solutions in catalysis require catalysts that are active and stable. Mixed metal oxides are robust materials, and as such are often used as industrial catalysts. The problem is that predicting their performance a priori is difficult. Following our work on simple descriptors for supported metals based on Slater-type orbitals, we show here that a similar paradigm holds also for metal oxides. Using the oxidative dehydrogenation of butane to 1,3-butadiene as a model reaction, we synthesised and tested 15 bimetallic mixed oxides supported on alumina. We then built a descriptor model for these oxides, and projected the model's results on a set of 1711 mixed oxide catalysts in silico . Based on the model's predictions, six new bimetallic oxides were then synthesised and tested. Experimental validation showed impressive results, with Q 2 > 0.9, demonstrating the power of these low-cost predictive models. Importantly, no interaction terms were included in the model, showing that even if we think that bimetallic oxide catalysts are highly complex materials, their performance can be predicted using simplistic models. The implications of these findings to catalyst optimisation practices in academia and industry are discussed.


Introduction
Catalysis is a key enabling technology that affects nearly all aspects of our industrialised society. Catalysts and catalytic processes are essential for the making of fuels and bulk chemicals, fine-chemicals and intermediates, as well as advanced materials, medicines and foodstuffs. 1 The applications range far and wide, and so does research into new catalysts. Scientific papers describe amazing and wonderous structures, intricate dendrimers, 2 molecular "cages", 3 and hybrid inorganic/organic compounds, 4,5 that are limited only by human imagination.
However, the bulk of the industrial applications in real life require robust and hardy materials, and the most common are metal oxides. 6 These are already "burned" and have a high chemical and mechanical resistance, which is a must for large-scale processing. But appearances can be deceiving: the molecular formula of a mixed oxide may look simple, but the actual structure is highly complex. What's more, unlike the uniformity of homogneous complexes, 7,8 the catalytic activity of solids often stems from breaks and kinks on the surface, that in turn depend on minute changes in the synthesis and pre-treatment conditions. 9 Predicting the performance of such catalysts successfully is thus a mammoth task.
There are two approaches for making such predictions. The first uses high-power computing and intricate algorithms, that combine quantum and classical mechanics. Great advances were made in this field in the past decade, 10 and catalyst performance can actually be predicted, but at a high cost. 11,12 The second approach is data-driven, based on modelling catalyst performance using a few simple descriptors. Such models may be less intuitive, but they are highly practical. [13][14][15] Ultimately, both approaches are needed for finding new catalysts and optimising existing ones.
Recently, we demonstrated the feasibility and effectiveness of using simplified radial distribution functions (RDFs) as descriptors for supported metal(0) catalysts. 16 These models can predict the performance of heterogeneous catalysts under a reducing environment (e.g. for catalytic hydrogenation). Here, we take these descriptor models an important step further, into the realm of oxidation reactions. The interactions of the active site with the support are different for an oxide and a metal. 17,18 Oxides bind differently and react differently, so the catalytic performance of a metallic element is usually very different from that of its oxo or peroxo species. Nevertheless, we show here that by tuning the RDF descriptors to the corresponding metal ions, one can predict well the performance of supported catalysts under oxidative conditions. The theoretical principles are first demonstrated using an experimental set of 15 catalysts in the oxidative dehydrogenation of butane to 1,3-butadiene. Subsequently, we generate a large set of 1711 bimetallic oxides in silico, and use descriptor models to project the experimental results onto this dataset. Six promising catalysts from the virtual set are then synthesised and tested, validating the model and demonstrating the power of data-driven predictive modelling in oxidation catalysis.

Materials and instrumentation
Unless stated otherwise, chemicals were purchased from commercial sources (>99% pure) and used as received. γ-alumina (surface area 200 m 2 g −1 , total pore volume 0.6 cm 3 g −1 ) was provided by LANXESS Deutchland GmbH. Surface area measurements were performed using N 2 at 77 K on a Thermo Scientific Surfer instrument and calculated using the BET method. Catalytic oxidative dehydrogenation reactions were tested in a built-for-purpose computer-controlled sixflow reactor setup. This setup enables the testing of six different catalysts simultaneously, using six fixed-bed quartz tube reactors in parallel. The reactors are kept at one temperature but have separately controlled flow rates, allowing for tuning the gas hour space velocity (GHSV). The gas composition is controlled via four mass flow controllers that dose hydrocarbon, oxygen, nitrogen and argon. The temperature is controlled using a carbolite furnace and can be set between 50-1100°C. Reactor output is analysed on-line by both gas chromatography (Interscience compact-GC) and mass spectrometry (Granville Phillips, Brooks Automation). 19,20 Note that while strict calculations should allow for variations in concentration due to the expansion (or compression) of the gas for reactions occurring in gas flow, we assumed for simplicity that the volume remains constant. This enables the use of absolute concentrations and is approximately correct for a flow reaction at low conversions in a reactor with a constant cross-section.

Procedure for catalyst synthesis
All catalysts were prepared by wet impregnation of γ-alumina support ĲM:N/Al 2 O 3 ; the composition details are given in Table 1 2 were each dissolved in 10 ml deionized water while stirring. The two solutions were then combined, stirred and then added to a suspension of 2.90 g γ-alumina in 50 ml deionized water. The mixture was then heated at 95°C overnight in an open round-bottomed flask under continuous stirring till all the water has evaporated. The remaining cake (2.86 g) was ground to a fine powder, which was further dried in an oven at 120°C for 24 h and then calcined in static air at 550°C for 4 h (heating rate 2°C min −1 ). The resulting catalyst was pressed into pellets, and then ground and sieved, retaining the 250-350 μm fraction for testing.
Example 2. WO x :MnO x /Al 2 O 3 (catalyst 12). Stock solutions of W and Mn precursors were prepared as follows: 0.0367 g (0.13 mmol) of ĲNH 4 ) 2 WO 4 and 0.0756 g (0.42 mmol) of MnĲNO 3 ) 2 were dissolved separately in 10 ml deionized water with continuous stirring. The two solutions were combined and added to a suspension of 2.89 g γ-alumina in 50 ml deionized water. The mixture was then heated at 95°C overnight to evaporate excess water. The remaining solid was dried at 120°C for 24 h, calcined at 550°C for 4 h and finally pressed into pellets, ground and sieved, retaining the 250-350 μm fraction for testing.

Procedure for catalyst testing
Each catalyst was tested for 100 mg and 20 mg, catalysts were placed in the reactor tube over a plug of quartz wool, forming a cylindrical catalyst bed roughly 4 cm in height and 4 mm in diameter. In each run using the sixflow reactor, one reactor was kept empty as a blank (this blank was changed between runs to minimise systemic error). The catalysts were activated in situ before reaction in a flow of 45 ml min −1 Ar and 5 ml min −1 O 2 at 500°C. After activation, total reaction feed of 50 ml min −1 was passed in each reactor, with the volumetric ratio ranging O 2 : C 4 H 10 : Ar = Ĳ0.25-1) : 1 : Ĳ8. . Reactions were run for 24 h on stream at both 550°C and 650°C, giving a total of eight different conditions for each catalysts (reaction conditions A-H, see Table 2). Reactant conversion Table 1 Composition of the catalysts prepared and tested in the first iteration

Catalyst
Composition a 1 a In all cases, the loading of each metal is 1 wt%. and product selectivity were monitored on-line using gas chromatography and mass spectrometry. The conversion of butane was calculated as χ butane = (MF in − MF out )/MF in , where MF in and MF out are the molar flows of butane at the reactor inlet and outlet, respectively. Similarly, the selectivity of each product was calculated as S product = MF product /ĲMF in − MF out ), where MF product is the molar flow of the product at the reactor outlet.

Computational methods
Descriptor calculation, analysis and data mining were performed on a Sony Vaio laptop with Intel® Core™ i7-4500U processor. A variable importance (VIP) analysis was done following the method of Hageman et al. 21 Principal components analysis (PCA) and partial least squares (PLS) regression models were run using the JMP pro software. The principal components were calculated by using the NIPALS algorithm, which calculates the components in their order of explaining the variance in the data. All models were validated using leave-one-out cross-validation. A discussion on the merits of validation methods is published elsewhere. 22

Results and discussion
Generating the initial dataset Aiming at both high conversion of n-butane (herein: χ butane ) and a high selectivity to 1,3-butadiene (herein: S butadiene ) we synthesised and tested a varied set of 15 bimetallic oxide catalysts. We chose Fe, Cu, Ag, Sr, Cr, Zr, Pb, In, Nb, Ni, Mg, Ga, Mo, La, Bi, Li, W, Y, K, V, Te, Co, Mn, Pt, and Zn (see Table 1 above). The rationale for choosing these metals is threefold: first, they are commercially available and most of them are relatively cheap, so they could be also applied in an industrial environment; second, some are known to be good dehydrogenation catalysts while others are known as good catalyst promoters. Finally, we also added some metals at random, reducing the bias in the set (a discussion on selecting metals for oxidation catalysis is published elsewhere 23 ). The bimetallic mixed oxide catalysts 1-15 were prepared using wet impregnation (for details see the Experimental section). X-ray diffraction and BET surface area analysis of several samples confirmed that the crystal structure of the alumina remained unchanged. The BET surface area values of these catalysts were all in the range of 200-240 m 2 g −1 . This is what we would expect considering the low metal loadings and high surface area of alumina support. The 15 bimetallic oxide catalysts were then tested in the oxidative dehydrogenation of n-butane (eqn (1)). This reaction Catalyst   has an interesting history: it was a popular subject of research following WW II, when synthetic rubber was in short supply. The interest subsided in the 1960s, when largescale cracking of naphtha provided a steady stream of 1,3butadiene. It then resumed around 2010, with the advent of shale gas and the political unrest in the Persian Gulf. Following our work on ethane 24 and propane 25 oxidative dehydrogenation, we were approached by Lanxess Deutchland GmbH, one of the main users of 1,3-butadiene, to collaborate on using predictive modelling methods for finding new butane oxidative dehydrogenation catalysts.
(1) Table 3 shows the conversion and butadiene selectivity results for the four reaction conditions A-D. Running the  For example, the fact that the FFWH ion arrow is opposite to S butadiene (%) means that oxides with a higher FFWH ion value will give less butadiene. Similarly (and unsurprisingly) the conversion of butane is strongly correlated with the oxygen : butane ratio.  Fig. 1 only). The reactions at lower catalyst loadings were run to confirm that the same mechanism is in effect at both regimes. This was confirmed by the similar product selectivity at lower conversions. Fig. 1 shows the conversion and total butenes selectivity results for all 15 catalysts at all eight condition sets. The remaining difference to 100% is due to oxidation to CO and CO 2 . No deactivation was observed over 24 h on stream, and control experiments on three different catalysts running for 100 h showed also no deactivation.

Choosing relevant catalyst descriptors
In general, there are three approaches for modelling catalyst performance. One option is based on an in-depth analysis of the reaction mechanism, combined with high-level quantum mechanics models. Although these models are computationally very expensive, they often provide accurate data, that can then be used for making good predictions. Examples in heterogeneous catalysis include work from the groups of Nørskov and Bligaard, 10 Neurock, 26 van Santen 27 and Sautet, 28 as well as from our group. 29 Yet these in-depth models are typically too expensive to be applied to large data sets. The second approach is using purely data-driven models. These "black-box" models are based on statistical analysis, often combined with stochastic optimization methods, such as neural networks or genetic algorithms. 23,30 Such models    are fast, but connecting their results to 'chemical intuition' is difficult, and they cannot adapt well to new factors. Here, we opted for a third approach, using so-called 'grey models', that combine simple descriptors based on chemical principles with statistical modelling. As we will show, such models are effective in predicting catalyst performance, giving a good cost-to-benefit ratio.
Previously, we showed that descriptors based on radial distribution functions (RDFs) derived from Slater-type orbitals (STOs) are effective for modelling and predicting the performance of hydrogenation catalysts. 31,32 These RDF descriptors are robust. Their calculation is straightforward, and their implementation is easy. Here, we will show that the same approach works also in an oxidative environment, but instead of using the parameters for metals, we now apply the analogous parameters for their oxide salts. This is an important generalizing stepthe same paradigm that works well for monometallic and bimetallic catalysts applies also to monometallic oxides and mixed metal oxides.
Basically, we reduce the combined STOs of the frontier orbitals of each metal to four parameters: the distance from the nucleus where the probability of finding the electrons is highest, r apex , the value of the RDF at this distance, R apex , the peak width at half height, FWHH, and the skewness of the peak, Skew (the latter is calculated as the area on one side of the peak divided by the area on the other side, see Fig. 2). However, considering that the (mixed) oxide system is more complex than the pure metallic one, we introduced three additional parameters as descriptors: electronegativity, 33 atomic radius 34,35 and ionization potential. 36 To construct a statistical model that can predict the performance of these mixed oxide catalysts, we first used principal component analysis (PCA) and partial least squares (PLS) regression for distinguishing important parameters from marginal ones. This must be done to avoid over-fitting and ensure that the model will be based on the simplest and most robust parameters (a tutorial on using PCA and PLS in catalysis research is published elsewhere 37 ). Fig. 3 shows a biplot representation based on the PCA analysis. The symbols on the graph show the distribution of the conversion and selectivity for catalysts 1-15 running under reaction conditions A-H. In this graph, the axes are the two first principal components (PCs, also called 'latent variables'). These two PCs explain 53% of the variance in the data. The arrows indicate the direction and magnitude of the descriptors, the reaction conditions, and the figures of merit. The direction of the arrows gives the relation between the parameters: if two arrows are close together, it means that the two parameters are highly correlated. Similarly, if two arrows are close together yet pointing at opposite directions, it means that the two parameters are inversely correlated. Finally, if two arrows are orthogonal to each other, it means that the two parameters are uncorrelated.
Looking at the biplot in Fig. 3, we see that the conversion of n-butane (χ butane ) is very closely grouped with three reaction parameters: catalyst amount, O 2 : n-butane flow and reaction temperature. Indeed, this is what you would expect. Further, we see that the selectivity of total butene is inverse to χ butane (cf. Fig. 1). S butadiene is correlated with the RDF descriptors. It depends directly on the parameters R apex and Skew, and inversely on r apex and FWHH ion . Interestingly, the product selectivity does not depend directly on the reaction conditions. This does not mean that S butadiene and S butenes are independent of each other. Butenes produced by ODH could be used for making 1,3-butadiene. Fig. 4 shows the loading of each sample on the first two principal components (PC1 and PC2). PC1 is sensitive to the type of catalyst, yet insensitive to any changes in the reaction conditions. This is important, because PC1 explains the largest amount of variance in the data, and the largest change in the production of butadiene comes when you change the catalyst precursor. Conversely, PC2 is much more sensitive to changes in the reaction conditions.

Predicting the performance of new ODH catalysts
Now that we have pinpointed good descriptors for these catalysts, we can use these for building a model for predicting the performance of new catalysts. Therein lies the real value of descriptor models. We use these models for screening a large space of virtual catalysts, and then test in the lab those catalysts for which the model predicts the desired performance (so-called 'figure of merit', see flowchart in Fig. 5). 19 In this specific case, we are searching for bimetallic supported oxides that will give both high conversion of butane and a high selectivity for 1,3-butadiene. Thus, we want to maximise both χ butane and S butadiene . In addition, we need to synthesise and test some catalyst candidates with low predicted values. This may seem counter-productive, and it is always a sore point of discussion with the people who actually carry out the experiments. Yet testing "bad" candidates is essential for confirming the model's viability and robustness over a wide range of data. First, we created and modeled our training set of 15 bimetallic supported oxide catalysts (catalysts 1-15). We applied a partial least squares (PLS) regression model, using the descriptor values based on the metal ion STOs as input. These differ from the pure metal STOs that we used earlier for modelling hydrogenation catalysts. 16 The reason is that the pristine catalysts are metal oxides, and in an oxidative environment, metal(0) species are unlikely. The correlation coefficients (see Fig. 6) using the metal ion STOs were good: R 2 = 0.865 for χ butane and R 2 = 0.610 for S butadiene . These numbers may seem low, but they are actually impressive, especially considering the simplicity of the descriptors, and the fact that no interaction parameters were included for these bimetallic oxides. Control experiments showed that the correlation with metal(0) STO descriptors was much lower, R 2 = 0.5748 for χ butane and 0.2321 for S butadiene , respectively, confirming the hypothesis that oxide models are more suitable for modelling metal oxides than pure metal models. All of the models were validated using leaveone-out cross-validation.
We then created a large set of virtual bimetallic oxides, comprised of 1711 bimetallic combinations of 59 elements in total (see Fig. 7). Calculating the descriptor values for these 1711 virtual catalysts is very fast (especially as there are no interaction parameters). It takes only seconds using a simple laptop. We then projected the results of the descriptor models for the 1711 virtual catalysts on the set of the 15 real catalysts, and selected six bimetallic supported oxides catalysts. These were then synthesized and tested in the lab. Fig. 8 shows the so-called parity plot of the predicted vs. the experimental results, both for the conversion of butane and the selectivity to 1,3-butadiene. The plot shows that there is a good fit between the model's predictions and the actual experimental data. Note that we selected not only catalysts with an expected high performance (high conversion and  selectivity) but also ones for which we had low expectations. This is important, because it shows the wide operational range of the model. There is an understandable bias in published papers towards good resultspublishing papers about badly performing catalysts is a tough sell, but if you want to predict the performance of catalysts, your model should cover a wide range. This means testing both good and bad candidates.
The good performance of the models in the case of mixed metal oxide raises the question of the importance (or in this case, lack of importance) of the interaction parameter. Basically, if no interaction parameter is included, it means that the model is limited to a linear combination of the effects of oxide A and oxide B. That is, for a catalyst containing two metals, M 1 and M 2 , the figure of merit would be FOM = fĲM 1 O x ) + fĲM 2 O x ), giving some weighted average of the effects of the two oxides. This does not necessarily mean that there is no interaction effect at all. Rather, it may reflect the fact that these catalysts contain relatively little active material, 1 wt% of M 1 and 1 wt% of M 2 . When these are impregnated on the alumina support and calcined, the actual sites where mixing occurs between the oxides are probably few and far between (see Fig. 9, left). In such a case, the weighted average would give (and indeed gives) a good description of the catalytic properties of the surface. Avoiding the interaction term in the model makes sense, because such a second-order term would increase the chances of over-fitting. In the case of a main metal and a promoter metal (see example in Fig. 9, right) there may be more justification for including interaction parameters (e.g. in the dehydrogenation of alkanes catalysed by Pt/Sn).

Conclusions
Complex catalytic reactions such as the oxidative dehydrogenation of butane to butenes and butadiene can be modeled efficiently using heuristic descriptor models. These datadriven models are 'quick & dirty'they cost practically zero in computer time, yet deliver surprisingly accurate results. The fact that such models work well also under oxidation conditions may not surprise mathematicians, who consider a model's performance as a function with a figure of merit and residuals. But for chemists, this means that the RDF descriptors based on Slater-type orbitals can now be applied across a wide range of catalytic processes. Since they perform well for metal(0) catalysts and metal oxides, they should in principle also do well in predicting the activity and selectivity of metal sulfides, nitrides and carbides. We hope that this work will encourage colleagues in academia and industry to apply these models as they search for new, active, selective and robust catalytic materials.