 Open Access Article
 Open Access Article
      
        
          
            Pim R. 
            Linnebank
          
        
       *a, 
      
        
          
            David A. 
            Poole
*a, 
      
        
          
            David A. 
            Poole
          
        
       a, 
      
        
          
            Alexander M. 
            Kluwer
          
        
      b and 
      
        
          
            Joost N. H. 
            Reek
a, 
      
        
          
            Alexander M. 
            Kluwer
          
        
      b and 
      
        
          
            Joost N. H. 
            Reek
          
        
       *ab
*ab
      
aHomogeneous, Supramolecular and Bio-Inspired Catalysis, Van’t Hoff Institute for Molecular Sciences, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands. E-mail: j.n.h.reek@uva.nl; p.r.linnebank@uva.nl
      
bInCatT B.V., Science Park 904, 1098 XH Amsterdam, The Netherlands
    
First published on 9th February 2023
The use of data driven tools to predict the selectivity of homogeneous catalysts has received considerable attention in the past years. In these studies often the catalyst structure is varied, but the use of substrate descriptors to rationalize the catalytic outcome is relatively unexplored. To study whether this may be an effective tool, we investigated both an encapsulated and a non-encapsulated rhodium based catalyst in the hydroformylation reaction of 41 terminal alkenes. For the non-encapsulated catalyst, CAT2, the regioselectivity of the acquired substrate scope could be predicted with high accuracy using the Δ13C NMR shift of the alkene carbon atoms as a descriptor (R2 = 0.74) and when combined with a computed intensity of the C![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch vibration (IC
C stretch vibration (IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch) the accuracy increased further (R2 = 0.86). In contrast, a substrate descriptor approach with an encapsulated catalyst, CAT1, appeared more challenging indicating a confined space effect. We investigated Sterimol parameters of the substrates as well as computer-aided drug design descriptors of the substrates, but these parameters did not result in a predictive formula. The most accurate substrate descriptor based prediction was made with the Δ13C NMR shift and IC
C stretch) the accuracy increased further (R2 = 0.86). In contrast, a substrate descriptor approach with an encapsulated catalyst, CAT1, appeared more challenging indicating a confined space effect. We investigated Sterimol parameters of the substrates as well as computer-aided drug design descriptors of the substrates, but these parameters did not result in a predictive formula. The most accurate substrate descriptor based prediction was made with the Δ13C NMR shift and IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch (R2 = 0.52), suggestive of the involvement of CH–π interactions. To further understand the confined space effect of CAT1, we focused on the subset of 21 allylbenzene derivatives to investigate predictive parameters unique for this subset. These results showed the inclusion of a charge parameter of the aryl ring improved the regioselectivity predictions, which is in agreement with our assessment that noncovalent interactions between the phenyl ring of the cage and the aryl ring of the substrate are relevant for the regioselectivity outcome. However, the correlation is still weak (R2 = 0.36) and as such we are investigating novel parameters that should improve the overall regioselectivity outcome.
C stretch (R2 = 0.52), suggestive of the involvement of CH–π interactions. To further understand the confined space effect of CAT1, we focused on the subset of 21 allylbenzene derivatives to investigate predictive parameters unique for this subset. These results showed the inclusion of a charge parameter of the aryl ring improved the regioselectivity predictions, which is in agreement with our assessment that noncovalent interactions between the phenyl ring of the cage and the aryl ring of the substrate are relevant for the regioselectivity outcome. However, the correlation is still weak (R2 = 0.36) and as such we are investigating novel parameters that should improve the overall regioselectivity outcome.
Due to the complex shapes of encapsulated catalysts, a small variation in the substrate structure can lead to large variations in the overall catalytic outcome and the catalytic outcome of a single substrate cannot be easily extrapolated to other substrates. Despite this fact, the substrate scope is often not explored extensively in reports discussing encapsulated transition metal catalysts.17,28–31,38,39
To rationalize the catalytic outcome, often DFT calculations combined with analytical techniques are used to rationalize catalytic outcomes and predict the selectivity for novel substrates.40–47 To explain the catalytic outcomes using DFT calculations, all pathways need to be considered for every substrate. However, this is not feasible due to computational cost and computational resources required for large systems, such as when encapsulated transition metal catalysts are studied and/or large amounts of substrates are investigated. Because of this, the catalytic results are often only rationalized afterwards, and methods to predict how additional substrates would react often remain elusive. Therefore, it is desirable to find methods that circumvent elaborate DFT calculations, while being able to predict the catalytic outcome of a large substrate scope with reasonable accuracy.
Recently multivariate data driven approaches have been applied to predict the catalytic outcomes of catalyzed reactions.3,48–60 These methods have received considerable attention as these require less computational power while providing valuable information about catalytic systems studied. To be successful, these methods typically require large data sets. Most often catalyst descriptors are used to devise a mathematical model that accurately describes the catalytic outcome for a range of catalysts for a reaction using the same substrates (Fig. 1). However, substrate descriptors should also be applicable to rationalize the catalytic outcome of a large substrate scope. This would then lead to mechanistic insights as such an approach should demonstrate what substrate moieties affect the selectivity outcome, while the amount of computational resources required is lower as only the substrates have to be modeled.
In the hydroformylation reaction, syngas (H2![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) CO) is reacted with an alkene in the presence of a transition metal catalyst to form an aldehyde (Fig. 2). Since the aldehyde can add onto both sides of the alkene, a regio-isomeric mixture is typically formed and as such regioselectivity control is a longstanding challenge for this reaction.61–63
CO) is reacted with an alkene in the presence of a transition metal catalyst to form an aldehyde (Fig. 2). Since the aldehyde can add onto both sides of the alkene, a regio-isomeric mixture is typically formed and as such regioselectivity control is a longstanding challenge for this reaction.61–63
|  | ||
| Fig. 2 General scheme of the hydroformylation reaction showing that two regio-isomers of the aldehyde product can be formed. | ||
In our group we have reported an encapsulated hydroformylation catalyst based on rhodium using a ligand-template meta-tris pyridylphosphine (P(mPy3)) and three zinc-tetraphenylporphyrin (ZnTPP) building blocks (Scheme 1).19,20
|  | ||
| Scheme 1 Substrate scope investigation of the encapsulated [Rh(H)(CO)3(P(mPy3(ZnTPP))3)] catalyst (CAT1) in the hydroformylation reaction of terminal alkenes. | ||
The cage formation relies on the selective coordination of the ZnTPP building blocks to the pyridine moieties of the ligand-template P(mPy3). [Rh(H)(CO)3(P(mPy3(ZnTPP))3)] was the active catalyst which formed under syngas (H2![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) CO) conditions. This encapsulated catalyst offered unique regioselectivity in the hydroformylation reaction as this catalyst is able to convert terminal alkenes such as 1-octene to form an excess of the branched aldehyde (linear/branched ratio = 0.56) (Fig. 3). For internal alkenes, such as 2-octene, the innermost internal aldehyde (outermost/innermost ratio = 1/9) is dominantly formed.21
CO) conditions. This encapsulated catalyst offered unique regioselectivity in the hydroformylation reaction as this catalyst is able to convert terminal alkenes such as 1-octene to form an excess of the branched aldehyde (linear/branched ratio = 0.56) (Fig. 3). For internal alkenes, such as 2-octene, the innermost internal aldehyde (outermost/innermost ratio = 1/9) is dominantly formed.21
Recently, we evaluated the substrate scope of this caged catalyst ([Rh(H)(CO)3(P(mPy3(ZnTPP))3)] (CAT1)) using 41 terminal alkenes and compared the outcomes against an unencapsulated reference catalyst [Rh(H)(CO)2(P(mPy3))2] (CAT2) (Scheme 1).64 For all substrates investigated, CAT1 produces more branched product than CAT2. The degree of branched selectivity enhancement, however, significantly varies between substrates and no clear rationale was discovered for why certain substrates gave a significantly higher regioselectivity enhancement to the branched product than others (Fig. 3). To explain the catalytic outcomes using DFT calculations is not feasible due to computational cost and computational resources required due both to the size of CAT1 as well as the size of the substrate scope.22,65,66 We hypothesized that this data set could be used to apply a substrate descriptor based approach to uncover regioselectivity trends, which ultimately should lead to more insight into the cage effects induced by hydroformylation catalyst CAT1.
As descriptors, steric53 and electronic parameters54–56 as well as IR frequencies57 have been used. These have been successfully applied to predict catalyst properties for several different reactions including organocatalyzed52,58,59 and transition metal catalyzed reactions.53–57,60 Based on a limited parameter set, a mathematical model is constructed. This model shows which parameters, such as steric or noncovalent interactions, largely affect the outcome of the reaction and this allows for the a priori identification of substrates that react with high selectivity with a chosen catalyst as well as entries towards improved catalyst design. An effective substrate descriptor approach for the investigated substrate scope would provide an important tool for identifying substrates that can be converted with high selectivity with a chosen encapsulated transition metal catalyst. Additionally, this could significantly reduce the amount of experiments required to identify reactions that are practically applicable.
|  | (1) | 
As a physical parameter, the difference between the 13C shift of the two olefinic carbon atoms (Δ13C shift) was used and correlated to the selectivity of the hydroformylation reaction with CAT2 (Fig. 4). In previous olefin insertion reactions, this has been identified as a descriptor that strongly correlates with the regioisomeric outcome.55,56,69 Plotting the regioselectivity (ΔΔE) against the Δ13C shift of all substrates evaluated shows a relatively strong correlation with CAT2, demonstrating that for the unencapsulated CAT2 the regioselectivity in the hydroformylation reaction can be predicted with a reasonable accuracy using the Δ13C shift as a substrate descriptor, in line with a previous report.69
|  | ||
| Fig. 4 Plot of the experimental regioselectivity expressed in ΔΔE against the Δ13C shift for all substrates studied for the unencapsulated catalyst CAT2. | ||
To improve the accuracy of our predictions, we calculated the C![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C IR-stretch intensity (IC
C IR-stretch intensity (IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch) for all substrates using DFT calculations as this frequency was proven to be a useful descriptor that accurately predicts the selectivity for some other reactions.52,57,58,70 This was done by calculating the lowest energy structures and subsequently using the Amsterdam Density Functional (ADF) program.71–73 The B3LYP-D3(BJ)74–76 density functional was used together with a small core TZ2P basis set for both the geometry optimizations as well as the frequency calculations.
C stretch) for all substrates using DFT calculations as this frequency was proven to be a useful descriptor that accurately predicts the selectivity for some other reactions.52,57,58,70 This was done by calculating the lowest energy structures and subsequently using the Amsterdam Density Functional (ADF) program.71–73 The B3LYP-D3(BJ)74–76 density functional was used together with a small core TZ2P basis set for both the geometry optimizations as well as the frequency calculations.
| ΔΔECAT2 = 0.038Δ13Cshift + 0.021IC ![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch − 0.70 | (2) | 
![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch in the descriptor formula led to a strong correlation (R2 = 0.86) between the calculated and the observed regioselectivity (Fig. 5). The fact that the regioselectivity in the hydroformylation reaction can be predicted with high accuracy shows that the regioisomeric outcome with CAT2 is mostly governed by alkene polarization, as the IR intensity is mainly governed by the charge redistribution within the bond under specific vibrational transitions.77
C stretch in the descriptor formula led to a strong correlation (R2 = 0.86) between the calculated and the observed regioselectivity (Fig. 5). The fact that the regioselectivity in the hydroformylation reaction can be predicted with high accuracy shows that the regioisomeric outcome with CAT2 is mostly governed by alkene polarization, as the IR intensity is mainly governed by the charge redistribution within the bond under specific vibrational transitions.77
        |  | ||
| Fig. 5  Correlation plot of regioselectivity (ΔΔECAT2) predicted versus the experimentally obtained regioselectivity using the Δ13C shift and IC ![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch substrate parameters as indicated in eqn (2). | ||
| Cage effect = ΔΔECAT2 − ΔΔECAT1 | (3) | 
Also for CAT1 the regioselectivity induced by the catalyst was plotted against the Δ13C shift (Fig. 6). Consistent with our expectations, the correlation was significantly less strong compared to CAT2 using the same parameter (0.35 vs. 0.74).
|  | ||
| Fig. 6 Plot of the experimental regioselectivity expressed in ΔΔE against the Δ13C shift for all substrates studied for the encapsulated catalyst CAT1. | ||
This is in line with the anticipated effect of the confined space around CAT1 as this interacts in a different way when the shape and/or the functional groups on the substrates are altered. Therefore, we hypothesized that models that account for the shape of the substrates could improve the predictive ability of our models. To obtain models that account for the shape of the substrates, we extended the substrate descriptors investigated to Sterimol parameters (Fig. 7), which are parameters that systematically account for the steric influence of the shape of the substrates as reported by Verloop et al.78 Previous work reported by Sigman et al. has shown that for several reactions such descriptors are useful to account for steric effects of functional groups.52–54,58
|  | ||
| Fig. 7 Sterimol parameters acquired to find stronger correlations between substrate properties and selectivity in hydroformylation displayed by CAT1. | ||
We investigated whether such simple steric parameters can account for the interactions such substrates experience with the inner compartment of caged catalyst CAT1. Using the DFT calculated structures of the substrates, we obtained Sterimol parameters. The Sterimol values consist of two width parameters (B1 and B5) and a length parameter (L). The different width parameters were calculated according to the profile of the substituent when viewed down the axis of the C![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C bond. B1 is defined as the minimum width perpendicular to the primary bond axis. This value generally describes the extent of branching at the first carbon center next to the C
C bond. B1 is defined as the minimum width perpendicular to the primary bond axis. This value generally describes the extent of branching at the first carbon center next to the C![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C bond. The B5 parameter describes the maximum width orthogonal to the same axis, which is a degree for how wide the substrate is. L is the total length of the substituent along the C
C bond. The B5 parameter describes the maximum width orthogonal to the same axis, which is a degree for how wide the substrate is. L is the total length of the substituent along the C![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C axis.
C axis.
In the first instance, the substrate descriptor values were plotted against the regioselectivity of CAT1 and CAT2. The cage effect (expressed in ΔΔE) was also used and the overall R2 values for every parameter are presented in Fig. 8.
|  | ||
| Fig. 8 Correlation between descriptors and experimental results (given as R2 values) and a visual representation of parameters investigated. | ||
These values show that better correlations are observed for CAT2 than CAT1 for most substrate descriptors. B1 also correlates with the regioselectivity displayed by CAT1, however, construction of a multiparameter formula with B1 does not result in improved fitting results. The inspection of the substrate scope shows that the B1 parameter mostly reflects di-substitution (B1 ≈ 2.4) or mono-substitution (B1 ≈ 1.8) on the aliphatic carbon atom next to the alkene. However, di-substitution on this carbon atoms is also reflected strongly in Δ13C shift where disubstituted alkenes have a significantly higher Δ13C shift. This also results in a low orthogonality of this parameter to the Δ13C shift, which in turn is reflected by a low correlation between B1 and the cage effect.
Also, L, the substrate length parameter, and B5, the parameter that represents the maximum width of the substrate orthogonal to the C![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C bond, display low correlation values for both catalysts and do not provide useful handles to predict the regioselectivity displayed by these catalysts. Indeed, in our substrate scope investigation, some bulky substrates reacted with exceptionally high branched selectivity (e.g. allylmesitylene, l/b = 0.12) whereas other bulky substrates reacted with low branched selectivity (e.g. 3-(3,5-dimethylphenyl)-1-propene, l/b = 0.71) (Fig. 9), which exemplifies the limitations of employing Sterimol parameters for predicting the catalytic outcome with CAT1. Despite the low correlations observed, we explored correlation equations that included the Sterimol parameters. However, these did not yield significantly better correlations to describe the regioselectivity observed by CAT1 and CAT2 than the equation based on solely the Δ13C shift. This shows that the interactions of the substrates with the cage cannot be simply accounted for with the Sterimol parameters. Most likely, the cage effect of CAT1 involves the precise position of the substrate which is governed by steric hindrance in combination with noncovalent interactions.
C bond, display low correlation values for both catalysts and do not provide useful handles to predict the regioselectivity displayed by these catalysts. Indeed, in our substrate scope investigation, some bulky substrates reacted with exceptionally high branched selectivity (e.g. allylmesitylene, l/b = 0.12) whereas other bulky substrates reacted with low branched selectivity (e.g. 3-(3,5-dimethylphenyl)-1-propene, l/b = 0.71) (Fig. 9), which exemplifies the limitations of employing Sterimol parameters for predicting the catalytic outcome with CAT1. Despite the low correlations observed, we explored correlation equations that included the Sterimol parameters. However, these did not yield significantly better correlations to describe the regioselectivity observed by CAT1 and CAT2 than the equation based on solely the Δ13C shift. This shows that the interactions of the substrates with the cage cannot be simply accounted for with the Sterimol parameters. Most likely, the cage effect of CAT1 involves the precise position of the substrate which is governed by steric hindrance in combination with noncovalent interactions.
|  | ||
| Fig. 9 Bulky substrates both display high regioselectivity control as well as low regioselectivity control with CAT1 despite having similar Sterimol parameters. | ||
As the confined space around CAT1 resembles the active site of enzymes, we also explored substrate descriptors derived from computer-aided drug design (CADD) using RDKit (Fig. 10).79 The substrate descriptors investigated included Kappa shape indices (κ1, κ2, κ3),80,81 Chi shape indices (χ0n–3n, χ0v–3v),81 topological polar surface area (TPSA),82 eccentricity,83 plane of best fit (PBF),84 asphericity,85 sphericity, principal moments of inertia (PMI1, PMI2, PMI3, NPR1, NPR2), approximate surface area (LabuteASA), inertial shape factor (ISF),85 spherocity index,85 the number of rotatable bonds. Unfortunately, none of these substrate descriptors that we examined resulted in strong correlations for the regioselectivity outcomes of CAT1, CAT2, or the cage effect.
For the unencapsulated catalyst CAT2 the correlation function significantly improved when the IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch was included. Also, this descriptor correlated well with the cage effect (Fig. 8) and therefore this descriptor was also explored for cage catalyst CAT1. With the IC
C stretch was included. Also, this descriptor correlated well with the cage effect (Fig. 8) and therefore this descriptor was also explored for cage catalyst CAT1. With the IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch included as a substrate descriptor the overall fit for the selectivity displayed by CAT1 indeed improved (R2 = 0.35 vs. 0.52) (Fig. 11).
C stretch included as a substrate descriptor the overall fit for the selectivity displayed by CAT1 indeed improved (R2 = 0.35 vs. 0.52) (Fig. 11).
| ΔΔECAT1 = 0.057Δ13Cshift − 0.031IC ![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch − 1.28 | (4) | 
|  | ||
| Fig. 11  Moderate correlation for catalytic outcome with CAT1 and the Δ13C shift and IC ![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch substrate parameters. | ||
Surprisingly, the IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch had an opposite sign in the correlation equation for CAT1 as a catalyst. The correlation equation for CAT1 shows that an increase in IC
C stretch had an opposite sign in the correlation equation for CAT1 as a catalyst. The correlation equation for CAT1 shows that an increase in IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch intensity enhances the selectivity for the branched product, whereas the correlation equation for CAT2 predicts that a higher IC
C stretch intensity enhances the selectivity for the branched product, whereas the correlation equation for CAT2 predicts that a higher IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch intensity enhances the selectivity for the linear product. A plausible explanation is that the alkene interacts with the aromatic planes of the walls of the cage as the alkene is in close proximity with the ZnTPP walls, which results in an altered IC
C stretch intensity enhances the selectivity for the linear product. A plausible explanation is that the alkene interacts with the aromatic planes of the walls of the cage as the alkene is in close proximity with the ZnTPP walls, which results in an altered IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch vibration (Fig. 12). Indeed, simple DFT calculations show that the intensity of the IC
C stretch vibration (Fig. 12). Indeed, simple DFT calculations show that the intensity of the IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch is affected in the proximity of an aromatic surface. The regioselectivity predictions are still significantly less accurate for CAT1 than CAT2 (R2 = 0.51 vs. 0.86), indicating that we didn’t capture the cage effect to its full extent yet. Therefore, it is still desirable to acquire substrate and/or substrate/catalyst descriptors for the substrate scope with CAT1 to construct a formula with higher accuracy.
C stretch is affected in the proximity of an aromatic surface. The regioselectivity predictions are still significantly less accurate for CAT1 than CAT2 (R2 = 0.51 vs. 0.86), indicating that we didn’t capture the cage effect to its full extent yet. Therefore, it is still desirable to acquire substrate and/or substrate/catalyst descriptors for the substrate scope with CAT1 to construct a formula with higher accuracy.
|  | ||
| Fig. 12 Noncovalent interactions between the alkene moiety and aromatic plane of the cage affect the regioselectivity. | ||
|  | ||
| Fig. 13 Allylbenzene subset investigated to understand the noncovalent interactions between CAT1 and the substrates. | ||
We chose this subset as it is the largest subset investigated and there is a significant variation in the overall regioselectivity outcome with CAT1, while the alkene polarization is comparable for this substrate class. Furthermore, the difference in size between all the substrates does not explain the regioselectivity trends observed. Correlation equations have also been applied to a limited set of similar substrates in other reports.53,54,56,70 This is generally more facile as the catalyst–substrate interactions are generally more similar, making the construction of predictive formulae less complicated. Possibly, due to the diversity of the substrates investigated in this study, the construction of a general formula that accounts for all substrates evaluated is unsuccessful for CAT1. Therefore, better correlations might be obtained with models that only cover certain substrate classes. However, the added value of such models is lower as the formulae only apply to a single class of substrates.
If the alkene polarization (Δ13C shift) of all allylbenzene derivatives against measured ΔΔE is plotted, no correlation between the regioselectivity outcome and the alkene polarization of allylbenzene derivatives is observed (R = 0.011) (Fig. 14).
|  | ||
| Fig. 14 Weak correlation between the alkene polarization (represented by Δ13C shift) and the regioselectivity outcome of allylbenzene derivatives for CAT1. | ||
Since there was no correlation between the polarization of the alkenes and the overall regioselectivity outcome we investigated additional parameters to obtain a better correlation.
DFT calculations in previous reports show that the computed substrates (i.e. 2-octene and allylbenzene) display CH–π interactions with the porphyrin walls of the cage.22,64 Since the aryl ring of CAT1 interacted with the aryl ring of allylbenzene in a DFT study, we hypothesized that the charge of the aryl ring of the allylbenzene derivatives could provide a predictive parameter of the regioselectivity outcome as these interactions differ between substrates and are most likely responsible for the large differences in selectivity control, e.g., the regioselectivity differences between the allylbenzene type substrates (vide supra). Additionally, these substrates show interesting regioselectivity effects, where ortho and para substituents lead to an increased branched selectivity, whereas meta substituents lead to decreased branched selectivity. When combined with the aforementioned IC![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch we obtained a significantly better correlation when we included the average Voronoi deformation density (VDD) charge of the C3 and C5 positions of the aryl ring as an additional descriptor (Fig. 15, eqn (5)).86 This is in agreement with our previously reported assessment that noncovalent interactions between the aryl ring of the allylbenzene moiety and the aryl ring of the ZnTPP moiety of the catalyst affect the regioselectivity.22,64 It shows that a more negative charge on these positions lowers the ΔΔE, which leads to a higher branched selectivity.
C stretch we obtained a significantly better correlation when we included the average Voronoi deformation density (VDD) charge of the C3 and C5 positions of the aryl ring as an additional descriptor (Fig. 15, eqn (5)).86 This is in agreement with our previously reported assessment that noncovalent interactions between the aryl ring of the allylbenzene moiety and the aryl ring of the ZnTPP moiety of the catalyst affect the regioselectivity.22,64 It shows that a more negative charge on these positions lowers the ΔΔE, which leads to a higher branched selectivity.
| ΔΔECAT1 = 0.062Δ13Cshift − 0.036IC ![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch + 2.92VDDmeta + 1.30 | (5) | 
|  | ||
| Fig. 15 Improved regioselectivity prediction of the allylbenzene derivative substrate scope with the inclusion of VDD charges on the meta-carbon atoms of the aryl rings. | ||
Similar to the formula that covered the entire substrate scope evaluated, inclusion of the Sterimol or CADD derived descriptors did not result in a significant improvement of the predictivity of the constructed mathematical equations. Recently Sigman et al. reported a workflow to predict the selectivity of a class of cavity shaped C–H activation catalysts accurately.51 Several additional descriptors of the catalysts, coined SMART, were included to accurately predict the selectivity of these catalysts. The selectivity of the catalysts investigated in this study are also caused by confinement effects, similar to CAT1.
![[double bond, length as m-dash]](https://www.rsc.org/images/entities/char_e001.gif) C stretch as substrate descriptors. This is in agreement with our assertion that the outcome is mostly determined by the alkene polarization parameters and remote substituents do not significantly affect the catalytic outcome with this catalyst.
C stretch as substrate descriptors. This is in agreement with our assertion that the outcome is mostly determined by the alkene polarization parameters and remote substituents do not significantly affect the catalytic outcome with this catalyst.
      A similar approach for the encapsulated catalyst CAT1 showed that the selectivity of the reaction was significantly more difficult to predict and it is clear that additional substrate parameters such as steric interactions between the substrates and the cage as well as noncovalent interactions play a role in determining the overall regioselectivity. Sterimol and molecular dynamics derived parameters were investigated in order to improve the model by accounting for the substrate size. However, the use of such parameters does not lead to improved models for prediction of the selectivity of the reaction. The models used did not account for the noncovalent interactions displayed between substrate moieties and the walls of the cages as well as the relative flexibility of the substrate. As such, we investigated correlation equations using the allylbenzene derivative substrates subset when reacted with CAT1 and found a significant improvement when we included the average VDD charge at the C3 and C5 positions of the allylbenzene derivatives. Despite this improvement, the accuracy of the formula predicting the regioselectivity was still low using these descriptors (R2 = 0.36). As such we are currently investigating additional parameters to be able to accurately predict the regioselectivity of substrates when reacted with CAT1.
| Footnote | 
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3fd00023k | 
| This journal is © The Royal Society of Chemistry 2023 |