Siowling
Soh
a,
Yanhu
Wei
b,
Bartlomiej
Kowalczyk
b,
Chris M.
Gothard
b,
Bilge
Baytekin
b,
Nosheen
Gothard
b and
Bartosz A.
Grzybowski
*ab
aDepartment of Chemical and Biological Engineering, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208. E-mail: grzybor@northwestern.edu
bDepartment of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208
First published on 23rd February 2012
Although modern chemical databases store a great wealth of structural and reactivity data, this vast “universe” of chemical information has not yet been systematically analyzed. Here, we use computers to derive from the entire body of organic-chemical knowledge the indices that estimate the reactivity and cross influence of functional groups. The major premise of our approach is that in sufficiently large and diverse collections of reactions (as the entire “history” of organic chemistry is), the frequencies with which transformations of certain groups occur, reflect their reactivities. Illustrative examples spanning several classes of reactions demonstrate that our knowledge-based indices capture the well-known reactivity trends. A free-access software is also developed with which other trends can be analyzed for various combinations of functional groups.
Recently, we have shown that accurate reactivity indices for specific individual groups involved in various classes of reactions can be derived from the existing body of chemical knowledge8,9—namely, from the statistics of reactions reported in literature from the times of Lavoisier to the present and nowadays stored in chemical databases such as Beilstein/Reaxys, The Chemical Thesaurus, or Current Chemical Reactions. The premise of this knowledge-based approach has been that in sufficiently large, diverse and unbiased collections of reactions—as the entire “history” of organic chemistry certainly is8,10–12—the numbers of transformations in which specific functional groups have reacted should reflect the reactivity of these groups. In other words, with adequate reaction statistics, synthetic “popularity” is proportional to chemical reactivity. In ref. 9, we validated this approach against experimental data (notably, Hammett constants) for several classes of reactions including formation of diazo-compounds, Stille cross-coupling, Heck and/or Suzuki type couplings, and more. With hindsight, the fact that admittedly simple reaction counting yielded accurate estimates of reactivity is, perhaps, not that surprising. After all, every successful chemical reaction reported in the literature is an outcome of an experiment in chemical reactivity, kinetics, and thermodynamics. What our approach quantifies is, in essence, the fact that more kinetically and/or thermodynamically favorable reactions have higher chances of being carried out successfully than those that are less favored. While such conclusion would be premature when considering only few select reactions, large reaction sets “average over” specific conditions (solvents, temperature, etc.) and if the comparisons are made for specific groups, it is the inherent reactivities and cross-reactivities that our statistical method yields.
In the present paper, we extend this statistical approach to attack a problem that is significantly more challenging than determining the reactivities of individual groups—specifically, our objective is, to back-track from all known synthetic knowledge (cf.Fig. 1a) the measures that quantify mutual influence of organic functionalities present in the same molecule. Based on the counts of database-stored chemical reactions, we derive conditional probabilities of certain groups being present in a molecule and reacting – these probabilities then lead to indices, ηAB, that quantify cross-influence (activating vs. deactivating) for arbitrary pairs of chemical functionalities in different mutual arrangements (e.g., directly connected or separated within a molecule). The reactivity trends the ηAB, indices predict are congruent with common chemical intuition for several well studied classes of reactions. In addition to the illustrative examples discussed in the paper, a β-version software—which we call ChemGPS for Chemical Group Predictor Software—that calculates the ηAB indices for user-specified group pairs is made freely available through our group web page (http://www.dysa.northwestern.edu/ChemGPS). Our work aims to supplement “intuitive”/subjective cross-influence/reactivity measures with ones that are based on the entire, collective knowledge of several generations of chemists. It is the first time in the history of chemistry that this knowledge can be analyzed and quantified en masse with the help of modern computers.
Fig. 1 (a) A small (∼300 nodes) sub-network of organic chemistry centered around cortisone. Individual nodes represent the molecules, arrows represent reactions. The entire “universe” of known organic reactions is more than 25,000 times larger than the sub-network shown here. Yet, the wealth of structural and reactivity information stored in this repository created collectively by generations of chemists8,9,11,12 has not been systematically explored. Of course, to make predictions beyond simple network connectivity, the individual nodes of the network must be analyzed in structural detail. In the present work, all nodes/molecules are decomposed into functional groups as illustrated in (b) for Penicillin G (top) and Dauricine (bottom). The numbers on the right correspond to the occurrences of each functional group in the molecule. The algorithm for decomposition will be published separately; most important 322 groups are listed in the Supplementary Information†, Section 1. |
Fig. 2 Reactivity of individual functional groups. (a) Scheme illustrating the calculation of the reactivity index, RA, defined as the fraction of reactions, in which group A reacts. In this case, RA = 5/10 = 0.5. (b) Three examples of calculated RA trends, which are consistent with common chemical knowledge. The sizes of the reaction sets, NAtot, and the 95% confidence intervals, CIA, are indicated for each RA. |
Fig. 3 The cross-influence of functional groups A and B. (a) Shows a scheme for calculating the reactivity index, RAB, when both A and B, are present. In this case, RAB = 6/10 = 0.6. The cross-influence index is then, ηAB = RAB/RA, where ηAB > 1 indicates that B increases the reactivity of A, while ηAB < 1 indicates that B decreases it. (b) Table of ηAB for all 322 × 322 functional groups, where A and B are directly connected. The values of ηAB are color-coded as illustrated in the scale on the right. A magnified view of the table's fragment is shown in (c). In this specific example, ηAB generally increases “to the right,” with increasing number of nitrogen atoms in the aromatic ring (see main text for discussion). |
The key point of our work is then as follows: if RAB > RA, the presence of B renders A more reactive than its inherent/“average” reactivity; in this case, a cross-influence index ηAB = RAB/RA > 1. Conversely, if RAB < RA (or ηAB < 1), B causes A to be less reactive than in the generic case (i.e., when the reactivity of A is determined irrespective of any other groups present on the same molecule). We make three further comments about the cross-influence indices ηAB thus defined: (1) these indices can take into account specific arrangement of the A and B groups within the molecules. In the present work, we focus on examples where A and B are either directly connected or separated by one carbon atom, such that steric or electronic effects of proximal groups are most pronounced (though, of course, the same methodology can be applied to situations where A and B are more distant). (2) For a given arrangement of groups, ηAB indices can be generated rapidly (within seconds on a standard PC) for all possible pairs of functional groups. This procedure generates a cross-influence matrix such as that in Fig. 3b, where the colors correspond to the activating or deactivating effect of B on A. (3) The indices can be calculated within specific classes of reactions; in such a case, NAB values are reaction counts over a particular reaction class only (as discussed later in the text, see Fig. 5).
The first case study is illustrated in Fig. 3c for substituted heterocyclic compounds where the reactivity of the substituents (e.g., A = Cl, I, OH, COOH, COOR) increases with increasing number of nitrogen atoms present on the aromatic ring B (i.e., from left to right in the figure, values of ηAB are color-coded, see figure caption). These results are chemically reasonable, since in the presence of more nitrogen atoms, there is an increasing electron withdrawing effect, causing groups A to be more active in various reactions. For instance, with the increase of the number of nitrogen atoms, hydroxyl and carboxylic acid groups are more easily deprotonated, ester group is more prone to the attack of nucleophilic reagents, and the decrease of electron density around carbon atoms makes aryl chloride or iodide more prone to nucleophilic attack or Pd catalyzed coupling.
Another trend is shown in Fig. 4 where the cross-influence of several synthetically important groups—either directly connected or separated by one carbon atom—is considered. When A is directly connected to B (Fig. 4a), it becomes an easier target for nucleophilic reagents because of two factors: (i) a significant electron withdrawing effect of groups B (here, ketone, carboxylic acid, ester, amide) on A (alkene, ketone, ester) or (ii) electron conjugation of A to B. In both cases, ηAB > 1. Similarly, when A and B are separated by one carbon atom (Fig. 4b), electron withdrawing groups, B (e.g., nitro and ketone), encourage nucleophilic attack, while B = benzene or alkyne can help stabilize the transition states of SN1/SN2 chloride substitution via the conjugation effect. Further examples, this time such that B “deactivates” A (i.e., ηAB < 1), are discussed in the SI†, Section 3.
Fig. 4 Examples of the cross-influence indices, ηAB, calculated for the case when (a) functional groups A and B are directly connected, and (b) when they are separated by one carbon atom. Owing to the large sample size (NABtot ∼ 103–104), the 95% confidence intervals values, CIη, for ηAB, are small, indicating reliable statistics. (c) The reactivity of inherently highly reactive functional groups (here, A = acyl chloride), does not depend perceptibly on the nature of other groups present in the molecule; in this case, ηAB ∼ 1. In contrast, when A is less inherently reactive (here A = ketone), the nature of B can influence its reactivity to a much higher degree (ηAB's above or below 1). Error bars correspond to CIη. |
An interesting situation arises when A is a highly reactive group itself—in this case, one could expect that less reactive groups B would have a marginal effect on the already high reactivity of A. This, indeed, is the case, as illustrated in Fig. 4c where various substituents connected to the highly reactive acyl chloride have minimal effect on its reactivity, so that RAB ∼ RA and ηAB ∼ 1. This is in contrast to a case where A is less reactive (e.g., ketone), and the nature of B influences the values of ηAB perceptibly (also see SI†, Section 4 for further examples).
While the trends discussed above are taken across all possible reaction types, ηAB can be also calculated for specific types of transformations. In such cases, the statistics become more scarce (reaction counts, N's, are smaller) but are still high enough for statistical significance at the 95% confidence levels (see ref. 14). As an illustration, first consider a general trend for the case of A = ester and B = halides (Fig. 5a). As seen, ηAB increases with increasing electronegativity of the halide which, as any chemist knows, renders the carbonyl atom more susceptible toward a nucleophilic attack. This trend is also present—albeit with different values of the ηAB indices—in specific reaction classes such as the hydrolysis of esters (Fig. 5b) or the conversion of esters into amides via a nucleophilic attack (Fig. 5c). These trends are established by narrowing the counts of reactions (NA and NAB) to specific sub-classes of transformations (these narrowed counts are denoted NSA and NSAB such that the reaction-type-specific index ηSAB = (NSAB/NABtot)/(NSA/NAtot).
Fig. 5 Examples of the cross-influence indices for specific reaction sub-types. (a) Shows the general trend (i.e., for all sub-classes of reactions), where the reactivity index, ηAB, of esters increases with increasing electronegativity of the halide substituents. When specific reactions are considered, the trends are qualitatively similar, but the reaction-specific values of ηSAB are different: (b) ester hydrolysis, (c) conversion of esters into amides via a nucleophilic attack. Note that NABtot is the same for all examples in (a–c). Reactivity indices and trends for two other important and popular reaction classes: (d) the Sonogashira coupling and (e) the Heck coupling. |
This methodology is applicable to arbitrary reaction types. For example, in Fig. 5d–e the influence of halides B on the Sonogashira and Heck couplings is quantified. The ηSAB index values for both of these reactions reflect the reactivity of aryl halides (R–I > R–Br > R–Cl > R–F). These results are well known and chemically reasonable because cleavage of aryl halide bonds is involved in both Sonogashira and Heck reactions, and the energies of aryl halide bonds follow a R–I < R–Br < R–Cl < R–F ordering.
The final set of examples in Fig. 6a applies the ηSAB indices to the popular Diels–Alder reactions—importantly, it is chosen to provide some quantitative comparison with available experimental data. It is known that in this reaction, the reactivity of dienes can be enhanced either by n-electron donating substituents (e.g., methoxy) or by bulky substituents such as t-Butyl-ester, trimethylsilyl (TMS) or t-butyl silyloxy (TBS) in whose presence the diene is more prone to assume the s-cis conformation. These effects have been studied systematically by Fowler15 and Sauer16 who quantified the reactivity of dienes in terms of the reaction rate constants. These experimental reactivity measures are listed in the top row of Fig. 6a and are normalized with respect to the rate constant of an unsubstituted diene. For comparison, the bottom row in the same figure gives the ηSAB indices we derived from the reported reaction counts (with A being the diene and B = chloro, bromo, ether, ester, silyl ether)—these indices agree qualitatively with the experimental trends. We note that this particular analysis is relevant to modern synthesis. A classic example here is the wide use of the Danishefsky's diene (Fig. 6b), whose reactivity derives from the synergistic effect of the OMe and TMS substituents. Another interesting example is the Nicolaou's synthesis of a natural product colombiasin A (Fig. 6c; for two more examples, see SI†, Section 5).
Fig. 6 Reactivity of substituted dienes in Diels–Alder reaction. (a) Reactivities measured experimentally (by determining the reaction rate constants15,16 and normalizing to unity for diene without any reactive functional groups attached) and those (ηSAB) estimated by our method. (b,c) Two contemporary synthetic examples in which the diene's reactivity is enhanced by substituents that ChemGPS finds activating: (b) the use of Danishefsky's diene (1) in the total synthesis of disodium prephenate;24 (c) K. C. Nicolaou's synthesis of colombiasin A, where the TBS protecting group is used as a steric tool to favor s-cis conformation of diene (2).25 |
Footnote |
† Electronic supplementary information (ESI) available: the list of 322 functional groups, calculation of the indices' confidence intervals and more chemical trends/examples. See DOI: 10.1039/c2sc00011c |
This journal is © The Royal Society of Chemistry 2012 |