A thorough anion–π interaction study in biomolecules: on the importance of cooperativity effects† †Electronic supplementary information (ESI) available: Fig. 1 and 2, Tables 1–24 and all interactions. See DOI: 10.1039/c5sc01386k

The importance of anion–π interactions in key biological processes is reported from a PDB analysis of anion–π interactions in biomolecules, also considering cooperativity effects by including other interactions.


Introduction
Noncovalent interactions have a constitutive role in the science of intermolecular relationships. In particular, those involving aromatic rings play a vital role in chemistry and biology, 1 which becomes prominent in drug-receptor interactions, crystal engineering, and protein folding. 2 For example, we have recently reported the small molecule XD14, a BET bromodomain inhibitor, which presents a key T-shaped p-p interaction with a tryptophan in the recognition site of the target responsible for high potency and selectivity. 3 Around 60% of aromatic amino acid side chains (phenylalanine, tyrosine, tryptophan, and histidine) are estimated to participate in p-stacking interactions in proteins. 4 Stacking interactions in nucleic acids play a fundamental role, wherein the structure of DNA duplexes is stabilized by nucleobase intra-and inter-strand stacking interactions. 4,5 Moreover, the action of intercalating drugs as well as the biochemical processes implicated in the control and regulation of gene expression depend on protein-DNA stacking interactions. 6 An additional related function takes place at the active site of a number of DNA repair enzymes, where alkylated purines are excised by means of a recognition mechanism based on p-p contacts with the side chains of aromatic amino acids. 7 Similarly, these contacts play a crucial role in the repair process, where the insertion of aromatic amino acids into the DNA strand help preserve stability when the damaged base is ipped into the active site of the repair enzyme and out of the duplex. 7 In recent years, the interaction between an electron-decient aromatic moiety and an anion conveniently located above the ring plane has been accepted as a noncovalent bonding contact. The nature of this contact, designated an "anion-p interaction", 8 has been reported by a myriad of computational investigations, that prove that it is energetically favorable, [8][9][10][11][12][13] as well as several experimental studies. [14][15][16][17] Though the role of anion-p interactions in chemical processes is being progressively acknowledged, [18][19][20][21][22][23] their involvement in biological processes has been scarcely reported. The search for anion-p interactions in biological macromolecules began in 2011, when our group reported clear evidence of anion-p interactions in the active site of urate oxidase, causing inhibition of the enzymatic activity, and thereby demonstrating the crucial role of this noncovalent interaction in a biological system for the rst time. 24 Three additional studies appeared the same year indicating that such interactions may be important in protein structures. A pioneering systematic search through the Protein Data Bank (PDB) showed that anion-p close contacts exist in experimental protein structures between the standard aromatic residues (Trp, Phe, Tyr, and His) and anions, such as chloride and phosphate. 25 Also, by a systematic search of protein structures followed by ab initio calculations, our group showed that anion-p interactions are likely to occur in avin-dependent enzymes. 26 By examining high-resolution structures of proteins and nucleic acids, Chakravarty and coworkers pointed out that "h 6 "-type anion-p interaction is observed unambiguously and suggested it plays an important role in macromolecular folding and function. 27 Howell and coworkers also performed a PDB search focusing on interactions between Phe and negatively charged residues such as Asp and Glu, leading to the conclusion that anion-p interactions are weakly attractive or slightly repulsive. 28 A subsequent renement of their PDB study for anion-p interactions showed that these interactions are present in thousands of protein structures with strong binding energies, as large as À8.7 kcal mol À1 . 29 Wetmore and co-workers thoroughly studied the interaction between cytosine and Asp or Glu concluding that the large magnitude of the anion-p interaction, up to ca. 23 kcal mol À1 , suggests that it can play a large role in biology. 30 Our group, on the one hand, also reported the critical role of the anion-p interaction in the mechanism of sulde:quinone oxidoreductase, 31 and, on the other hand, we demonstrated the importance of the anion-p interaction in the mechanism of inhibition of phenyldiketo acids of malate synthase. 32 To dig deep into the current knowledge and understanding of the biological role of the anion-p interaction and greatly expand the number of possible interactions by increasing the number of interacting units, in this work we present a largescale PDB analysis of the occurrence of anion-p interactions in proteins and nucleic acids, by considering the side chains of Phe, Tyr, Trp, and His and the purine and pyrimidine bases as the interacting aromatic rings, and F À , Cl À , Br À , I À , SO 4 2À , PO 4 3À , NO 3 À , CO 3 2À , Glu, and Asp as the interacting anions (because the pK a values for Asp and Glu are low, 3.5-4.5, 33 we assume Asp and Glu are always ionized). Moreover, to gain insight into the role of anion-p interactions in the stabilization of macromolecular complexes, inter-chain recognition has also been a subject of study, primarily for proteins. We have gone a step further in the analysis by considering the existence of cooperativity effects through the inclusion of a second noncovalent interaction, i.e. p-stacking, T-shaped, or cation-p interactions. These cooperativity effects are supposed to be of utmost importance for the weakly attractive anion-p interactions where the aromatic ring is an electron-rich p-system. As far as we are aware this is the rst time that cooperativity effects are addressed in a study of anion-p interactions in biological systems.

Results and discussion
Geometric parameters used during data collection are depicted in Fig. 1, along with exemplary binary and ternary interactions. The search yielded thousands of anion-p interactions contained in the PDB, as well as ternary complexes involving additional aromatic systems and cations. The identied interactions are summarized in Table 1 Tables 2 and 3 † we include the interacting residues and summarize the results of the search of anion-p interactions with adenine (DA), cytosine (DC), thymine (DT), and guanine (DG) rings as in DNA. First, we observed 69 interactions in 56 unique PDB structures, 63 of which corresponded to protein-DNA complexes. We could not detect selectivity towards either of the two most abundant anions, i.e. Glu (32 interactions) and Asp (31 interactions), accounting for 91.3% of interactions. For the rest of the anions, namely, chloride and sulfate, only 1 and 5 anion-p interactions were found, respectively. The most representative binary contact is Glu-DT, followed by Asp-DC and Asp-DT. From the results in ESI Table 3 † it can be extracted that the interactions with DT and DC are, by far, the most numerous: 84.3% and 96.8% of Glu and Asp, respectively, interact via anion-p contacts with the pyrimidinic rings. The purine bases adenine (6 hits) and guanine (1 hit) barely establish interactions. From the electrostatic point of view, it is understandable why the most p-acidic ring, thymine, is the most abundant interacting residue. However, the electrostatic contribution alone cannot explain the disparity of cytosine and guanine rings.
In Fig. 2 we show histograms of the equilibrium distance and angle as dened in the computational methods. Overall, we observe that the median equilibrium distance ( d À ) and angle ( a À ) are 4.53Å and 56.2 . If we break down these values into the different anion contributions, the shortest d À and smallest Fig. 1 Considered interaction types and geometric parameters used during data collection. Distances d a , d c , and d p are between the centroid of the aromatic ring and the anion, cation, and centroid of another aromatic ring, respectively. Angles a a , a c , and a p , are formed by the vector connecting the ring centroid with the anion, cation, and another ring plane, respectively. Angle a pp , is formed between ring planes. A comprehensive definition of centers and centroids for each amino acid, nucleic base, and ion is offered as ESI Table 1. † a À are found for Glu with 4.39Å and 61.6 . Intriguingly, the interactions with Asp have a sensibly longer d À (4.67Å) along with a wider a À (53.4 ) than Glu, which can only be attributed to the longer Glu side chain. Sulfate presents a d À of 4.50Å and the largest a À , 55.1 . We have also analyzed the orientation of the carboxylate with respect to the aromatic system by considering the angle between the plane dened by the carboxylic carbon and oxygen atoms in Asp and Glu and the plane of the interacting aromatic system. The results are gathered in ESI Fig. 2. † A value close to 0 indicates a face-to-face interaction and a value close to 90 indicates an edge-to-face interaction. As inferred from the inspection of the gure a face-to-face approach predominates with a value for the median angle of 30.0 , which is consistent with a reinforcement of the anion-p interaction by a p-p effect.
An example of anion-p interaction is illustrated in Fig. 3a, with the BamHI type II restriction endonuclease bound to DNA in the presence of Mn 2+ and Ca 2+ . 34 Type II restriction endonucleases are phosphodiesterases that recognize short palindromic DNA sequences and cleave both DNA strands to yield 5 0phosphate and 3 0 -hydroxyl groups. In the gure, two anion-p interactions between Asp154 and cytosines 4 and 8 from different strands are shown for the post-reactive state of the enzyme. The pre-reactive state of the enzyme preserves the same two interactions, 34 which also appear in a previous structure of BamHI in the absence of metals. 35 An overlay of the enzyme in its apo form and in the pre-reactive state reveals that, upon DNA binding, Asp154 is displaced by 5.77Å to engage in an anion-p interaction with cytosine (ESI Fig. 1 †).   selectively interacts via anion-p over Asp (the second most abundant anion, 55 interactions, 27.9%), with both anions accounting for 90.3% of interactions. These results are in striking contrast with DNA results, where no selectivity for either Glu or Asp was observed (ESI Table 3 †). Apart from Glu and Asp, the other anions that appear from the search are chloride and sulfate, with 13 and 6 anion-p interactions, respectively. As opposed to the DNA results, the most common contact pair is Glu-A, which represents 32.5% of all the contact pairs. From the results gathered in ESI Table 4 † it can be deduced that A (92.8%), G (65.2%), and U (71.1%) interact preferentially with Glu. If we only consider surface (inter-chain) interactions by removing Cl À and SO 4 2À , all these percentages are moderately increased except for guanine which dramatically increases up to 96.8%. The statistical analysis unveils an unexpected preference of cytosine to interact with Asp: 90.9% of cytosine forms anion-p interactions with the amino acid. This preference is reciprocal because 72.7% of Asp is found in anion-p contacts with cytosine. These results are also supported by the expected small numbers of Glu-A and Asp-C pairs. This enrichment induces a signicant reduction of pairs of Asp with the purine bases adenine and guanine, yet it does not affect the formation of complexes with uracil. Chloride shows the highest selectivity with all 13 anions interacting with the guanine ring, as can be also inferred from the comparison of the expected and actual amounts for the Cl-G pair. All these results are in stark contrast with the DNA results since in RNA there is not a predominance of pyrimidine over purine bases. These impaired selectivities can only be justied by differences between the nucleic acids. On the one hand, they might depend on the conformational effect derived from C3 0endo (DNA) or C2 0 -endo (RNA) sugar puckering that leads to different distances and twist angles between two subsequent base pairs along the helical axis. On the other hand, it needs to be born in mind that the unbalanced amount of anion-p interactions identied in RNA and DNA within the PDB (Table  1) may lead to biased conclusions.
The histograms of the equilibrium distance and angle are shown in Fig. 2. First, we observe that the distribution of the interaction distances in RNA and DNA is remarkably distinct. Indeed, d À is much shorter in RNA (D d À ¼ À0.39Å). If we pay attention to the different anions separately, the shortest d À with a small a À is found for Asp, with 3.85Å and 67.1 . In contrast to DNA, it is worth mentioning that the interactions with Glu have a much longer d À (4.21Å) along with a much wider a À (55.5 ) than Asp.
The orientation of Asp and Glu carboxylates with respect to the aromatic system (ESI Fig. 2 †) shows that a face-to-face approach is dominant, with a value for the median angle of 20.3 . The angle is smaller than in DNA, indicating a strong reinforcement of the anion-p interaction when it interacts planar to RNA bases.
In Fig. 3b we show an example of an anion-p interaction for RNA-U1A human protein binding, which is critical in the transcription process of genetic information. 36 Experimentally it is known that the C-terminal domain (which includes the Asp92 residue) is crucial for the stability of the RNA-U1A complex. 37 It has been demonstrated 38 that this binding mechanism is primarily based on an anion-p interaction between Asp92 and C12, which seems to be critical in controlling the locking/unlocking binding mechanism in the RNA-binding specicity of human U1A protein.
2.1.3. Interactions involving proteins. The results obtained from the search of anion-p interactions with the side chains of histidine (His), phenylalanine (Phe), tyrosine (Tyr), and tryptophan (Trp) as found in proteins are summarized in Table 2 and  ESI Tables 6 and 7. † We observed 82 456 interactions in 38 027 unique PDB structures, 80 346 of which corresponded to interactions exclusively involving amino acids. It is noteworthy that these results imply that 61.3% of all the processed structures in the PDB (62 033 structures, Methods) contain anion-p interactions as classied herein. The ratio of Glu compared to Asp in such interactions is slightly greater (46 132 interactions, 55.9%) than the total percentage of Glu in our working PDB set (51.7%, ESI Table 8 †), indicating a modest selectivity for this anion to be entangled in anion-p interactions. These results are similar to those obtained for RNA, though the selectivity for Glu is higher in RNA than in proteins. In addition to Glu and Asp, the rest of the identied anions involved in the interactions include sulfate, chloride, phosphate, with 1055 (1.3%), 627 (0.8%), 261 (0.3%), respectively, and minute amounts of nitrate, carbonate, bromide, and uoride. The relative amounts of sulfate, chloride, and phosphate anions interacting with psystems are larger than the relative amounts of these anions in the PDB (0.9%, 0.4%, and 0.1%, respectively), indicating an enrichment of those anions in the p-interactions with proteins.
The most abundant aromatic amino acid in the PDB is Phe (35.2%), followed by Tyr (31.0%), His (21.0%), and Trp (12.8%) (ESI Table 8 †). For His and Phe this distribution varies when only those amino acid side chains that are involved in anion-p interactions are taken into account (ESI Table 6 †): His is the most abundant residue, which appears in 29.9% of the cases, in detriment of Phe (29.1%). This is consistent with the existence of protonated imidazole moieties at physiological pH thus Table 2 The most common binary anion-p interactions in proteins. Pairs of interacting residues and their occurrences in number (amount), percentage (%), and residues' representativities for each distinct anion (%A À ) and p-system (%p). The expected amount of each interaction pair, according to its relative abundance, and the statistical significance are shown (Methods). Statistical significance is denoted with ** for p-value < 0.01, and *** for p-value < 0.001 favoring the electrostatic contribution of the anion-p interaction.
There is no contact pair that stands out from the rest ( Table 2), in contrast to what is observed in DNA and RNA (ESI Tables 3 and 4 †): Glu-His, Glu-Tyr, and Glu-Phe are the most numerous pairs, approximately contributing 16% each. The corresponding pairs of Asp with His, Tyr, and Phe account for around 12% each. From the inspection of the results in Table 2, it can be reasoned that Glu and Asp have the same preference for the aromatic rings of Phe, His, Tyr, and Trp for establishing anion-p interactions. For example, 29.8% of both Glu and Asp establish anion-p interactions with His. Almost identical percentages are obtained for Phe and Tyr, regardless of whether the anion is Glu or Asp. However, a closer look at the absolute values reveals subtle differences: Phe preferably attracts Asp instead of Glu, i.e. the abundance of Glu-Phe interactions is signicantly lower than expected, which is compensated by a higher occurrence of Asp-Phe pairs. Conversely, Trp presents a tendency to interact with Glu instead of Asp.
In Fig. 2 the histograms of the equilibrium distance and angle are shown. First, it is worth noting the larger d À in proteins compared to RNA (4.36Å and 4.14Å, respectively). To some extent this difference could be anticipated because the nucleobases are more p-acidic than the phenyl, imidazole, and indole rings, as can be inferred from the electrostatic potential surface maps shown in Fig. 4. However, d À for DNA is a little bit longer than for proteins (D d À ¼ 0.17Å), which may be due to a bias resulting from the small amount of DNA data (69 interactions). If we partition the results in terms of anion contributions, both Glu and Asp have very similar d À values (4.38Å and 4.33Å, respectively), whereas nitrate, carbonate, and phosphate exhibit the shortest d À (3.93Å, 3.96Å, and 4.04Å, respectively). In addition, if we analyze the results in terms of amino acid contributions, the shortest d À is found for His (4.16Å), as expected from electrostatic considerations (Fig. 4), whereas very similar yet longer distances are found for Phe, Trp, and Tyr (4.41 A, 4.40Å, and 4.44Å, respectively). The strong electrostatic interactions of His are analogously directing its engagement with sulfate and phosphate: SO 4 /PO 4 -His pairs are favored compared to SO 4 /PO 4 -Tyr and SO 4 /PO 4 -Phe.
Analogously to DNA and RNA, the orientation of Asp and Glu carboxylates with respect to the interacting aromatic system (ESI Fig. 2 †) reveals a clearly dominant face-to-face approach, with a value for the median angle of 31.1 .
In Fig. 3c a partial view of one of the two active sites of the complex of the pyridoxal-5 0 -phosphate (PLP)-dependent catalytic antibody 15A9 with a phosphopyridoxyl-L-alanine (PPL-L-Ala) substrate analogue is illustrated. 39 The antibody catalyzes, in addition to Schiff base formation, transamination, and a-, belimination reactions. 40,41 As shown in the gure, Tyr94 is interacting both with the substrate via hydrogen bonding and Glu58 via anion-p contact.
2.1.4. Interactions involving protein surfaces. The results when only inter-chain anion-p interactions are considered, either as part of the same protein or in protein-protein complexes, are summarized in ESI Tables 9 and 10. † We retrieved 5395 surface interactions of a total of 82 456 interactions. Therefore a remarkable 6.5% of all the anion-p interactions in proteins are established between amino acids of different chains of one or more proteins, leading to the conclusion that anion-p contacts play an active role in protein interface recognition and have an underestimated contribution in protein-protein interactions. The percentage of Glu is similar, though slightly greater, than that observed in the general protein search, with Glu being the major anion (60.3%). The abundance of the four aromatic amino acids varies with respect to those obtained from the general search, along with their relative ordering. In chain interfaces the most abundant amino acid is Tyr (His in the general search), appearing in 35.0% of the cases, followed by Phe (28.7%), His (28.5%), and Trp (7.8%). As a consequence of these results, the most common amino acid pair is Glu-Tyr with an occurrence of 21.5% (ESI Table 9 †).
The analysis of the geometrical parameters for the interchain anion-p interactions yields results similar to those obtained for the general protein search.

Ternary anion-p interactions in DNA
2.2.1. Anion-p-cation. The analysis of the anion-p-cation interactions in DNA could not be performed because the search returned no successful hits (Table 1).
2.2.2. Anion-p-p. The search for anion-p-p interactions in DNA returned 59 successful hits, 38 of which are the result of binary anion-p interactions forming triads with an additional DNA base (ESI Table 11 †). Therefore 55.1% of the anion-p interacting aromatic systems from the parent binary anion-p interaction are further involved in p-p interactions with DNA. The remaining 21 hits are of the anion-p(protein)-p(DNA) type. It is worth mentioning that all 59 aromatic interactions are of the p-stacking type. In terms of anion and nucleic base representativities, the results are similar to those obtained for the parent anion-p search (ESI Tables 3 and 11 †). However, the partition of the p-donor systems into those that are central (p c ) and terminal (p t ) gives insights into the specic attraction of the aromatic groups for the central and terminal positions of the ternary anion-p-p complexes and their combined cooperativity effects: adenine (1 hit) and guanine (0 hits) barely establish anion-p interactions, yet they are attracted to anion-p interactions to form ternary complexes (19 and 12 hits, respectively). Conversely, cytosin (1 hit) and His (0 hits) barely establish p-p interactions, and rather participate in ternary complexes with DNA occupying the central location (18 hits each). Intriguingly, His only interacts with Asp to form triads with DNA, despite the higher amount of Glu-His anion-p interactions in proteins compared to Asp-His (Table 2). However, the purine bases are important contributors to the pp interactions. In fact, adenine, guanine, and especially thymine represent, respectively, 32.2%, 20.3%, and 44.1% of all the aromatic rings entangled in p-p interactions (ESI Table  12 †). The study of geometrical parameters for the anion-p and p-p interactions revealed a d À (4.61Å) similar to that of the corresponding binary interaction, and a median p-p equilibrium distance ( d p-p ) of 3.60Å. A representative example of an anion-p-p interaction is illustrated in Fig. 3a, where the BamHI type II restriction endonuclease is shown bound to DNA. 34 In the gure, in addition to the two anion-p interactions with cytosine described above (section 2.1.1), we observe how these p-systems simultaneously establish p-p interactions with thymine. The results show a d À (3.57Å) considerably shorter than that reported for the parent Glu-p binary interaction (4.21Å). This result suggests that the anion-p interaction is substantially strengthened when the p-system additionally interacts with a cation on the opposite side of the ring, leading to a cooperative effect. Previous studies have shown similar cooperativity effects in systems where either benzene or hexauorobenzene simultaneously interacts with an anion on one side of the ring plane and a cation on the opposite side: 42,43 in the present study the median cation-p distance ( d + ) is 3.77Å, which could be considered quite long. However, it has to be borne in mind that herein this geometrical parameter is not dened as the minimum distance between the cation and the ring plane (ESI Table 1 †). Moreover, the median cation-p angle ( a + ¼ 72.8 ) is larger than the corresponding angle for the anion-p interaction ( a À ¼ 59.5 ). This is in agreement with the different directionality of both interactions: in anion-p complexes the displacement of the anion along the parallel plane does not imply such a large interaction energy loss (#7%) compared to the cation-p complexes (#23%). 44 A representative example of an anion-p-cation interaction is illustrated in Fig. 5a. 45 Pseudouridine (J) synthases catalyze the isomerization of specic uridines in cellular RNAs to pseudouridines and may function as RNA chaperones. The TruB cocrystal structure reveals that this J synthase gains access to its substrate by ipping out nucleotide 55 of tRNA. In addition TruB binding ips out two additional nucleotides, namely C56 and G57, which may keep the ribose of U55 from ipping back prematurely before reattachment to the rotated nucleobase. Within this context, the anion-p interaction depicted in the gure is formed between one of the multiple sulfate anions that appear in the crystal and the ipped-out G57 which, at the same time, is cation-p interacting with the guanidinium side chain of Arg141.
2.3.2. Anion-p-p. The search for anion-p-p interactions in RNA returned 26 hits out of 197 anion-p interactions, i.e. 13.2% of the anion-p interacting aromatic systems are involved in p-p interactions (ESI Tables 15 and 16 †). All identied interactions are of the p-stacking type. As in DNA, chloride is missing in the ternary results. The relative amounts of the anions show signicant changes with respect to the parent anion-p interaction results (ESI Table 4 †), where Glu was the major anion: There is a high selectivity towards Asp (65% of the interactions). The relative weight of p c is also quite different from that found for the parent binary interactions, now yielding C as the major contributor (14 interactions), followed by U (4 interactions), and G (2 interactions). Intriguingly A is rarely observed in the central location of ternary complexes (1 interaction) despite its high abundance in the binary systems. Conversely, adenine is the most common p t , with 13 interactions, indicating its affinity to form ternary complexes with already established anion-p systems. Consequently, Asp-C-A is the most abundant ternary contact, representing almost 50% of all triads. In addition, and similar to the parent binary interaction, 76.5% of Asp is anion-p interacting with C. U establishes 6 p-p interactions as a terminal moiety, mainly with His in the Glu-His-U triad.
In Fig. 3b we show a snapshot of RNA recognition by U1A human protein. 36 As previously described, the binding mechanism is primarily based on an anion-p interaction between Asp92 and C12. Additionally, the cytosine is p-p interacting with an adenine p-system (A11), suggesting electronic cooperativity effects in the locking/unlocking RNA-binding mechanism.   Table 6 †). However, the distribution of p c is similar to that found in the binary anion-p interactions.
As in RNA, Arg is the most abundant cation (68.9%) followed by Lys (29.5%). This is an interesting result because the total amount of Lys (1 816 877) in the PDB (ESI Table 8 †) is slightly larger than that of Arg (1 631 104). Therefore the central aromatic system shows a clear preference for guanidinium rather than ammonium moieties. Similarly, several cation-p studies by Gromiha and coworkers show that Arg has a higher preference to form cation-p interactions than Lys and that the roles of these cation-p interactions are different from other noncovalent contacts in the stability of protein structures. [46][47][48] The presence of Na + and K + is scarce, with only 35 and 4 interactions, respectively. The p-Na + interactions appear in the Asp-Phe-Na triad (ESI Table 18 †) and were retrieved from X-ray diffraction studies of b-galactosidase from E. coli. This enzyme catalyzes hydrolytic and transgalactosidic reactions on b-D-galactopyranosides. Likewise, the four K + contacts are found in N 2 -(2-carboxyethyl)arginine synthase (CEAS), an unusual thiamin diphosphate (ThDP)-dependent enzyme that catalyzes the committed step in the biosynthesis of the b-lactamase inhibitor clavulanic acid in Streptomyces clavuligerus. 49 Reaction mechanisms proposed for CEAS 50-52 imply a ThDP-mediated catalysis where Glu57 is actively involved as a proton donor-acceptor. In the complex formed with the substrate analog dipotassium L-(+)-tartrate (Fig. 5b), Glu57 is anion-p interacting with His56, which in turn is cation-p interacting with K1501. This cooperatively-strengthened anion-p interaction, which went unnoticed by the authors, might be relevant to arrange Glu57 towards ThDP. Remarkably, K + is perfectly accommodated between two His of different chains, His56C/His56D and His56A/His56B, in a space that is occupied by water molecules in the native state of the enzyme.
The results of all ternary contacts (Table 3 and ESI Table 18 †) reveal four predominant triads (with abundances ranging from 10.2% to 11.4%), all comprising arginine as cation (Asp-His-Arg, Asp-Tyr-Arg, Glu-Tyr-Arg, and Glu-Phe-Arg). The rest of the contacts represent less than 8% each. If we compare these results with those of the binary interaction (Table 2), we observe that the relative weight of each anion-p contact has changed: the percentage of the interaction of Glu with Trp is 12% bigger in the ternary search, in detriment of the rest of the amino acids. Conversely, the percentage of the interaction of Asp with Tyr and His has been increased in detriment of the interaction with Phe and Trp. Regarding the p-cation contact pairs, Tyr-Arg alone represents 21.6% of all pairs, followed by His-Arg (19.3%), Phe-Arg (15.6%), and Trp-Arg (10.7%).
The d À value is 4.38Å, which is very similar to the reported value for the parent binary interaction. The d + value (3.92Å) is Table 3 The most common ternary anion-p-cation interactions in proteins. Triads of interacting residues and their occurrences in number (amount), percentage (%), and residues' representativities for each distinct anion (%A À ), central p-system (%p c ), and cation (%C + ). The expected amount of each interaction pair, according to its relative abundance, and the statistical significance are shown (Methods). Statistical significance is denoted with ** for p-value < 0.01, and *** for p-value < 0.001 shorter than the related d À , consistent with the smaller radius of cations compared to anions. As expected, a + (68.3 ) is larger than the corresponding a À (59.5 ), as previously observed in RNA, which is in agreement with the different directionality of both interactions. 44 The comparison and statistical analysis of the collected and expected amounts of each triad, based on the relative abundance of each interaction partner within the data set, provide interesting insights into otherwise hidden details on cooperativity effects for the ternary anion-p-cation gathered in Table 3. First, the most common ternary complex, Asp-His-Arg, is signicantly enriched in detriment of the related Glu-His-(Arg/ Lys), indicating again a higher preference of the Asp-His complex to form triads despite its lower abundance in the parent binary interaction ( Table 2). Geometric data supports a strong synergistic effect for Asp-His-(Arg/Lys) compared to the Glu parent ternary complexes: the d + value is substantially increased to 3.83Å (D d + ¼ 0.33Å) and 4.25Å (D d + ¼ 0.75Å) in Glu-His-Arg and Glu-His-Lys, respectively, and a À is reduced to 66.0 (D a + ¼ À11.5 ) and 64.0 (D a + ¼ À13.5 ), respectively. We studied in detail this phenomenon by partitioning the Asp/ Glu-His contacts in such interactions into contiguous and noncontiguous contacts with respect to the amino acid sequence. Surprisingly, no synergistic effect appears in contiguous contacts, compared upon formation of triads (D d + ¼ 0.05 A and D a + ¼ À2.3 ), thus reinforcing the hypothesis of a strong cooperative energy beyond structural and geometric hindrance. Second, the Asp-Phe anion-p interaction favors ternary complexes with Arg rather than Lys, suggesting that the cooperativity effects in the former triad are of a greater extent. This hypothesis is also supported by an increase in a À (D a À ¼ 7.9 ). Third, we detect a signicant accumulation of Glu-Trp-(Arg/ Lys) compared to the parent Asp triads, suggesting that in such complexes Trp has a preference to interact with Glu instead of Asp (Table 3).
2.4.2. Anion-p-cation in protein surfaces. Protein-protein interactions (PPIs) are involved in a wide range of biological processes within the cell, including signal transduction and allosteric regulation of enzymes, through intricate networks of strong and weak transient interactions. 53 Hence, understanding the physical relations between proteins is of pivotal importance to comprehend the molecular mechanisms of cell regulation at the atomic level. Remarkably, we identied hundreds of anionp-cation contacts in inter-chain surfaces (Table 4 and ESI Table  20 †). Compared with the ones obtained from the general search, Glu is more present than Asp (ESI Table 19 †), and the abundance of Phe has been increased by ca. 25% up to 43.2% in detriment of His and Tyr, while the relative amounts of Arg and Lys are kept more or less constant. Glu-Phe-Arg and Asp-Phe-Arg are the most numerous triads, accounting for 34.1% of all contacts (only 15.6% in the general ternary protein search, Table 3) and representing 78.9% and 47.9% of Phe and Arg, respectively. Therefore, the anion-Phe-Arg recognition motif seems to play a very important role in inter-chain interactions. It is worth mentioning, too, the abundance of Glu-Trp-Lys. Another point is that Glu and Asp have different preferences for anion-p interaction with aromatic amino acids when only inter-chain interfaces are considered: the percentages of the interaction of Glu with Phe and Trp are bigger, in detriment of the interactions with Tyr. The amount of Asp in the anion-p interactions with Tyr, Trp, and especially His decreases in benet of the interaction with Phe, which is dramatically increased to become the most important amino acid. The geometrical parameters of triads in interfacial interactions are similar to those obtained from the general ternary search.
In Fig. 6a we show an example of an anion-p-cation interaction occurring at the interface of the protein arginine methyltransferase 5 (PRMT5) in contact with methylosome protein 50 (MEP50). 54 PRMT5 symmetrically di-methylates the twoterminal u-guanidino nitrogens of arginine residues on substrate proteins, including histone tails, hence it is involved in cell signaling and gene regulation. The function and speci-city of PRMT5 is regulated by a multimeric complex, a core component of which is MEP50. The gure illustrates that Glu276 from MEP50 is engaged in an intra-molecular anion-p interaction with Phe299, which in turn is recognized by Arg62 at the surface of PRMT5. Therefore the resulting anion-p-cation triad at the interface of these two proteins may play a role in their mutual recognition and the subsequent signal transduction process.
2.4.3. Anion-p-p. The search for anion-p-p interactions in proteins returned 2945 successful hits out of 82 456 anion-p interactions, meaning that 3.6% of the anion-p interacting aromatic systems are involved in p-p interactions (Table 5 and  ESI Tables 21 and 22 †). Although the relative amount of anions and the weights of the central aromatic moieties are similar to those found for the parent binary interactions (ESI Table 6 †), it is worth noting that the decrease and increase of central His Table 4 The most common ternary anion-p-cation surface interactions in proteins. Triads of interacting residues and their occurrences in number (amount), percentage (%), and residues' representativities for each distinct anion (%A À ), central p-system (%p c ), and cation (%C + ). The expected amount of each interaction pair, according to its relative abundance, and the statistical significance are shown (Methods). Statistical significance is denoted with * for p-value < 0.05 and Trp is ca. 7%, respectively, Tyr (32.1%) being the most frequent central amino acid. If we consider separately the anion-p contacts by amino acid we observe the following results: when His is the central amino acid, the most abundant anion is Asp (53.4%). However, Glu is the most abundant anion when interacting with Phe (58.8%), Trp (57.1%), and especially Tyr (63.2%). The side chain of Phe is the most common terminal aromatic system involved in p-p interactions, accounting for almost 50% of these interactions, followed by Tyr (23.5%), His (16.9%), and Trp (15.0%).
In Fig. 7a we show a histogram of the ring-to-ring angle of the p-p interactions in proteins, a geometrical parameter that gives information regarding the relative orientation of the terminal aromatic ring with respect to the central p-system. The histogram reveals two well-dened, asymmetrically represented states which can be easily associated with p-stacking (from 0 to 20 ) and T-shaped (from 70 to 90 ) interactions (Fig. 1). Remarkably, and in contrast to anion-p-p triads involving nucleic acids, the T-shaped interaction accounts for 78.8% of contacts. In addition, if we dissect the incidence of the T-shaped and p-stacking interaction depending upon the aromatic side chain of the amino acids, interesting results are found: the ring less involved in T-shaped interactions is the imidazole moiety of His (54.2%), despite its higher polarity (Fig. 4), in favor of the phenyl moiety of Phe (87.5%). The results for Trp and Tyr are 80.0% and 79.4%, very close to the mean value of 78.8%.
From the inspection of the results in Table 5 it is worth emphasizing that Glu-Tyr-Phe is by far the most frequent recognition pattern, representing 14% of all triads. It involves almost half (43.3%) of Tyr in anion-p interactions, 31.1% of the terminal p-p interacting Phe, and 24.7% of Glu. We calculated the expected abundance and signicance of each triad  Table 5 The most common ternary anion-p-p interactions in proteins. Triads of interacting residues and their occurrences in number (amount), percentage (%), and residues' representativities for each distinct anion (%A À ), central (%p c ), and terminal (%p t ) p-systems. The expected amount of each interaction pair, according to its relative abundance, and the statistical significance are shown (Methods). Statistical significance is denoted with *** for p-value < 0.001  (Methods). As inferred from the results, there is a signicant enrichment of Glu-Tyr-Phe, Glu-Phe-Tyr, Glu-Trp-His, Glu-Trp-Trp, Asp-His-Trp, and Asp-His-His, that is to say, on the one hand those triads where the pairs between Tyr and Phe are interacting with Glu and, on the other hand, those where all combinations of His and Trp are interacting with Asp and Glu as above mentioned: Glu with Trp and Asp with His. On the contrary, we observe a signicant underrepresentation of other combinations, namely Glu-His-Phe, Glu-Trp-Phe, Glu-Tyr-His, Glu-Tyr-Trp, Glu-Phe-Trp, and Asp-Tyr-His.
Comparing the results of the binary interaction ( Table 2) with those of the ternary contacts shows that Glu and Asp do not have the same preference for the aromatic rings to establish anion-p interactions. Moreover, the relative weight of each anion-p contact has changed: the percentages of the interaction of Glu with Trp and Tyr are bigger in the ternary search, in detriment of the percentage of the interaction with His, which dramatically decreases from 29.8% to 18.4%. For Asp, the amount of Asp-Trp also increases but, in this case, in detriment of the Asp-Phe contact. Phe and Tyr contribute largely to p-p contact pairs: a very remarkable 60.5% and 43.5% of p c Tyr and p t Phe, respectively, are found in the most numerous Tyr-Phe pair. Additionally, Phe-Phe and Phe-Tyr gather 41.7% and 39.6% of the central Phe and terminal Tyr, respectively. Substantial amounts of central Phe (35.4%), Trp (36.1%), and His (32.7%) and terminal His (36.4%) and Trp (32.2%) are also found in the Phe-Tyr, Trp-Phe, His-Phe, His-His, and His-Trp, respectively.
The analysis of the geometrical interaction parameters yields a d À value (4.33Å) slightly shorter than that reported for the binary search in proteins (4.36Å), suggesting that the p-p interaction slightly favors the anion-p interaction. If we break down the results in terms of amino acid contributions, interesting results arise. First, His is the only amino acid that does not suffer variations in d À (4.16Å for both binary and ternary searches). For the rest of the amino acids, d À values in the ternary search are shorter than those in the binary search (from 4.41Å to 4.37Å for Phe, from 4.40Å to 4.30Å for Trp, and from 4.44Å to 4.37Å for Tyr). The fact that the p-p interaction slightly favors the anion-p interaction for all amino acids but His could be due to dispersion effects, the contribution of which is larger in p-p interactions involving bigger, more polarizable arenes. 55 Another aspect worth noting is that there are differences in d À depending on whether p t is engaged in p-stacking 2.4.4. Anion-p-p in protein interfaces. We also collected the inter-chain contacts in proteins and compared them with those retrieved from the general ternary search (Table 6 and ESI Tables 23 and 24 †). The percentage of Glu and Asp, and of the central aromatic amino acids is very similar (ESI Table 21 †). However, the abundance of the terminal amino acids varies: the content in Phe increases to 53.7% (DPhe ¼ 9.1%), in detriment of the content in Tyr (DTyr ¼ À3.5%) and Trp (DTrp ¼ À5.4%).
In general, we observe that Glu and Asp have different preference to interact via anion-p with the aromatic amino acids: the percentages of the interaction of Glu with His and Phe are bigger in protein interfaces, in detriment of the percentage of the interaction with Trp. Conversely, for Asp the anion-p interactions with His, Phe, and Tyr decrease in benet of the interaction with Trp, which becomes the most important amino acid. Regarding the p-p contact pairs, Tyr-Phe is the most numerous pair, just as in the general ternary search. The main differences appear in the p-p contact pairs formed by either Trp or His: Phe-Trp was not detected and Trp-His (3 hits) is rare. On the other hand, Trp-Phe and His-His are quite abundant with important contributions of central Trp (72.3%) and terminal His (54.2%).
The comparison of the geometrical parameters of anion-pp contacts in proteins and in peptide surfaces revealed a longer d À value for the latter ( His exhibits the shortest d À (4.19Å), as in the general search, followed by Phe (4.40Å), Trp (4.50Å), and Tyr (4.70Å). In contrast to the tendency observed in the previous search, the differences in d À depending on whether the terminal aromatic ring is engaged in p-stacking or T-shaped interactions are very small. However, the histogram represented in Fig. 7b shows that, also in contrast to the general search, the interface T-shaped interactions are unevenly distributed along the considered angles: the central bin, comprising 78-82 , contains 33 hits, whereas the 70-74 and 82-86 bins each contain double that amount (66 and 67 hits, respectively).
The statistical analysis of the respective abundances sheds light on the characteristic preferences among amino acids when interacting at protein interfaces (Table 6 and ESI Table 24 †). First, there is a remarkable enrichment in Asp-Trp-Phe (41 hits Table 6 The most common ternary inter-chain anion-p-p interactions in proteins. Triads of interacting residues and their occurrences in number (amount), percentage (%), and residues' representativities for each distinct anion (%A À ), central (%p c ), and terminal (%p t ) p-systems. The expected amount of each interaction pair, according to its relative abundance, and the statistical significance are shown (Methods). Statistical significance is denoted with ** for p-value < 0.01, and *** for p-value < 0.001 out of 110 contacts in proteins, Table 5), i.e. 37.3% of these interactions occur between amino acids of different peptide chains. As we identied 191 Asp-Trp binary interactions in peptide surfaces (ESI Table 9 †), 21.5% of them are involved in this triad. This enrichment is compensated by a signicant loss of the tandem triad Glu-Trp-Phe (6 hits), which is also underrepresented in the general search. Second, there is an enrichment of Glu-His-His as an inter-chain contact as well (23 hits, 37.1% of all contacts in proteins). Intriguingly, the corresponding Asp ternary complex is underrepresented at the interfaces of proteins (9 hits) despite being over represented in the general search (117 hits, Table 5). Consistently, the interchain anion-p interaction between the His dimer and Glu is remarkably shorter than that with Asp (D d À p-stacking ¼ À0.54Å), suggesting a strong anion-specic cooperativity effect. Third, the binary interactions (Asp/Glu)-His weaken upon forming ternary complexes with Phe: on the one hand they are signicantly underrepresented (ESI Table 24 †); on the other hand there is an increase in d À upon formation of the triads, particularly for Glu (D d À Hence, not all combinations of anions and aromatic side chains that form triads in proteins present cooperativity effects. Consistently, those triads appear rarely in crystal structures. Last, the results for the Glu-Trp-His triad are especially striking: it is enriched in the general ternary protein search (92 hits, Table 5), and yet it is underrepresented at the interfaces (1 hit), i.e. less than 0.5% of surface Glu-Trp pairs are involved in ternary interactions with His (231 hits, ESI Table 9 †). Fig. 5c depicts a snapshot of the active site of the Plasmodium falciparum glutathione S-transferase (PfGST). 56 GSTs catalyze the conjugation of glutathione with a wide variety of hydrophobic compounds, generally resulting in nontoxic products that can be readily eliminated. PfGST is highly abundant in the parasite, its activity has been found to be increased in chloroquine-resistant cells, and it has been shown to act as a ligandin in parasitotoxic hemin. Thus, the enzyme represents a promising target for antimalarial drug development. In the gure we observe an anion-p interaction between Glu120 from one monomer and the phenyl side chain of Phe10 from a second monomer. Furthermore, Phe35 interacts via a T-shaped intrachain contact with Phe10 giving rise to an anion-p-p interaction. Formate 1, that presumably mimics the glycyl carboxylate of glutathione, is interacting with Glu120 suggesting that this Glu, that is entangled in an anion-p interaction, might be important for the catalytic activity of the enzyme. Additionally, formate 2 presumably mimics the glutamyl carboxylate of glutathione. In this example, the anion-p-p interaction might not only provide functional assistance, but it could also be important for the successful crystallization process of the protein as it stabilizes its dimeric form.
In Fig. 6b we show a second example of an anion-p-p interaction involved in protein-protein recognition, which occurs at the interface of interleukin-1b (IL-1b) in contact with the highly specic IL-1b monoclonal antibody canakinumab. 57 IL-1b is a key orchestrator in inammatory and immune responses forming a heterotrimeric signaling-competent complex with IL-1-specic receptor proteins. The antibody neutralizes the signal transduction by reducing the affinity of IL-1b for the complex in a competitive inhibitory manner. The gure illustrates that Glu64 from IL-1b is engaged in an anionp interaction with Tyr50 from the antibody. The binary interaction is cooperatively strengthened by forming a triad with His34 in a T-shaped p-p contact. Thus, the interfacial anion-pp interaction contributes to the recognition process and the stabilization of the IL-1b:canakinumab complex.

Conclusions
Thousands of anion-p contacts have been detected from a large-scale analysis of the protein data bank, revealing selectivities among anions, cations, and p-systems not yet reported in a biological context. Due to their abundance, Asp and Glu are found in the vast majority of anion-p interactions with a preferred close-to-parallel orientation of their carboxylate with respect to the interacting aromatic system. For nucleic acids different results are obtained: in DNA there is no selectivity towards either Glu or Asp whereas Glu is more present in RNA. In addition Asp is prone to interact with cytosine and thymine and Glu with thymine in DNA, whereas in RNA Asp and Glu prefer cytosine and adenine, respectively. Anion-p distances also show different trends, since they are shorter for Glu than for Asp in DNA whereas the opposite is observed in RNA. For proteins, a very remarkable 61.3% of all processed PDB structures present anion-p interactions, where Glu is the major anion and His the most common amino acid. However, at interchain contacts and protein-protein interfaces Tyr is more abundant than His at the expense of Trp. Importantly, the anion-p recognition pattern in proteins varies when considering only inter-chain interactions.
Remarkably, hundreds of cation-p, p-stacking, and T-shaped interactions have been observed on the opposite side of the aromatic ring involved in anion-p interactions, a fact that might lead to cooperativity effects. Concerning anion-p-cation interactions in RNA, the Glu-A-Arg triad represents 87% of all contacts, whereas in proteins, Glu and Asp have very similar contributions and Arg is the most abundant cation (69%). When only inter-chain contacts are considered, the anion-Phe-Arg pattern predominates. In anion-p-p interactions Asp is more abundant in nucleic acids in contraposition to the binary search, with Asp-His-T and Asp-C-A as the major contributors in DNA and RNA, respectively. In proteins, if p-p interactions are taken into account different anion-p recognition patterns are obtained when considering binary or ternary contacts: Tyr and Phe are the most abundant p-systems involved in anion-p and p-p interactions, respectively, which leads to Glu-Tyr-Phe being the most abundant triad. In peptide interfaces we have detected a signicant enrichment of Asp-Trp-Phe. We have also observed that T-shaped interactions are much more abundant than p-stacking interactions in proteins and that the anion-p equilibrium distance in triads is slightly shorter than that of the binary contacts, suggesting that the p-p interaction favors the anion-p interaction.
The reported results bring striking conclusions: overall, more than half of the biomolecular complexes analyzed contained at least one anion-p contact. In other words, there is one anion-p interaction for every 50 anionic residues in the PDB. Additionally, thousands of them were engaged in triads. Hence, anion-p interactions and the cooperativity that arises from ternary contacts are a common resource in molecular science, and may play an active role in protein folding and function, and nucleic acids-protein and protein-protein recognition by making a signicant contribution to the binding energy of protein complex formation and stabilization. Besides the mentioned biological roles, we present here examples of anion-p interactions and related triads involved in enzymatic catalysis, epigenetic gene regulation, antigen-antibody recognition, and protein crystallography.

Data collection
A multi-processor Python routine using Biopython 58 was designed to process the whole PDB database, identify the interactions of interest, and keep a record of any hits in a twostep manner: initially, each available PDB structure solved by X-ray crystallography or neutron diffraction with a resolution higher than 2.5Å (62 033 out of 89 395 PDB structures) was queried for anion-p interactions taking into consideration the distance and the angle between the partners (ESI Table 1 † for a comprehensive denition of centers and centroids for each amino acid, nucleic base, and ion taken into account). Resulting binary interactions were subsequently queried for cations or aromatic systems in close proximity in order to gather tertiary complexes, i.e. anion-p-cation and anion-p-p. Aromatic stacking interactions were considered in face-to-face, T-shaped, and parallel-displaced congurations. To reduce the number of redundant interactions found in different chains of the same structure, binary and ternary interactions arising from the same residue names and numbers in a PDB le were omitted.

Statistical analysis
The expected amount of each interaction was computed using the observed abundance of each partner in the interaction rounded to the closest integer, e.g. the expected amount of Glu-DT binary anion-p interactions in DNA (ESI Table 3  The statistical signicance of the difference between the expected and the observed amounts for each interaction was assessed by means of the Fischer's exact test of the corresponding contingency tables, as implemented in the statistical package R v3.0. 59 Statistical signicance is denoted in the manuscript with * for p-value < 0.05, ** for p-value < 0.01, and *** for p-value < 0.001.