Masaki Tsujimura*a,
Hiroshi Ishikita
bc and
Keisuke Saito
*bc
aDepartment of Advanced Interdisciplinary Studies, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan. E-mail: masaki.tsujimura@riken.jp; Fax: +81-3-5452-5083; Tel: +81-3-5452-5056
bDepartment of Applied Chemistry, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan. E-mail: ksaito@appchem.t.u-tokyo.ac.jp
cResearch Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8904, Japan
First published on 25th April 2025
Hydrogen bonds (H-bonds) between oxygen atoms, with the O–H bond donated to the acceptor O atom (Odonor–H⋯Oacceptor), are essential for stabilizing protein structures and facilitating enzymatic reactions. The dielectric and electrostatic environment of proteins, as well as structural constraints imposed by protein folding, influence the nature of H-bonds. In this study, we investigated how these factors affect H-bond distances in proteins. Analysis of 906 high-resolution protein structures (≤1.2 Å) from the Protein Data Bank revealed that H-bond distances for H-bonds with the same donor and acceptor groups are distributed around a value primarily determined by the pKa difference between these groups (ΔpKa) in water, with lower ΔpKa values leading to shorter distances. This correlation arises from enhanced electron redistribution from the H-bond acceptor to the donor in lower ΔpKa H-bonds, which increases the covalent character of the H-bond and decreases the H⋯Oacceptor distance. In contrast, H-bond distances are largely unaffected by whether the H-bond is buried in the protein interior or exposed to bulk water, as the strength of the electrostatic interaction between the donor and acceptor groups plays a minor role in determining distances. Furthermore, analysis of H-bonds in microbial rhodopsins using a quantum mechanical/molecular mechanical approach demonstrates that the protein environment primarily influences H-bond distances electrostatically by altering the ΔpKa of the H-bond, while structural constraints impose a secondary influence by altering Odonor–H⋯Oacceptor angles or H⋯Oacceptor distances without changing ΔpKa.
An H-bond can be considered an interaction between a Brønsted acid (H-bond donor) and a Brønsted base (H-bond acceptor).10 pKa is an indicator of the Brønsted acidity. Therefore, the pKa difference between the donor and acceptor groups (ΔpKa)10 reflects the characteristics of the H-bond. A lower ΔpKa leads to a higher binding enthalpy,11 a lower Odonor–H stretching vibrational frequency,12,13 and a higher 1H NMR chemical shift.12–16
A positive correlation between O⋯O distances and ΔpKa values was reported for 68 H-bonds of small compounds in neutron diffraction structures, with a coefficient of determination R2 = 0.86.16,17 Similarly, positive correlations between N⋯O or N⋯N distances and ΔpKa values were reported (R2 = 0.74 and 0.69 for 86 and 29 H-bonds, respectively).17 Deviations from these correlations mainly arise from additional H-bonds of the donor and acceptor groups,17,18 as well as geometric constraints in crystals.17 O⋯O distances are almost independent of the solvent dielectric environment, as the 1H NMR chemical shift for the same H-bond remains largely unchanged across different solvents (chloroform, acetone, or water), and the calculated equilibrium O⋯O distance of the [HOH⋯−OH] H-bond remains nearly constant among solvents with dielectric constants ranging from 5 to 78.16,17
The present study aims to elucidate the determinants of H-bond distances in proteins, focusing primarily on O⋯O H-bonds, but also examining N⋯O and N⋯N H-bonds. We begin by investigating small-compound H-bonds relevant to those in proteins to clarify the basis for the correlation between H-bond distances and ΔpKa in the absence of the protein environment.
However, in contrast to H-bonds in solution under comparable conditions, systematic analysis of H-bond distances in protein environments remains limited. A major challenge lies in the heterogeneity of the PDB, which contains structures determined under inconsistent conditions and resolutions. For instance, many structures resolved at >2.5 Å lack sufficient clarity to assign side-chain orientations or identify interacting water molecules. This heterogeneity has hindered meaningful comparisons of H-bond properties among proteins.
To overcome these limitations, we next focus exclusively on 906 high-resolution protein crystal structures (≤1.2 Å), in which both side chains and interacting water molecules can be reliably modeled based on electron density. By limiting our dataset to these high-resolution structures, we are able to clarify which factors in the protein environment determine H-bond distances. This classification further enables us to compare buried and solvent-exposed H-bonds and to elucidate how the dielectric properties of the protein environment influence H-bond distances. Finally, based on the insights obtained, we analyze H-bonds in microbial rhodopsins using a quantum mechanical/molecular mechanical (QM/MM) approach.
Natural bond orbital (NBO)28,29 energies were calculated for (i) water, methanol, phenol, and protonated acetic acid molecules donating an H-bond to a water molecule, and (ii) water, methanol, phenol, deprotonated acetic acid, protonated acetic acid, and N-methylacetamide molecules accepting an H-bond from a water molecule (Fig. S2, ESI†). NBO energies were obtained following geometry optimization using the density functional theory (DFT) method. To include long-range corrections in the DFT functional,30,31 the CAM-B3LYP functional32 was employed with the 6-31G**+ basis set. The CAM-B3LYP-related parameters α, β, and μ were set to the standard values of 0.19, 0.46, and 0.33, respectively.32 All calculations were performed using the NBO 5.0 program33 implemented in Jaguar.34
H-bonds with relative solvent accessibilities of <16% and ≥16% were classified as buried and exposed, respectively.37 Solvent accessibilities were calculated in the absence of crystal water molecules, using the DSSP program.38,39 Therefore, the absence of water molecules that are difficult to capture by crystallography does not affect this classification. Asymmetric units, the smallest portions of crystal structures, are deposited in the PDB. The entire crystal can be reconstructed by applying symmetry operations to the asymmetric unit. In the absence of neighboring asymmetric units, ∼10% of H-bonds that are buried in the entire crystal are calculated as exposed (Fig. S4 and Table S1, ESI†). Therefore, solvent accessibilities were calculated in the presence of neighboring asymmetric units. Residues of neighboring asymmetric units within 7 Å of the focusing asymmetric unit were included in the calculation of solvent accessibility. The 7 Å threshold was set to be longer than the sum of the longest atomic diameter (3.74 Å for the Cα atom) and the water probe diameter (2.80 Å) used in the DSSP program.38
The protonation pattern was determined using the electrostatic continuum model by solving the linear Poisson–Boltzmann equation with the MEAD program.47 The experimentally measured pKa values employed as references were 12.0 for Arg, 4.0 for Asp, 9.5 for Cys, 4.4 for Glu, 10.4 for Lys, 9.6 for Tyr,48 and 7.0 and 6.6 for the Nε and Nδ atoms of His, respectively.49–51 Dielectric constants were set to 4 for the protein interior and 80 for water. All calculations were performed at 300 K, pH 7.0, and with an ionic strength of 100 mM. The linear Poisson–Boltzmann equation was solved using a three-step grid-focusing procedure at resolutions of 2.5, 1.0, and 0.3 Å. Protonation patterns were sampled using the Monte Carlo method with the Karlsberg program.52
Geometries were optimized using a QM/MM approach. The restricted DFT method was employed with the B3LYP functional and the LACVP**+ basis set, using the QSite program.53,54 The QM region was defined as follows: (i) ground-state BR: retinal, side-chains of Lys216, Tyr57, Arg82, Asp85, Trp86, Thr89, Tyr185, and Asp212, and H2O-402, 401, and 406. (ii) N′-state BR: side-chains of Tyr57, Asp85, and Asp212, and H2O-401, 406, and 407. (iii) pHR: retinal, side-chains of Lys256, Ser78, Ser81, Tyr82, Arg123, Thr126, Trp127, Ser130, Tyr225, Asp252, and Tyr257, H2O-502, 503, and 504, and Cl−-401. (iv) KR2: retinal, side-chains of Lys255, Ser70, Arg109, Asn112, Trp113, Asp116, Tyr218, Asp251, and Ser254, and H2O-434, 437, 501, and 512. (v) ErNaR: retinal, side-chains of Lys246, Ser25, Ser60, Glu64, Arg98, Trp102, Asp105, Tyr215, Thr239, and Asp242, and H2O-503, 504, 532, and 542. All atomic coordinates were relaxed in the QM region. In the MM region, hydrogen atom positions were optimized using the OPLS2005 force field,55 while heavy atom positions were fixed. The protonation pattern of titratable residues in the MM region was implemented in the atomic partial charges. Vibrational frequencies were calculated at the same level of theory as the geometry optimizations. The calculated frequencies were scaled using a standard factor of 0.9614 for the B3LYP functional.56
Small compounds | In proteins | pKa as a donor (acid/conjugated base) | pKa as an acceptor (base/conjugated acid) | |
---|---|---|---|---|
a pKa value of methanol.19b Ref. 48.c Ref. 20. | ||||
(i) | Water | Water | 16 (H2O/OH−) | −2 (H2O/H3O+) |
(ii) | Alcohol | Ser, Thr | 16 (C–OH/C–O−)a | −2 (C–OH/C–OH2+)c |
(iii) | Phenol | Tyr | 10 (PhOH/PhO−)b | −6 (PhOH/PhOH2+)c |
(iv) | Carboxylic acid | Asp, Glu | 4 (COOH/COO−)b | 4 (COO−/COOH)b, −6 (COOH/COOH2+)c |
(v) | Amide C![]() |
Backbone, Asn, Gln | — | −1 (C![]() ![]() |
Even when the groups forming an H-bond are the same, several H-bond types can exist depending on (i) which group serves as the donor and (ii) the protonation states of the donor and acceptor groups. For instance, the donor/acceptor of an H-bond between water and a carboxyl group can be COOH/H2O, H2O/COO−, or H2O/COOH. Different H-bond types for the same pair can be distinguished for crystal structures of small compounds obtained from the Cambridge Structural Database (CSD).59 This is not the case for H-bonds in proteins, where the hydrogen atom positions are mostly unidentified. To clarify why H-bond distances are primarily determined by ΔpKa, we first investigate small-compound H-bonds relevant to those in proteins.
H-bond pair | Donor | Acceptor | ΔpKa (in water) | a (Å) |
---|---|---|---|---|
a Average O⋯O distances. Values for [water⋯water], [water⋯alcohol], and [water⋯phenol] pairs were taken from ref. 57. Values for the [water⋯carboxylic acid] pair were taken from ref. 58. Other values were taken from ref. 6. | ||||
Water⋯Water | H2O | H2O | 18 | 2.83 |
Water⋯Alcohol | H2O | C–OH | 18 | 2.83 |
C–OH | H2O | 18 | 2.75 | |
Water⋯Phenol | PhOH | H2O | 12 | 2.68 |
H2O | PhOH | 22 | 2.89 | |
Water⋯Carboxylic acid | COOH | H2O | 6 | 2.59 |
H2O | COO− | 12 | 2.77 | |
H2O | COOH | 22 | 2.82 | |
Alcohol⋯Alcohol | C–OH | C–OH | 18 | 2.78 |
Alcohol⋯Phenol | PhOH | C–OH | 12 | 2.73 |
C–OH | PhOH | 22 | 2.82 | |
Alcohol⋯Carboxylic acid | COOH | C–OH | 6 | 2.65 |
C–OH | COO− | 12 | 2.74 | |
C–OH | COOH | 22 | 2.81 | |
Alcohol⋯Amide C![]() |
C–OH | C![]() |
17 | 2.77 |
Phenol⋯Phenol | PhOH | PhOH | 16 | 2.80 |
Phenol⋯Carboxylic acid | PhOH | COO− | 6 | 2.64 |
COOH | PhOH | 10 | 2.67 | |
PhOH | COOH | 16 | 2.74 | |
Phenol⋯Amide C![]() |
PhOH | C![]() |
11 | 2.70 |
Carboxylic acid⋯Carboxylic acid | COOH | COO− | 0 | 2.54 |
COOH | COOH | 10 | 2.65 | |
Carboxylic acid⋯Amide C![]() |
COOH | C![]() |
5 | 2.60 |
![]() | ||
Fig. 1 Correlation between average O⋯O distances6,57,58 and ΔpKa for each H-bond type in small-compound crystal structures from the CSD (R2 = 0.89). Orange crosses, blue squares, and black triangles indicate H-bonds involving COOH, PhOH, and H2O/C–OH groups as donors, respectively. H-bonds involving the COO− group as acceptors are surrounded by open circles. |
To clarify the basis for the correlation between O⋯O distances and ΔpKa, we performed quantum chemical calculations of small-compound H-bonds. The O⋯O distance is correlated with ΔpKa for quantum-chemically optimized H-bonds in water solvent (Fig. S5a, R2 = 0.73, ESI†), which aligns with the correlation between the average O⋯O distance from the CSD and ΔpKa (Fig. 1).
A lower ΔpKa leads to a shorter O⋯O distance because as ΔpKa decreases, (i) the Odonor–H distance increases, and (ii) the H⋯Oacceptor distance decreases more significantly than the increase in the Odonor–H distance (Fig. 2a). Thus, the H⋯Oacceptor distance serves as the primary limiting factor for the O⋯O distance. For example, the O⋯O distances for the H-bond between water and protonated acetic acid (ΔpKa = 22, Fig. 2b), and between protonated acetic acid and deprotonated acetic acid (ΔpKa = 0, Fig. 2c), are 2.91 and 2.59 Å, respectively. Odonor–H distances are 0.98 and 1.04 Å, with the latter being 0.06 Å longer than the former. In contrast, H⋯Oacceptor distances are 1.97 and 1.55 Å, with the latter being 0.42 Å shorter than the former.
![]() | ||
Fig. 2 Relationships among H-bond geometries and ΔpKa. (a) Odonor–H distance (rO–H), H⋯Oacceptor distance (rH⋯O), and ΔpKa for quantum-chemically optimized H-bonds in water solvent (circles). Relationships between rO–H and rH⋯O, and between rO–H and ΔpKa, are shown as crosses on the rO–H–rH⋯O plane and the rO–H–ΔpKa plane, respectively. Circles and crosses are color-scaled according to the electron redistribution amount from the H-bond acceptor to donor, calculated from the change in Mulliken charges due to H-bond formation. The curve on the rO–H–rH⋯O plane indicates the correlation derived from the bond order model.60–64 The parameters r0OH and b in the model were set to 0.95 Å and 0.38 Å, respectively, to best fit the rO–H versus rH⋯O relationship (see the discussion in ESI†). The line on the rO–H–ΔpKa plane indicates the linear fit line (R2 = 0.76). (b) and (c) Energy diagrams of molecular orbitals for the H-bond: (b) between water and protonated acetic acid (ΔpKa = 22), and (c) between protonated acetic acid and deprotonated acetic acid (ΔpKa = 0). Gray bars indicate energies of the donor O atom and H atom orbitals forming the Odonor–H bond, as well as of the Odonor–H bonding (σO–H), Odonor–H antibonding (![]() |
Since pKa is an indicator of proton affinity (Brønsted acidity), a lower ΔpKa indicates a closer proton affinity between the H-bond donor and acceptor. In low ΔpKa H-bonds, the proton is strongly attracted to the H-bond acceptor, resulting in an increased Odonor–H distance and a significantly decreased H⋯Oacceptor distance.
Relationships among H-bond geometries and ΔpKa can also be explained in terms of electron redistribution induced by H-bond formation. A lower ΔpKa leads to pronounced electron redistribution from the H-bond acceptor to the donor (Fig. 2a). For example, electron redistribution amounts for the H-bond between water and protonated acetic acid (ΔpKa = 22, Fig. 2b), and between protonated acetic acid and deprotonated acetic acid (ΔpKa = 0, Fig. 2c), are 0.02e and 0.10e, respectively. Analysis of NBO28 showed that this electron redistribution arises from the hybridization between the lone pair orbital of the H-bond acceptor (nO) and the Odonor–H antibonding orbital of the H-bond donor .29
ΔpKa is related to the energy difference between (i) the orbital of the donor O atom forming the Odonor–H bond, and (ii) the orbital of the acceptor O atom accepting the Odonor–H bond (i.e., the nO orbital) (Fig. 2b and c). When ΔpKa is high, the orbital energy of the donor O atom is much higher than that of the acceptor O atom (Fig. 2b), resulting in the H atom forming a bond with the donor O atom. As ΔpKa decreases, the energy difference between these orbitals decreases. In the extreme case where ΔpKa ∼ 0, the energies of these orbitals are nearly equal (Fig. 2c). In this case, the H atom can bind to either the donor or acceptor O atom with almost no energy difference (a low-barrier H-bond, LBHB65,66). In LBHBs, the O⋯O distances are as short as ∼2.5 Å.67
Therefore, a lower ΔpKa leads to a decreased energy difference between the nO orbital of the H-bond acceptor and the orbital of the H-bond donor, which enhances their hybridization (Fig. 2b and c). Indeed, pKa values are correlated with the nO and
orbital energies (Lewis acidity, Fig. S2, ESI†). This enhanced hybridization increases the
electron redistribution, leading to an increased
orbital occupancy, a decreased Odonor–H bond order, and an increased Odonor–H distance. These correspond to the destabilization of the Odonor–H bonding orbital, resulting in a weaker Odonor–H bond (Fig. 2b and c). Furthermore, the enhanced hybridization between the nO and
orbitals, corresponding to the increased covalent character of the H⋯Oacceptor “bond”,64 decreases the H⋯Oacceptor distance more significantly than the Odonor–H distance increases. Thus, a lower ΔpKa results in a shorter O⋯O distance.
Notably, the correlation between the O⋯O distance and ΔpKa is mostly unaffected by the total charge of the H-bond (0 or −1) (Fig. 1 and Fig. S5a, ESI†). Asp and Glu side-chains are frequently involved in short H-bonds (O⋯O distances <2.7 Å).4,8,9 This is due to their tendency to form low ΔpKa H-bonds not only when (i) the deprotonated COO− group serves as an acceptor, but also when (ii) the protonated COOH group serves as a donor (Fig. 1). The shorter O⋯O distances are not caused by the stronger Odonor–H⋯−Oacceptor electrostatic interactions resulting from the negative charge of the deprotonated COO− group. Indeed, the average O⋯O distance of the charge-neutral [COOH⋯OH2] H-bonds with ΔpKa of ∼6 (2.59 Å) is shorter than that of the negatively charged [HOH⋯−OOC] H-bonds with ΔpKa of ∼12 (2.77 Å) for small compound H-bonds from the CSD (Table 2).58 Although Asp/Glu side-chains are often assumed to be deprotonated,4,8 the possibility of Asp/Glu side-chains serving as H-bond donors in their protonated forms should be considered, especially for short H-bonds.
H-Bond pair | Donor | Acceptor | ΔpKa (in water) | a (Å) | Nb |
---|---|---|---|---|---|
a Average O⋯O distance and standard deviation, obtained by fitting the distribution histogram (Fig. S6, ESI) using a Gaussian function.b Total number of O atom pairs.c Obtained from distributions of O⋯O distances shorter than 2.75 Å, as many of the O atom pairs with distances longer than 2.75 Å are unlikely forming H-bonds (Fig. S6, ESI). | |||||
Water⋯Water | H2O | H2O | 18 | 2.77 ± 0.12 | 149![]() |
Water⋯Ser/Thr | H2O | C–OH | 18 | 2.76 ± 0.09 | 17![]() |
C–OH | H2O | 18 | |||
Water⋯Tyr | PhOH | H2O | 12 | 2.72 ± 0.11 | 5924 |
H2O | PhOH | 22 | |||
Water⋯Asp/Glu | COOH | H2O | 6 | 2.73 ± 0.10 | 35![]() |
H2O | COO− | 12 | |||
H2O | COOH | 22 | |||
Water⋯Backbone | H2O | C![]() |
17 | 2.79 ± 0.09 | 92![]() |
Ser/Thr⋯Ser/Thr | C–OH | C–OH | 18 | 2.75 ± 0.08 | 635 |
Ser/Thr⋯Tyr | PhOH | C–OH | 12 | 2.74 ± 0.07 | 329 |
C–OH | PhOH | 22 | |||
Ser/Thr⋯Asp/Glu | COOH | C–OH | 6 | 2.68 ± 0.07 | 2675 |
C–OH | COO− | 12 | |||
C–OH | COOH | 22 | |||
Ser/Thr⋯Backbone | C–OH | C![]() |
17 | 2.75 ± 0.09 | 6220 |
Tyr⋯Tyr | PhOH | PhOH | 16 | 2.73 ± 0.07 | 72 |
Tyr⋯Asp/Glu | PhOH | COO− | 6 | 2.63 ± 0.06 | 1359 |
COOH | PhOH | 10 | |||
PhOH | COOH | 16 | |||
Tyr⋯Backbone | PhOH | C![]() |
11 | 2.69 ± 0.06 | 1313 |
Asp/Glu⋯Asp/Glu | COOH | COO− | 0 | 2.52 ± 0.05c | 254 |
COOH | COOH | 10 | |||
Asp/Glu⋯Backbone | COOH | C![]() |
5 | 2.64 ± 0.05c | 173 |
The distributions of O⋯O distances were compared with the ΔpKa values in water for each H-bond pair (Fig. 3). A lower ΔpKa leads to a shorter O⋯O distance distribution. The O⋯O distance distributions exhibit certain widths, with standard deviations ranging from 0.05 to 0.12 Å (Table 3).
Several H-bond types with different ΔpKa values are possible for the [Asp/Glu⋯Asp/Glu], [Tyr⋯Asp/Glu], [Ser/Thr⋯Asp/Glu], [water⋯Asp/Glu], [Ser/Thr⋯Tyr], and [water⋯Tyr] pairs (Fig. 3). Among these six pairs, the predominant H-bond types for the [Asp/Glu⋯Asp/Glu] and [water⋯Asp/Glu] pairs can be deduced from their O⋯O distance distributions. For the [Asp/Glu⋯Asp/Glu] pair, the [COOH⋯−OOC] type with ΔpKa ∼ 0 is likely predominant. For the [water⋯Asp/Glu] pair, the [HOH⋯−OOC] type with ΔpKa ∼ 12 is likely predominant (see the discussion in ESI†).
Average O⋯O distances in proteins were compared with ΔpKa values in water. We compared the values of H-bond pairs whose ΔpKa values of the predominant H-bond types were inferred (indicated by the dotted squares in Fig. 3). The average O⋯O distance is highly correlated with ΔpKa (Fig. 4a, R2 = 0.91), which is best described by the following equation:
![]() | (1) |
![]() | ||
Fig. 4 Average O⋯O distances for each H-bond type in proteins. (a) Correlation between average O⋯O distances in proteins and ΔpKa in water (R2 = 0.91). Solid and open circles indicate H-bonds with total charges of 0 and −1, respectively. (b) Average O⋯O distances for buried and exposed H-bonds (RMSD = 0.03 Å). The black diagonal line indicates perfect correspondence (i.e., identity line). (a) and (b) Labels indicate donor/acceptor groups. (c) Representative structures of buried (PDB ID: 4KQP68) and exposed (PDB ID: 3CLM, unpublished) H-bonds (dotted lines). |
This high correlation indicates that O⋯O distances with the same donor and acceptor groups are distributed around a value primarily determined by ΔpKa in water. This ΔpKa value does not account for the influence of the protein environment. Therefore, deviations in O⋯O distances from their average values, corresponding to the width of the O⋯O distance distribution (Fig. 3), reflect the influence of the protein environment on the characteristics of the H-bond.
The average O⋯O distances of buried and exposed H-bonds are nearly identical for H-bond pairs with inferred predominant types (Fig. 4b and c, root mean square distance [RMSD] = 0.03 Å). This result indicates that O⋯O distances are unaffected by whether the H-bond is buried in the protein interior or exposed to bulk water. This suggests that differences in the dielectric properties of the protein environment do not influence O⋯O distances, consistent with the previous reports for small compounds.16,17
Water solvent weakens the electrostatic interaction between the Odonor–H and Oacceptor groups due to electrostatic shielding. This electrostatic interaction is the major stabilizing factor of an H-bond.71 On the other hand, the strength of the Odonor–H⋯Oacceptor electrostatic interaction plays a minor role in determining the O⋯O distance. When comparing H-bonds with similar O⋯O distances, the electrostatic interaction energy is higher for H-bonds with total charges of −1 than for charge-neutral H-bonds (Fig. S5b, ESI†). This indicates that a strong Odonor–H⋯Oacceptor electrostatic interaction does not necessarily result in a shorter H-bond. This is because the stabilization provided by the electrostatic interaction is largely compensated by the destabilization due to exchange repulsion at short O⋯O distances (Fig. S5c, ESI†).62,64 In contrast, a lower ΔpKa leads to an enhanced electron redistribution from the H-bond acceptor to the donor, which decreases the O⋯O distance (Fig. 2). Therefore, O⋯O distances are predominantly determined by the ΔpKa of the H-bond without being affected by whether the H-bond is buried in the protein interior or exposed to bulk water.
The average N⋯O distance in proteins is highly correlated with ΔpKa in water for each H-bond type (Fig. 5a, R2 = 0.71). Similarly, the average N⋯N distance in proteins is highly correlated with ΔpKa in water (Fig. 5b, R2 = 0.93). These are consistent with the high correlations between N⋯O or N⋯N distances and ΔpKa observed for small-compound H-bonds.17 The strength of the electrostatic interaction between the donor and acceptor groups differs among H-bonds formed between (i) charge-neutral donor and acceptor (e.g., backbone-N–H⋯OC-backbone), (ii) a positively charged donor and a negatively charged acceptor (salt-bridges, e.g., Lys-NH3+⋯−OOC-Asp), (iii) a positively charged donor and a charge-neutral acceptor (e.g., Lys-NH3+⋯OH2), and (iv) a charge-neutral donor and a negatively charged acceptor (backbone-N–H⋯−OOC-Asp). The correlations between average N⋯O or N⋯N distances and ΔpKa are largely unaffected by these differences (Fig. 5a and b), indicating that the ΔpKa of the H-bond, rather than the electrostatic interaction strength between the donor and acceptor groups, is the primary determinant of N⋯O and N⋯N distances, as well as O⋯O distances. Indeed, average N⋯O and N⋯N distances for buried H-bonds (with low electrostatic shielding) and solvent-exposed H-bonds (with high electrostatic shielding) are nearly identical (Fig. 5c and d).
![]() | ||
Fig. 5 Average N⋯O and N⋯N distances for each H-bond type in proteins. (a) Correlation between average N⋯O distances in proteins and ΔpKa in water (R2 = 0.71). (b) Correlation between average N⋯N distances in proteins and ΔpKa in water (R2 = 0.93). (a) and (b) Black closed circles, orange crosses, blue plus signs, and black open circles indicate H-bonds formed between (i) charge-neutral donor and acceptor, (ii) positively charged donor and negatively charged acceptor (salt-bridges), (iii) positively charged donor and charge-neutral acceptor, and (iv) charge-neutral donor and negatively charged acceptor, respectively. For H-bond pairs involving His side-chains, H-bond types with the lowest ΔpKa values are assumed (see the discussion in ESI†). (c) Average N⋯O distances for buried and exposed H-bonds (RMSD = 0.02 Å). (d) Average N⋯N distances for buried and exposed H-bonds (RMSD = 0.06 Å). (c) and (d) The black diagonal lines indicate perfect correspondence (i.e., identity line). Data points for Trp⋯Ser/Thr, Trp⋯Tyr, Arg⋯His, Lys⋯His, and Trp⋯His pairs were excluded due to an insufficient number of exposed H-bonds (<10). |
Odonor–H stretching vibrational frequencies of H-bonds in proteins are predominantly determined by the ΔpKa of the H-bond.72 Here, ΔpKa values were estimated from the calculated Odonor–D stretching vibrational frequencies (νO–D) using the following equation:72
ΔpKa = 0.019 νO–D [cm−1] − 32 | (2) |
Note that this ΔpKa value, obtained from the Odonor–D stretching vibrational frequency, represents the ΔpKa value of the H-bond in the protein environment, not the ΔpKa value of ∼12 in water. O⋯O distances were obtained from the QM/MM-optimized structures. O⋯O distances and ΔpKa values were compared for the 12 [HOH⋯−OOC] H-bonds (Fig. 6a and Table S2, ESI†).
![]() | ||
Fig. 6 O⋯O distances for [HOH⋯−OOC] H-bonds in microbial rhodopsins. (a) O⋯O distances and ΔpKa for 12 [HOH⋯−OOC] H-bonds. The open square, the cross, and circles indicate O⋯O distances and ΔpKa values of the [H2O-406⋯Asp212] H-bond in the N′-state BR, [H2O-402⋯Asp212] H-bond in the ground-state BR, and the other 10 H-bonds, respectively. The solid line indicates the correlation between ![]() |
The O⋯O distance is correlated with ΔpKa for 12 H-bonds (Fig. 6a, R2 = 0.79). This relatively high correlation suggests that deviations in O⋯O distances from their average values primarily arise from ΔpKa shifts induced by the protein electrostatic environment.
Although Odonor–H stretching vibrational frequencies are mostly determined by the ΔpKa of the H-bond,72 O⋯O distances are influenced more significantly by factors other than the ΔpKa of the H-bond. The correlation between and ΔpKa for each H-bond type in proteins (eqn (1), the solid line in Fig. 6a) likely represents the relationship between the O⋯O distance and ΔpKa. While O⋯O distances and ΔpKa values for many H-bonds align with this
versus ΔpKa relationship, some H-bonds significantly deviate from this trend (Fig. 6a). These deviations arise from factors other than the ΔpKa of the H-bond. Among these, we focus on the [H2O-406⋯Asp212] H-bond in the N′-state BR with a short O⋯O distance of 2.57 Å (Fig. 6b), and the [H2O-402⋯Asp212] H-bond in the ground-state BR with a long O⋯O distance of 2.92 Å (Fig. 6c).
The O⋯O distance of the [H2O-406⋯Asp212] H-bond in the N′-state BR is 2.57 Å, which is 0.16 Å shorter than the average O⋯O distance for [HOH⋯−OOC] H-bonds in proteins (2.73 Å) (Fig. 6b). The calculated ΔpKa value of this H-bond is 13, nearly identical to the ΔpKa value of the [HOH⋯−OOC] H-bond in the absence of the protein environment (∼12) (Fig. 6a). Therefore, the short O⋯O distance of this H-bond is not caused by a decreased ΔpKa due to the protein electrostatic environment, but rather by structural constraints in the protein environment.
In the absence of the protein environment, a lower ΔpKa tends to result in a higher binding enthalpy,11 a shorter O⋯O distance, and a larger Odonor–H⋯Oacceptor angle approaching 180°.16,17 On the other hand, the Odonor–H⋯Oacceptor angle of this short H-bond is 149° (Fig. 6b). Short H-bonds with decreased Odonor–H⋯Oacceptor angles, despite high ΔpKa values, are frequently observed in intramolecular H-bonds, where structural constraints are significant.17 Such short H-bonds with small Odonor–H⋯Oacceptor angles result from rigid anchoring in the protein matrix, which causes deviations from the equilibrium O⋯O distance. In the case of the [H2O-406⋯Asp212] H-bond in the N′-state BR, the formation of the H-bond network involving protonated Asp85, H2O-401, H2O-406, and Asp212 decreases the O⋯O distance (Fig. 6b). Indeed, when the structure is QM/MM-optimized in the absence of H2O-401, the O⋯O distance of the [H2O-406⋯Asp212] H-bond increases to 2.79 Å (Fig. S7, ESI†).
The O⋯O distance of the [H2O-402⋯Asp212] H-bond in the ground-state BR is 2.92 Å, which is 0.19 Å longer than the average O⋯O distance for [HOH⋯−OOC] H-bonds in proteins (2.73 Å) (Fig. 6c). The calculated ΔpKa value of this H-bond is 19, indicating that ΔpKa is increased from ∼12 due to the protein electrostatic environment (Fig. 6a). However, a ΔpKa of 19 corresponds to an O⋯O distance of 2.78 Å based on eqn (1), which is shorter than 2.92 Å. Therefore, the long O⋯O distance of 2.92 Å cannot be solely attributed to the increased ΔpKa. Instead, it results from both (i) the increased ΔpKa due to the protein environment, and (ii) structural constraints in the protein environment (Fig. 6a). The increase in ΔpKa is due to the decrease in pKa of Asp212, caused by H-bond donations from Tyr57 and Tyr185 to Asp21273 (Fig. 6c). The structural constraints likely arise from the H-bond formations between H2O-402 and Asp85/Lys216, and between Asp212 and Tyr57/Tyr185, which may increase the H⋯Oacceptor distance (Fig. 6c).
To summarize, H-bond distances are influenced by the following two factors of the protein environment: (i) the protein electrostatic environment that shifts the ΔpKa of the H-bond. This ΔpKa shift alters the covalent character of the H⋯Oacceptor bond (Fig. 2), thereby altering the O⋯O distances. (ii) Structural constraints imposed by protein folding. These constraints alter the Odonor–H⋯Oacceptor angle (Fig. 6b) or the H⋯Oacceptor distance (Fig. 6c) without affecting ΔpKa (or the corresponding Odonor–H distance), thereby altering the O⋯O distance. Such constraints destabilize the H-bond, which is likely compensated by stabilization through other interactions and H-bonds (e.g., H-bonds between H2O-402 and Asp85/Lys216, and between Asp212 and Tyr57/Tyr185 in the ground-state BR, Fig. 6c).
Beyond this primary determinant of H-bond distance, structural constraints in the protein environment impose a secondary influence. Odonor–H⋯Oacceptor angles (Fig. 6b) and H⋯Oacceptor distances (Fig. 6c) are often restricted by rigid anchoring in the protein matrix, causing deviations in the O⋯O distance from the equilibrium value determined by ΔpKa. While the effect of structural constraints on H-bond distances is weaker than the covalent-bond-like electronic effect, it enables the formation of unique H-bond geometries that are inaccessible in bulk solvent (Fig. 6a). These protein-specific H-bond geometries may play a crucial role in shaping the functional properties of proteins.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5cp00511f |
This journal is © the Owner Societies 2025 |