Sergio
Rampino
,
Mirco
Zerbetto
* and
Antonino
Polimeno
Department of Chemical Sciences, University of Padova, Via Marzolo 1, I-35131, Padova, Italy. E-mail: mirco.zerbetto@unipd.it; Fax: +39 049 8275050; Tel: +39 049 8275678
First published on 12th May 2023
Stochastic modeling approaches can be used to rationalize complex molecular dynamical behaviours in solution, helping to interpret the coupling mechanisms among internal and external degrees of freedom, providing insight into reaction mechanisms and extracting structural and dynamical data from spectroscopic observables. However, the definition of comprehensive models is usually limited by (i) the difficulty in defining – without resorting to phenomenological assumptions – a representative reduced ensemble of molecular coordinates able to capture essential dynamical properties and (ii) the complexity of numerical or approximate treatments of the resulting equations. In this paper, we address the first of these two issues. Building on a previously defined systematic approach to construct rigorous stochastic models of flexible molecules in solutions from basic principles, we define a manageable diffusive framework leading to a Smoluchowski equation determined by one main tensorial parameter, namely the scaled roto-conformational diffusion tensor, which accounts for the influence of both conservative and dissipative forces and describes the molecular mobility via a precise definition of internal–external and internal–internal couplings. We then show the usefulness of the roto-conformational scaled diffusion tensor as an efficient gauge of molecular flexibility through the analysis of a set of molecular systems of increasing complexity ranging from dimethylformamide to a protein domain.
First of all, direct coarse-grained approaches have been developed over the years with the aim of reducing the number of explicitly considered degrees of freedom by collecting in some appropriate way groups of atoms into collective particles (beads) and adopting an effective force field among them, also aided by state-of-the art machine-learning techniques.8,9 A less direct but intriguing approach is followed when distinguishing between relevant motions (or coordinates), primarily affecting the process or property under study, and non-relevant ones, which have – or are supposed to have – a negligible effect and can be treated in some suitable approximate way. This is the field of the so-called relevant dynamics: in other words, a few important details are supposed to be responsible for a phenomenon to happen, while all the others can be thought of as molecule-dependent fine tuners of the properties of the phenomenon. In these approaches, the problem is thus that of defining/finding the relevant coordinates of the system, focusing on the important, hopefully small, subset of degrees of freedom. The partition “relevant vs. irrelevant” can be based on purely phenomenological considerations, or, at least approximately, on algorithmic procedures. In the Gaussian Network Model, for instance, the molecule is seen as an ensemble of beads interacting within a cutoff radius with a harmonic potentials of the same strength.10,11 The diagonalization of the so-called Kirchoff matrix (worked out from the variance-covariance matrix describing the correlation between the displacements of bead pairs) leads to the definition of a set of generalized coordinates (linear combinations of Cartesian coordinates of the beads) whose relevance in terms of relaxation times is directly proportional to the amplitude of the motion given by the related eigenvalue. Similar alternative models are the anisotropic network model, based on the diagonalization of the Hessian of the potential instead of the Kirchoff matrix,12,13 and the normal mode analysis, expressing the dynamics of a protein as a superposition of collective motions called normal modes.14,15 Among other viable approaches, the filtering technique is based on the idea of removing high-frequency fast motions with a low pass filter applied to a Fourier transformed MD trajectory,16,17 to focus on the structural and energetic features of low-frequency collective motions. Singular Value Decomposition is thought to be very useful for characterization of the nonlinear dynamics of multivariate systems and to the identification of the dominant modes of motion in systems whose cooperative dynamics cannot be fully explored within reasonable computation time using conventional simulation techniques.18 Finally, the widely employed principal component analysis,19 also known as essential dynamics, involves as some of the previously mentioned techniques the construction and diagonalization of a matrix which leads to the definition of collective generalized coordinates.
Once defined, the subset of relevant coordinates can be described using stochastic approaches, which are particularly useful when discussing molecules in solutions. The definition of a comprehensive approach to define molecularly-based Fokker–Planck (FP) models can be conducted within a general framework. In our previous work20,21 we gave a contribution to the definition of a general procedure to derive a FP description by addressing the relaxation processes of a flexible (macro)molecule as a non-rigid body, based on a generic set of internal and external degrees of freedom, which define a Markovian process. This was pursued by first setting up the Liouville equation of motion22 for a generic flexible body defined as a set of material points (atoms or beads), in terms of roto-translational and natural internal coordinates. The general description of a macromolecule in solution was then carried out in terms of a collection of flexible bodies, to which a standard Nakajima-Zwanzig23,24 projection method was applied in order to eliminate the irrelevant, i.e. not directly observed, degrees of freedom. A generalized master equation was obtained. Alternative modeling options, based on different choices of the internal variables, of accuracy levels of the description of solvents effects, etc., were reviewed. The case of a partially rigid Brownian rotator (semi-flexible body model), i.e. with only internal modes described by a harmonic internal energy, was discussed.21
The resulting theoretical approach, while effective, is still considerably challenging from a computational point of view. To partially overcome this limitation, we propose here to simplify the general scheme20,21 by projecting out all the conjugate momenta. Such an approximation is valid in the diffusive limit, and leads to the Smoluchowski description of the stochastic motion of the relevant coordinates only. We present in particular in this paper first, for sake of completeness, the derivation of the diffusive description from the general model, and then we concentrate on the characteristics of the main tensorial parameter defining the resulting FP equation, i.e. the (scaled) roto-conformational diffusive tensor, D, which includes, at least for semi-flexible systems not undergoing large amplitude internal motions, information on both the molecular internal energy landscape and the dispersive medium. In particular, we intend to test the usefulness of D as a gauge of molecular flexibility. We argue that, by analysing in different ways the tensor elements, its eigenvalues spectrum and other derived quantities, it is possible to acquire additional insight into the molecular mobility. To do so, after sketching the theoretical framework in Section 2, we analyze in Section 3 the features of the roto-conformational diffusion tensor for molecular systems of increasing complexity, ranging from the simple dimethylformamide to a protein domain, and we discuss its possible role as a signature for measuring the dynamical relevance of subsets of coordinates. Conclusions are drawn in Section 4 and some perspectives are outlined.
This picture is somewhat simplified when a diffusive limit is considered, i.e. when it is assumed that momenta L, p relax much faster then positional coordinates Ω, q. This is a reasonable approximation, usually introduced when discussing molecular relaxation processes in normal fluids at room temperature. The original complete FP equation discussed in ref. 20 and 21, can be therefore simplified by the rigorous elimination (via projection) of the momenta, which are considered as fast relaxing modes. The resulting time evolution operator in the reduced configuration space Ω, q is Hermitian (or easily reduced to a Hermitian form), allowing for a considerably increased efficiency in the subsequent numerical treatment, without renouncing to the full inclusion of internal degrees of freedom, based on non-phenomenological assumptions. The procedure employed to project the momenta follows a standard Nakajima–Zwanzig23–25 formalism, and is sketched in the ESI† for the sake of completeness. The result is the diffusive (Smoluchowski) equation for the time dependent conditional probability ρ(x,t) with x = (Ω, q), obtained after averaging out all the momenta L, p
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
![]() | (6) |
• α-L-Rhap-α-(1 → 2)-α-L-Rhap-OMe (R2R, 2 residues)
• β-D-Glcp-(1 → 6)-α-D-[6-13C]-Manp-OMe (BGL, 2 residues)
• β-D-Glcp-(1 → 3)[β-D-Glcp-(1 → 2)]-α-D-Manp-OMe (GGM, 3 residues)
• α-D-Manp-(1 → 2)-α-D-Manp-(1 → 6)-α-D-[6-13 C]-Manp-OMe (TRI, 3 residues)
• α-L-Fucp-(1 → 2)-β-D-Galp-(1 → 3)-β-D-GlcpNAc-(1 → 3)-β-D-Galp-(1 → 4)-D-Glcp (LNF, 5 residues)
• γ-cyclodextrin (GCY, 8 residues)
The last (and most complex) considered system is GB3 protein (PDB id: 1P7F), with 56 amino acid residues, 862 atoms and 2580 internal degrees of freedom. The orientation of the molecular frame is defined by three consecutively bonded atoms as follows: the origin coincides with the position of the second atom, the first atom lies on the negative x axis, and the third atom lies in the xy plane. For DMF and for the oligosaccharides the first two atoms were selected as the hydrogen and the carbon, respectively, of a plausible NMR probe (either CH or CH2); for GB3 the triplet of atoms defining the orientation frame was chosen as the first three atoms (NCC) of the polymer backbone. Fig. 1 shows the molecular structures of the considered systems with the above mentioned triplets of atoms highlighted as spheres.
In the hydrodynamic model used to evaluate the friction tensor, four quantities are required, namely: the temperature, the local viscosity, the hydrodynamic boundary conditions and the effective radius of the atoms, Reff. For the set of oligosaccharides, temperature and viscosity were chosen among those for which experimental NMR data are available (see ref. 26–30), and Reff was set to the optimal value providing the lowest sum of squared percentage deviations for T1, T2 and NOE over the entire ensemble of experimental data available for each system, as estimated in ref. 31. For DMF and GB3, temperature and viscosity were arbitrarily chosen, and a value of Reff = 2.0 Å was used. For all systems stick boundary conditions were applied. A summary of the parameters used for all the considered systems is provided in Table 1. Calculations were performed according to the computational protocol described in the ESI.†
DMF | R2R | BGL | GGM | TRI | LNF | GCY | GB3 | |
---|---|---|---|---|---|---|---|---|
T/K | 298.15 | 298.2 | 253.0 | 298.6 | 298.0 | 303.0 | 323.0 | 298.15 |
η/cP | 2.19 | 2.19 | 2.82 | 1.09 | 3.66 | 1.40 | 2.90 | 0.91 |
R eff/Å | 2.0 | 1.6 | 1.8 | 1.8 | 2.2 | 3.2 | 1.8 | 2.0 |
i = 1 | i = 2 | i = 3 | |
---|---|---|---|
DMF | |||
D (RR) i | 6.69 × 10−6 | 1.26 × 10−5 | 2.22 × 10−5 |
(egv[D])i | 1.82 × 10−6 | 2.04 × 10−6 | 2.14 × 10−6 |
Rigid mol. | 1.83 × 10−6 | 2.15 × 10−6 | 2.21 × 10−6 |
R2R | |||
D (RR) i | 1.05 × 10−5 | 1.98 × 10−5 | 3.24 × 10−5 |
(egv[D])i | 5.19 × 10−7 | 5.75 × 10−7 | 7.53 × 10−7 |
Rigid mol. | 5.24 × 10−7 | 5.78 × 10−7 | 7.66 × 10−7 |
BGL | |||
D (RR) i | 5.46 × 10−7 | 1.03 × 10−6 | 1.66 × 10−6 |
(egv[D])i | 2.56 × 10−8 | 2.81 × 10−8 | 4.47 × 10−8 |
Rigid mol. | 2.58 × 10−8 | 2.83 × 10−8 | 4.54 × 10−8 |
GGM | |||
D (RR) i | 1.60 × 10−5 | 3.14 × 10−5 | 5.00 × 10−5 |
(egv[D])i | 5.65 × 10−7 | 6.57 × 10−7 | 7.95 × 10−7 |
Rigid mol. | 5.70 × 10−7 | 6.65 × 10−7 | 8.08 × 10−7 |
TRI | |||
D (RR) i | 3.33 × 10−6 | 6.24 × 10−6 | 9.91 × 10−6 |
(egv[D])i | 1.35 × 10−7 | 1.42 × 10−7 | 2.39 × 10−7 |
Rigid mol. | 1.36 × 10−7 | 1.43 × 10−7 | 2.43 × 10−7 |
LNF | |||
D (RR) i | 4.04 × 10−6 | 7.87 × 10−6 | 1.22 × 10−5 |
(egv[D])i | 1.40 × 10−7 | 1.47 × 10−7 | 2.23 × 10−7 |
Rigid mol. | 1.42 × 10−7 | 1.49 × 10−7 | 2.27 × 10−7 |
GCY | |||
D (RR) i | 6.79 × 10−6 | 1.28 × 10−5 | 2.03 × 10−5 |
(egv[D])i | 9.54 × 10−8 | 1.12 × 10−7 | 1.22 × 10−7 |
Rigid mol. | 9.63 × 10−8 | 1.13 × 10−7 | 1.23 × 10−7 |
GB3 | |||
D (RR) i | 1.37 × 10−5 | 2.26 × 10−5 | 4.01 × 10−5 |
(egv[D])i | 5.35 × 10−8 | 5.79 × 10−8 | 7.62 × 10−8 |
Rigid mol. | 5.41 × 10−8 | 5.84 × 10−8 | 7.67 × 10−8 |
1. elements D(RR)i ≡ D(RR)ii of the diagonal rotational block D(RR) of D
2. first three eigenvalues of the set of eigenvalues obtained by full diagonalization of D and sorted in ascending order.
3. eigenvalues of the rotational diffusion tensor for the rigid molecule obtained with the DiTe2 software package,32 using the same hydrodynamic model and the same conditions described above.
A strikingly good agreement is found between the eigenvalues of the rotational diffusion tensor of the rigid molecule (set 3) and the lowest three eigenvalues of the (fully diagonalized) D matrix (set 2). Therefore, upon coordinate transformation the diagonalization of D seems to lead to a separation of motions, with the three separated eigenvalues being close to the rotational diffusion of the molecule as if it was rigid. The two sets of eigenvalues show an overall decreasing trend with increasing systems size and complexity. In contrast, the three elements of the diagonal rotational block of D(RR) (set 1) do not exhibit a clear trend with increasing system size, and except for the first (and smallest) system their values are one or two orders of magnitude smaller than the respective values from the other two sets. This is in line with the fact that the elements of D(RR)i relate to the rotation of a small subsystem made of the triplet of atoms used to define the molecular frame.
On the other hand, the analysis of the elements of the roto-conformational block D(SR) provides information on the coupling between the internal degrees of freedom and the external rotation.
The roto-conformational coupling (panel d) is strong (high values of |D(SR)ij|/|D(SS)i − D(RR)j|) for only a few normal modes with the lowest frequencies (thus mainly represented by torsion motions). In particular, the first two normal modes exhibit by far the highest coupling with external rotation, while the remaining modes turn out to be negligibly coupled to it. The analysis of the squared elements of the first two columns of the T matrix (panel b) reveals that the internal motions mostly coupled with the external rotation are the C2–N3–C–H torsions (see panel a for atom numbering), with the last two atoms being one of the three CH pairs of each methyl group.
A measure of the ‘coldness’/‘hotness’ of the atoms (i.e. the involvement of atoms in motions associated with lower or higher frequencies), can be computed as follows:
![]() | (7) |
![]() | ||
Fig. 3 Atomic average frequencies in DMF computed as per eqn (7) (see text for discussion). |
One may further push the analysis to the contribution of each atom in a given type of motion (stretchings, bendings, torsions) for a given frequency range. To this purpose the following expression can be used:
![]() | (8) |
Panel d shows that there is only one normal mode, precisely the lowest-frequency one, which is tightly coupled to the overall rotation of the molecule. The nature of this mode can be appreciated by inspection of Fig. 6, where the atom connections involved in the torsions (left panel), bendings (center panel) and stretchings (right panel) contributing to the mode with value Tij2 above a given threshold are highlighted in green. The threshold is here set to 1/N (with N, as already mentioned, being the number of internal degrees of freedom), i.e. exactly equal to the average value of all contributions to that mode by the original internal coordinates q. In other words, in the panels of Fig. 6, only the connections between atoms involved in motions contributing above average to the mode are highlighted. The figure thus reveals that the only normal mode significantly coupled to the external rotation results mainly from torsions involving the atoms of the polymer backbone, plus one coordinate relating to the bending motion of three of the sugar rings with respect to the remaining two. For LNF we also analyzed how both the parameters relating to physical conditions and the choice of the triplet of atoms defining the orientation frame affect the above discussed results. Results are fully reported and discussed in the ESI.†
![]() | ||
Fig. 6 Character of the lowest frequency (and highest coupling with external motion) normal mode in terms of the constituting torsions, bendings and stretchings (see text for discussion) for LNF. |
Focusing now on the atom ‘hotness’, Fig. 7 shows the atomic average frequencies of the atoms using the same color scale as in Fig. 3, ranging from white (coldest atom) to red (hottest atom). In this case, the hottest atom is the only terminal oxygen atom, which is bound to a single carbon atom and is thus mainly involved in the stretching of this bond. On the other hand, the coldest atoms are the carbon and oxygen atoms constituting the oligomer backbone linking the sugar rings, which are mainly involved in slow torsion motions.
![]() | ||
Fig. 7 Atomic average frequencies in LNF computed as per eqn (7) (see text for discussion). |
Also in this case, the matrix of the color plots (Fig. 8) illustrating the contribution of the atoms in different kind of motions for the three above envisaged frequency ranges (identified by index ranges[1,50], [51,240], and [241,333]) reflects the features of the color plot of the T matrix reported in panel b of Fig. 5.
![]() | ||
Fig. 10 Character of the fifth normal mode in terms of the constituting torsions, bendings and stretchings for GB3 (see text for discussion). |
![]() | ||
Fig. 11 Character of the sixth normal mode in terms of the constituting torsions, bendings and stretchings for GB3 (see text for discussion). |
The two figures show that the modes mainly results from torsions and bendings involving the atoms of the polymer backbone, plus a stretching of the bond linking the S-methyl thioether side chain of the first (methionine) residue to the protein backbone.
The atomic average frequencies for GB3 can be visualized in Fig. 12, where the same color scale is used as in the previously discussed Fig. 3 and 7, i.e. ranging from white (coldest atom) to red (hottest atom). In line with the results on the smaller polymer LNF, the hottest atoms are here the ‘external’ terminal oxygens, while the colder atoms are the more ‘internal’ carbon and nitrogen atoms constituting the polymer backbone. For this last system too, the matrix of the color plots (Fig. 13) illustrating the contribution of the atoms in different kinds of motions for the three above envisaged frequency ranges (identified by index ranges [1,280], [281,1800], and [1801,2580]) reflects the features of the color plot of the T matrix reported in panel b of Fig. 9.
![]() | ||
Fig. 12 Atomic average frequencies in GB3 computed as per eqn (7) (see text for discussion). |
Focusing on a set of molecular systems of increasing complexity ranging from dimethylformamide, to several oligosaccharides and a protein domain, we showed how the rich and informative structure of the scaled roto-conformational diffusion tensor can be analyzed to extract quantitative information on the molecular mobility. In particular, the analysis of the eigenvectors and eigenvalues of the matrix diagonalizing the conformational block of the diffusion tensor provides a quantitative picture of the time scales of the different molecular motions, putting on quantitative grounds, for instance, the time-scale separation between stretching, bending, and torsion motions. Moreover, a suitable combination of the elements of the eigenvector matrix leads to an informative definition of the ‘coldness’/‘hotness’ of the atoms which can be used to quantitatively assess the involvement or contribution of a given atom in specific internal modes. Thus for instance, the intuitive consideration that the coldest atoms of the considered five-units oligosaccharide and protein domain are those constituting the polymer backbone, mainly involved in slow torsion motions, while the hottest atoms are the terminal oxygen atoms mainly involved in stretching motions, could be put on a thorough quantitative ground. Finally, the analysis of the non diagonal block of the scaled roto-conformational diffusion tensor leads to the precise determination of the few internal modes strongly coupled with the overall external rotation. The scaled roto-conformational diffusion tensor can thus be conveniently used as a gauge of molecular flexibility, also with the aim of determining a cut-off in the degrees of freedom to be included in the full solution of the related FP equation. Work is ongoing in our group on the numerical resolution of this equation with a focus on the computation of observables that can be assessed through various spectroscopic techniques including NMR and neutron spectroscopy, which simultaneously accesses spatial and time correlations (as opposed to the angular correlations probed by NMR).
The analysis of molecular flexibility presented in this work can be used as an alternative approach to the selection of essential dynamics in macromolecules, including in the analysis the flexibility in terms of characteristic time scales, instead of just energy arguments. The idea is to be able to extract such an essential dynamics (relevant coordinates) by analyzing short (with respect to the time scales required to interpret slow dynamics) MD trajectories. Machine learning techniques33,34 are expected to be efficiently employed to pursue the best essential dynamics selection with respect to the physical observable that is being interpreted (e.g., NMR relaxation).
Footnote |
† Electronic supplementary information (ESI) available: Results on the roto-conformational diffusion tensor (analogues of Fig. 2, 5 and 9) for the remaining five oligosaccharides, brief review of the Fokker–Planck equation in internal coordinates, and derivation of the Smoluchowski equation by projetion of the momenta. See DOI: https://doi.org/10.1039/d3cp01382k |
This journal is © the Owner Societies 2023 |