Johannes Stöckelmaiera and
Chris Oostenbrink
*ab
aInstitute of Molecular Modeling and Simulation (MMS), BOKU University, Vienna, Austria. E-mail: chris.oostenbrink@boku.ac.at
bChristian Doppler Laboratory Molecular Informatics in the Biosciences, BOKU University, Vienna, Austria
First published on 2nd July 2025
To elucidate the connection between the structure and function of intrinsically disordered proteins (IDPs) a description of their conformational ensembles is crucial. These are typically characterized by an extremely large number of similarly low energy conformations, which can hardly be captured by either experimental or computational means only. Rather, the combination of data from both simulation studies and experimental research offers a way towards a more complete understanding of these proteins. Over the last decade, a number of methods have been developed to integrate experimental data and simulations into one model to describe the conformational diversity. While many of these methods have been successfully applied, they often remain black-boxes for the scientist applying them. In this work, we review maximum entropy methods to optimize conformational ensembles of proteins. From a didactical perspective, we aim to present the mathematical concepts and the optimization processes in a common framework, to increase the understanding of these methods.
The elucidation and characterization of structures and the associated dynamics of flexible proteins turned out to be a substantial scientific challenge that requires a close cooperation between experimental studies, data science and molecular simulations.6–8 Flexible proteins are often characterized by complex, multifunneled potential energy landscapes with multiple, often shallow, minima.9,10 Flatter parts of the landscapes may span multiple conformations, allowing rapid switches between them at ambient temperatures11 as visualized in Fig. 1. The observable molecular properties cannot be fully explained by a single structure and therefore it is necessary to create an appropriate representation of the structural diversity. A frequently used model consists of a superposition of different geometric structures, each showing a single relevant structure. The observable molecular properties then emerge as an average over the different structures. All of those structures together represent the conformational ensemble12–14 which is a set of molecule geometries with an affiliated probability coefficient or weight.15 The true amount of conformations in an ensemble is unknown and depends on the definition of discrete conformations, but can grow very large even with mid-sized molecules.16,17
Many established computational methods like comparative modeling18 and AI-based structure predictors like Rosettafold19 or Alphafold20 are designed to calculate static structures of stable proteins. The extension of these methods to also describe conformational ensembles, which are typically described by the sampling of the relevant conformations, is currently a major topic of research.21–24 Alternatively, molecular dynamics (MD) simulation uses the ergodic theory, which predicts that a conformational ensemble is captured by following the molecular motions of a molecule over a sufficiently long time. The computational challenge of appropriately sampling all conformations is closely related to the MD simulation of protein folding25–27 which has, while still being very challenging especially for larger proteins, seen substantial improvements of parameters and methodology. MD simulation can be applied to investigate the dynamic nature of an IDP and to generate an ensemble. The ensemble obtained with such method contains both conformations and associated probability coefficients. For a straightforward MD simulation, which follows the appropriate equations of motion based on an accurate energy function, and from which conformations are sampled at regular time intervals, the probability coefficients would be identical for all samples. The ensemble can subsequently be reduced in size to group very similar structures into single conformations and to assign their weights according to the occurrence of these conformations in the larger ensemble.
The complex potential energy surfaces of most IDPs and flexible proteins make these probability coefficients prone to errors due to force-field inaccuracies. To obtain not just valid geometrical structures but also the correct associated weights, it is necessary to model not just the well populated conformational minima but also to describe the (reversible) transitions from one conformation to the next and the associated energy barrier correctly.28 If the transitions between conformations are not observed sufficiently often, the weights assigned to specific conformations belonging to different minima may not be statistically robust. To address this challenge and to refine the weights of the geometrical ground states it thus seems reasonable to optimize weights a posteriori after completing the simulation. A fundamental prerequisite for the successful reweighting of ensembles lies in the complete sampling of the conformational space, often necessitating enhanced sampling methods. Reweighting methods depend on a reasonable sampled conformational space as they cannot create new conformations by them self, but are designed to create an appropriate ensemble from an existing set of conformations to better reproduce experimental data. Thus, initial ensembles obtained from such enhanced sampling methods featuring a wide set of relevant conformations with lower confidence statistical weights represent an ideal use case for a posteriority reweighting.
In the last decade, numerous methods have been developed to correct and improve computationally obtained ensembles by optimizing the associated weights using experimental data. Since then, these reweighting methods became an established tool in computational structure elucidation of flexible proteins.29–32 The aim of this study is to review some of the most prominent reweighting techniques and to give insights into what are often considered black box methods.
Experimental observables may also give insight into the relevant conformations of a biomolecule. Particularly insightful for IDPs is nuclear magnetic resonance (NMR) spectroscopy, offering e.g. chemical shifts, 3J-coupling constants, residual dipolar couplings (RDCs) and paramagnetic relaxation enhancement (PRE). During an experimental measurement, a very large number of molecules is measured simultaneously and the averaging timescales are typically long with respect to the molecular motion. Consequently, the measured observables represent directly both a time and ensemble average of the measured molecules.33–36 It is therefore invalid to compare observables calculated from a single conformation to the ensemble-averaged experimental results. Accordingly, it is necessary to compute the expectation value for each observable from a representative set of conformations (i.e. the computationally derived ensemble) to accurately compare results of experiment and simulation. In many cases, a weighted average over the simulation trajectory is calculated. Eqn (1) shows such an averaging where the ensemble average 〈Ocalc〉, indicted by angular brackets, is calculated. The ensemble consists of in total N conformations and each conformer t has an individual calculated observable Ocalct and a statistical weight wt:
![]() | (1) |
This approach is valid for most experimental data, but not for residual dipolar couplings (RDCs) and nuclear Overhauser effects (NOEs), where different averaging schemes are required. Before the calculation of ensemble averages, the physical nature of each type of observable needs to be considered and the correct averaging scheme must be chosen. For example, NOEs arise from dipolar coupling between the nuclear spin of two protons. The intensity of such signals is highly dependent on the distance in space between a given proton pair and weakens proportional to the third or sixth power of the distance, depending on the timescale of the experiment and the tumbling time of the molecule.37 Pairs closer than 3 Å result in strong NOE signals while the limit of detection is reached with pairs 6 Å apart. In ensemble averaging, this means that a small number of conformations with short distances between a proton pair have a dominating influence on the intensity of the NOE signal. To reproduce this behavior, NOE-derived distances require r−3 or r−6 averaging,38,39 e.g.:
![]() | (2) |
Interpreting the weights, wt, in eqn (1) and (2) as probabilities that the conformation t occurs in the ensemble, leads to the condition that the sum of all individual probabilities needs to be one:
![]() | (3) |
Due to the approximate nature of force fields it is unavoidable to introduce some level of inaccuracy into the simulation. In some simulation settings, such as those involving intrinsically disordered proteins (IDPs), these small inaccuracies are of increased relevance, as most force fields are originally optimized for stable proteins, and can potentially affect the prediction of the observables of the system. The simulated and ensemble-averaged observables, as obtained from the conformational ensemble, may be compared with those measured in experimental studies to confirm the validity of the simulations, identify differences and possibly to correct the simulation to allow further investigation into the properties of the system.
To validate and optimize molecular ensembles, a set of techniques known as reweighting methods can be applied. The basic principle of all of these methods is similar: an initial probability density representing the weights of each conformation of the unbiased ensemble is transformed into a probability density which represents the refined ensemble, aiming to improve the agreement between computationally and experimentally derived ensemble averages of the biophysical observables.
In biophysical experiments the behavior of a measured molecule is determined by its potential-energy landscape (natural potential). This potential-energy surface, governed by nature's physics, is a computationally inaccessible potential which can only be approximated by the force field (or the quantum mechanical method). To illustrate this concept, imagine a hypothetical force field that accurately represents the natural potential except for one region. In this example, the force field potential includes an additional energy valley that does not exist in the natural potential (compare Fig. 2A).
Fig. 2B, illustrates the distribution of simulated values of a hypothetical observable for an unbiased ensemble. An experimental ensemble average could be measured (green line). The computational estimate of the same observable can be predicted by averaging over the samples of the simulation, shown as red line. In this example, the experimental value corresponds to the left population of the simulated observable. Due to force field errors (Fig. 2A), some samples of the simulation are likely overrepresented, shifting the computational ensemble average away from the experimental result. In this hypothetical example, the right population is an artifact of the force field, causing the simulated expectation value (red line) to be overestimated.
In general, there are two main approaches to address such miscalculated observables due to force field errors. Experimentally derived boundary conditions can be imposed during the simulation, to correct the force field for a specific system. Because these conditions are set a priori, they are baked into the trajectory, making later adjustments complicated and expensive. An a priori approach, to impose experimental restraints during the simulation, may guide the ensemble towards otherwise unsampled conformations, but bears the risk of getting stuck in a small amount of local minima due to too strong restraints, potentially leading to unintentional overfitting to the experimental observable.
Alternatively, ensemble reweighting can be used a posteriori to increase the impact of conformations that agree with the experiment, while reducing the impact of conformations that are in disagreement with the experiment. Reweighting methods yield new weights for the ensemble such that inappropriate conformations become insignificant. In our example, the refined simulated value of the observable can be seen in Fig. 2C (blue line) after the reweighting. Now the simulated average is in much better agreement with the experimental observable. This example already demonstrates key requirements necessary for the successful reweighting of conformational ensembles. The initial ensembles needs to be well-sampled, covering the entire relevant conformational space. In a second step, after the initial ensembles has been generated, the reweighting algorithm picks a sub-ensemble to better represent the experimental data by adjusting the statistical weights of the ensemble. As ensemble reweighting cannot generate new conformations that were not in the initial ensemble all relevant conformations must be sampled beforehand. An in-depth discussion on imposing boundary conditions a priori as compared to a posteriori reweighting can be found in Rangan et al.40
Regarding the nomenclature of methods, we understand the term maximum entropy methods as an umbrella term for a group of specific methods and implementations in which the initial ensembles are modified as little as possible given the conditions. This clearly separates maximum entropy methods from maximum parsimony methods, which maximally reduce the ensemble. The scope of this work focuses on the explanation and investigation of Bayesian ensemble refinement and the minimum relative entropy method, both commonly used methods within the maximum entropy umbrella term due to their closeness to the maximum entropy principle. A special case of the minimum relative entropy method, in which the initial weights are uniform, may also be referred to as entropy maximizing, as described in the appendix.
Bayes’ theorem allows to calculate the conditional probabilities of events:
![]() | (4) |
The prior probability P0(model) is the estimated probability of being correct before any data is observed. In the context of ensemble reweighting, the associated model parameters could be obtained from MD simulations. The conditional probability P(data|model) is a measure for the likelihood that the assumed model parameters can reproduce the observed data. The marginal probability can be interpreted as normalization constant such that the posterior probability qualifies as probability. It can be ignored in the case of an optimization problem where we search for the model that maximizes the posterior probability.
The basic formulation of Bayesian ensemble refinement sees the weight vector w as the model to describe the ensemble. As such, the method can be summarized as:
P(w|data) ∝ P(data|w) × P0(w) | (5) |
To design an appropriate function that measures how well the model parameters explain the observed data, P(data|w) should have a maximum when simulated and observed data match each other. It may be interpreted as the likelihood that the data can be reproduced given the model weights w. In the context of ensemble reweighting such a function can be designed as shown in eqn (6) if a Gaussian error can be postulated:
![]() | (6) |
For the prior probability of a model, we postulate that the model obtained from the unbiased simulation (w0) is the best representation of the true system. Thus, the probability to yield correct values for observables should be highest if w = w0.68 A qualifying (but not normalized) function comparing w with w0 can be found in the theta-scaled Kullback–Leibler divergence,69 which is equal to the relative entropy (eqn (7)) if the targeted distribution is normalized [ref. 70, p. 90] and theta is one:
![]() | (7) |
P0(w,w0) ∝ exp(−θSrel(w,w0)) | (8) |
To find the ideal model wopt, the global maximum of the posterior probability (eqn (5)) needs to be found:
![]() | (9) |
From eqn (9) the natural logarithm can be applied on both sides of the equation as the logarithm is a positive monotone transformation which does not alter the position of the extreme. After reordering the equation, the negative log posterior can be renamed to a cost function which leads to eqn (10):
![]() | (10) |
The minimum of the newly created cost function has to be found. The first term refers to the divergence to the initial distribution which should be small and the second term to the error to the experiment which also should be minimized:
optimize cost(w0, w1, …, wN) → min | (11) |
The choice of θ is system specific and an expression of the quality of the initial distribution of weights. A large value of θ results in an optimization that stays very faithful to w0 and accepts more significant violations in the data. A value of θ close to zero leads to a better agreement with the experimental data but w0 is only of little relevance, which exposes the risk of overfitting.54
The second term of the cost function evaluates the error of the simulated observables Ocalc compared to the measured observables Oexp and resembles closely the distribution, except for a constant. The constant is utilized in some implementations while not in others. This leads to a change of scale of theta depending on the specific implementation. In both cases, the value of
quantifies the error between the experiment and the simulation (weighted by wt).60
![]() | (12) |
Eqn (12) may be adjusted if the measured observable is not a scalar with a specific value but a range of valid results. In the case of NOE analysis the measured distance of a proton pair is described by a range of values enclosed by a lower and upper bound.71 For reweighting lower and upper bounds are set independently as one-sided limits; therefore implementations must make sure that only violated bounds contribute to eqn (12).
In addition to the relative entropy (Srel, eqn (7), ref. 69) it is common to define two additional types of entropy that depend on one or two probability distributions (Q and P):72,73
![]() | (13) |
![]() | (14) |
The maximum entropy method introduced by Jaynes74,75 allows to find a probability distribution that is in agreement with external conditions while preserving maximal entropy given the conditions.76 The relative entropy can be interpreted as the information lost when using distribution Q(x) as an approximation of distribution P(x). If minimized, the distribution Q(x) can be assumed to be the distribution that meets all necessary conditions while requiring minimal additional information.77
The Shannon entropy (eqn (13)) reaches its maximum when the probability distribution is uniform.78 This property of the Shannon entropy explains why most methods in conformational ensemble reweighting that try to preserve the initial ensemble generated with MD are called maximum entropy methods. It can be shown (Appendix A.1) that the maximum entropy method can be a special case of the minimum relative entropy method if the weights w0 are uniform [ref. 79, pp. 291–292].
The relative entropy (also called Kullback–Leibler divergence, eqn (7)) is the difference between Shannon- and cross-entropy (eqn (14)) and a metric to evaluate the similarity of two probability distributions. If both discrete probability distributions Q and P are equal, the relative entropy is zero. The relative entropy is positive and increases with diverging distributions Q and P [ref. 70, p. 90]. An important property of the KL-divergence is it being not symmetric and failing to satisfy the triangle inequality, thus making it a divergence between the distributions and not a distance.80,81
From eqn (14) follows an alternative notation of the relative entropy:
![]() | (15) |
Due to the non-symmetry of the relative entropy a distinction between a forward case and a reversed case can be made (see ref. 80, pp. 71–74 and ref. 81–85). In the context of optimization methods, one of the two distributions is kept constant (P(x), reference distribution) while the other (Qv(x), approximated distribution) is being learned and therefore dependent on the optimization parameter.86
Forward KL-divergence (eqn (16))
![]() | (16) |
From eqn (16) it becomes apparent that the contribution of the Shannon entropy is independent from the variable distribution Qv(x) and doesn't influence the minimization of the relative entropy. Therefore, the minimization of the relative entropy in the forward formulation is equal to the minimization of the cross-entropy and often referred to as the minimum cross entropy method in literature.
It can be shown that the forward formulation of the KL-divergence is closely related to the maximum likelihood P(x) is chosen (Appendix A.2).
Reversed KL-divergence (eqn (17))
![]() | (17) |
In contrast to the forward formulation, the contribution of the Shannon entropy to the relative entropy is variable and cannot be ignored when using the reversed KL-divergence as loss function.
In practice, differences become relevant when systems with a low number of independent parameters are optimized. Fig. 3 shows an example illustrating the influence of the chosen loss function on the fitted distribution. The bimodal reference distribution P(x) in blue is to be approximated by a single Gaussian optimised distribution, Qv(x). An optimization using the forward KL-divergence is referred to as mode-covering (inclusive) and leads to a single broad distribution. The reversed KL-divergence optimization is called a mode seeking (exclusive) approach and leads to the selection of a single signal in the reference distribution.83,84,86 The relation of the reverse formulation of the minimum relative entropy method to the maximum entropy method given a uniform target distribution is shown in Appendix A.1.
The directionality of KL-divergence based loss functions is an important theoretical consideration when designing algorithms in data science. While Fig. 3 shows an example specifically designed to present the directionality of the loss function, its effect during reweighting of ensembles is more subtle. Nevertheless, we see minor differences when optimizing the same data using the same strength of optimization θ. In our work Stöckelmaier et al.87 we created a validation system for ensemble refinement using the small dialanine peptide. While a quantitative assessment of the algorithm presented here is beyond the scope of this work, we would like to refer to the Appendix A.3 showing the impact of the loss function directionality of Bayesian ensemble refinement. This and other comparisons in our previous work87 indicate, that the effect of the directionality is not dramatic but noticeable when refining conformational ensembles.
An initial distribution of weights (w0) is typically available from MD. It may be uniform if the data comes straight from MD or non-uniform if the data is reduced by clustering the conformational ensemble or obtained from biased ensemble methods like replica exchange MD. Both cases can be treated with the minimum relative entropy method. If the initial ensemble has been reduced by clustering, each calculated observable representing the cluster should, by itself, be an ensemble-average of the cluster.88 To optimize the conformational ensemble, the set of weights wopt has to be found that minimizes the relative entropy S (eqn (18)) in reference to w0:
![]() | (18) |
However, the minimization should be performed obeying two boundary conditions. The first represents the condition that the calculated and experimental ensemble averages of the observables should match:
![]() | (19) |
The second condition is a reformulation of eqn (3) and enforces that the updated probability distribution remains normalized:
![]() | (20) |
An optimization under the constraints given by eqn (19) and (20) can be solved using Lagrange-multipliers, λi and μ. The sign in front of each condition term does not influence the solution.
![]() | (21) |
The partial derivative of eqn (21) with respect to each vector element wt is taken:
![]() | (22) |
This equation can be rearranged and we can define λ0 as:
λ0 := 1 + μ | (23) |
such that
![]() | (24) |
![]() | (25) |
![]() | (26) |
The term e−λ0 should be interpreted as normalization term. The value of e−λ0 can be obtained using the condition (20) which leads to eqn (27):
![]() | (27) |
We define a partition function, Z:
![]() | (28) |
![]() | (29) |
Combining eqn (26) and (29) the reweighted probabilities can be calculated:
![]() | (30) |
Eqn (30) connects the optimal weights for the N conformations in the ensemble to the Lagrange multipliers, λi for each of the M observables. This significantly reduces the dimensionality of the optimization problem, but solving a M-dimensional optimization problem still remains a difficult task. To calculate the vector λ it is possible to turn the problem into an easier optimization problem using the Lagrangian duality formalism. A solution is described in ref. 48, 89 and 90 and used in ref. 29, 91 and 92.
The concave Lagrangian dual Γ(λ,μ) is introduced as a function of the primal optimization problem :
![]() | (31) |
Remember that the vector wopt should fulfill conditions (19) and (20). Accordingly, for the optimal solution, the condition terms of eqn (21) become zero. To calculate the infimum of the Lagrangian dual , eqn (24) gets substituted into the entropy term of eqn (21) which leads to eqn (32):
![]() | (32) |
Replacing −λ0 with eqn (29) then leads to:
![]() | (33) |
From the initial condition (19) it is defined that .
![]() | (34) |
To determine the optimal Lagrangian multipliers, the maximum of the concave Lagrangian dual is determined (supλ Γ(λ)). The function Γ (eqn (34)) should be maximized without constraints.
Oexpi + 〈εi〉 = 〈Ocalci〉 | (35) |
Instead of the original condition (19) an error corrected condition can be used. Therefore, the modified Γ-function for optimization problems including error is obtained:
![]() | (36) |
Cesari et al.29 further describe the methodology of treating a Gaussian shaped error with preassigned variance. The third term in eqn (36), describing the error, becomes:
![]() | (37) |
Finally, a proportionality constant θ is introduced which defines the influence of the error εi on the optimization. A choice of a large θ indicates that larger error are tolerated. If Gaussian shaped errors are assumed, eqn (38) should be maximized:
![]() | (38) |
According to the definition of the forward KL-divergence, we define the relative entropy as
![]() | (39) |
The Lagrangian function is set up similar to (21) but with the alternative entropy term. The partial derivative of the modified Lagrangian is taken and set to zero.
![]() | (40) |
Here, a significant difference to eqn (22) can be seen as the fraction w0t/wt in the equation is outside of a logarithm. The solution for the forward direction can still be formulated in terms of an optimization of the Langrange multipliers (via eqn (41)) but solving the problem as described previously is difficult, as the 'normalization constant’ μ cannot be calculated easily.
![]() | (41) |
In practice, the reverse formulation of the KL-divergence remains more accessible when using a Lagrangian solution strategy. It is the regular choice as loss function even though the mode-covering behavior of the forward case remains interesting for the optimization of molecular ensembles. Non-Lagrangian solution strategies to optimize ensembles using the forward case remain attractive and can be seen as a further area of research. As a basic solution, Bayesian ensemble refinement described in Section 3.1 can easily be modified to apply both the forward and the reversed KL-divergence.
Bottaro et al.92 describes a five-fold cross-validation to estimate the optimal value of θ for their implementation of the reversed maximum entropy approach. The observables and the conformational ensemble are split into a training and validation data-set. The training set is used to calculate the optimized weights w while the validation set uses these weights to calculate the relative improvement (
, where
is calculated using the initial weights w0) as a validation score. This process is repeated for a set of different θ values. If a set of weights improves not only the agreement between simulation and experiment in regard to the fitted observables, but also in regard to previously unknown ones from the validation set, a validation score below one is calculated. It may be interpreted as the ability to find a set of weights compatible with the prior information, simulated and experimental data that is likely an improvement over the initial set of weights. On the other hand, a validation score over one may be interpreted as the inability to find a set of weights in agreement with prior information, simulated and experimental data that is an improvement over the initial set of weights in regard to previously unknown observables. Thus, it may indicate overfitting of the data. A plot (Fig. 4) showing the relative
improvement as function of θ is used to tune the strength of the optimization. In the best case, a well behaving curve with little uncertainty is shown, indicating an ideal choice of θ at the minimum of the curve. In practice, the curve often shows substantial levels of noise and lacks an obvious minimum but shows a steep increase of the relative
improvement at low theta values. In this case, it may be reasonable to choose a value of θ just before the steep increase in slope manifests. To confirm the plausibility of the chosen θ, the resulting ensemble after reweighting should be checked manually to confirm that the new ensemble remains plausible, both in size and conformations.
Alternatively, it is also possible to tune the strength of optimization such that the optimized ensemble evaluates to an error estimate of .60,93,94 A
-value of one quantifies that the average error of the ensemble is equal to the sum of uncertainty from experiment and simulation. While this approach is straightforward at first glance, it assumes that the uncertainty from simulation and experimental measurement is additive and well characterized. In many practical application, as in our recent work,87 both the uncertainty from simulation and experiment is guessed, making the absolute value of
a reasonable indicator but difficult to use as a conclusive criterion.
In the last decade, numerous implementations of maximum entropy methods have been developed and applied. The theoretical foundation behind the methods is based on the established information theory by Claude Shannon. While the theory behind ensemble refinement is solid and well established, most methods work as black-box optimizer for many users. In this work, we focused on the foundation of the technique to promote a broader understanding of the methods as we believe this is important to allow for proper interpretation of the refined conformational ensembles. We want to emphasize that reweighting methods require well curated data, both simulated, experimental and in regard to prior weights. Ill curated data used during the process of reweighting may lead to misleading findings that are difficult to spot and may promote incorrect findings. In summary, however, it can be stated that reweighting works well if used carefully with well curated data. Maximum entropy methods show a solid theoretical foundation and promising properties to integrate simulated and experimental data, allowing new and exciting insights into molecular behavior.
![]() | (42) |
In the case of a uniform distribution P(x) this leads to:
![]() | (43) |
![]() | (44) |
![]() | (45) |
DKL(Qv‖P) = −(SShannon) + const | (46) |
As ln(N) is a constant, optimizing Qv with respect to minimizing the relative entropy is equivalent to maximizing the Shannon entropy if the target distribution is uniform. Thus, maximum entropy optimizations are closely related to the minimum relative entropy optimizations in its reversed formulation.
![]() | (47) |
In the case of a uniform distribution P(x) this leads to:
![]() | (48) |
![]() | (49) |
![]() | (50) |
![]() | (51) |
The first term of eqn (50) is constant. In consequence, the close relation between the relative entropy minimization in its forward formulation and the negative log-likelihood minimization is shown if a uniform distribution P(x) is chosen.
![]() | ||
Fig. 5 The loss-function of Bayesian ensemble refinement allows for an easy implementation of both the forward and reversed direction of the KL-divergence. Using the dialanine system with equipotential (uniform) initial weights as presented in ref. 87 to test ensemble refinement, the directional dependence of Bayesian ensemble refinement was tested. The top row shows the result of the reweighting using the forward (mode-covering) direction with four different values of theta tested. The second row shows the same system with the same θ-values tested using the reversed (mode-seeking) direction. |
Column two (θ = 0.25) demonstrates the subtle differences between the directions. While in general the same regions get populated, the ensemble preservation of the forward (mode-covering) direction remains higher with (on average) lower weights in the preferred β-sheet region.
This journal is © the Owner Societies 2025 |