Relaxometry Models Compared with Bayesian Technique: Ganglioside Micelle Example

In this work a methodology to perform Bayesian model-comparison is developed and exemplified in the analysis of magnetic relaxation dispersion (NMRD) experiments of water in Ganglioside micelle system. The NMRD powerful probe of slow dynamics in complex liquids is obtained. There are many interesting systems to study with NMRD, such as ionic and Lyotropic liquids or electrolytes. However, to progress in the understanding of the physical chemistry of studied systems relatively detailed theoretical NMRD-models are required. To improve the models they need to be carefully compared, in addition to physico-chemical considerations of molecular and spin dynamics. The comparison is performed by computing the Bayesian evidence in terms of a thermodynamic integral solved with Markov chain Monte Carlo. The result leads to a conclusion of two micelle water sites, and rules out lower and higher complexity level, i.e., one and three sites. In contrast, and provided only with the quality of best fit, suggest a three site model. The two approximate selection tools, Akaike and Baysian information criterions, may lead to wrong conclusions compared to the the full integration. The methodology is expected to be one of several important tools in NMRD model development, however, is completely general and should find awakened use in many research areas. In this work a methodology to perform Bayesian model-comparison is developed and exempliﬁed in the analysis of magnetic relaxation dispersion (NMRD) experiments of water in Ganglioside micelle system. The NMRD powerful probe of slow dynamics in complex liquids is obtained. There are many interesting systems to study with NMRD, such as ionic and Lyotropic liquids or electrolytes. However, to progress in the understanding of the physical chemistry of studied systems relatively detailed theoretical NMRD-models are required. To improve the models they need to be carefully compared, in addition to physico-chemical considerations of molecular and spin dynamics. The comparison is performed by computing the Bayesian evidence in terms of a thermodynamic integral solved with Markov chain Monte Carlo. The result leads to a conclusion of two micelle water sites, and rules out lower and higher complexity level, i.e. , one and three sites. In contrast, and provided only with the quality of best ﬁt, suggest a three site model. The two approximate selection tools, Akaike and Baysian information criterions, may lead to wrong conclusions compared to the the full integration. The methodology is expected to be one of several important tools in NMRD model development, however, is completely general and should ﬁnd awakened use in many research areas.


Introduction
The measurement of relaxation rates of nuclear spins versus resonance frequency is typically denoted relaxometry and the recorded data forms a relaxation dispersion (NMRD) profile. The experiment typically cover kHz to MHz resonance frequency adds additional information to NMR spectroscopy (denotes NMR relaxation). In particular relaxometry can characterise the dynamics in a wide range from nano-to micro-second regime. 1 These properties opens for study of complex systems such as cement, petroleum fluids, biological tissue and collective motion of whole protein, 2 dynamics in liquid crystals. 3 To obtain more meaningful information from relaxation experiments, relaxation models are utilized to provide details of the mechanisms underlying spin relaxation. 4 Starting from the spinsystem under study relevant mechanism to include may involve dipole-dipole, chemical exchange, paramagnetic and quadrupole contributions. [5][6][7][8][9][10][11][12] The challenges in building models involves incorporation of consistent molecular dynamics processes to explain experimental observations and account for spin-dynamics at sufficient ac- curacy. [13][14][15] Hence, these challenges may come in the form of (i) multiple physical interpretations, (ii) more than one parametrization within a specific physical picture. By addressing (i) and (ii), the relaxometry-research community can strengthen the communication of results and have progress in model development for challenging systems.
The Bayesian model selection is expected to provide a powerful method to rank conclusions drawn from different models and thus navigate past the challenges. Hence, in light of the data, NMRD-profile and additional experiments such as NMRrelaxation, Diffusion and X-ray, we can determine the "best" model out of several competing proposals. The definition of "best" need to include more than minima of cost function like χ 2 . 16 Rather, it should focus on reproducibility and a balance quality of fit versus complexity of the model. 17 Model selection is expected to help navigate in the physics and chemistry underpinning the relaxation models. This goes beyond qualitative guide in model selection via the Ockham's razor, that may be formulated as "it is futile to do with more what can be done with fewer". 18 For the water study in this work, the proton dipole-dipole (DD)relaxation is the most important mechanism to model. Even for one mechanism, multiple relaxation pathways can prove important, such as intramolecular contribution 19 , intermolecular DD modulated by molecular diffusion, 20,21 and/or proton ex-change. 22 The vast majority of relaxometry studies merge relevant physics in an analytical model describing the dynamics of the system and lead to a relatively large set of adjustable parameters. 5 Ideally a signature NMRD feature can be used and strengthen the model interpretation with fewer parameters such as DD-relaxation enhancement. 11 Such pronounced features are explored for paramagnetic 10,15 super-paramagentic 23 and quadrupole spin-system. 24,25 In this work the methodology is provided for Bayesian model selection and tested on micelle NMRD data. 5 In particular the example (ii) above, i.e., the communication of result, why does the two water-site model give the best explanation of data and not a simpler or more complex model. An alternative hypothetical example, why did we model anisotropic rotational diffusion in spite this not being the dominating contribution? Hence, does the data support this more complex model?
The communication can thus be strengthened, especially towards researchers that are not experts in the field. Other research-fields such as astrophysics and cosmology has a long tradition of using and benefiting from these methods. 26,27 A deeper understanding of approximations in analytical models as well as a partial or complete reduction of adjustable parameters has been obtained with molecular dynamics simulations. In relaxometry studies this is achieved for paramagnetic relaxation enhancement, 28 rotation and translation of small molecules, 29 water at bilayer interface, 8 water in cement, 30 and ionic liquid, 31 lyotropic phases, 13,32 porous organic cages 7 and proteins. 33 The simulation studies opens for exploration of the validity of analytical models and for some systems serves as the method of choice. However, model comparison are not removed by using molecular dynamics simulations, instead questions such as what is the sufficient size and timespan as well as what forcefield or quantumchemistry method are sufficiently accurate. Hence, based on different types of simulations, what outcome is the best in describing NMR-relaxation/relaxometry. The estimation of model parameters and model comparison is centred around a cost function such as the χ 2 : where R T i (x) is the theoretical model, dependent on adjustable parameters x, R exp i are the experimental observations and σ i the standard deviation in experimental observation i. The adjustable parameters of the model (x) is of size k, i.e., correlation times, order parameters diffusion constant etc, dependent on the physics and chemistry of the system under study. In this format we may merge multiple relaxometry experiments at various temperatures, diffusion measurements and high-field relaxation data. 5,34 There are large set of tools proposed around cost functions to compare models, the reduced χ 2 , can be computed by dividing by N minus the model degrees of freedom and for instance overfitting can be proposed. However, in particular for nonlinear models we don't know how many degrees of freedom are present and the guess "N − k", where k are the number of parameters is not robust. 35 Hence, χ 2 -value does not allow for model comparison in general.
Two approximate routes: Akaike information criterion (AIC) and the Bayesian information criterion (BIC), both attempting to balance quality of fit with complexity of the model. These methods have as one assumption an asymptotic limit of N for a reduced error. 36 In current work these are tested and to have limited value in NMRD micelle study.
The cross-validation (analysing part of the data) or more generally the Bootstrap-sampling have the drawbacks with validity in the asymptotic limit 16,37 and for model comparison, only approximate form are provided. 36,38 Given the common situation of small data set (N) in NMR-relaxation and NMRD the bootstrap method will not be explored in current work.
In this work we follow instead Bayesian theory and compute the model evidence, that balance model-complexity, that will be discussed further, and the quality of fit. Thus, a quantitative tool corresponding to Ockham's razor is obtained. 17,36,39 The Bayesian evidence is a challenging high-dimensional integral to compute even with Monte Carlo methods. However, by introducing a thermodynamic integration the sough property is computed with multiple but vastly less demanding Markov chain Monte Carlo method (MCMC). 40 In this work we implement the thermodynamic integration and show that this is a computationally feasible route for relaxometry model comparison. The result is quantitative discrimination between models, finding suitable complexity of micelle water-sites, a property that is qualitatively discussed previously. 5 The structure of the paper present the relaxation model derived in previous work, 5 in section 2. In section 3, the relevant Bayesian theory is provided with discussion of model complexity. This is followed by aspects to consider in MCMC simulations and the code provided in section 4. Finally the NMRD example is analysed and presented in section 5, followed by a summarising conclusion in section 6.

RELAXATION MODEL
In this work we reexamine and follow closely the notation for the model presented in Ref 5 , and refer to this work for further model discussions. The experimental observation of longitudinal relaxation rate R exp 1 (ω 0 ) is the rate-of-change in the spin ensemble due to the influence of the environment at resonance frequency ω 0 . At relatively low fields the dominating relaxation pathway is the DD-mechanism where the relaxation matrix for pairwise DDcoupling is well documented in the literature. 4,22 The key to the relaxation study 5 is the formulation of a relevant physical picture, i.e. the two site water model depicted in figure 1. The two types of waters are denoted (β i, i=1,2) both located at the micelle. Reasons for more than one site may be due to the large Ganglioside head-group allowing for several water locations, leading to different exchange times (τ w,i ) and populations ( f β i ), but modulation by the same overall micelle rotation τ R . The non-bounded waters are denoted bulk water in figure 1. In addition there are 2-3% Ganglioside protons that are not addressed with the model. 5  Fig. 1 Scheme of two-site H 2 O relaxation model. Hence, two water sites at the micelle with populations f β 1 and f β 2 , in addition to the bulk water. Micelle overall rotation has a characteristic time τ R and waters has a residence times τ w,i (i=1,2).
The theoretical prediction (R T 1 (ω)) takes the form where α are the sum of field independent contributions. 5 The order parameters originates from expressing spectral densities within a two step model 41 with τ f denotes fast local dynamics not explicitly accounted for in final model, but enters the parameters α. The field dependent parts for the two waters are given by where W D denotes weight factor for inter-molecular proton interaction (second part of eq. 4), τ c,i represent an effective correlation time and β intra is the amplitude of intra-water proton interaction. The effective correlation time accounts for the combined processes of micelle reorientation and water exchange:

PROPOSED MODELS
The intra-molecular DD-amplitude may be determined from with proton distance in water r HH = 1.58 Å and vacuum permeability µ 0 , Planks constant per radians (h), gyromagnetic ratio of a proton (γ H ). However, the amplitude for inter-molecular proton interactions is not know and takes the effective form In this format we can not resolve if intra or inter are the dominating relaxation pathways and is left for future work.
Relaxometry experiments are less sensitive to some model aspects and it makes sense to reduce the number of determined parameters. Following ref 5 , the two combined parameters S 2 i f β i are estimated, together with τ c,i and α. Thus, two water site model contain five adjustable parameters per temperature 5 x = {α, S 2 i f β i , τ c,i }, (i=1, 2) and 15 in total. We can not tell in advance how many water sites there actually are, and more importantly, how many the NMRD experiments are sensitive to. This opens for a qualitative discussion a long the line of Occham's razor to motivate a suitable number of water sites, two in the original work. 5 The number of water sites, however, forms a transparent example where quantitative model comparison may be followed. Thus we consider the models M a , M b and M c with two, one and three number of water sites respectively, leading to number of adjustable parameters as: dim(x a ) = 15, dim(x b ) = 9 and dim(x c ) = 21. In addition the model M d , also a two-site model but exploring a reduced set of parameters (dim(x d ) = 10).
As pointed out, 5 modelling can be refined if we have complementary heavy water experiments. 42 Furthermore, classical molecular dynamics may propose individual values to the merged (S 2 i f β i ) parameter, 43 and quantum chemical molecular dynamics may suggest constraints on proton exchange times. High field T 1 measurement increases the spanned frequency range with one order of magnitude. Hence, the model comparison is evolving and needs to be open to new information.

Theory: Bayesian approach to model selection
To define the model selection, consider the probability of model M j in the light of available data R exp (in this work the NMRD profile) that is formulated with Bayesian theory. With the given (additional) information I (for instance known concentration of surfactants, temperature of experiments etc.) 40 the probability takes the form where left hand side reads "probability of model M j given experimental data (R exp )" and Z is a normalisation constant. Thus, two models can be compared (P(M a |R exp , I)/P(M b |R exp , I)) without knowing Z. The right hand side contain the prior probability P(M j |I), i.e., the probability given to the model before we have the data and the so called evidence p(R exp |M j , I). The evidence may be expanded as 40 and is thus an integral of likelihood function over the whole multidimensional parameter volume (V (x j )) where parameters are considered random variables with probability measure µ(x j ). Hence, to get the probability of model M j we need to integrate the likelihood over the whole weighted parameter space.

Model complexity
It is worthwhile to look at one of the asymptotic approximations 40 of eq. 8: , containing a single likelihood maxima (L), with the ratio of the two parameter volumes, "the (small) likelihood" and "parameter boundary" volumes of dimension k. Hence, the Bayesian evidence balance quality of fit L penalised by model complexity, here given by the number of parameters k and the volume-ratio they span. It is important to note that the boundaries defined by the researcher (priors) have influence on the evidence. If we introduce more parameter in model, the fit will improve but at some point the model complexity dominates. Hence, with additional information, that in this work could be obtained from surfactant relaxation study or a molecular dynamics simulation, a reduction of some correlation times interval [x max − x min ] is motivated and there is an increase in the evidence, or a lowering of the negative logarithm, provided the fit is still good.
In eq. 8, the complexity involves also dependence between parameters and transition between multiple maxima, and thus includes more than the volume ratio. For instance, rotational correlation times may be expected to have inverse temperature dependence and by incorporating this assumption in the model the complexity may go down. An aspect explored with the proposed models.

Thermodynamic integration
For many problems, the eq. 8 is a high-dimensional integral and is difficult to estimate with numerical methods. 17,36 A flavour of the numerical challenge is given in Figure 2 where the details of the structure of the likelihood function needs to be integrated. 17 By introducing a one-dimensional thermodynamic integral instead of attempting direct computation of eq. 8, a much less demanding problem is obtained. In fact, as exemplified, 40 this may reduce the required number of sampling configurations from 10 138 to 10 6 ! To compute eq. 8 as a thermodynamic integral (without introducing approximations) an auxiliary function is introduced function 17,40 : and make use of the derivate of ln This gives: where the specific likelihood function in this work (L(x j ) β ∝ exp(−β χ(x j ) 2 /2)) is used and the mean value · β is computed for the β -probability density The approximation involve the statistical uncertainty from finite number of MCMC configurations, described in section 4. Integrating eq. 10 over [0, 1] gives where ln[F(0)] is zero (by normalisation) and the natural logarithm of the desired probability is obtained.

Bayes factor
With two models proposed (M a , M b ) the eq. 11 enables calculation of Bayes factor 36 : hence, for models that can be assumed equally probable before data is analysed the B ab provides the posterior odds. Commonly Bayes-factor is discussed in terms of empirical Jeffreys scale for the strength of evidence against model b, given in Table 1. This is an empirically calibrated scale.

Approximate model selection criterions
There are a number of approximate model selection criteria and we will explore Akaike and Baysian information criterion, (AIC) and (BIC) respectivley: where L opt is the maximum likelihood. 17 For the case of gaussian likelihood used in this work (see eq. 10) further simplification is 2 ln(L opt ) = χ 2 opt + "constant independent of model". As for the full calculation provided in this work (see eq. 11) BIC and AIC penalise a model with poor fit (large χ 2 ) and excessive adjustable parameters, with expected O(N −1 ) error, of experimental data N. 36

Methods
We present the parameter estimation with MCMC followed by the procedure for thermodynamic integral (eq. 11), for details we refer to the Matlab code and "readme" file presented in the GitHub example. 44  and thus the standard deviation (σ i ) of experimental relaxation rates in eq. 1. The experimental relaxation rates in this work 5 where obtained by fitting a function to R exp data points, leading to a much too small standard deviation that were then calibrated with additional experiments.

MCMC parameter estimation
Before any work with model comparison the parametrization is needed. The ideas are implemented in Matlab/octave code 44 and the steps given in flow-chart of parameter estimation (see figure 3). The start (a) for one of the models M a , . . . , M d , the parameter prior (p(x j |M j , I)) is chosen, i.e., in this work is the uniform (uninformed) prior. For the user this involve setting the x min and x max boundaries (given under "range" in tables 2-4). The parameter prior is used in (b) to generating N test starting configurations (in the range 4-10 {x} N test ), where these parameter sets are random variables from the sought prior distribution.
Core of the MCMC is efficient sampling of configurations (c). For ease of introducing alternative prior distributions the current version, function MCMCstep.m, 44 follow Metropolis algorithm, 7,45,46 hence, the proposed new parameter configurations are sampled at random direction (not for instance guided by likelihood function). Each parameter is given a move individually (hence as a component of Gibbs sampling 47 ). The parameter moves are reflected at the boundaries (x min and x max ).
The acceptance ratio of steps should as a rule of thumb be ∼0.5 to efficiently probe lower probability regions but where higher value (shorter steps) leads to greater correlation between configurations. It is noted that more efficient sampling may be found, but these are require model specific implementation. 45 The step length can be optimised versus the measured step correlation to further improve sampling. The step correlations are computed from the saved configurations, i.e., normalised auto and cross-correlation functions 48 (see SEI and implementation in the code 44 ).
The calibration of step length followed by performing ∼ 10 5 MCMC iterations towards equilibrium (region of smallest χ 2 ) is denoted "burn in". The burn in is followed directly by attempt of productive MCMC-steps (∼ 10 6 ) and this procedure are typically not intervened by the user. At this point comes the main question of MCMC, have configurations reached equilibrium? This question is hard to address from a single configuration, but for a set of N test looking for consistency in χ 2 opt versus the spread (standard deviation of χ 2 ) are examined, see SEI. If some configurations remain in a local minima, proceed with annealing MCMC, thus corresponding to simulation corresponding to higher "temperature" (low β ) where longer steps are made and local minima more easily surpassed.
Obviously there are no global minimisation routine (MCMC or otherwise) capable of arbitrary complex model, thus if some very challenging problem occur, the user should also consider if actually relevant physics is sought.
With all the {x} N test configuration successfully completed (d) an additional set is typically computed up to and including the burn-in step, to verify that we have a reproducible global minima in χ 2 . This follows by x j -histogram, i.e. total density projected on variable x j that is integrated to give 95%-confidence intervals. The optimal parameters are in this work given by reported in combination with 95%-confidence interval numerically integrated from MCMC histogram data. For the considered models one calculation of MCMC take 1-3 hours on MacBook Pro 3.1 GHz intel processor. The implementation 44 in Matlab R2019b has one option to submit multiple runs to supercomputer (see next section). The same code also runs with freely available Octave. 49 What is found valuable in the provided implementation 44 is the ease by which calibration of MCMC step-length is made automatic with the univariate moves (Gibbs sampling 47 ). In addition, how random initial configurations converge by exploring a set of fixed annealing temperatures for the challenging cases. These cases may in particular appear for complex models that still are interesting to give a fair trial. The annealing enables completely random initial configurations to query if proposed model result is reproducible within the given parameter boundaries.
There are several MCMC programs and it is found that in Ref 50 (used in NMRD 51 ) a type of configuration move is used that is not transparent to implementation of annealing. Furthermore, initial configurations are proposed to be from a suitable staring point, 50 thus, the correlated set of configurations does unlikely explore the whole prior parameter volume as is done with annealing and random initial configurations.

Numerical thermodynamic integral
The numerical computation of thermodynamic integral in eq. 11 follow by taking the set of optimised configurations {x opt } N test following from MCMC-parameter optimisation in Figure 3 and collecting data at discrete β values. The discretization is found by starting from five β values with logarithmic discretisation on interval (0, 1), where it is noted we have the β = 1 value from the parametrization and the β = 0 case is most efficient to generate with independently sampled configurations from the prior parameter distribution.
Based on the integral computed with trapezoidal rule, 16 the interval with largest contribution is located and split in half and integral is recomputed. This procedure continue until difference between two subsequent integrals is less than the desired tolerance. The same discertization was then used for all the models with β -values provided in the github example. 44

RESULTS AND DISCUSSION
The main aim of this work is to provide numerical (MCMC) Bayesian model selection to the relaxometry community and exemplify how this can be used in the interpretation of NMRD profiles. The example considered is relaxometry study of water in Ganglioside Micelle, 5 where previous work consider a model with two water sites, this number (and not one, three or more pools) are deemed the feasible amount of information to extract from NMRD-profiles. The main question is if quantitative model selection can conclude how many pools is best with the given data.

Model parametrization
From the parametrization of he models (M a , M b , M c and M d ) following Figure 3 we can identify if any of the models provide exceptional challenges before continued work with model comparison. The first stage, the parametrization, with optimal fits (parameter inference) are provided in figure 5 and estimated parameters in tables 2-5, providing the parameter boundaries in bottom row (i.e. the prior assumptions).
Common for models M a to M d are that optimisation is performed globally over the 90 data points and the exploration of of weight of inter-DD contribution case D W =0.4. It was found 5 that model dependence of D W is weak and is not the focus of this work to explore further.
For model M a it is found that MCMC estimation provide within 95%-confidence interval, χ 2 opt =35 and largely the same parameters as the original work, 5 with data in table 2. The exception is for (S 2 f β 2 ), that is slightly larger from present MCMC-analysis.
Thus, the same physical interpretation 5 is summarised, that is, a slow correlation time τ c,1 dominated by overall rotational diffusion of Ganglioside micelle, that from Stoke-Einstein-Debye (SED) relation is consistent with the radius of 54 Å, concluded from measurements by other techniques. 5 The shorter correlation time, originating from τ w,2 (see figure1) is ascribed to the proton exchange-modulated reorientation where water protons exchange with labile surfactant protons. 5, 52 The difference in the size of confidence intervals are striking, for MCMC they are always larger with the difference for (S 2 2 f β 2 ) is a factor 50. 5 Considering steps to check the MCMC parameter estimation, there are 7 simulations starting from independent starting configurations within parameter boundaries (see range in table 2). By estimating the MCMC-parameter auto-correlation (τ steps ) provides a handle how well the parameter space is sampled and it is found that χ 2 , (S 2 2 f β 2 ) and (S 2 1 f β 1 ) are sampled with 4·10 4 , 2·10 2 and 1·10 3 correlation times respectively. This points to a sufficiently sampled parameter-space given that additional "burn-in" is separately performed (see SEI, figure S2).
From the computed cross-correlations (see SEI), a substantial amount is seen between (S 2 2 f β 2 )-(τ c,2 ) and (S 2 2 f β 2 )-α for 20 • C NMRD-profile and are illustrated with histograms in Figure 4. The parameter correlation explains the large error boundary found in MCMC estimation. Note that this in it self not a problem for the MCMC-validity, except it leads to a large parameter uncertainty. This is since the whole parameter space is still sampled provided sufficiently long MCMC simulation is performed. On the other hand, parameter correlations introduce an error in uncertainty estimation from χ 2 -values where parameter distribution is unknown and the parameters are assumed to be independent degrees of freedom of the model. 39 The parametrization of model M b is given in table 3 and figure 5 (χ 2 opt =673) assumes one water pool. The model provide a poor description of high field data (see figure 5), however, complexity of model is less with total of 9 adjustable parameters. The least sampled parameter is (S 2 1 f β 1 ) with 6700 correlation times within the MCMC run and auto-correlation (for each parameter in MCMC run) is in the same regime as cross-correlations. The correlation time τ c,1 is about 20% shorter than for model M a and corresponds to a micelle radius about 3 Å smaller with the correlation time interpreted in terms of SED.
For model M c , the parametrization is provided in table 4 and the best fit in figure 5 (χ 2 opt =12) where three water pools are assumed, is also the best of all models tested. The parameter boundaries (τ c,2 and τ c,3 ) are chosen relatively narrow (based on the M a result), in order for the study to be reproducible from random initial parameter configurations. The model M c has similar outcome as M a , where τ c,1 is longer (considering optimal fit values), however, considering the 95%-confidence intervals the two models provide essentially the same outcome. The confidence intervals are large for M c , exceptionally for the high-field contribution α and amplitude (S 2 3 f β 3 ) as well as correlation times τ c,2 and τ c,3 . Correlation between several parameter pairs are substantial, in particular τ c,i , α and (S 2 3 f β 3 ) and this is exemplified with histograms in SEI.
Based on what is learned from MCMC study of model M a we propose an alternative in M d , to explore a simpler model. First, from table 2 the confidence intervals for (S 2 2 f β 2 ) are overlapping and the parameter correlations, suggest that one temperature independent parameter can be more useful for (S 2 i f β i ). Secondly, discussion of τ c,1 is in terms of SED relation and can be incorporated explicitly and with a temperature independent micelle radius (R) and water residence time (τ w,1 ). In total 10 instead of 15 adjustable parameters in M d .
Clearly this alter to some extent the physics of the model and must be kept in mind, for instance, a model that explains additional experimental data well is desired. The model M d is first aimed to illustrate the computed Bayesian-evidence and it is left for future work to derive a less complex model with the relevant temperature dependence of (S 2 i f β i ). 5 The parametrization of model M d are given in table 5, with best fit in figure 5 (χ 2 opt =37.8). Estimated micelle radius from the SED relation explicitly included in the model is 48 Å and is similar to what is obtained from M a assuming τ c = τ R . The water residence time for site-1 (β 1) are larger than τ R (τ w,1 ∼ 10µs) with a large confidence interval. The amplitudes (S 2 i f β i ) are similar to model M a . However, in M d parameter cross-correlations (see histogram lower panels figure 4) are, together with parameter confidence intervals reduced compared to model M a .

Bayes evidence
Based on the fits in figure 5 and the χ 2 opt values it is difficult do decide how to proceed. Clearly model M b has a poor fit at higher frequency but still parameters are the fewest and make sense, hence, according to Ockham's razor 18 this could be the choice? Or it could be convincing with a good fit (model M c ), especial if the additional parameters are useful in a discussion of the physical chemistry of micelles?
At this stage the Bayes-evidence and corresponding factor (see eqs 11 and 12 respectively), provide valuable information. The ranking properties in relation to the original model M a with χ 2 opt , ln[B a j ] and − ln[p(R exp |M j , I)] are provided in table 6.
In figure 6 the Bayes-evidence for model M a and M b (two and one water pools respectively) are shown with the left panel displaying the integrand of eq. 11 and the right the cumulative integral where β = 1 is the sought value. From left panel it noted that for low-β the M b performs better, whereas at β close to 1 the model M a is performing better (see linear scale inserts in figure 6). It is noted that at low β the Likelihood function has almost no contribution in eq 10 and thus the integral of χ 2 (x) is expected to be small only in a few small volume elements obtain a smaller value due to the lower parameter volume of M b . Considering the whole β -interval the large χ 2 for M b (conventional poor fit) leads to a larger evidence integral for M b . Thus comparing the values (− ln[p(R exp |M j , I)] in table 6) the model M a is significantly better, evidence against M b (single water site) is very strong (with Jeffery's scale). The later can be stated provided the prior probabilities for the models are equal (see eq. 12).
With three water pools of model M c the lowest χ 2 opt (at 12) of the four models is reached. The computed evidence is provided in figure 7, where although a much better fit is achieved in terms of χ 2 opt , the Bayes-factor dictates very strong evidence against M c and the three water-pool model (following Jeffrey's scale, see table 1). This has it's origin in the increased complexity of model M c . Finally the evidence for alternative two-pool model M d is provided in figure 8, were although larger χ 2 opt compared with M a , the evidence supports model M d . Hence, an interesting observation where a model with reduced complexity is favoured, in spite of the worse quality fit. However, noting that the amplitudes are assumed temperature independent make the physics of the two models slightly different and care is needed not to over interpret the Bayes-factor, noting that a ranking assumes equal probability of the two models compared (see eq. 12).
The approximate model selection criterions listed in table 6 are the AIC and BIC. By comparing the magnitudes of − ln[p(R exp |M j , I)] with AIC and BIC it is noted that smallest AIC value is for M c thus wrong order in comparison with the complete integration. The BIC provide qualitatively the same order and relatively close for 2 ln[B ab ] (533±1 for Bayesian and 585 for BIC). Discrepancy is large for 2 ln[B ac ], (234±9 for Bayesian and 30 for BIC).
If Bayesian evidence are relatively close (not the case in this study) the property χ 2 (x) − χ 2 opt (see table 6) is proposed to be combined with Bayesian evidence for ranking. 27 The property eliminates quality of fit and summarise model complexity as the effective number of parameters that in this work is found larger than the actual number (from table 6). This reflect how much the data can constrain versus model prediction. 27 Hence, with additional parameter not constrained by the data, for example proposing multiple parameters instead of the single field independent α, this property should increase.
The components for the presented Bayesian model comparison are, (i) accurately finding the experimental variance in χ 2 (see eq. 1) here 5 obtained from calibrating NMRD output with multiple experiments, (ii) parametrization of each model with MCMC (see figure 3) where the choice of likelihood function is Gaussian and parameter-priors are set to uniform type (in practise setting a max and min value of each adjustable parameter in the model, correlation time etc), (iii) computing 10-30 MCMC to obtain the integrand of thermodynamic integral. The stage (i) is important to address carefully, without correctly weighted experimental data model comparison is not useful. The choices in (ii) of likelihood and priors are useful to enable comparison with conventional maximum likelihood methods. 5, 34 Clearly we do not know a priori how large MCMC simulations are required, making researchers sometimes stay away from this technique. However, provided are a set of tools to monitor convergence and reproducibly and from a moderate computational investment it is clear if it is worthwhile to complete the study. In particular with the option to follow annealing the experience so far is that MCMC can do well in parametrization of NMR-relaxation, relaxometry and line shape studies, 6,7,51,53 and should in the future play an important role in Bayesian evidence calculation.

Conclusions
In this work the methodology of Bayesian model comparison are provided and exemplified with nuclear magnetic relaxation dispersion (NMRD) data. The NMRD profile provides otherwise difficult to probe slow dynamics in complex materials and biological systems. However, to reach conclusive results requires a carful model development to capture the physics and chemistry of the studied system. Thus, Bayesian model comparison fills a gap with quantitative input on how many parameters and mechanisms are reasonable to include in a model based on the data we have. A practical question that should always occur when faced with data from a new system.
The computational approach follow a thermodynamic integration to compute the Bayesian model evidence from a set of MCMC simulations. The technique followed, 40 makes this challenging task feasible. It was found that this integral can be computed with only 10-30 relatively low cost MCMC simulations, in this work each completed at 1-2 hours, meaning that extensive model comparison can be performed with access to a workstation or clusternode.
The computed Bayesian evidence rules out the simplicity of one-and the complexity of three-water cites in favour of a twocite model (M a ) based on the current NMRD data. Such conclusion can not be quantitatively drawn based on a conventional least square fit (or MCMC parametrization). Furthermore, a twosite model (M d ) of less complexity than proposed is provided, interesting given the improved Bayesian evidence.
However, the represented physics in this model (M d ) with temperature independent amplitudes are different and illustrate a case where care must be held not to over-interpret the Bayesian evidence. The proposed model M d suggest a direction for future work on a suitable model that more accurately capture the relevant physics and chemistry. These models can can incorporate more experimental data at higher field and an other direction is to incorporate information from MD simulations and for-instance resolve the order parameter and population. 8 It is found that approximate tools to compare models can de-viate from the complete integral (AIC), or qualitatively follow (BIC), however the later at large discrepancy at logarithmic scale. Suggesting that MCMC simulation of Bayesian evidence will play important role in development of NMRD theory, however, completely general and can find extensive use in other areas of physical chemistry as well.

Conflicts of interest
"There are no conflicts to declare".  Table 4 M c , following the three-site assumption of ref. 5 Global MCMC parameter estimation with three temperature data (sample concentration 15mM). Total χ 2 is 12.0 and fit displayed in figure 5.

Variable histograms
The variable histograms are provided in Figure S3-S6 for models Ma , Mb , Mc and Md respectively.