Open Access Article
Daniil O.
Soloviev
and
Christopher A.
Hunter
*
Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK. E-mail: herchelsmith.orgchem@ch.cam.ac.uk
First published on 29th August 2024
Musketeer is a powerful open-source software tool for the analysis of titration data, featuring a simple cross-platform graphical interface for importing data directly from UV-vis, fluorescence and NMR spectrometers, or from spreadsheets. The fast data analysis algorithm can be used to obtain equilibrium constants for simple binding isotherms, as well as for more complicated systems with multiple competing equilibria. Applications of Musketeer for the analysis of a range of different supramolecular and biomolecular systems are illustrated, including titrations with multiple spectroscopically active species, competitive binding assays, denaturation experiments, optimisation of concentrations as variables. The software also includes a number of tools that can be used to select the binding isotherm that represents the best model to describe a dataset.
Proprietary tools, such as HYPERQUAD and AFFINImeter, have seen widespread use, but there are limitations associated with cost, generality of applicability to instruments made by different manufacturers, and the range of different binding isotherms that can be analysed.3–5 Open-source software tools for analysing titration data, such as BindFit, SupraFit and SIVVU, have also been developed, both in the form of standalone software, or as plugins for commonly used proprietary programs, such as Mathematica and MATLAB, but these tools are often limited in the scope of experiments they support.6–10 A comprehensive review of the most popular tools circa 2012, and the strengths and limitations associated with each one, is provided by Thordarson.11
COGS and SPECFIT were among the first algorithms capable of handling arbitrarily complex speciation in the analysis of titration data for systems with multiple components and equilibria, but the original software is no longer supported.12–14 While some newer software tools implement the same speciation algorithms, these approaches rely on an iterative refinement of concentrations. As a result, the computational complexity scales poorly with the number of components in the system, and the functions are not guaranteed to be free from local minima. In addition, the implementation of complicated binding isotherms often requires the user to write their own code. Here we present a new open-source software tool for fitting titration data, Musketeer, which uses a new algorithm to rapidly calculate speciation for binding isotherms of arbitrary complexity. The software is controlled through an accessible cross-platform graphical interface that accepts data from any type of instrument or spectroscopic technique, provides a range of standard binding isotherms, and makes it straightforward for the user to implement more complicated models.
Next, the total concentration of each component present in solution for each addition is required. The concentrations can either be entered directly, or as stock solutions and addition volumes. Concentrations can also be entered as variables to be optimised using “?”. This function is important when working close to the tight-binding limit, where a very small error in concentration can have a very large effect on the quality of fit (see below). Optimisation of concentrations also allows for analysis of systems where not all of the concentrations are known. When dealing with multiple unknown variables, it can be beneficial to enter an initial guess for some variables, which can be done by entering a value prefixed with “∼”. The optimisation algorithm tends to converge quickly as long as initial guesses are within two orders of magnitude of the optimum value.
The last piece of data relating to the experiment is specification of whether the different species present in solution are in slow or fast exchange on the spectroscopic timescale.
Relationships between different equilibrium constants are entered by selecting the “Custom” option in the dropdown menu. For example, if a host has two independent binding sites with identical binding affinities for the same guest, the global association constants for the 1
:
1 and 1
:
2 complexes, K1 and K2, can be derived from a single variable, Kmicro, which corresponds to the microscopic association constant for a single binding event (eqn (1)).
![]() | (1) |
This case is sometimes referred to as the non-cooperative model,6,8 and the relationship is specified in Musketeer by using the “No cooperativity” option, which automatically sets up the non-cooperative model for any number of host binding sites. For more complicated relationships, the “Custom” option can be used to define any number of variables, such as microscopic binding constants, effective molarities, and statistical factors. Each global association constant is expressed as the corresponding statistical factor multiplied by the product of each variable raised to a specified exponent. There are additional options to constrain variables, such as equilibrium constants, to fixed values that have been measured in independent experiments.
The use of the tools contained in Musketeer is illustrated below using experimental data from UV/vis absorption, fluorescence, NMR titrations, displacement assays and denaturation experiments. These examples show how Musketeer can be used to establish the most appropriate binding isotherm that is justified based on the experimental data. The denaturation example demonstrates how Musketeer can be used to build complicated models to describe multiple competing equilibria, but without introducing large numbers of variables that would lead to overfitting.
:
1 complex formed by a host H and a guest G, the total absorbance A at any given wavelength is given by eqn (2).| A = εH[H] + εG[G] + εH·G[H·G] | (2) |
:
1 complex.
To simplify the fitting process, the free guest is often assumed to be spectroscopically silent (εG = 0).1,8,15 However, even in cases where the guest has no obvious chromophore, accounting for the absorbance due to free guest can be important, because high concentrations of guest may be required to reach saturation in the binding isotherm. An example is shown in Fig. 1.16 A UV/vis absorption titration was carried out using an amide as the host and tri-n-butylphosphine oxide as the guest. Fig. 1c shows the fit of a 1
:
1 binding isotherm to the titration data, assuming that the guest does not absorb. Although this isotherm appears to provide a reasonable description of the blue data, the experimental points and the calculated line clearly diverge at high guest concentrations for the orange data. This result means that there is at least one other species that contributes to the observed UV/vis absorption spectra.
![]() | ||
Fig. 1 A UV/vis absorption titration (0.047 mM host in n-octane at 298 K).16 (a) Chemical structures. (b) Overlay of UV/vis absorption spectra. The initial spectrum is shown in black, the final spectrum in red. (c) Best fit to a 1 : 1 model allowing for absorption of the host and 1 : 1 complex only (K1 = 1120 M−1, RMSE = 7.8 × 10−3 AU). (d) Best fit to a 1 : 2 model allowing for absorption of the host, 1 : 1 complex, and 1 : 2 complex (K1 = 3800 M−1, K2 = 3.50 × 10−8 M−2, RMSE = 9.6 × 10−4 AU). (e) Best fit to a 1 : 1 model allowing for absorption of the host, free guest, and 1 : 1 complex (K1 = 3800 M−1, RMSE = 9.6 × 10−4 AU). Three plots are shown for each model: comparison of experimental absorbances at selected wavelengths (points) and the calculated values (lines), the calculated populations of species that contain host, and the calculated contributions to the endpoint spectrum (grey). R = 2-ethylhexyl. | ||
It is possible that there is a weak second binding event that leads to small amounts of a 1
:
2 complex at high guest concentrations. However, at high guest concentrations, the absorbance increased at all wavelengths, which suggests that absorption due to free guest may be the cause. Any number of spectroscopically active species can be considered in Musketeer, and Fig. 1d and e show the fits obtained by including either free guest or a 1
:
2 complex as the additional species. Both models provide a significantly better description of the Experimental data, reducing the root mean square error (RMSE) between the experimental and calculated spectra by an order of magnitude. It is worth noting that even though the simple 1
:
1 isotherm appears to provide a reasonable description of the blue data points in Fig. 1c, the association constant obtained from this fit (1120 M−1) deviates significantly from the value obtained in the other two models (3800 M−1).
:
2 complex is less than 1 part per million, because the association constant for formation the 1
:
2 complex is extremely low (K2 = 3.50 × 10−8 M−2). However, the calculated spectra show a substantial absorption for the 1
:
2 complex, which would require a molar absorption coefficient that is many orders of magnitude higher than the value for the free host or the 1
:
1 complex. This observation suggests that although the model achieves a good fit of the experimental data with a lower RMSE than the simple 1
:
1 model (9.6 × 10−4 AU compared with 7.8 × 10−3 AU), this result is an artefact of a very large calculated absorbance for a complex that is not formed to any extent in the experiment.
In contrast, the calculated spectra for the model that allows for guest absorption in Fig. 1e show that the increase in absorption observed for the orange data points comes from the tail of a very weak absorption at low wavelengths due to free guest, which is present at very high concentrations at the end of the titration. We conclude that the model used in Fig. 1e provides the best representation of this system. This example highlights the danger of overfitting by using too many variables. The model used in Fig. 1e produces a good fit with one additional variable (εG). The model used in Fig. 1d has two additional variables (K2 and εH·G2), and the interplay of these variables produces a good fit, but only with values that are not physically reasonable.
:
2 complex in the model improved the fit shown in Fig. 1d and including guest absorption improved the fit shown in Fig. 1e, so one might be tempted to use a model that includes both guest absorption and the 1
:
2 complex. Fig. 2 shows the resulting relationship between the RMSE and the value of K1. The blue line shows the RMSE plot for K1 obtained using the model that includes the 1
:
1 complex and guest absorption only (cf.Fig. 1e). The orange line shows the RMSE plot for K1 obtained using the more complicated model that includes the 1
:
1 complex, the 1
:
2 complex and guest absorption. The minimum in RMSE is slightly lower for the more complicated model, as expected, but it is clear that the value of K1 cannot be reliably established using this model. The reason is that there are compensating changes in the other variables, such that any value of K1 greater than 3 × 103 M−1 gives a similar RMSE. In contrast, there is a well-defined minimum in the RMSE plot for the simpler model, which shows that an optimal value of K1 can be accurately identified using this model.
![]() | ||
Fig. 2 Relationship between the RMSE and the value of K1 for the UV/vis titration data in Fig. 1 fitted using a model allowing for absorption of the host, free guest, and 1 : 1 complex (blue) or a model allowing for absorption of the host, free guest, 1 : 1 complex and 1 : 2 complex (orange). | ||
:
1 binding isotherm. Visual inspection suggests that this model provides a good description of the data, but replotting the result in a normalised format based on the total change in chemical shift for each signal reveals something different. The green ArH2 data clearly follow a different pattern from the other two signals, which indicates that an additional species is present.
![]() | ||
Fig. 3 A 1H NMR titration (0.208 mM host in n-octane at 298 K).21 (a) Chemical structures, showing the labelling of the host protons. (b) Best fit to a 1 : 1 binding isotherm (K = 293 M−1, RMSE = 5.15 × 10−3 ppm). (c) Best fit to a 1 : 2 binding isotherm (K1 = 328 M−1, K2 = 34 800 M−2, RMSE = 1.57 × 10−3 ppm). Three plots are shown for each model: experimental changes in chemical shift, Δδ (points), and calculated values (lines), normalised changes in chemical shift, and the populations of species that contain host. R = 3-heptyl. | ||
Fig. 3c shows the corresponding plots for a fit to a 1
:
2 binding isotherm. This model clearly accounts for the difference in the behaviour of the three signals and represents a better description of the experimental data. The poor fit of the green data points is masked in the plot of the raw data in Fig. 3b, because the change in chemical shift of this signal is much smaller than the other two signals. However, the green signal is much more sensitive to the second binding event than the other two signals and therefore contains important information for evaluating the different models. Although the association constant obtained from the 1
:
1 model is similar to the value obtained from the 1
:
2 model, the error introduced by using a 1
:
1 binding isotherm leads to an underestimate of the value (293 M−1 compared with 328 M−1).
:
2 binding isotherm. At first sight, the quality of the fit appears reasonable, but closer inspection of the data near the turning point at one equivalent of guest reveals a significant discrepancy. For titrations carried out near the tight binding limit, most of the data points lie on two straight lines, and it is only the curvature of the data points near the turning point that contains information about the association constant. The quality of the fit in this region is therefore critical in these cases.
The fit in Fig. 4b was obtained using fixed concentrations of 1.11 mM for the host stock solution and 25.0 mM for the guest stock, which were determined from the weights of host and guest, and the volume of solvent used in the experiment. The fit in Fig. 4c was obtained by allowing the concentration of the host stock solution to be optimised as a variable. In this case, an excellent fit was obtained at the turning point by using an optimised value of 0.95 mM for the host concentration, which is close to but different from the experimental value. Fig. 4d shows that a similarly good fit can be obtained by allowing the concentration of the guest stock solution to be optimised as a variable instead. The optimised value of 29.1 mM for the guest stock solution is 16% higher than the experimental value, and this percentage increase is the same as the percentage decrease in the optimised value for the host stock solution in the fit in Fig. 4b. In other words, a titration carried out in the tight binding limit accurately identifies the relative values of the host and guest stock solutions, but an error of 16% in either the host solution or the guest solution, or an error of 8% in both, could equally well account for the data.
The association constant obtained from the fit in Fig. 4b (K1 = 7.75 × 104 M−1) is quite different from the values obtained using optimised concentrations (K1 = 2.76 × 104 M−1 or 2.45 × 104 M−1 by optimising the host concentration or guest concentration respectively). The variation in association constant obtained by optimising the host or guest concentration (2.6 ± 0.2 × 104 M−1) is comparable to the errors obtained from multiple repeats of an NMR titration, whereas the error introduced by not optimising concentration is much larger. The result illustrated in Fig. 4 is a general feature of titrations carried out close to the tight binding limit. An accurate value of the equilibrium constant can only be obtained if the concentration of either the host or guest stock solution is optimised as a variable and a sufficient number of data points is collected in the region of the turning point where there is curvature. Sometimes titrations are intentionally performed close to the tight binding limit in order to verify the stoichiometry of a complex. If the host has N guest binding sites, in the tight binding limit the binding isotherm will plateau when the total concentration of guest is N times the total concentration of host, allowing N to be determined directly from concentration of guest at the turning point. These conditions are widely used for isothermal titration calorimetry (ITC), where a host concentration in the range of 10/K to 500/K is recommended. In ITC, it is conventional to optimise N rather than the concentration of one of the stock solutions, so that discrepancies in the concentrations of the host and guest stock solutions are absorbed in a non-integer value of N.22 In Musketeer, the same result is achieved by allowing the concentration of the host stock solution to be optimised as a variable.
![]() | ||
| Fig. 5 A fluorescence displacement assay showing titration of guest B (E570) into a mixture of host (500 nM Aβ42 fibrils) and guest A (1 μM ThT) in pH 7.4 PBS buffer at 298 K (λex = 440 nm, λem = 483 nm).24 (a) Cartoon representation of the partial displacement model. (b) Best fit to a full displacement model (KB = 4.8 × 106 M−1, [sites] = 52.4 pM, RMSE = 4.72 AU). (c) Best fit to a partial displacement model (KB = 2.3 × 107 M−1, [sites] = 81.3 pM, B sites = 66%, RMSE = 1.36 AU). Two plots are shown for each model: experimental fluorescence intensity, I (points), and calculated values (lines), and the populations of host species. | ||
Fig. 5c shows the fit obtained using a partial displacement model, which assumes that there are two different binding sites on the fibrils, one of which binds both ThT and E570, and one that binds only ThT. Although this model requires the relative proportions of the two binding sites as an additional variable, visual inspection of the quality of fit confirms that this model gives a significantly better description of the data.
![]() | ||
| Fig. 6 A 31P NMR denaturation experiment showing addition of DMSO to a mixture of two oligomers, ADA and DAD, in chloroform at 298 K.25 (a) Structure of the H-bonded duplex formed by ADA and DAD. (b) Cartoon representation of the species present in a three-state denaturation model. (c) Best fit to the three state denaturation model in (b) (KADA·DAD = 1,130 M−1, RMSE = 4.10 × 10−2 ppm). (d) Best fit to the more complicated model illustrated in Fig. 7 and 8 (KADA·DAD = 2150 M−1, RMSE = 1.10 × 10−2 ppm). Two plots are shown for each model: experimental difference in chemical shift compared with a reference phosphine oxide at the same concentration of DMSO, Δδ (points), and calculated values (lines), and the populations of species that contain ADA. R = CO2iBu. | ||
Fig. 7 shows that there are ten different stoichiometric species that could be present in this system (degenerate binding modes are shown for each stoichiometry), and a different equilibrium constant and chemical shift is required to describe each one. Although some of these parameters can be independently determined, there are too many unknowns for any fit of the denaturation data to be reliable. This problem can be solved be making some assumptions about relationships between the parameters to dramatically reduce the number of variables without reducing the complexity of the model. Fig. 6 shows that the equilibrium constants required to describe the ten different stoichiometric species can be expressed in terms of six different equilibrium constants, five of which can be determined experimentally, leaving just one variable, the ADA·DAD association constant KADA·DAD. The association constants for duplexes that make one and two H-bonds (K1 and K2) were measured by 31P NMR titration experiments using shorter oligomers, and the association constants for ADA2 and DAD2 were measured by 31P NMR dilution experiments.
A similar approach was used to reduce the number of variables required to describe the chemical shifts of the ten different stoichiometric species. The chemical shifts of ADA2 and DAD2 were measured directly by 31P NMR dilution experiments, and the chemical shifts of all other species were treated as the population-weighted average of signals due to free and bound phosphine oxide groups. Fig. 8 shows how the chemical shifts of all species involving ADA are described using just two variables, δf and δb. Similar relationships can be written for the species involving DAD using two analogous variables for the chemical shifts. Fig. 6d shows the fit to the DMSO denaturation data that was obtained using this model. The quality of the fit in Fig. 6d is significantly better than the fit in Fig. 6c, even though the same number of variables were optimised: the free and bound phosphine oxide chemical shifts, and the association constant for duplex formation.
:
1 binding isotherm with a spectroscopically silent guest involves 601 variables: the equilibrium constant, and the free and bound molar absorption coefficients at each wavelength. If these variables are optimised simultaneously, fitting will take a long time, and there is a high risk that the result will be a local minimum rather than the optimal values for all variables. To increase the speed of fitting and avoid local minima, we first separate the linear and nonlinear variables. Unknown total concentrations of the components and equilibrium constants are nonlinear variables. However, given the values of those variables, the concentrations of all species present at each addition can be calculated (see speciation algorithm below), and from there the concentrations of all spectroscopically active states are obtained by a simple linear transformation. The observed signal is then given by| Y = AX | (3) |
Given Y and A, the exact solution for the linear variables X can quickly be found using linear regression. By separating the variables this way, the fitting can be reformulated as a bilevel optimisation problem. The objective function to be optimised depends only on the nonlinear variables. For each input value, the objective function calculates A, solves for X, and returns the RMSE of the solution. A nonlinear optimisation algorithm can then be used to find the values for the nonlinear variables that return the smallest RMSE. In Musketeer, the Nelder–Mead method26 is used for the nonlinear optimisation, as implemented in the SciPy package.27
:
1 complexes or polymers of a single component, closed-form solutions can easily be found. However, for more complicated models with multiple competing equilibria, an exact solution usually requires finding the roots of a high order polynomial, and deriving the precise form of this polynomial may not be computationally feasible. Instead, it is usually quicker to solve the speciation for a complicated isotherm numerically to the desired precision. The speciation algorithm used by Musketeer is described below, and the matrix notation is explained in Table 1 using formation of a 1
:
2 complex as an example.
The speciation algorithm must determine s and c, given t, β, and M. Mass balance means that the total concentration of each component is equal to the concentration of the free component plus the concentration of each complex multiplied by the stoichiometric coefficient of the component in that complex. This gives the following constraint:
| t = s + Mc | (4) |
The concentration of each complex cj is given by the corresponding global equilibrium constant multiplied by the product of the concentration of each component raised to the power of the stoichiometric coefficient:
![]() | (5) |
Substituting eqn (5) into (4) gives the following set of constraints f = 0:
![]() | (6) |
Solving for the value of s that satisfies all constraints in f = 0 will give the concentrations of all free components at equilibrium, and the concentrations of all complexes can then be calculated using eqn (5). Rather than trying to solve all constraints simultaneously, the process can be simplified by first noting that
![]() | (7) |
Eqn (7) shows that the set of constraints f can be expressed as the partial derivatives of a single multivariate function, F(s), which is defined in eqn (8) (the corresponding function for polymeric species is described in the ESI†).
![]() | (8) |
Therefore, satisfying all constraints f = 0 is equivalent to solving for ∇F(s) = 0, i.e. finding the minimum of F(s). Since there is only one set of concentrations at which a system will be at equilibrium, F(s) has no local minima, and so a numerical optimisation method can be used to find the minimum. By calculating appropriate boundary conditions to ensure numerical stability, this function can be guaranteed to converge to any desired precision for any system (see ESI†). In Musketeer, the fastest results were obtained by using the L-BFGS-B algorithm28 as implemented in the SciPy package.27
Footnote |
| † Electronic supplementary information (ESI) available: The software is available from the GitHub repository, https://github.com/daniilS/Musketeer/releases. To install Musketeer on macOS or Windows computers download and run the relevant installer from the GitHub repository. For computers that already have Python installed, the musketeer package can be installed from PyPI using “pip install”. CSV files containing the data used in the examples described in the text and the corresponding.fit files are available in the ESI material. A step by step guide of how to implement a complicated model in Musketeer is also provided for the denaturation analysis shown in Fig. 6–8, along with full details of the Musketeer algorithm. See DOI: https://doi.org/10.1039/d4sc03354j |
| This journal is © The Royal Society of Chemistry 2024 |