Nick H. J.
Geertjens‡
a,
Pim J.
de Vink‡
a,
Tim
Wezeman
a,
Albert J.
Markvoort
*b and
Luc
Brunsveld
*a
aLaboratory of Chemical Biology, Department of Biomedical Engineering and Institute for Complex Molecular Systems, Eindhoven University of Technology, Den Dolech 2, Eindhoven, 5612 AZ, The Netherlands. E-mail: l.brunsveld@tue.nl
bComputational Biology Group, Department of Biomedical Engineering and Institute for Complex Molecular Systems, Eindhoven University of Technology, P.O. Box 513, Eindhoven, 5600 MB, The Netherlands. E-mail: A.J.Markvoort@tue.nl
First published on 18th January 2023
Mathematical modelling of molecular systems can be extremely helpful in elucidating complex phenomena in (bio)chemistry. However, equilibrium conditions in systems consisting of more than two components, such as for molecular glues bound to two proteins, can typically not be analytically determined without assumptions and (semi-)numerical models are not trivial to derive by the non-expert. Here we present a framework for equilibrium models, geared towards molecular glues and other contemporary multicomponent chemical biology challenges. The framework utilizes a general derivation method capable of generating custom mass-balance models for equilibrium conditions of complex molecular systems, based on the simple, reversible biomolecular reactions describing these systems. Several chemical biology concepts are revisited via the framework to demonstrate the simplicity, generality and validity of the approach. The ease of use of the framework and the ability to both analyze systems and gain additional insights in the underlying parameters driving equilibria formation strongly aids the analysis and understanding of biomolecular systems. New directions for research and analysis are brought forward based on the model formation and system and parameter analysis. This conceptual framework severely reduces the time and expertise requirements which currently impede the broad integration of such valuable equilibrium models into molecular glue development and chemical biology research.
The solution to the equilibrium distribution of a two-component biomolecular system, with one-to-one binding, goes as far back as the Langmuir adsorption model, however more complicated systems consisting of three or more components can only be solved analytically in certain specific cases14,15 or after additional assumptions.16 The ternary body problem is mathematically unsolvable without approximation or assumption of the free concentration of one of the components.17 It has therefore gotten more common to develop custom-made mathematical models for specific types of biomolecular systems based on a combination of analytical and numerical solutions.17–20 Such equilibrium models have also been used for analysis of combinatorial libraries,21 macromolecular reactions within living cells,22 supramolecular copolymerization,23 and the design of synthetic signalling pathways.24 However, the creation of a new model for each unique system takes expertise and a non-trivial amount of time. The widespread integration of these models into chemical biology is thus impeded, even though a trove of information can be extracted from them. Increasing the accessibility of numerical equilibrium models will greatly aid the study of complex biomolecular systems.
Here we present a framework for multi-equilibrium models as a general approach to generating a custom model for any biomolecular system based on the simple, reversible reactions that constitute that system. The framework automatically determines the relations between the species concentrations at equilibrium and combines these with mass balance equations to analytically establish a system of coupled expressions for the equilibrium concentrations that are subsequently solved numerically, without the need to rely on assumptions, simplifications, and/or the need of computationally expensive calculations of reaction kinetics. In addition, the framework facilitates standardized methods for system and parameter analysis. This general approach severely reduces both the expertise and time required to construct new computational models. The framework was designed to especially cater to the experimental (laboratory) chemical biologist, who is possibly less accustomed to programming and modelling techniques. The manuscript starts with a description of the general derivation method. Next, this is used to revisit three chemical biology systems from recent literature25,26 to illustrate the simplicity and generality of the framework while providing models of the same quality as models created specifically for the cases in question, but also providing novel scientific insights in the underlying biomolecular interaction network. The complexity of the three systems is step-wise increased from a simple one-to-one protein–protein interaction via molecular glues for stabilization of protein–protein interactions26 to highly complex, multicomponent homogeneous immunoassays.25 The obtained parameters are compared to the original findings and the chemical biology systems at hand are analysed in more depth using methods from the framework, providing valuable insights and new directions for research. Installation instructions and complete protocols for the framework, as well as detailed tutorial descriptions to follow along with each of the described chemical biology cases are available in the ESI.†
For each single component within the system a mass balance equation is set up, stating that the overall concentration of that component should be equal to the equilibrium concentration of that component in uncomplexed form plus the sum of the equilibrium concentrations of all complexes multiplied by the number of times that component is present therein. Substituting the equilibrium expressions of the complexes determined earlier in these mass-balance equations results in a system of n equations with the n free component concentrations as the only unknowns. For given overall concentrations and equilibrium constants, these equilibrium equations can then be solved numerically by the model solver. All steps are performed autonomously based on the entered reversible reactions. A comprehensive protocol for each step in the framework (Fig. 1) is available in ESI† Appendix, Section 1.
Once a custom model has been generated, not only can it be solved for fixed concentrations and equilibrium constants, but also various analyses can be performed in the framework to gain insight into the biomolecular system at hand. These analyses include plotting the concentrations of all species in the system as a function of component concentrations for given equilibrium constants and parameter sensitivity analysis. Similarly, the framework allows predicting of equilibria upon tuning of equilibrium constants and concentrations, thus aiding in assay development.
The framework can also determine unknown parameters by fitting experimental data using the model solver. For this, a data function has been added that relates concentrations of one or more species to an experimental observable. The data function is specified in the model builder and is included in the custom model. The experimental data values must be a direct function of one or more species concentrations.13,32 Experiments where this is the case often include titrations and we will use the related terminology in the given examples. Experimental conditions in which the data was obtained can be defined with help of the config generator. Detailed instructions and the expected data format are described in the framework protocol (ESI† Appendix, Section 1). Using initial guesses for the unknown parameters, the model solver can then numerically calculate the equilibrium concentrations of all species based on the specified total concentrations for each data point. Subsequently, the values of the unknown parameters are iteratively adjusted to best fit the entire experimental dataset using a numerical least-squares optimization. The framework also offers various analyses for the critical assessment of the correctness of the chosen system description, i.e., the specified equilibrium reaction(s), and of the quality of the determined parameter estimates for that model, including model prediction plots, mean-squared-error landscapes and confidence intervals determined using the bootstrap method. A complete overview of all analyses is available in ESI† Appendix, Section 2 and the most important ones are illustrated in the following case studies.
The PPI example system consists of a single reversible reaction and the corresponding system description is depicted in Fig. 2A. To create a custom model for this PPI, the reaction was entered in the form ‘complexing species = complex; dissociation constant’. From this, the equilibrium equations in Fig. 2B are automatically generated and simplified, such that the final equations only consist of component concentrations and equilibrium constants. The ESI† contains an overview with the exact input values for each step to model the system discussed here, which can be used as a tutorial reference (ESI† Appendix, Section 3).
Fig. 2 Application of the framework on the PPI system. (A) System description, each 14-3-3 protein (R) can complex with one partner (P) protein (TASK3). The model builder input corresponding to this reversible reaction is displayed beneath the cartoon. (B) The equilibrium equations determined for this system. (C) Fluorescence anisotropy data collected for this system26 (dots are the averages of the experimental technical repeats) in the presence of 10 nM P (labeled component) together with the model fit (line). |
Parameter fitting and additional analysis is performed from the framework main script, which offers a simplified interface to the underlying functions. It is also possible to directly execute the functions in the framework for advanced customization if desired. For this simple PPI system, only the model plot analysis is executed. The framework determines a point estimate for the fit parameter KD before performing any chosen analyses. It is important to consider that the optimal estimate is determined within the constraints of the custom model and the framework thus does not judge the correctness of the system description, i.e., the specified equilibrium reaction(s). In order to validate the proposed system description, we inspect both the obtained parameter value and the model plot together with the root mean square error (RMSE) of the model fit through the data. The calculated estimate for the KD is 4.05 μM, which is in line with previous findings.37 The chosen analysis visualizes all the experimental data and the model prediction for each titrate (14-3-3) concentration using the determined point estimate (Fig. 2C). Inspecting the graph shows that the model prediction accurately describes the data with the determined parameter value, something that is corroborated by the RMSE, with a value of 1.56, being small compared to the fitted experimental data values.
Fig. 3 Protein–protein interaction stabilization case. (A) System description: the adapter protein 14-3-3 (R) binds to its partner TASK3 (P) with dissociation constant KD,I. In the presence of fusicoccin-A (S), the interaction is stabilized by a factor α, reducing the apparent affinity. Either the partner or stabilizer can bind first to form the ternary complex. (B) Fluorescent polarization data in the presence of varying concentrations of fusicoccin-A and a partner (labelled component) concentration of 10 nM (line), previously published26 for the model created specifically for this system. Note the decrease in EC50 value with the addition of the stabilizer (arrow). (C) Equilibrium equations automatically determined by the framework for this system. (D) Error-landscape plot centred on the determined estimates. The contours show that there is a valley of parameter combinations that result in a relatively low mean squared error (MSE). |
A custom model is created for the system description displayed in Fig. 3A from which the framework derives the equilibrium mass balance equations. For this three component system, this results in the set of three equations shown in Fig. 3C with the equilibrium concentrations of the three free components as the three unknowns. The experimental data for this case consists of 2D fluorescence anisotropy titrations (Fig. 3B and Dataset S2, ESI†).26 The step-by-step guide for this system can be found in ESI† Appendix, Section 4. Since the KD,I, (4.05 μM) was already determined in the previous section, this parameter is entered as such and kept constant and the model solver determines point estimates for the α and KD,II parameters, resulting in values of 1.34 × 103 and 0.389 mM respectively. These values are in good agreement with the previously determined values of 1 × 103 and 0.3 mM.26
To provide additional context and insights on the cooperativity value α and the affinity of the compound for 14-3-3 (KD,II), the mean squared error landscape around the determined point estimates was visualized (Fig. 3D). The landscape gives a sense of the interdependence of these two parameters and the influence of each by displaying contour lines. A valley of parameter combinations results in a relatively low mean squared error. The KD,II and α values display positive correlation, increasing both values (weaker initial binding, stronger cooperativity) still results in a relatively good prediction of the input data. Nevertheless, starting the solver from several combinations found at the valley of the landscape as initial guess values results in the same point estimates, indicating that there is a preference for the determined estimates over the other possible combinations. A sharp rise in error can be observed when one of the parameters is fixed and the other is varied. This indicates that the ratio of these parameters is important for accurate model prediction and can thus be determined with high confidence. Next to that, this analysis indicates that optimization regarding either KD,II or α are both valid approaches for improvement of compound properties.41,42
While a point estimate is the best single-value approximation of a parameter, a confidence interval can be effective to get a sense for the certainty (or spread) of the reported estimate. The framework can determine this interval using a nonparametric, bias corrected bootstrap approach.43–45 As an example, the confidence interval analysis is performed with 2000 repeats in order to get an appropriate sample size and a 95% confidence interval (ESI,† Fig. S6). Additional information on the bootstrap method is available in ESI† Appendix, Section 5. The confidence interval (lower; median; upper) for the KD,II (0.189 mM; 0.392 mM; 2.19 mM) and the α (0.687 × 103; 1.34 × 103; 7.35 × 103) show an order of magnitude difference between the limits of the confidence interval. This broad range corresponds with the stretched valley in Fig. 3D and is important to consider when drawing conclusions from parameter estimates.
Fig. 4 Ratiometric Plug-and-Play Immunodiagnostics (RAPPID) case. (A) System description.25 The antibodies A and B are both conjugated to a part of split NanoLuc luciferase, which emit light upon complex formation. The antibodies are specific for a certain analyte but target different epitopes. Upon antibody-analyte sandwich formation, the effective molarity for the split luciferases increases, resulting in increased complex formation and consequently increased signal. (B) Local parameter sensitivity analysis for the system using cardiac troponin I data (and corresponding concentrations) and a perturbation of +50%. The reference values are the point estimates determined by the framework. (C) Concentration plot generated with framework displaying how each concentration changes as the total titrate concentration increases. This analysis is performed without any experimental data with the fixed parameter values KD,N: 2.5 μM, KD,A: 10 nM KD,B: 15 nM, EM: 100 μM, [A]tot: 1 nM, [B]tot: 1 nM. The species BT and BTT have been omitted for clarity. (D) Simulation of shift in active ternary complex (ABTa) concentration for different combinations of antibody affinities and an EM value of 10 μM. Both antibodies have a total concentration of 1 nM. |
The system description for the multicomponent immunodiagnostic platform for intensiometric detection of cardiac troponin I is displayed in Fig. 4A and a step-by-step guide (ESI† Appendix, Section 6) is also available. A total of eight reversible reactions describe this system. The measured data (ESI† Appendix, Dataset S3 and Fig. S8) is proportional to the concentration of formed active ternary complex (ABTa) multiplied by an unknown scaling factor, representing the enzymatic activity of the luciferase. This data-concentration relationship is not available as a default data function in the model builder. However, the framework is not limited to these predefined functions, and it is possible to easily specify the relationship. In the model builder we define the custom function: ABTa × Scaling. Instructions on custom data functions are available in ESI† Appendix, Section 7. With the model point estimates for the parameters KD,A, KD,B and Scaling could be determined, with fixed values for KD,N and EM. The values for KD,A, KD,B and Scaling of 533 nM, 15.3 nM and 5.53 × 1017 RLU/M are identical to the values determined in the original paper.25 The framework thus also allows modelling of complicated multicomponent systems and with equal accuracy as specifically designed models.
In such complex systems, usually not all parameters contribute equally to the final measurement outcome. The influence of each parameter can be determined using local parameter sensitivity analysis.49 This quantifies the change in a given function (designated M) based on a percentage change in parameter value. The default M-function measures the change in the sum of squared errors between the model prediction and the experimental data. Parameters with high sensitivity greatly affect the final model prediction and their identification allows for focused properties optimization. Fig. 4B displays the sensitivity for the parameters in the cardiac troponin I model after a 50% increase in the parameter value. For this Immunodiagnostics system, the sensitivity of KD,A is larger than that of KD,B. The weaker binding affinity of the A-antibody, as compared to that of the B-antibody, is the limiting parameter in the formation of the active complex. In addition, both the KD,N and the EM parameters, which were fixed parameters in the original analysis, have greater sensitivity than the fitted parameters (KD,A, KD,B). A potential smaller error in the fixed values of sensitive parameters can thus strongly affect the determined estimates for the other parameters. The analysis thus not only identifies critical parameters, but also shows that obtaining accurate values for sensitive parameters is crucial and prompts experimental setups to be adjusted to suit these criteria. Another available analysis which measures the sensitivity for the maximum amount of active complex formation is given in ESI† Appendix, Section 8.
For the design of new immunodiagnostics, or molecular sensors in general, it is important to engineer the detection regime over a large range of possible analyte concentrations. Within the RAPPID system, the specific antibodies and the concentrations of the sensor components can be relatively easily adjusted in order to tune the sensor. While it is possible to gather new experimental data to explore for suitable sensor concentration regimes for specific analyte concentrations, it is more (cost-) efficient to use the framework to make predictions first. We use the concentrations analysis for this purpose. In this analysis, the concentrations of all species in the system over a range of titrate values of the analyte is visualized (Fig. 4C). The final assay output for a number of different antibody affinity combinations can be seen in Fig. 4D. The analysis reveals that the peak of the graph, where the greatest increase in luminescence signal is observed, is dictated mostly by the antibody affinity. Greater affinities also increase the total signal as can be observed from the figure. The maximum complex formation can be tuned by changing the total antibody concentration. The analysis allows the selection of antibodies that are most suited for the intended analyte concentrations at a fraction of the time or costs necessary to perform the experimental measurements.
While the system description of the immunodiagnostics platform in Fig. 4A is already highly complex, it actually still omits several other higher-order species that could potentially be formed. When building a custom model by hand for a more elaborate multicomponent system it is not uncommon to exclude complexes that are assumed to form in negligible amounts. This is in part because manually deriving the equilibrium equations for these extremely large systems becomes very labour intensive and error prone. Using our framework approach it is possible to easily extend any multicomponent system within minutes with additional equilibria and new species. This allows for proper and rapid, validation of the role and relevance of such higher-order complexes. An extended system description for the RAPPID system was therefore modelled in ESI† Appendix, Section 9. This extended custom model contains 8 additional equilibria, additional to the 8 equilibria already shown in Fig. 4A. Analysis of the modelling of the data with this extended custom model revealed that several additional higher-order four-component complexes (the two antibodies A and B bound to two analyte molecules T simultaneously) can be formed in significant amounts at specific, higher analyte concentrations. The analysis thus brings forward that, depending on analyte concentrations, such complexes should potentially not be ignored. Fitting parameters to the extended custom model results in the point estimates KD,A = 46.7 nM, KD,B = 9.71 nM and scaling = 3.75 × 1016 RLU/M. These values have up to an order of magnitude difference compared to the previous estimates with the more simple model. This shows that the framework is not only capable of modelling complex systems, but also facilitates simple and fast validation of existing models.
The framework features an easy-to-use design that does not require a programming or mathematics background. Advanced users will find that the general structure and use of the Python programming language allows for straightforward extension and customization towards specific usage scenarios. The framework thus facilitates the simple and fast creation of effective computational equilibrium models in order to unravel, understand and delineate a broad range of molecular systems in chemical biology, such as molecular glues and antibodies, and beyond.
Footnotes |
† Electronic supplementary information (ESI) available: Sections S1–S11 and Fig. S1–S12. See DOI: https://doi.org/10.1039/d2cb00211f |
‡ Authors contributed equally. |
This journal is © The Royal Society of Chemistry 2023 |