From the journal Digital Discovery Peer review history

Data-driven generation of perturbation networks for relative binding free energy calculations

Round 1

Manuscript submitted on 05 Aug 2022
 

26-Aug-2022

Dear Dr Michel:

Manuscript ID: DD-ART-08-2022-000083
TITLE: Data-driven Generation of Perturbation Networks for Relative Binding Free Energy Calculations

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

The manuscript describes efforts to create perturbation maps for a series of compounds based on an entirely objective, automated criterion. The authors choose to guide the map building based on the (expected) statistical fluctuations of potential legs, which is predicted from a neural network. This neural network is trained on about 4000 reference datapoints extracted from real-life examples of perturbations. The approach is creative, open for further approvements and could form the beginning of a new standard in computer-aided drug design with free-energy calculations. In the current form, the free-energies that are predicted are comparable in quality to the currently used Lomap score, which is based on rules derived by experts in the field.

I would like to make the following comments:
1. To generalize the predictions of the standard-error of the mean (SEM), the authors graft the functional groups that change onto a simple benzene ring. In spite of the various simplifications, this is a very clever trick and leads to valuable data, that is made available to the community. From figure 2, and the discussion on page 14, I understand that this involves all modifications of a single perturbation, which are grafted onto a single benzene ring. Have the authors assessed if the data could be simplified even further? Can the SEM for the simplified perturbation on the right-hand side of figure 2 be constructed from the SEMs of a Cl-to-H and a CH2F-to-H modification? Is the SEM for the full modification the maximum of the two, the sum, or can it be obtained by error propagation? If the individual groups can be treated independently, a lot more synthetic data could be generated to train the neural network on.
2. The authors do discuss the simplications of grafting the data onto a single benzene ring. In particular two observations are made, that (I think) require a bit more attention:
a. On page 13, the authors are surprised by the observation that the SEM in the bound state is typically lower than for the solvated state. They attribute this to the fact that the simulations were probably too short to capture protein relaxation. I can follow this explanation, but would ask the authors to point out explicitly, that this is a problem for the predicted map. If some perturbations require more protein-relaxation than others, this will never be reflected in a predicted SEM. I don’t expect the authors to solve this in this manuscript, but they may want to emphasize that this is an open question
b. Similarly, the authors observe that in the grafted molecule, the functional groups may interact with each other, which they do not do in the real perturbations. I would add to that, that they do not interact with the rest of the molecule. This means that even in the solvated state the SEM is likely to be underestimated. Consider e.g. a CH3-to-OH modification in which the OH group can make an intramolecular hydrogen bond which influences the conformation of the entire molecule. This will be much more difficult to sample and converge in the real molecule (with the H-bond), than in benzene. Again, this is not something the authors should solve in this manuscript, but they could point it out.
Essentially, the authors will need to understand better how the correlation in figure 8A works, and then improve on the correlation in figure 8B. Or better yet, the correlation in figure S4C.
3. The similar performance as the Lomap score should be considered as encouraging, as the authors identify various approaches to further improve the building of the networks. They may also want to offer as a suggestion that the correlations that are obtained by the two approaches are ‘as good as it gets’, and that further improvements need to be sought elsewhere. To see if this is true, the authors may want to more carefully analyze if two maps lead to similar deviations and outliers for individual molecules. What is the correlation between the two sets of predicted DG_bind in figure 9?
4. The authors construct a Siamese relative binding free energy neural network, according to figure 3. To first learn the ‘atom indexing’ a transfer learning approach needs to be adopted using a larger set of data. I am not sure why the atom indexing is something that needs to be learned. For the actual relative free energy calculations, the authors have created a ‘merged ligand’, which already defines the atom indexing, I would say? Would it not even be more appropriate to impose this atom indexing to the neural network, such that it predicts the same perturbation as was actually performed? It could well be, that I misunderstood something completely here, in which case, I would ask the authors to explain the differences between ‘learning atom indexing’ and use of a ‘merged ligand’ better.
5. Playing the devil’s advocate one may ask how useful a NN is that has a mean-absolute-error of 0.3 kcal/mol on a property that has a peak occurrence of 0.15 kcal/mol. The authors may want to emphasize that this accuracy in the predictions seems to be sufficient to identify the legs that are to be avoided.
6. The first paragraph of section 3.3 seems to contradict itself? Leaving these datasets out would remove a large number of perturbations, but as there is so much overlap, these would be compensated by the other series? So then, why not leave them out?
7. I am not sure that I understand how 3794 perturbations could lead to 3964 data points, if duplicates are being removed and every perturbation is grafted onto a single benzene ring? (page 8).
8. On page 12, the authors refer to the single-atom substitutions with 46 entries in figure 4B. It seems that these are the ones labeled with 0 in the figure? What kind of perturbations have 0 heavy atom changes (tautomers?) and why would you refer to them a single-atom substitutions?
9. Please correct the references to the sections in footnotes a-c in table 1.
10. In the discussion of thrombin, the authors write n=11 and n=8 within five lines of text. Please correct.
11. The subscript of DDG in the first line of table 2 seems garbled.

Reviewer 2

The manuscript by Scheen et al. addresses a problem that is becoming increasingly pertinent in computation-first drug discovery: what is the best set of relative binding free energy calculations (RBFE, which estimates the difference in the binding free energies between two molecules against the same protein target) to perform in predicting the binding free energies of multiple molecules. Specifically, this work develops a model to predict the statistical fluctuations (SF) in each RBFE calculation based on the molecular structures of the pair of involved ligands, and then use the predicted SF—scaled to an SEM score—and the LOMAP algorithm to plan a network of RBFE calculations. As large scale RBFE calculations of many molecules are becoming routine in drug discovery projects, the model and method reported in this manuscript will be interesting to the practitioners of computation in drug discovery. The manuscript reports some interesting ideas, such as pretraining a Siamese neural network on related physicochemical properties and then use transfer learning to learn SF based on a small set of RBFE perturbations, performed on simplified proxy molecular pairs, using the statistical error in solvation free energy perturbation to approximate the error in RBFE. The RBFE-Space dataset deposited with the manuscript will be useful for the community. The observed correlation between statistical error and accuracy (i.e. agreement with experiments) are interesting. Given these contributions, I believe that the manuscript should ultimately be published in Digital Discovery.

Overall, the manuscript is well organized. The exposition of the model, the method, and the results, however, should be refined (see below); in particular, I find the proliferation of notations for various definitions a bit confusing and sometimes inconsistent. There are also places in this work where the technical rigor can be improved, as detailed below. The authors should address the following issues in their revision.

Major issues:

1. According to the manuscript, the authors set out to address the question of accuracy (agreement with experiments), rather than that of precision (statistical uncertainty in the predictions) in binding free energy calculations. For example, in the Introduction on Page 6 (1st page of the main text), the authors wrote “Page 6: “Consequently the choice of a network that maximizes accuracy.” But the statistical error of the mean (SEM) prediction has more to do with precision than with accuracy. The true problem the authors are solving is that of improving precision, which is an important goal in its own right. The observation that in the presented examples there seem to be correlation between precision and accuracy is interesting, which suggests that designing the network of RBFE calculations for higher precision may lead to higher accuracy. Improving the accuracy is a happy side product of the authors’ work of improving precision. The authors should revise the manuscript accordingly.

2. Relatedly, on Page 15: “different network topologies should result in a difference in the estimation of binding free energies (∆G)”. This statement is not true in the ergodic limit and should be framed in the context of finite simulations.

3. The key concept “Statistical fluctuation (SF)” should be rigorously (i.e. mathematically) defined. Is it the same as in Reference 25? There should be either an equation or a citation.

4. The idea of transfer learning from ∆ESOL is interesting. But the data points in that set are highly correlated: S1 – S3 is the sum of S1 – S2 and S2 – S3 (I use S to denote the calculated solubility). The authors should comment on how many data points in the 200,000 validation samples are independent of the 800,000 training samples and investigate how the Siamese neural network perform on an independent test set. The current approach of generating pairwise differences from individual quantities smells like free lunch. How does such data augmentation affect the learning?

5. It seems to me that a more appropriate surrogate for pre-training may be to generate ∆S_ij=(1+ε_ij)(S_i-S_j), where ε_ij is a random number reflecting the uncertainty in ∆S_ij. This will be analogous to the RBFE calculations and remove some of the correlation in the ∆S_ij. If there is sufficient computational resource, the authors should investigate this alternative of pretraining the Siamese neural network and report how the predicted SEMs change for the resulting model.

6. It is unclear to me how the set of ∆∆G values from the network of RBFE calculations are converted into ∆G values for each individual molecule. The authors mention “weighted least squares regression method” and refer to their GitHub repo code. The authors should describe the method in detail and comment on whether and how it differs from Eqs 2-4 of Ref 25 or the corresponding solution for RBFE-only networks in Ref 26.

7. The observed correlations between SEM and the change in molecular weight (Fig 4C) and between SEM and the change in perturbed heavy atoms (Fig 4D) are interesting. Since these are computed for perturbations going in both the direction of increasing molecular weight/perturbed heavy atoms and the opposite direction, it will be beneficial to the community to report separately the SEMs associated with each directional perturbation. This will help address the question of which type of perturbations (insertion vs deletion) is associated with high statistical errors.

8. Instead of normalizing SEM into LOMAP-score, why not use DiffNet (Ref 25) or Yang et al. (Ref 21) to construct a network that minimizes the total expected variance?

9. If SEM is to be turned into normalized LOMAP-score, what is the best way of normalization? The choice in Eq. 4 and 5 appears arbitrary, why the inversion?

10. The distribution of the variances in the estimated ∆G for different network topologies should be reported, in addition to the accuracy reported in Figure 9. These reflect the precision of the calculations.

Minor issues:

1. Page 7 (2nd page of the main text): “an initial estimation of SF is thus pivotal in this approach”. Even though an initial estimate of SF will influence the progression of NetBFE, it is not pivotal, as in later iterations the SF are updated according to real data.

2. Page 8: “We hypothesize that edges in a RBFE network with low statis- tical fluctuations are associated with low |∆∆Goffset| values.” There is no need for this hypothesis if we are considering precision instead of accuracy. “Associated” is imprecise language.

3. Page 8: The “grafting” procedure should be illustrated with examples (add reference to Fig. 2 in the text).

4. Page 8: Why are there more perturbations (3964) in the RBFE-Space than in the original FEP dataset (3794)?

5. Page 8-9: “the atom mapping used was stored to describe which R-groups were being perturbed into which between ligands” was a bit hard to parse.

6. Page 10: “6.5<sup>6</sup> points” should be “6.5×10<sup>6</sup> points”.

7. Page 12: The section title “RBFE-space derivatives correlate to their drug-like counterparts” is hard to decipher.

8. Page 13: “drug-like perturbation” seems a misnomer. Maybe just “original perturbation”?

9.Page 13: “1-ns/lambda” and “4-ns/lambda” are misleading and should probably be “1×ns/lambda” and “4×ns/lambda”.

10. Table 2: SEM_{ddG_{\mathrm{bind}}} is missing curly brackets in LaTex.

11. Is RBFE-Space SEM_((∆G)_solvated) in Table 2 the same as RBFE-Space SEM in Figure 5?

12. Figure S4: Explain A-D panels in the caption.

13. It is surprising that RBFE-Space SEM has a stronger correlation with the SEM in the bound phase of FEP than with the SEM in the solvated phase. A comment on this is probably due.

14.The authors may consider including a brief discussion that the RBFE-Space SEM model may be updated using iterative RBFE allocations, and combination of the RBFE-Space SEM model with NetBFE in an active learning cycle may further improve the precision of RBFE calculations.

15. The correlation between precision and accuracy in Fig. 8A is intriguing. Does this mean that a large part of the RBFE error is due to poor convergence?

16. “n” is used to denote both dataset size and, on Page 9, the number of repeats. Different symbols may be used to avoid confusion.











Reviewer 3

General comments: Thank you very much for the work! The main README file of the repo explains in detail on how to reproduce the work. Additionally, it would really help navigating the files if you can add a readme with more details about the sequence of steps under "./ANALYSIS/perturbation_networks", since this is a core part of training and testing the model.

For the data review rubric:
1b. Can you please reference the exact commits of the source data repositories OFF PLBenchmarks repo, and Merck's fep-benchmark repos.
5c. A justification has already been provided under section 3.3 of the manuscript for the overlap of training and test set perturbations and seems reasonable, no need for further explanation.


 

Dear Linda,

Many thanks for seeing our manuscript through the review process. Please find below our answers to the reviewer reports. We have also made a number of minor typographical corrections to the manuscript.

/************
/REVIEWER REPORT(S):
/Referee: 1
/
/Comments to the Author
/The manuscript describes efforts to create perturbation maps for a series of /compounds based on an entirely objective, automated criterion. The authors choose to /guide the map building based on the (expected) statistical fluctuations of potential legs, /which is predicted from a neural network. This neural network is trained on about 4000 /reference datapoints extracted from real-life examples of perturbations. The approach /is creative, open for further approvements and could form the beginning of a new /standard in computer-aided drug design with free-energy calculations. In the current /form, the free-energies that are predicted are comparable in quality to the currently /used Lomap score, which is based on rules derived by experts in the field.

We thank the reviewer for their supportive comments about our work.

/I would like to make the following comments:
/1. To generalize the predictions of the standard-error of the mean (SEM), the /authors graft the functional groups that change onto a simple benzene ring. In spite of /the various simplifications, this is a very clever trick and leads to valuable data, that is /made available to the community. From figure 2, and the discussion on page 14, I /understand that this involves all modifications of a single perturbation, which are /grafted onto a single benzene ring. Have the authors assessed if the data could be /simplified even further? Can the SEM for the simplified perturbation on the right-hand /side of figure 2 be constructed from the SEMs of a Cl-to-H and a CH2F-to-H /modification? Is the SEM for the full modification the maximum of the two, the sum, or /can it be obtained by error propagation? If the individual groups can be treated /independently, a lot more synthetic data could be generated to train the neural network /on.

The authors thank the reviewer for this thoughtful suggestion. The SEM of a complex perturbation is not exactly predictable from the SEMs of smaller sub-perturbations but it may be possible to generate useful estimators for the purpose of generating greater training spaces. Validating this hypothesis requires detailed work we deem beyond the scope of the present study but could form the basis of a follow up project. We have made our code and training set available to the community at https://github.com/michellab/RBFENN/ as we appreciate the work we present could be further developed in several other directions.

Added to section 4 paragraph 2

‘’A future direction for RBFESpace would be to explore whether the SEM of a large perturbation could be estimated from the SEMs of simpler sub-perturbations. This could allow increasing the size of the training set without increasing computing cost.’’

/2. The authors do discuss the simplications of grafting the data onto a single /benzene ring. In particular two observations are made, that (I think) require a bit more /attention:
/a. On page 13, the authors are surprised by the observation that the SEM in the /bound state is typically lower than for the solvated state. They attribute this to the fact /that the simulations were probably too short to capture protein relaxation. I can follow /this explanation, but would ask the authors to point out explicitly, that this is a problem /for the predicted map. If some perturbations require more protein-relaxation than /others, this will never be reflected in a predicted SEM. I don’t expect the authors to /solve this in this manuscript, but they may want to emphasize that this is an open /question

Added the following sentence to section 3.1.2 paragraph 4

‘’Achieving robust estimates of the precision of free energy changes in a bound system is challenging owing to the potential for protein relaxation modes to occur on a broad range of timescales. ‘’

/b. Similarly, the authors observe that in the grafted molecule, the functional groups /may interact with each other, which they do not do in the real perturbations. I would add /to that, that they do not interact with the rest of the molecule. This means that even in /the solvated state the SEM is likely to be underestimated. Consider e.g. a CH3-to-OH /modification in which the OH group can make an intramolecular hydrogen bond which /influences the conformation of the entire molecule. This will be much more difficult to /sample and converge in the real molecule (with the H-bond), than in benzene. Again, /this is not something the authors should solve in this manuscript, but they could point it /out.
/Essentially, the authors will need to understand better how the correlation in figure 8A /works, and then improve on the correlation in figure 8B. Or better yet, the correlation in /figure S4C.

Added the sentence below to section 3.1.2 paragraph 6

‘’Conversely scaffold specific intramolecular interactions with a R-group are lost in the grafting process.’’

See also our answer to reviewer 2 comment 15 for more insights in Fig 8A.


/3. The similar performance as the Lomap score should be considered as /encouraging, as the authors identify various approaches to further improve the building /of the networks. They may also want to offer as a suggestion that the correlations that /are obtained by the two approaches are ‘as good as it gets’, and that further /improvements need to be sought elsewhere. To see if this is true, the authors may /want to more carefully analyze if two maps lead to similar deviations and outliers for /individual molecules. What is the correlation between the two sets of predicted /DG_bind in figure 9?

We have added Figure S16 that shows the correlation of the predicted affinities for the RBFENN and LOMAP-Score networks. The two solutions are highly correlated suggesting the reviewer is correct in saying that further improvements may need to be sought elsewhere.

Added at the last paragraph of 3.3.4

‘’The LOMAP-Score and RBFENN network ∆Gbind predictions for individual compounds are highly correlated despite the networks sharing only ca. 50% edges (figure S16). This is encouraging as it suggests that affinity predictions do not need to critically depend on the details of the network topology, as long as the network is assembled from a collection of reasonably accurate edges.’’

/4. The authors construct a Siamese relative binding free energy neural network, /according to figure 3. To first learn the ‘atom indexing’ a transfer learning approach /needs to be adopted using a larger set of data. I am not sure why the atom indexing is /something that needs to be learned. For the actual relative free energy calculations, the /authors have created a ‘merged ligand’, which already defines the atom indexing, I /would say? Would it not even be more appropriate to impose this atom indexing to the /neural network, such that it predicts the same perturbation as was actually performed? /It could well be, that I misunderstood something completely here, in which case, I /would ask the authors to explain the differences between ‘learning atom indexing’ and /use of a ‘merged ligand’ better.

Added to section 2.2.1 paragraph 3 the following sentence to clarify our motivation

‘’To effectively learn atom-mappings between two input molecules, the model must be able to relate atom-mapping information to the input graphs; atom-mappings are presented to the model using atom indices.‘’

/5. Playing the devil’s advocate one may ask how useful a NN is that has a mean-/absolute-error of 0.3 kcal/mol on a property that has a peak occurrence of 0.15 /kcal/mol. The authors may want to emphasize that this accuracy in the predictions /seems to be sufficient to identify the legs that are to be avoided.

Updated section 3.2 last paragraph.

‘’Although the validation MAE of the RBFENN ensemble after fine-tuning is ∼0.3 kcal mol−1 and the majority of SEMs in RBFE-Space have values ∼0.15kcal mol−1 (figure 4A), this level of accuracy was considered sufficient to predict perturbations that would exhibit large SEMs (>1.0 kcal mol−1) values’’

/6. The first paragraph of section 3.3 seems to contradict itself? Leaving these /datasets out would remove a large number of perturbations, but as there is so much /overlap, these would be compensated by the other series? So then, why not leave /them out?

Clarified section 3.3 paragraph 1 by shortening the text and adding the following.

‘’In other words, excluding or including TYK2/TNKS2 in the training set results in essentially the same training set due to the high amount of overlap in R-group modifications present across multiple congeneric series. For workflow simplicity these sets were thus included in the training set.’’

/7. I am not sure that I understand how 3794 perturbations could lead to 3964 data /points, if duplicates are being removed and every perturbation is grafted onto a single /benzene ring? (page 8).

This was a typo, thank you for pointing this out.

/8. On page 12, the authors refer to the single-atom substitutions with 46 entries in /figure 4B. It seems that these are the ones labeled with 0 in the figure? What kind of /perturbations have 0 heavy atom changes (tautomers?) and why would you refer to /them a single-atom substitutions?

These describe isomers.. Updated section 3.1.1 paragraph 1.

“..except for isomeric perturbations (i.e. a swap in position of two heavy atoms) of which only 46..”

/9. Please correct the references to the sections in footnotes a-c in table 1.

Updated table 1 caption:

‘’a-c: these ligand series are further analysed in sections 3.3.5, 3.3.4 and 3.3.2, respectively’’

/10. In the discussion of thrombin, the authors write n=11 and n=8 within five lines of /text. Please correct.

Removed second mention of n=8.

/11. The subscript of DDG in the first line of table 2 seems garbled.

Corrected by including a backslash to mathrm command.

/Referee: 2
/
/Comments to the Author
/The manuscript by Scheen et al. addresses a problem that is becoming increasingly /pertinent in computation-first drug discovery: what is the best set of relative binding free /energy calculations (RBFE, which estimates the difference in the binding free energies /between two molecules against the same protein target) to perform in predicting the /binding free energies of multiple molecules. Specifically, this work develops a model to /predict the statistical fluctuations (SF) in each RBFE calculation based on the /molecular structures of the pair of involved ligands, and then use the predicted SF—/scaled to an SEM score—and the LOMAP algorithm to plan a network of RBFE /calculations. As large scale RBFE calculations of many molecules are becoming /routine in drug discovery projects, the model and method reported in this manuscript /will be interesting to the practitioners of computation in drug discovery. The manuscript /reports some interesting ideas, such as pretraining a Siamese neural network on /related physicochemical properties and then use transfer learning to learn SF based on /a small set of RBFE perturbations, performed on simplified proxy molecular pairs, using /the statistical error in solvation free energy perturbation to approximate the error in /RBFE. The RBFE-Space dataset deposited with the manuscript will be useful for the /community. The observed correlation between statistical error and accuracy (i.e. /agreement with experiments) are interesting. Given these contributions, I believe that /the manuscript should ultimately be published in Digital Discovery.

The authors are thankful for the kind and supportive comments of the reviewer.

/Overall, the manuscript is well organized. The exposition of the model, the method, /and the results, however, should be refined (see below); in particular, I find the /proliferation of notations for various definitions a bit confusing and sometimes /inconsistent. There are also places in this work where the technical rigor can be /improved, as detailed below. The authors should address the following issues in their /revision.
/
/Major issues:
/
/1. According to the manuscript, the authors set out to address the question of accuracy /(agreement with experiments), rather than that of precision (statistical uncertainty in the /predictions) in binding free energy calculations. For example, in the Introduction on /Page 6 (1st page of the main text), the authors wrote “Page 6: “Consequently the /choice of a network that maximizes accuracy.” But the statistical error of the mean /(SEM) prediction has more to do with precision than with accuracy. The true problem /the authors are solving is that of improving precision, which is an important goal in its /own right. The observation that in the presented examples there seem to be /correlation between precision and accuracy is interesting, which suggests that /designing the network of RBFE calculations for higher precision may lead to higher /accuracy. Improving the accuracy is a happy side product of the authors’ work of /improving precision. The authors should revise the manuscript accordingly.

The reviewer is correct. We have structured the manuscript to frame the overall problem as that of network accuracy optimisation since that is our primary research goal.
It so happens that in this body of work computing the SEM is a convenient proxi metric for accuracy because we can compute a training set without the need for experimental data. Future work may identify superior metrics that offer a stronger relationship with network edge accuracy.

We have revised the conclusion section of the manuscript where we discuss how different metrics could be optimised (Section 4).

‘’As all heuristics depicted in table 2 attempt to model RBFE in accuracy (in the form of e.g. |∆∆Goffset| values), this begs the question as to whether a predictor can be trained directly on this quantity instead of statistical fluctuations. For this, instead of
grafting perturbations onto benzene (as with RBFE-Space), the original ligands must be featurised as well as the protein system in which the RBFE perturbation takes place. This has been attempted before and offers additional information such as pose
differences between input ligands which are highly influential to the RBFE reliability. 27,28 However, the bottleneck in this scenario is that a large number of RBFE simulations must be run. Additionally robust experimental binding affinity must be available for each RBFE edge.’’

/2. Relatedly, on Page 15: “different network topologies should result in a difference in /the estimation of binding free energies (∆G)”. This statement is not true in the ergodic /limit and should be framed in the context of finite simulations.

Updated Section 3.3.3

‘’As the performance of RBFE calculations is determined by the errors made along each edge of the chosen network, different network topologies should result in a difference in the estimation of binding free energies (∆G) in the limit of finite sampling’’

/3. The key concept “Statistical fluctuation (SF)” should be rigorously (i.e. /mathematically) defined. Is it the same as in Reference 25? There should be either an /equation or a citation.

We agree the terminology could cause confusion and was used to refer to different quantities in the original manuscript. We have updated the manuscript to remove this term and depending on the context use the terms ‘precision’, ‘statistical uncertainty’ or ‘standard error of the mean’.

/4. The idea of transfer learning from ∆ESOL is interesting. But the data points in that /set are highly correlated: S1 – S3 is the sum of S1 – S2 and S2 – S3 (I use S to denote /the calculated solubility). The authors should comment on how many data points in the /200,000 validation samples are independent of the 800,000 training samples and /investigate how the Siamese neural network perform on an independent test set. The /current approach of generating pairwise differences from individual quantities smells /like free lunch. How does such data augmentation affect the learning?

See response to 5

/5. It seems to me that a more appropriate surrogate for pre-training may be to generate /∆S_ij=(1+ε_ij)(S_i-S_j), where ε_ij is a random number reflecting the uncertainty in /∆S_ij. This will be analogous to the RBFE calculations and remove some of the /correlation in the ∆S_ij. If there is sufficient computational resource, the authors should /investigate this alternative of pretraining the Siamese neural network and report how /the predicted SEMs change for the resulting model.

We agree that other routes to train RBFENN could have been explored and that our procedure introduces undesirable correlation between data points. We were not overly concerned about this for this proof-of-concept study since the model is never used to actually predict ∆ESOL values. It would be desirable to revisit multiple aspects of our dataset curation and model training protocols considering the helpful suggestions made by the reviewers. We feel however that this altogether warrants a separate follow up study given the already significant size of the present manuscript and currently limited available resources to us to currently pursue further this project. We have made all the code and dataset to reproduce our work available at https://github.com/michellab/RBFENN to encourage the community to build on our results.

Added to section 4 paragraph 3

‘’A future iteration of RBFENN could focus on optimisation of the data augmentation stage used in the pre-training stage since the current approach introduces correlations between data points that could influence the subsequent transfer learning step. ’’

/6. It is unclear to me how the set of ∆∆G values from the network of RBFE calculations /are converted into ∆G values for each individual molecule. The authors mention /“weighted least squares regression method” and refer to their GitHub repo code. The /authors should describe the method in detail and comment on whether and how it /differs from Eqs 2-4 of Ref 25 or the corresponding solution for RBFE-only networks in /Ref 26.

We have updated Section 2.3.3

‘’In this work a per-ligand free energy estimation is made using a weighted least squares method implemented in FreeEnergyWorkflows.49. This implementation is equivalent to Eqs 2-4 of Yang et al. 50 with weights set as the reciprocal of the propagated standard error of the mean values across the replicates of each
RBFE leg (solvated and bound) in kcal·mol−1. ‘’

/7. The observed correlations between SEM and the change in molecular weight (Fig /4C) and between SEM and the change in perturbed heavy atoms (Fig 4D) are /interesting. Since these are computed for perturbations going in both the direction of /increasing molecular weight/perturbed heavy atoms and the opposite direction, it will be /beneficial to the community to report separately the SEMs associated with each /directional perturbation. This will help address the question of which type of /perturbations (insertion vs deletion) is associated with high statistical errors.

We have carried out this analysis and report the results in Figure S2. There doesn’t seem to be a discernable difference which could reflect the nature of the RBFE-Space dataset. Added the following sentence at the end of 3.1.1.

‘’Separating perturbations in this analysis by whether they involve addition ('Grow') or removal ('Shrink') of heavy atoms does not suggest discernable distributions (Figure S2).’’

/8. Instead of normalizing SEM into LOMAP-score, why not use DiffNet (Ref 25) or Yang /et al. (Ref 21) to construct a network that minimizes the total expected variance?

We have added to Conclusion section 4 third paragraph the following sentence

‘’Our work has focused on normalising predicted SEM values into a LOMAP-Score metric for the LOMAP algorithm. Other algorithms based on optimal design principles could use RBFENN SEM estimates to construct networks that minimise the total expected variance.25,50’’

/9. If SEM is to be turned into normalized LOMAP-score, what is the best way of /normalization? The choice in Eq. 4 and 5 appears arbitrary, why the inversion?

Added the following sentence at the end of selection 2.3.2

‘’SEM values were inversed such that perturbations with large SEM have a low score. For the sake of simplicity a simple inversion was used in this work but other transformations could be explored in future work.’’

/10. The distribution of the variances in the estimated ∆G for different network /topologies should be reported, in addition to the accuracy reported in Figure 9. These /reflect the precision of the calculations.

We have added such data to Figure S8 of the SI and added a paragraph at the end of section 3.3.4

‘’For each network topology the picked edges can be analysed in terms of their accuracy (∆Goffse) and precision (SEM) (figures S7 and S8). The median edge-accuracy figures for the different networks follow the statistical metrics trends seen in figure 9G-H, the random selection performing worst and |∆∆Goffset| performing best. However the |∆∆Goffset| network has selected several edges with large SEM values compared to the LOMAP-Score and RBFENN networks because such edges happened to give (for the present dataset) a mean RBFE between pairs of compounds that showed a low deviation from the experimental difference.’’

/Minor issues:
/
/1. Page 7 (2nd page of the main text): “an initial estimation of SF is thus pivotal in this /approach”. Even though an initial estimate of SF will influence the progression of /NetBFE, it is not pivotal, as in later iterations the SF are updated according to real data.

Changed ”pivotal” to ”important” in the text.

/2. Page 8: “We hypothesize that edges in a RBFE network with low statis- tical /fluctuations are associated with low |∆∆Goffset| values.” There is no need for this /hypothesis if we are considering precision instead of accuracy. “Associated” is /imprecise language.

See our comment on major issue 1. Since we frame the problem of network optimisation as that of picking a collection edges with the highest accuracy it is not given that for a given simulation protocol edges that show the lowest deviation against experimental data will also have the lowest SEM values.

/3. Page 8: The “grafting” procedure should be illustrated with examples (add reference /to Fig. 2 in the text).

Added an explicit reference to Fig 2 in section 2.1.1

/4. Page 8: Why are there more perturbations (3964) in the RBFE-Space than in the /original FEP dataset (3794)?

This was a typo that has been corrected.

/5. Page 8-9: “the atom mapping used was stored to describe which R-groups were /being perturbed into which between ligands” was a bit hard to parse.

The sentence has been split in two and restructured to improve clarity.

/6. Page 10: “6.5<sup>6</sup> points” should be “6.5×10<sup>6</sup> points”.

Changed to 6.5×106 points (using cdot)

/7. Page 12: The section title “RBFE-space derivatives correlate to their drug-like /counterparts” is hard to decipher.

Changed the title of 3.1.2 to more explicitly refer to the main finding of this section. Replaced with ”original (ligand)” throughout the section.

‘’The precision of the free energy estimates of RBFE-Space derivatives correlates with the precision of the free energy estimates of their parent ligands’’

/8. Page 13: “drug-like perturbation” seems a misnomer. Maybe just “original /perturbation”?

See previous item

/9.Page 13: “1-ns/lambda” and “4-ns/lambda” are misleading and should probably be /“1×ns/lambda” and “4×ns/lambda”.

Changed throughout manuscript

/10. Table 2: SEM_{ddG_{\mathrm{bind}}} is missing curly brackets in LaTex.

Agreed, will change in text.

/11. Is RBFE-Space SEM_((∆G)_solvated) in Table 2 the same as RBFE-Space SEM in /Figure 5?

Yes, notation in table 2 has been corrected to fit the figure axis labels. This simpler notation of RBFE-Space SEM is chosen because it simplifies the text body

/12. Figure S4: Explain A-D panels in the caption.

Agreed, will change in text. Added description of panels

/13. It is surprising that RBFE-Space SEM has a stronger correlation with the SEM in /the bound phase of FEP than with the SEM in the solvated phase. A comment on this /is probably due.

We have updated paragraph 5 in 3.1.2 to more explicitly stress why we think this is the case.

‘’This is surprising as a bound system has higher complexity than a solvated box - it is expected however that the short sampling time for this analysis (1·ns/λ) was insufficient to relax the protein topology in the simulation, thus enforcing a relatively rigid environment for the ligand perturbation, meaning only a narrow range of conformations could be sampled. This may explain why the RBFE-Space SEMs correlate more strongly with bound SEMs than with solvated SEMs.’’

/14.The authors may consider including a brief discussion that the RBFE-Space SEM /model may be updated using iterative RBFE allocations, and combination of the RBFE-/Space SEM model with NetBFE in an active learning cycle may further improve the /precision of RBFE calculations.

Added at the end of the conclusion section the following sentence

‘’(..)This could be combined with an adaptive approach to tune simulation protocols (in effect allocating different sampling efforts) to individual network edges.‘’

/15. The correlation between precision and accuracy in Fig. 8A is intriguing. Does this /mean that a large part of the RBFE error is due to poor convergence?

Added to section 3.3.3 paragraph 3 a reference to earlier work from Merz suggesting that errors in computed energies linearly increase with the number of interactions present in a protein-ligand complex.

‘’A possible explanation is that perturbations with greater ∆∆Gbind SEM values involve changes of a greater number of interactions, which would be expected to increase systematic model errors.58 In support of this interpretation the data in Figure 8A shows a trend with the number of heavy atoms perturbed.’’

/16. “n” is used to denote both dataset size and, on Page 9, the number of repeats. /Different symbols may be used to avoid confusion.

Corrected throughout, minimised the use of n except for dataset sizes.

/Referee: 3
/
/Comments to the Author
/General comments: Thank you very much for the work! The main README file of the /repo explains in detail on how to reproduce the work.

The authors thank the reviewer for their comment on our code and repository.

/Additionally, it would really help navigating the files if you can add a readme with more /details about the sequence of steps under "./ANALYSIS/perturbation_networks", since /this is a core part of training and testing the model.

This has been updated. See commit https://github.com/michellab/RBFENN/commit/7b6e990bc6bd6a57dbf39a5639501b7ff922baef

/For the data review rubric:
/1b. Can you please reference the exact commits of the source data repositories OFF /PLBenchmarks repo, and Merck's fep-benchmark repos.

Updated https://github.com/michellab/RBFENN/blob/master/fep_ref_ligands/readme.txt to cite the release version used for OFF repo (0.1.1) and PLBenchmarks (1.0). (https://github.com/michellab/RBFENN/commit/fa8a4e7b0a3e8820aa87ddaefba2a9233fd51b24)




Round 2

Revised manuscript submitted on 20 Sep 2022
 

04-Oct-2022

Dear Dr Michel:

Manuscript ID: DD-ART-08-2022-000083.R1
TITLE: Data-driven Generation of Perturbation Networks for Relative Binding Free Energy Calculations

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry


 
Reviewer 1

The authors have addressed my concerns satisfactorily. I have no further reservations towards publication of this manuscript and congratulate the authors on their work.

Reviewer 2

References 50 and 52 are incorrectly formatted and should be corrected.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license