Peer review - Parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design

09-Feb-2022

Dear Professor Aspuru-Guzik:

Manuscript ID: DD-ART-01-2022-000003
TITLE: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy from CASRAI, https://casrai.org/credit/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************

Reviewer comments

Reviewer 1

In my opinion the following issues should be addressed before the paper is suitable for publication

1. Comparing the performance of various methods is not really meaningful without knowing the number of property evaluations for each method. The authors do this for the penalised logP objective, but not for the other two objectives.

It’s not really clear to me whether “literature baselines” refer to numbers taken from the literature, or whether the authors ran these codes themselves. In the case of the former, this information is often missing, so the point should at least be acknowledged. At a minimum the number of property evaluations for the JANUS method should be discussed for all objectives.

2. Another factor that complicates comparisons of methods is any difference in “chemical constraints” of the methods. For example, it is much harder to get good docking scores if one imposes synthetic accessibility constraints in the objective (DOI: 10.7717/peerj-pchem.18). Other ML trained on known molecules may have similar constraints built in inherently which may result in lower docking scores. I don’t know if this applies to any of the methods that the authors compare to but this should at least be considered, because a method that gives very good docking scores but synthetically inaccessible molecules is a worse method than a method that gives lower docking scores but synthetically accessible molecules.

3. Similarly to point 2, It is easier to get high novelty and diversity in the absence of synthetic accessibility constraints. This should also be considered when comparing methods and also when reporting these values as they might be much lower when using the code in real applications when synthetic accessibility is included.

4. It is not clear what “ZINC” and “Train” mean in Table III

Jan Jensen (I choose to review this paper non-anonymously)

Reviewer 2

Dear Authors, I enjoyed very much reading this work. It simply is state-of-the-art and fills a big gap and JANUS, a genetic algorithm inspired by parallel tempering.is a great idea and will be useful in various fields in sciences and engineering.

Reviewer 3

Nigam et al. presented a new genetic algorithm based approach for molecular property optimization using the SELFIES representation. They used two pools of molecules to enhance search/optimization efficiency. In addition, they also used DNN-based model to select molecules with favorable properties. Overall, the approach is showing better or comparable performance to existing approaches and provide theoretical advances. I recommend the publication of the paper after revising some parts of the manuscript.

1. The authors wrote that the explorative population consists of high fitness solutions/molecules and the exploitative population consists of most similar molecules to their parents. This design principle does not seem to be straightforward. I believe that authors introduced the explorative population for more extensive search on chemical space. However, it is well-known that selecting the best solutions can easily make a search trapped into a local minimum/maximum. Thus, the authors should provide more detailed explanation on the design principles of the two populations.

2. As the authors suggested, the diversity of generated molecules is important for efficient search of chemical space. The change of the diversity of the two populations during search should be provided.

3. More details of DNN training should be provided. The authors wrote that the DNN model for selection pressure was trained at every generation using solutions with known fitness. Does this mean that the DNN model was trained using both the initial molecules and the generated molecules until previous generation? How much computational time was required to train the model at every generation? More detailed computational time profile would be helpful to readers.

4. More detailed explanation on docking calculation should be provided. Did authors perform actual SMINA calculation for every generated molecule? Or did authors used a pre-trained docking score predictor similar to the imitated inhibition case? If docking calculations were actually performed, how much computational time did it take? Please provide more details.

5. Comment: In terms of docking score optimization, several approaches that use docking score predictors instead of actually performing docking have been suggested: https://pubs.acs.org/doi/10.1021/acscentsci.0c00229, https://www.mdpi.com/1422-0067/22/21/11635, https://pubs.acs.org/doi/abs/10.1021/acs.jctc.1c00810. Please consider citing them.

6. The caption of supplementary figure 1-(b) is not sufficient. Please provide more details for readability.

7

Reviewer 4

The manuscript presents a new approach for structure enumeration guided by different models (QSAR or docking) using a surrogate DNN model and a tempered genetic algorithm. The authors compared performance of the implemented workflow with their previous approach and other approaches from literature. The study is well and concisely described with the almost sufficient level of details. The manuscript fully fits the scope of the journal and can be published after a minor revision.

The major value of generative approaches is enumeration of novel chemically reasonable and feasible molecules. While the optimization workflow implemented in this study is reasonable, it was combined with a structure generator which does not take into account chemical sense and thus resulted in a very large number of artificial structures whose scores were reported as superior. However, comparison of scores obtained for artificial structures misleads a reader. While for the penalized logP task one may expect that completely artificial molecules will be on top, that was not expected for other tasks: imitating inhibition and molecular docking. The authors did not provide the complete results for the imitated inhibition task and just showed some examples in the supplementary materials. So, I could review only results of the docking task. The enumerated molecules are completely artificial and not relevant. For example, the molecule from gen23 of 5HT1B/5T1B_ga_NN_class C=C=C=c1c2c([nH]c#ccc3c1=C3)C(C=C=C=C=CCC1=CC=C=C=C=CC=C=C=CC#C1)=C=C2 contains 9-membered aromatic cycle with a triple bond. This even breaks a formal Huckel rule. Top scored molecules contain a little or no polar atoms. This means that there is the issue with the chosen structure generator which was fooled by the target model (in this case it is docking). This makes the reported scores not very meaningful. I calculated SA scores for some molecules designed in docking studies and the majority of them had scores greater than 6, that additionally supports my conclusion about poor quality of generated structured.

Authors mentioned the issue of synthetically infeasible molecules and suggested that that SA scores should be included into the metrics. I agree with that and suggest to explicitly discuss this issue in the paper in a short separate section and report there SA scores for compounds which were taken into account to calculate target reported scores in imitating inhibition and docking tasks and provide examples of molecules in the main text. This will clearly indicate the issue and inform readers about it. This will not reduce the value of the study as a whole, because the main objective was a new optimization workflow, which works as expected. The issue here is a chosen structure generator and missing control on structure feasibility.

Other remarks:
1. Please publish results for imitating inhibition to enable their analysis.
2. As I understood, on each iteration a new DNN model was constructed. How thresholds for classifiers were selected for each task? I did not find this in the manuscript and supplementary materials. It could happen that the number of highly scored molecules grew fast and the training set may become highly imbalanced if the thresholds were kept constant. Did you observe this? If so, how did you solve this issue?
3. Which 51 rdkit descriptors were used for DNN modeling? There are not explicitly mentioned.
4. Please provide a little bit more details about DNN modeling building. In particular, how training and test sets were created on each iteration.
5. Please provide version numbers for software used in the study.

Kind regards,
Pavel Polishchuk

Author response

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters.

We would like to thank the editor & the reviewers for their comments. After revising our manuscript based on all suggestions, we are happy to submit an updated manuscript.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports
indicate that major revisions are necessary. When you submit your revised manuscript
please include a point by point response to the reviewers’ comments and highlight the
changes you have made.

Digital Discovery strongly encourages authors of research articles to include an ‘Author
contributions’ section in their manuscript. This should appear immediately above the ‘Conflict
of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the
Contributor Roles Taxonomy from CASRAI, https://casrai.org/credit/) for standardised
contribution descriptions.

We moved the Author contributions sections to the requested position. Additionally, we
changed the contributions to follow CRediT. Furthermore, Extended Data Figures and Tables
were renamed to ordinary Figures and Tables and incorporated into the main text or
renamed to Supplementary Figures and Tables and moved to the Supplementary
Information.

Referee: 1
In my opinion the following issues should be addressed before the paper is suitable for
publication
1. Comparing the performance of various methods is not really meaningful without knowing
the number of property evaluations for each method. The authors do this for the penalised
logP objective, but not for the other two objectives.

We thank the reviewer for this important comment. As requested by the reviewer, we added
the number of property evaluations when available and extended the corresponding
discussions.

It’s not really clear to me whether “literature baselines” refer to numbers taken from the
literature, or whether the authors ran these codes themselves. In the case of the former, this
information is often missing, so the point should at least be acknowledged. At a minimum the
number of property evaluations for the JANUS method should be discussed for all
objectives.

We thank the reviewer for pointing out that it was unclear what “literature baselines” meant.
We added to each of the table captions, which baselines were not taken directly from other
papers. However, the majority of literature baselines was taken directly from other papers.
To follow the recommendation of the reviewer, we acknowledged explicitly that the number of
property evaluations can only be compared for a subset of the methods mentioned.

Additionally, as requested by the reviewer, we added the corresponding property evaluation
numbers explicitly to each of the other benchmarks when available, compared them in the
main text and added a discussion of these results.

2. Another factor that complicates comparisons of methods is any difference in “chemical
constraints” of the methods. For example, it is much harder to get good docking scores if
one imposes synthetic accessibility constraints in the objective (DOI:
10.7717/peerj-pchem.18). Other ML trained on known molecules may have similar
constraints built in inherently which may result in lower docking scores. I don’t know if this
applies to any of the methods that the authors compare to but this should at least be
considered, because a method that gives very good docking scores but synthetically
inaccessible molecules is a worse method than a method that gives lower docking scores
but synthetically accessible molecules.

We thank the reviewer for this very important point. We agree fully with the reviewer and,
therefore, we explicitly acknowledged this as a general problem when comparing generative
models as part of the discussion of our results. We would also like to point out that, to
implement the suggestions of reviewer 4, we added comprehensive new results that
explicitly investigate the synthesizabilities of the structures generated in the imitated
inhibition benchmark tasks and the molecular docking benchmark tasks. These results show
that, in the molecular docking benchmark tasks, the synthesizabilities of the generated
molecules are low and get worse as the optimization progresses and higher molecular
docking scores are reached. Additionally, in the imitated inhibition task, we find that when
synthesizabilities are explicitly incorporated in the benchmark task, they tend to be
significantly higher as the corresponding generated structures tend to be more synthetically
feasible and generally more stable. Hence, we believe it is important to explicitly evaluate
synthesizabilities in the benchmark results which would also allow for a more sensible
comparison of alternative molecular generative models in the corresponding benchmark
tasks even when the methods employed have differing underlying structural constraints.

3. Similarly to point 2, It is easier to get high novelty and diversity in the absence of synthetic
accessibility constraints. This should also be considered when comparing methods and also
when reporting these values as they might be much lower when using the code in real
applications when synthetic accessibility is included.

We thank the reviewer for this comment and we agree with this statement. In our work, we
reported novelty and diversity only for the imitated inhibition benchmarks. We would like to
emphasize that in these benchmarks, the last task, labeled task D in the corresponding table
in our work, explicitly includes QED and the SAscore in the optimization objective. The
objective is to not only design inhibitors for the two proteins but also for the molecules to
have QED ≥ 0.6 and an SAscore ≤ 4.0. Hence, we believe that comparing performance on
task D in that benchmark does not suffer from the drawback the reviewer is pointing out. We
indeed observe that novelty is significantly reduced when using JANUS on task D compared
to task C. However, we do not observe a reduced diversity due to the presence of this
additional optimization objective. To emphasize this point, we mentioned it explicitly when
discussing the corresponding results.

4. It is not clear what “ZINC” and “Train” mean in Table III

We thank the reviewer for pointing this out. We added an explanation to the corresponding
table caption to clarify what these table entries mean.

Jan Jensen (I choose to review this paper non-anonymously)

Referee: 2

Dear Authors, I enjoyed very much reading this work. It simply is state-of-the-art and fills a
big gap and JANUS, a genetic algorithm inspired by parallel tempering is a great idea and
will be useful in various fields in sciences and engineering.

We thank the reviewer for these kind comments.

Referee: 3
Nigam et al. presented a new genetic algorithm based approach for molecular property
optimization using the SELFIES representation. They used two pools of molecules to
enhance search/optimization efficiency. In addition, they also used DNN-based model to
select molecules with favorable properties. Overall, the approach is showing better or
comparable performance to existing approaches and provide theoretical advances. I
recommend the publication of the paper after revising some parts of the manuscript.

1. The authors wrote that the explorative population consists of high fitness
solutions/molecules and the exploitative population consists of most similar molecules to
their parents. This design principle does not seem to be straightforward. I believe that
authors introduced the explorative population for more extensive search on chemical space.
However, it is well-known that selecting the best solutions can easily make a search trapped
into a local minimum/maximum. Thus, the authors should provide more detailed explanation
on the design principles of the two populations.

We thank the reviewer for this important comment. There is another feature of the
explorative population that we had only mentioned in the Supplementary Information but not
in the main text before. Inspired by Fermi-Dirac statistics, we assign the explorative
population an effective “temperature” that determines the probability of lower fitness
molecules to be propagated to the genetic operators and subsequent generations. Hence,
the explorative population not only consists of high fitness solutions. The idea of this feature
is to increase the chance of escaping local optima. Hence, we added several sentences to
the main text explaining this feature. Additionally, we also explicitly refer to the
corresponding section in the Supplementary Information that explains this feature in more
detail including the equation used to determine the corresponding probabilities.

2. As the authors suggested, the diversity of generated molecules is important for efficient
search of chemical space. The change of the diversity of the two populations during search
should be provided.

We thank the reviewer for this insightful comment. We added a new plot as Supplementary
Figure 4 to the Supporting Information comparing the diversities of the molecules generated
in the unconstrained penalized log P maximization experiments. We find that the diversities
in both the explorative and the exploitative populations are high as long as the selection
pressure is not too strict. Without additional selection pressure, the diversities in the
explorative and exploitative populations are hardly distinguishable. However, we also find
that the diversities are generally significantly lower in the explorative than in the exploitative
population when additional selection pressure is applied. In our view, this shows that JANUS
could be improved even further by increasing the diversities in the explorative population and
possibly even decreasing the diversities in the exploitative population. Accordingly, we also
added a discussion of these observations to the Supporting Information.

3. More details of DNN training should be provided. The authors wrote that the DNN model
for selection pressure was trained at every generation using solutions with known fitness.
Does this mean that the DNN model was trained using both the initial molecules and the
generated molecules until previous generation? How much computational time was required
to train the model at every generation? More detailed computational time profile would be
helpful to readers.

We would like to thank the reviewer for this comment. To follow the recommendation of the
reviewer, we clarified this point by stating explicitly that each generation both the initial
molecules and the generated molecules until the previous generation are used for training.
Additionally, we also added timing information for training the neural network models to the
Methods section of the manuscript. Typical training times are a few minutes per generation.

4. More detailed explanation on docking calculation should be provided. Did authors perform
actual SMINA calculation for every generated molecule? Or did authors used a pre-trained
docking score predictor similar to the imitated inhibition case? If docking calculations were
actually performed, how much computational time did it take? Please provide more details.

We would like to thank the reviewer for this comment. We performed actual docking
calculations via SMINA for every molecule generated by JANUS and we mention this more
explicitly in the updated text to clarify this point. Timing for the docking evaluations we
performed ranges quite drastically, depending on the size of the molecule, and we observed
it to take between 1 and 10 minutes. However, we did not collect and store timing
information for the docking simulations as every generative model tackling this benchmark
task has to perform the same kind of docking simulations so providing this information will
not be useful for their comparison. Accordingly, we think that comparing the number of
property evaluations needed is a better way for comparison between alternative molecular
generative models.

5. Comment: In terms of docking score optimization, several approaches that use docking
score predictors instead of actually performing docking have been suggested:
https://pubs.acs.org/doi/10.1021/acscentsci.0c00229,
https://www.mdpi.com/1422-0067/22/21/11635,
zhttps://pubs.acs.org/doi/abs/10.1021/acs.jctc.1c00810. Please consider citing them.

We thank the reviewer for this comment. We clarified more explicitly that actual docking
calculations with SMINA were performed. In addition, we also mentioned that surrogate
docking score predictors have been developed and added citations to these three papers as
suggested by the reviewer.

6. The caption of supplementary figure 1-(b) is not sufficient. Please provide more details for
readability.

We thank the reviewer for this comment. However, Supplementary Figure 1-(b) does not
exist in our manuscript. Hence, we assumed that the reviewer is referring to what was
Extended Data Figure 1-(b) in the previous version of the manuscript. First, we moved this
figure to the Supplementary Information, it is now Supplementary Figure 2 because Digital
Discovery does not support Extended Data Figures. Second, we followed the
recommendations of the reviewer and provided more details for the caption of (b) to describe
the generation of crossover structures more comprehensively.

Referee: 4
The manuscript presents a new approach for structure enumeration guided by different
models (QSAR or docking) using a surrogate DNN model and a tempered genetic algorithm.
The authors compared performance of the implemented workflow with their previous
approach and other approaches from literature. The study is well and concisely described
with the almost sufficient level of details. The manuscript fully fits the scope of the journal
and can be published after a minor revision.

The major value of generative approaches is enumeration of novel chemically reasonable
and feasible molecules. While the optimization workflow implemented in this study is
reasonable, it was combined with a structure generator which does not take into account
chemical sense and thus resulted in a very large number of artificial structures whose scores
were reported as superior. However, comparison of scores obtained for artificial structures
misleads a reader. While for the penalized logP task one may expect that completely artificial
molecules will be on top, that was not expected for other tasks: imitating inhibition and
molecular docking. The authors did not provide the complete results for the imitated
inhibition task and just showed some examples in the supplementary materials. So, I could
review only results of the docking task. The enumerated molecules are completely artificial
and not relevant. For example, the molecule from gen23 of 5HT1B/5T1B_ga_NN_class
C=C=C=c1c2c([nH]c#ccc3c1=C3)C(C=C=C=C=CCC1=CC=C=C=C=CC=C=C=CC#C1)=C=
C2 contains 9-membered aromatic cycle with a triple bond. This even breaks a formal
Huckel rule. Top scored molecules contain a little or no polar atoms. This means that there is
the issue with the chosen structure generator which was fooled by the target model (in this
case it is docking). This makes the reported scores not very meaningful. I calculated SA
scores for some molecules designed in docking studies and the majority of them had scores
greater than 6, that additionally supports my conclusion about poor quality of generated
structures.

Authors mentioned the issue of synthetically infeasible molecules and suggested that that
SA scores should be included into the metrics. I agree with that and suggest to explicitly
discuss this issue in the paper in a short separate section and report their SA scores for
compounds which were taken into account to calculate target reported scores in imitating
inhibition and docking tasks and provide examples of molecules in the main text. This will
clearly indicate the issue and inform readers about it. This will not reduce the value of the
study as a whole, because the main objective was a new optimization workflow, which works
as expected. The issue here is a chosen structure generator and missing control on
structure feasibility.

We thank the reviewer for this important suggestion. To follow the recommendation of the
reviewer, we decided to investigate this important issue more systematically by evaluating
four distinct metrics quantifying synthesizability and structural complexity, namely the
SAscore, the SCScore, the SYBA score and the RAscore. In the imitated inhibition
benchmark tasks, in addition to referring more explicitly to the subset of structures provided
in the Supplementary Information, we also generated histograms of all four metrics across
the generated molecules that passed the respective benchmark constraints. These results
suggest that when the SAscore is not used to assess the molecules, the generated
structures can be highly infeasible. Our results also show that when the SAscore is used to
evaluate the molecules, the corresponding structures are regarded as more likely to be
synthesizable. Furthermore, in the molecular docking benchmark tasks, we generated plots
that show the trajectory of these four synthesizability metrics across all generated molecules
in each generation. These results suggest that the molecules are getting less likely to be
synthesizable as the docking scores increase. This supports our suggestion that it is
important to incorporate synthesizability evaluations explicitly in molecular design
benchmarks. Additionally, we believe that these additional paragraphs inform the reader
directly of these problems in the presented benchmark results and the corresponding
generated molecules.

Other remarks:
1. Please publish results for imitating inhibition to enable their analysis.

We thank the reviewer for this important point that we had missed before. We have followed
the recommendation of the reviewer and uploaded the corresponding results to our GitHub
repository.

2. As I understood, on each iteration a new DNN model was constructed. How thresholds for
classifiers were selected for each task? I did not find this in the manuscript and
supplementary materials. It could happen that the number of highly scored molecules grew
fast and the training set may become highly imbalanced if the thresholds were kept constant.
Did you observe this? If so, how did you solve this issue?

We would like to thank the reviewer for this important comment. The thresholds are not kept
constant but change every generation depending on the fitness values of all the molecules in
the training set. The user only needs to choose the percentage corresponding to the
percentile that determines the threshold between “good” and “bad” molecules each
generation. Accordingly, to follow the recommendation of the reviewer, we described this
procedure in detail in the Methods section of the updated manuscript.

3. Which 51 rdkit descriptors were used for DNN modeling? They are not explicitly
mentioned.

We thank the reviewer for this comment about this oversight on our part. To follow the
recommendation of the reviewer, we added an explicit description of the 51 descriptors that
were used to the Supplementary Information of the manuscript as Section S5.

4. Please provide a little bit more details about DNN modeling building. In particular, how
training and test sets were created on each iteration.

We would like to thank the reviewer for this comment. To follow the recommendation of the
reviewer, we have added additional details and clarifications for the DNN model building to
the Methods section.

5. Please provide version numbers for software used in the study.

We would like to thank the reviewer for this comment. Explicit software version numbers are
now provided in the Methods section of the updated manuscript and we also adapted the
package requirements in our GitHub repository.

Kind regards,
Pavel Polishchuk

FILES TO PROVIDE WITH YOUR REVISED MANUSCRIPT:
Please delete any redundant files before completing the submission.
• A point-by-point response to the comments made by the reviewers
• Your revised manuscript with any changes clearly marked (PDF file)
• Your revised manuscript as a TEX file including figures, without highlighting, track
changes and a final PDF version including figures.
High quality images as separate numbered Figures, Schemes or Charts in .tif, .eps or .pdf
format, with a resolution of 600 dpi or greater.
• A table of contents entry: graphic maximum size 8 cm x 4 cm and 1-2 sentence(s) of
editable text, with a maximum of 250 characters, highlighting the key findings of the work. It
is recommended authors make use of the full space available for the graphic

As requested, we added both a table of contents graphic and a table of contents text to the
end of the main manuscript after the references.

• Your revised Electronic Supplementary Information

Editor’s decision letter

19-Apr-2022

Dear Professor Aspuru-Guzik:

Manuscript ID: DD-ART-01-2022-000003.R1
TITLE: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy from CASRAI, https://casrai.org/credit/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************

Reviewer comments

Reviewer 1

The authors have addressed my concerns
Jan Jensen

Reviewer 4

Authors revised the manuscript considerably but there are still some issues.

I cannot agree with the statement that the problem with low rate of synthetically feasible molecules in the output is the issue of the benchmarks (imitated inhibition and docking). This is the issue of structural generator which ignores synthetic accessibility by default and thus requires additional restrictions. I suggest to correct corresponding sentences.

I suggest to rewrite the conclusion part. It should be mentioned that while the developed workflow can generate molecules with higher scores than reference approaches the large part of generated molecules are synthetically unfeasible and consideration of synthetic feasibility should be included in the future analysis of such results. The corresponding changes should be made in the abstract as well.

I also suggest to add representative structures with highest docking scores or structures corresponding to the mode score to the main text to better inform a reader about what kind of structures are generated in each task. Even in supplementary the examples of structures generated in docking tasks were not provided.

I suggest to plot distributions of synthetic accessibility scores of molecules which docking scores were counted in table 4 in a separate figure. Now, synthetic accessibility scores are given per iteration that hides the real picture.

This issue was overlooked in my previous review. Smina, which was used for docking in this study, cannot work properly with macrocycles because it cannot sample conformations of ring systems. Since, there are a lot of macrocycles in docking output, the results were affected by this issue and this should be mentioned in the text.

It is better to replace Figure S15 with a table, because currently it is hard to understand the real distribution. It is possible to split the whole range 0-1 on bins and report occupancy of each bin.

Pavel Polishchuk

Reviewer 3

Authors successfully addressed all my concerns.

Author response

Referee: 4
Authors revised the manuscript considerably but there are still some issues. I cannot agree with the statement that the problem with low rate of synthetically feasible molecules in the output is the issue of the benchmarks (imitated inhibition and docking). This is the issue of structural generator which ignores synthetic accessibility by default and thus requires additional restrictions. I suggest to correct corresponding sentences.
We thank the reviewer for this important remark. We agree with the reviewer insofar as the problem can be considered to lie in the structure generator which ignores synthetic accessibility. However, we disagree with the reviewer that the problem cannot equally be considered to lie in the benchmarks which ignore synthetic accessibility. It is clear that a structural generator that explicitly accounts for synthetic accessibility will produce structures that are more synthetically feasible. However, our results in the imitated inhibition benchmark demonstrate that a structure generator that does not account for synthetic accessibility in conjunction with a benchmark that does explicitly account for synthetic accessibility will also result in the generation of more synthetically feasible structures. Hence, we believe there are two equally valid approaches to overcome this issue. They are two sides of the same coin. It is a priori unclear which one of the two will produce better results and we believe that will likely depend on the problem at hand. To acknowledge the recommendation of the reviewer, we changed wording throughout the manuscript to reflect this dichotomy more appropriately.

I suggest to rewrite the conclusion part. It should be mentioned that while the developed workflow can generate molecules with higher scores than reference approaches the large part of generated molecules are synthetically unfeasible and consideration of synthetic feasibility should be included in the future analysis of such results. The corresponding changes should be made in the abstract as well.
We thank the reviewer for this important remark. To follow the recommendation of the reviewer, we implemented the proposed changes in both the abstract and the conclusions.

I also suggest to add representative structures with highest docking scores or structures corresponding to the mode score to the main text to better inform a reader about what kind of structures are generated in each task. Even in supplementary the examples of structures generated in docking tasks were not provided.
We thank the reviewer for this comment. To follow the recommendation of the reviewer, we decided to add a figure with examples of structures generated in the docking tasks to the SI and refer to the corresponding figure in the main text as most of the structures provided for the other benchmarks are also in the SI.

I suggest to plot distributions of synthetic accessibility scores of molecules which docking scores were counted in table 4 in a separate figure. Now, synthetic accessibility scores are given per iteration that hides the real picture.
We thank the reviewer for this remark. To follow the recommendation of the reviewer, we plotted the corresponding histograms and added the one for the SYBA score to the main text and referred to the others which we added to the Supplementary Material.

This issue was overlooked in my previous review. Smina, which was used for docking in this study, cannot work properly with macrocycles because it cannot sample conformations of ring systems. Since, there are a lot of macrocycles in docking output, the results were affected by this issue and this should be mentioned in the text.
We thank the reviewer for pointing out this important issue. Accordingly, we added a remark to the text stating that the algorithm used cannot handle macrocycles properly and, hence, the corresponding docking scores are unreliable.

It is better to replace Figure S15 with a table, because currently it is hard to understand the real distribution. It is possible to split the whole range 0-1 on bins and report occupancy of each bin.
We thank the reviewer for pointing out the deficiencies of Figure S15. To alleviate the corresponding problems, we made two changes. First, we modified the plots in Figure S15 and decreased the total number of bins which leads to a larger bin width and makes the corresponding distributions easier to grasp in our view. Additionally, as recommended by the reviewer, we also provided the areas of the corresponding bins in Supplementary Table 4 which is placed directly after Figure S15.

Pavel Polishchuk

Editor’s decision letter

03-May-2022

Dear Professor Aspuru-Guzik:

Manuscript ID: DD-ART-01-2022-000003.R2
TITLE: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

Thank you for publishing with Digital Discovery, a journal published by the Royal Society of Chemistry – connecting the world of science to advance chemical knowledge for a better future.

With best wishes,

Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

From the journal Digital Discovery Peer review history

Round 1

Reviewer 1

Reviewer 2

Reviewer 3

Reviewer 4

Round 2

Reviewer 1

Reviewer 4

Reviewer 3

Round 3

Transparent peer review