Peer review - Enhancing diversity in language based models for single-step retrosynthesis

08-Dec-2022

Dear Dr Toniato:

Manuscript ID: DD-ART-10-2022-000110
TITLE: Enhancing diversity in language based models for single-step retrosynthesis

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************

Reviewer comments

Reviewer 1

The reviwed manuscript proposed a prompt-based model for single-step retrosynthesis. The main objective of the manuscript is to increase the diversity of predicted disconnection strategies in single-step models. The approach presented better performance on class diversity.
Following are my comments about the manuscript:
1. More comparisons with the latest works (covering Template-based, SMILES-based, and Graph-based methods) should be considered. Especilally, the comparison with a recent prompt-based work "Unbiasing Retrosynthesis Language Models with Disconnection Prompts", which used a different disconnection prompt tactic, should be shown in the article.
2. I have some concerns about the designed prompt token. The strategy indeed produces various combinations of rectants, but may not each disconnection strategy is suitable for all molecules so that produce more invalid result simultaneously. I wonder if there is such a phenomenon during experiments.
3. In Conclusions and Outlook, the article may need intensely discuss the advantages and disadvantages of the author’s prompt strategies.
4. Combining t-SNE projection from different models into a single Figure may clearly demonstrate model performance.
5. More descriptions of details of the 12 class tokens should be provided to make the paper easier to understand.

Reviewer 2

# codes:

1. To reproduce the code, I found one file 'preprocessed_rxn_cluster_token_prompt.vocab.pt' required for model training is missing.
Therefore I checked "No" at 6b (Are scripts to reproduce the findings in the paper provided? ).

# general/technical comments:

1. In the main text, the authors show decreased accuracy and round-trip accuracy when using cluster token prompt models (table 1) on the Pistachio dataset, which seems reasonable. I am curious about the result shown in Table 2 (USPTO-50K) not following the same trend. Instead, the accuracy and round-trip accuracy increase a lot compared to the baseline model. Can the authors comment on the reason that makes the difference?

2. In the last few sentences on page 3, the authors mentioned
"This happens for example if the model predicts an identical set of reactants (i.e. molecules into which the target is disconnected) and a different solvent "
Does that mean the retrosynthesis model was also trained to predict the reagent as well? If that is the case, one may imagine low diversity in predictions since the same reaction can be conducted with multiple different solvents/reagents/catalysts. Is it possible that the authors focus on predicting the main reactants to avoid the duplication of prediction?

3. Overall, the idea of adding reaction classes as prediction conditions is an interesting idea. Unfortunately, the practical improvement in accuracy seems less clear (higher diversity, but lower accuracy/round-trip accuracy). Any suggestion to increase the accuracy?

Reviewer 3

1. top1? pg 5 first paragraph. this might be a formatting error. I see it later on in the paper. But this caught me a little off guard. Does it stand for top one?
2. I am not convinced that the classification at the beginning is necessary. Can you provide more justification. Seem like classification can be done after the prediction.
3. 2‘447‘596. Is this a type or formatting issue?
4. In figure 3 how do you define these metrics? For example, what is convergence based on?
5. Does the 12clusters only consider the first level classification or do all the models consider the classification?
6. What was your reason for using the Pistachio database? I see you also used US Patents. What about web scrapping? I see explanation in Data section of Pistachio but no justifiction for its use. Maybe it was the best database available?
7. In section 3.2, I am trying to figure out what is different from the Schwaller et al approach of retrosynthesation. I see the classification token you added at the beggining but it was not obvious if that was necessary. Is the forward prediction step your new contribution. Please elaborate.
8. This statement in your conclusion is false, "Our work is the first approach tackling and analysing diversity directly"
9. If your major contribution or finding in this research is the forward step you need to provide more details. This is interesting but you need to provide more analysis.
10. Ultimately, in these retrosysthesis techniques you will be still be limited by the database. If the database is missing a species, which the reader is not sure based on how the manuscript is written, this will limit. For example, if there are no species that contain fluorine in the Pistacio dataset, you certainly will not get a fluorine contain structure out from this model. This should be mentioned or addressed in the manuscript.
11. I can tell you spent a great deal of time producing figures but I would ask the author to spend time and make sure you are getting the message across that you intend. This article seems like advertisement for IBM. I am not sure anyone can reproduce these results. Did you use the IBM Cognos Transfomer software? A few more details would increase the reproducibility of this research.

Author response

We thank the three reviewers for their comments and valuable input.

Review 1
The reviwed manuscript proposed a prompt-based model for single-step retrosynthesis. The main objective of the manuscript is to increase the diversity of predicted disconnection strategies in single-step models. The approach presented better performance on class diversity.

1.1 Reviewer’s points

• More comparisons with the latest works (covering Template-based, SMILES-based, and Graph-based methods) should be considered. Especially, the comparison with a recent prompt-based work “Unbiasing Retrosynthesis Language Models with Dis- connection Prompts”, which used a different disconnection prompt tactic, should be shown in the article.
ANSWER: Based on the reviewer’s suggestion we have extended the ‘Introduction’ section with more specific information about previous work. Our manuscript focuses on language-based model and the intention was to demonstrate the increased diversity compared to a language-based model baseline. A quantitative comparison to template and graph-based models is not directly applicable as first, none of them trains a model on the Pistachio dataset and second, on the USPTO-50k dataset they don’t report performances averaged across different splits, nor values of class-diversity, coverage and round-trip accuracy. Moreover, a recent publication (H. Tu, S. Shorewala, T. Ma and V. Thost, NeurIPS 2022 AI for Science: Progress and Promises, 2022) highlights the difficulty to set up locally or on a GPU machine many of the previous works. For what concerns the work “Unbiasing Retrosynthesis Language Models with Disconnection Prompts” (on ChemRxiv from 20 Sep 2022), it is a follow-up work with respect to the one we are presenting, exploring on a different prompt strategy. Our manuscript was de-anonymized on OpenReview since 25 Jul 2022.

•. I have some concerns about the designed prompt token. The strategy indeed pro- duces various combinations of reactants, but may not each disconnection strategy is suitable for all molecules so that produce more invalid result simultaneously. I wonder if there is such a phenomenon during experiments.
ANSWER: Indeed such phenomenon can be observed (drop in the round-trip ac- curacy). In the manuscript we report the following “The decay (of the round-trip accuracy) is more consistent for models utilizing a greater number of tokens (12clusters, 12clustersKmeans). Note that this is to be expected, since we are asking for disconnection conditions that may be impossible to satisfy for some input molecules. However, a high value of coverage guarantees at least one proposed valid disconnection per input molecule.” In addition, the following observations can be made: the prompting approach works as a ‘soft conditioning’, so prompting with a certain class does not necessarily output a prediction belonging to that class. This reduces drastically the generation of invalid predictions, as opposed to template-based strategies where the application of a reaction template is either successful or unsuccessful.

• In Conclusions and Outlook, the article may need intensely discuss the advantages and disadvantages of the author’s prompt strategies.
ANSWER: We extended the section ‘Conclusions and Outlook’ in order to be more exhaustive on the advantages of the method.

• Combining t-SNE projection from different models into a single Figure may clearly demonstrate model performance.
ANSWER: The t-SNE projections reported in the manuscript are generated based on the clustering performed on the training set reaction fingerprints prior to training. We added this specification to the manuscript. In this light, the authors believe that combining them will not lead to added information. In addition, we also com- puted the t-SNE projection from the fingerprints of the top24 predictions for the baseline and 12tokens models, on a sample of 1000 targets from the test set. The plot is provided as a separate file (tsne_predictions.pdf). We observe that the plot is not informative, because it cannot make a distinction between a model that predicts reactions of different classes, but on average the same class within one sample predictions (as the baseline), and a model that on the contrary predicts reactions of different classes within one sample predictions (like the 12tokens model).

• More descriptions of details of the 12 class tokens should be provided to make the paper easier to understand.
ANSWER: We modified the manuscript in order to be more exhaustive on the ex- planation of the 12 class tokens, adding adequate references. (Section 2.1)

Review 2
To reproduce the code, I found one file Jpreprocessed rxn cluster token prompt.vocab.ptJ required for model training is missing. Therefore I checked ”No” at 6b (Are scripts to reproduce the findings in the paper provided? ).
ANSWER: We apologize for the inconvenience. The error was caused by a non existing preprocessing directory. We now fixed it. We thank the reviewer for testing the code.

2.1 Reviewer’s points
• In the main text, the authors show decreased accuracy and round-trip accuracy when using cluster token prompt models (table 1) on the Pistachio dataset, which seems reasonable. I am curious about the result shown in Table 2 (USPTO-50K) not following the same trend. Instead, the accuracy and round-trip accuracy increase a lot compared to the baseline model. Can the authors comment on the reason that makes the difference?
ANSWER: We believe that the increased accuracy with respect to the baseline can be ascribed to the size and easiness of the open-source dataset. Indeed, a model trained on USPTO 50k sees less examples from each of the classes and this gives more specificity to the conditioning token, which gives an additional ‘hint’ to the model for the prediction with respect to the baseline (higher topn). Then, being the task easier (only reactants), the round-trip accuracy shows also an increase. For Pistachio this does not happen because the reaction space is much larger and diverse and includes reagents. The conditioning in this case has access to more reactions and therefore many predictions can include the original disconnection with different reagents (lower topn accuracy), which the proxy model might not be confident enough to validate (lower round-trip accuracy). We added this discussion to the manuscript.

• In the last few sentences on page 3, the authors mentioned ”This happens for example if the model predicts an identical set of reactants (i.e. molecules into which the target is disconnected) and a different solvent ”Does that mean the retrosynthesis model was also trained to predict the reagent as well? If that is the case, one may imagine low diversity in predictions since the same reaction can be conducted with multiple different solvents/reagents/catalysts. Is it possible that the authors focus on predicting the main reactants to avoid the duplication of prediction?
ANSWER: It is correct that the Pistachio model was trained to predict the reagents as well. Predicting all the precursors at once has been common practice in several groups for a long time, especially when language-based models are used. This allows models to support without special attention reactions where the reactant–reagent distinction is subtle; in addition, we believe that reagents are, from a chemistry perspective, useful for the general understanding of the mechanism. Independently of this preference, we have observed that removing the reagents does not improve diversity by itself: if a reagent based model was predicting the same reactants with different reagents, the corresponding reagent-free model was just predicting multiple times the same reactants. We added a sentence in section 3.2 to support the choice of not making this distinction.

• Overall, the idea of adding reaction classes as prediction conditions is an interesting idea. Unfortunately, the practical improvement in accuracy seems less clear (higher diversity, but lower accuracy/round-trip accuracy). Any suggestion to increase the accuracy?
ANSWER: Different publications have been questioning the suitability of the top- 1/topN accuracy for single-step retrosynthesis models (See for instance: Schwaller et al., Chem. Sci., 2020, 11, 3316-3325; Lin et al., J. Cheminf., 14, 2022, 15). We believe that the round-trip accuracy is more appropriate, but it does not take into consideration that the predictions for a sample, even if correct, can all collapse into one. This happens for example if the model predicts an identical set of reactants multiple times (or for the case with reagents, multiple times the same reactants and a different solvent for example). Class diversity, instead, takes this into account. A class diversity of 5 means that there are at least 5 valid predictions that are fairly different. The baseline has an average class diversity of 1.9 for 20 predictions, so even if all predictions are valid, on average only 2 are interesting because of being distinctly different from one another. If one were to aim at increasing the accuracy (and as a consequence probably the round-trip accuracy), one could consider weight- ing differently the examples in the training set which were not correctly predicted. We updated section 2.2 to clarify the discussion on the metrics.

Review 3

3.1 Reviewer’s points
• top1? pg 5 first paragraph. this might be a formatting error. I see it later on in the paper. But this caught me a little off guard. Does it stand for top one?
ANSWER: Yes, top1 prediction stands for ‘top one’. Meaning the prediction that was ranked the highest by the model. Collecting all the top1 predictions for, let’s say, a 10 cluster-token model leads to a set of disconnections more diverse than the top10 (top ten) outputs of a regular Transformer model without prompts. We added to the manuscript the specification where top1 was first used.

• I am not convinced that the classification at the beginning is necessary. Can you provide more justification. Seem like classification can be done after the prediction.
ANSWER: The classification/clustering is needed at training time. Indeed, the in- put training samples (product SMILES) have the classification/cluster token prepended. This token is the class for the NameRXN class-based models (e.g. the ‘12clusters’ model) and the cluster id for the models where the classification was obtained through clustering reaction fingerprints. This prompt-based training is exactly what allows the greater diversity at prediction time.

• 2‘447‘596. Is this a type or formatting issue?
ANSWER: This is a formatting choice for the number 2447596, we can adjust the formatting to what is most suitable for the Journal. We changed it to 2 447 596.

• In figure 3 how do you define these metrics? For example, what is convergence based on?
ANSWER: The metrics in Figure 3 are thoroughly defined in section 3.4 . We added a note to the Figure pointing to the relevant section. These metrics are used for the model evaluation, after training. During training the models are trained to minimize the maximum-likelihood objective as in standard sequence-to-sequence models (G. Klein, Y. Kim, Y. Deng, J. Senellart and A. Rush, Proceedings of ACL 2017, System Demonstrations, 2017, pp. 67–72). Model convergence is checked on the value of the loss function. Training was stopped at 260000 steps, as no further improvement in the loss was observed at later checkpoints. We made this more clear in the revised manuscript.

• Does the 12clusters only consider the first level classification or do all the models consider the classification?
ANSWER: The 12clusters model considers the first level classification for the training. The 3clustersRandom and the 4clustersRandom models group together the tokens of the first level classification for training. The other Kmeans models instead don’t use the NameRXN classification during training, but reaction clusters based on reaction fingerprints (P. Schwaller, D. Probst, A. C. Vaucher, V. H. Nair, D. Kreutter, T. Laino and J.-L.Reymond,Nat. Mach. Intell., 2021,3, 144–152). The first level NameRXN classification is used then for all models at testing time to compute the class diversity: here the first level class is checked in order to know the average number of different classes in the valid predictions.

• What was your reason for using the Pistachio database? I see you also used US Patents. What about web scrapping? I see explanation in Data section of Pistachio but no justification for its use. Maybe it was the best database available?
ANSWER: Pistachio (proprietary by NextMove Software) is, similarly to the commonly used USPTO dataset(s), based on patents. The main difference is that NextMove Soft- ware keeps updating it with newly published patents and with improvements to the extraction software, which makes it a superior dataset. The USPTO-50k was used as an alternative dataset because, while limited in its diversity, it is the common way to test AI-based retrosynthesis models, it is freely available and it will enable others to reproduce our results. We have not yet considered web scrapping because as of today it does not exist a method accurate enough to extract the reaction in text (SMILES) format from the web. We added this justification to the manuscript.

• In section 3.2, I am trying to figure out what is different from the Schwaller et al approach of retrosynthesation. I see the classification token you added at the beggining but it was not obvious if that was necessary. Is the forward prediction step your new contribution. Please elaborate.
ANSWER: Section 3.2 only presents the Methods, so the architecture used for the model. This architecture is the same that Schwaller et al. used, but we introduced prompt tokens at training time to build a conditional model. This was done to overcome the bias introduced by reaction classes which are better represented in the dataset than others and to increase the diversity of the predictions for the retrosynthesis task. As explained in the Introduction “To increase the diversity of the predictions in single-step text-based retrosynthesis models and counteract the effect of imbalanced datasets, we propose a prompt-based scheme to enhance and guide more diversity in the language model predictions. We introduce a modified transformer-based model. Inspired by works in natural language processing for prompt-based learning, we show that concatenating a class information during training (as an additional token), leads to more diverse predictions at inference.” Increasing diversity is necessary because it does not exist a unique way to disconnect a target molecule, but multiple. However in traditional approaches models are trained to synthesize a molecule is a sole way. This is at odds with the chemistry. This is why the prompting-scheme was introduced with this manuscript.

• This statement in your conclusion is false, ”Our work is the first approach tackling and analysing diversity directly”
ANSWER: To the best of our knowledge, the statement is true. As of the day on which our work was put on archive (de-anonymized on OpenReview since 25 Jul 2022) this was the first work on diversity. We updated the conclusion to be more specific: “Our work is the first AI approach tackling and analyzing retrosynthetic diversity directly”. We are not aware of any prior work, but will be glad to include it should anyone make us aware of such a work.

• If your major contribution or finding in this research is the forward step you need to provide more details. This is interesting but you need to provide more analysis.
ANSWER: We clarified our contribution in one of the previous questions of the referee.

• Ultimately, in these retrosysthesis techniques you will be still be limited by the database. If the database is missing a species, which the reader is not sure based on how the manuscript is written, this will limit. For example, if there are no species that contain fluorine in the Pistachio dataset, you certainly will not get a fluorine contain structure out from this model. This should be mentioned or addressed in
the manuscript.
ANSWER: This is a problem of any AI model. The primary source of information is the data and of course if in the data there is not a specific atom, this will not be created from nowhere. However, the advantage of these approaches lies in their speed, scalability, transferability to other datasets. The ability of AI based systems to reason over a great amount of data can carry insights which a rule-based expert system can overlook.

• I can tell you spent a great deal of time producing figures but I would ask the author to spend time and make sure you are getting the message across that you intend. This article seems like advertisement for IBM. I am not sure anyone can reproduce these results. Did you use the IBM Cognos Transfomer software? A few more details would increase the reproducibility of this research.
ANSWER: We outlined our procedure in the Sections 3 and 4 of the manuscript. In particular, in Section 4 we provide the link to our open source github repository https:// github.com/rxn4chemistry/rxn_cluster_token_prompt which contains the code to train the model and fully reproduce the results. We also trained the model on USPTO-50k dataset, in an explicit effort to make the results reproducible by others; we provide also a link to download the models trained on this data USPTO-50k. The models trained on Pistachio cannot be shared because of proprietary licensing. “IBM” is mentioned only once in the manuscript, in the “availablility” section, as a reference to the website where users can try the trained model for themselves in a graphical user interface.

Editor’s decision letter

01-Feb-2023

Dear Dr Toniato:

Manuscript ID: DD-ART-10-2022-000110.R1
TITLE: Enhancing diversity in language based models for single-step retrosynthesis

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after minor revisions that directly address reviewer #2's comments.

When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************

Reviewer comments

Reviewer 1

The authors have revised all my comments.

Reviewer 2

In my opinion, getting higher diversity at the expense of a significantly dropped round-trip accuracy (13-18%) may not be a convincing demonstration for the approach. That being said, addressing diversity in retrosynthesis prediction is an interesting question, so the paper could be published. However, in the present Abstract and Conclusion, this aspect of significant round-trip accuracy drop is not mentioned (Abstract), or not clear enough (Conclusion), and so I suggest the authors clarify these points.

One more question regarding the accuracy drop: The authors mention in the response letter, "we have observed that removing the reagents does not improve diversity by itself: if a reagent-based model was predicting the same reactants with different reagents, the corresponding reagent-free model was just predicting multiple times the same reactants." I am not sure what the authors mean by "the reagent-free model". Does it mean the new model retrained with all the reagents information removed? If the authors remove the reagents information and only focus on the prediction of reactants, does the round-trip accuracy still decrease as much?

Reviewer 3

Revision make it clearer on what your contribution is.

I do think this is a true statement, "Our work is the first AI approach tackling and analysing retrosynthetic diversity directly." I would recommend removing this sentence completely.

Author response

We thank the reviewers for their accurate analysis of the manuscript.

RESPONSE to REVIEWER 2:

1) In my opinion, getting higher diversity at the expense of a significantly dropped round-trip accuracy (13-18%) may not be a convincing demonstration for the approach. That being said, addressing diversity in retrosynthesis prediction is an interesting question, so the paper could be published. However, in the present Abstract and Conclusion, this aspect of significant round-trip accuracy drop is notmentioned (Abstract), or not clear enough (Conclusion), and so I suggest the authors clarify these points.

ANSWER: We have clarified the point in the Conclusions. We report here the paragraph: "The decreased validity of the disconnections is softened by the nature of the prompts which act as ‘soft-conditioning’ terms as opposed to the valid/invalid application of reaction templates. Higher diversity comes at the cost of a drop of around 15 percentage points in the top20 round-trip accuracy. However, round-trip accuracy is still a noisy metric that depends on the applicability domain of the forward reaction prediction model and does not take duplicates into account and as such cannot be fully trusted."

2) One more question regarding the accuracy drop: The authors mention in the re-sponse letter, ”we have observed that removing the reagents does not improve diversity by itself: if a reagent-based model was predicting the same reactants with different reagents, the corresponding reagent-free model was just predicting multiple times the same reactants.” I am not sure what the authors mean by ”the reagent-free model”. Does it mean the new model retrained with all the reagents information removed? If the authors remove the reagents information and only focus on the prediction of reactants, does the round-trip accuracy still decrease as much?

ANSWER: Yes, the ”reagent-free model” is a model retrained with all the reagents information removed, focusing only on the prediction of reactants. In our experience, reagent-free models do not increase diversity by themselves, as they often predict identical sets of reactants (in a different order, or with non-canonical SMILES string), or are incorrect. The relevance of the round-trip accuracy is, as in the answer above, limited. While the comparison between the two types of models (reactants only vs reactants+reagents) in terms of diversity would be interesting, the current work focuses on the latter ones, as they are commonly preferred (see https://doi.org/10.1039/C9SC05704H). Performing a comprehensive comparison of class prompt models with and without reagents lies outside of the scope of the current work.

Editor’s decision letter

15-Feb-2023

Dear Dr Toniato:

Manuscript ID: DD-ART-10-2022-000110.R2
TITLE: Enhancing diversity in language based models for single-step retrosynthesis

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

******
******

Please contact the journal at digitaldiscovery@rsc.org

************************************

DISCLAIMER:

This communication is from The Royal Society of Chemistry, a company incorporated in England by Royal Charter (registered number RC000524) and a charity registered in England and Wales (charity number 207890). Registered office: Burlington House, Piccadilly, London W1J 0BA. Telephone: +44 (0) 20 7437 8656.

The content of this communication (including any attachments) is confidential, and may be privileged or contain copyright material. It may not be relied upon or disclosed to any person other than the intended recipient(s) without the consent of The Royal Society of Chemistry. If you are not the intended recipient(s), please (1) notify us immediately by replying to this email, (2) delete all copies from your system, and (3) note that disclosure, distribution, copying or use of this communication is strictly prohibited.

Any advice given by The Royal Society of Chemistry has been carefully formulated but is based on the information available to it. The Royal Society of Chemistry cannot be held responsible for accuracy or completeness of this communication or any attachment. Any views or opinions presented in this email are solely those of the author and do not represent those of The Royal Society of Chemistry. The views expressed in this communication are personal to the sender and unless specifically stated, this e-mail does not constitute any part of an offer or contract. The Royal Society of Chemistry shall not be liable for any resulting damage or loss as a result of the use of this email and/or attachments, or for the consequences of any actions taken on the basis of the information provided. The Royal Society of Chemistry does not warrant that its emails or attachments are Virus-free; The Royal Society of Chemistry has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted. Please rely on your own screening of electronic communication.

More information on The Royal Society of Chemistry can be found on our website: www.rsc.org

From the journal Digital Discovery Peer review history

Round 1

Reviewer 1

Reviewer 2

Reviewer 3

Round 2

Reviewer 1

Reviewer 2

Reviewer 3

Round 3

Transparent peer review