From the journal Digital Discovery Peer review history

De novo generated combinatorial library design

Round 1

Manuscript submitted on 26 May 2023
 

17-Aug-2023

Dear Dr Johansson:

Manuscript ID: DD-ART-05-2023-000095
TITLE: de novo generated combinatorial library design

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

Authors present a multiple optimization approach (questionable the use of “artificial intelligence” as stated in the introduction” To design combinatorial libraries. I t is not clear the novelty of this work vs. the large number of publications regarding combinatorial library design. In this reviewers opinion, the methodology and data, as presented, is not entirely novel and it not reproducible. The manuscript can be improved and paper could be considered for publication in Digital Discovery after addressing the following major points:
Major
1. Authors must address the previous efforts done by the scientific community for combinatorial library design. There is plenty of literature in the 1990´s that are not considered in this paper.
2. In the introduction, discuss the successful applications of combinatorial libraries in drug discovery as well as their shortcomings.
3. Application example: Authors should justify and explain more about the scaffold used as an application example (Figure 2).
4. In the Introduction, comment on the availability of combinatorial libraries on-demand and combinatorial virtual libraries in the public domain.
5. Page 2. In the statement “explore the differences in optimized libraries between using available building blocks and commercially available building blocks.” Specify if you mean building blocks available at AstraZeneca? Or available exactly where?
6. Methods (page 2). It is stated: “blocks, followed by use of retrosynthesis prediction models to estimate if the building blocks are available in a defined stock data set”. This must be corrected. If the building block are available in a stock data set, they can be easily identified by a search analysis. There is no need for a retrosynthesis prediction.
7. Methods (page 3), target activity model. The QSAR model is described very succinctly. It is not possible to reproduce the data with the description provided. Why use a random forest? What are the 50 estimators and the interpretation of the model? Evermore, what is the (biological) target? Exactly what data from ChEMBL was used and why?
Others
8. Previous works regarding de novo design and multiple-objective parametrization should be acknowledged and discussed.
9. Authors cite only high-impact publications for AI applications (ref, 1,2), but there are outdated. Focus on publications in the last 1-2 years.
10. Page 2. The statement “Structural diversity (ECFP6)57 “ is not correct. It is missing the similarity coefficient and state explicitly the meaning of the abbreviation “ECFP6”. Furthermore, add a justification for why such a fingerprint was used.

Reviewer 2

The manuscript by Simon Viet Johansson et al., reports a new method to design libraries from de novo generated building blocks. They introduce the use of k -Determinantal Point Processes and Gibbs sampling as a way to select building blocks. Compared to previous methods they cite, they notably add a filter to check if the building blocks are available on the eMolecules building block platform, or if they seem synthetically accessible using a template-based retrosynthesis prediction tool. I have a few major and minor questions/comments for the authors:

Major
1. While I understand that the QSAR model is not the main point of the study, this model is used (1) for the LibINVENT reinforcement learning run, the (2) library selection, and (3) as a criterion to judge the quality of the the author’s proposed framework. Thus, given the importance of the QSAR model in this study, I have two major concerns.
First, any bias resulting from the data split to train the model will be reflected in the analysis downstream. However, the data split was just done with a train and a validation set, and without a test set. That means the model overfit the validation set during parameters optimization while training.
Additionally, the method to split the data is not mentioned (random, scaffold, time?). The authors should clearly mention what was done, and what are the limitations of the chosen approach (e.g., if random split, list the limitations of such a split on the QSAR predictions). This is important, notably, when discussing the benefit or not of adding new building blocks given the QSAR scores, as the applicability domain of the QSAR model/training method/data split will influence the conclusions made there.
2. The authors only compared their method to random selection. In general, for new computational methods, I think either experimental validation (probably not needed here), or comparison to another computational method (preference here) should be done to validate their approach. For example, comparing to the methods the authors cite in their introduction (refs 42, 43, and 45), and/or to a baseline like the RDKit MaxMin Picker combined with a simple filter for high QED score/high QSAR score.
3. How is the diversity computed? What is its domain of definition? These elements should be clarified and emphasized in the text. Additionally, some examples to relate set of compounds of high and low diversity (e.g., a figure in the supplementary material giving the diversity score of five similar compounds [with a couple of atoms differences], and the same for five randomly sampled molecules) would be helpful for the audience I believe.


Minor
1. Something that is a little bit confusing to me is that, from the abstract, the paper seems to be only about de novo building blocks (“We propose a framework for designing combinatorial libraries from de novo generated building blocks”). However, checking for the existence of the generated building block is part of their method, thus the framework will consider existing building blocks, i.e., that are not novel. This should be clarified in the abstract and the introduction, as all the results in the tables and in the discussion (if I’m not mistaken) consider also the use of the building blocks found on eMolecules.
2. The first part of the introduction is notably about the size limit of compound libraries that can be created, compared to virtual ones. It would probably be good to also mention DNA encoded libraries in that context.
3. What do the authors mean by “using a pessimistic bias” in the “Target activity model” section? This should be clarified.
4. The “QSAR values” seems to go up to 1.0, but in the “Target activity model” section, it is mentioned that compounds are selected based on pXC50. The relation between the two should be clarified.
5. It’s unclear if the QSAR model was trained as a regressive or as a classification model. This should be clearly stated in the “Target activity model” section.
6. What was the motivation to use a 0.8 threshold on the QSAR model? And what does exactly the threshold refer to? This question is overall linked to point 5.
7. The authors should list what are all the measurements part of the pXC50 (e.g., IC, etc, ?)
8. The authors say “The parameters chosen both for generative modelling and retrosynthesis let both models run for a longer time, 1000 epochs compared to 100 during generation and 5 minutes instead of 2 for retrosynthesis evaluation, than previous uses of the same architectures. This yields more output building blocks and solves more routes than previous use in demonstrated studies”. It would be nice to have a plot showing the number of output building blocks in function of the number of epochs.
9. Link to point 8., why did the authors choose to risk over-exploiting the QSAR model, vs using a higher temperature to sample more, and more diverse building blocks for example?
10. Figure 1 is a good idea and important, but very hard to read (maybe because of the journal formatting?). It would be very helpful to improve it for the readers, to at least be readable.
11. In the Tables, there is the column LogDet, but it’s actually never mentioned in the main text. Either the authors should discuss that value in the text, or remove it from the table. If they keep it, a clearer definition of its possible values could be helpful for the readers (e.g., domain of definition, what range of value is good vs bad for the value).
12. Statistical tests should be used when comparing numbers in the text (e.g., when comparing their method scores to random selection)
13. Table 1 has a problem with the rendering in the first row (next to QSAR)
14. It’s puzzling that the QED score is improving in one case as the number of building blocks increase, while decreasing in some cases in another. The authors mentioned that “it is likely that the number of added building blocks through reactions that are “too complex” are lower in this experiment”, but some more quantitative measurements to support this claim would be good to add.
15. The “acceptance ratio” is not clearly defined as such when first introduced, but refer that way in the conclusion. It would be good to use the same naming convention all along the manuscript.
16. Additionally, the acceptance ratio is mentioned as one parameter influencing the introduced framework, but all experiments are based on one value only if I’m not mistaken. Either the authors should add experiments across multiple values to clearly show what is the effect of this ratio on the result, or should mention in the conclusion that it was not explored here, and might be an additional parameter to explore in future work (the current phrasing in the MS is a bit confusing).
17. Why was the comparison with random selection not done in Table 2? If the authors have a good reason, no need to add it. Else, it would be good to add it.
18. It would be good to have the QED values and the QSAR scores of the entire sampled library, or to repeat the random selection at least 3 times to have a mean and a standard deviation.
19. It would be good to add a random selection with a simple filter on the QED score and QSAR predictions to compare diversity. E.g., (1) filter for molecules with same median QED and same median QSAR predictions as their introduced approach and (2) do a random selection within that set, to (3) finally compare the diversity values.

Reviewer 3

Johansson et al. present a de novo method for the design of combinatorial libraries. This a relevant topic and the authors make excellent work introducing the relevance of their work. Moreover the github repository seems in excellent shape and it's easy to run the code. It would be nice to have a minimal workflow example to run the

Minor comments:
Figure 1 is hard to read in the current pdf version.
It would be nice to have an additional README file to guide through the procedure of reproducing the example in the publication.


 

Note: This response to the reviewers was originally written as a Word file, with images that illustrate computational results that could not be provided in the Public Peer Review, this file was submitted together with the manuscript in the revision submission.

REVIEWER REPORT(S):
Referee: 1

Comments to the Author
Authors present a multiple optimization approach (questionable the use of “artificial intelligence” as stated in the introduction” To design combinatorial libraries. I t is not clear the novelty of this work vs. the large number of publications regarding combinatorial library design. In this reviewers opinion, the methodology and data, as presented, is not entirely novel and it not reproducible. The manuscript can be improved and paper could be considered for publication in Digital Discovery after addressing the following major points:
Major
1. Authors must address the previous efforts done by the scientific community for combinatorial library design. There is plenty of literature in the 1990´s that are not considered in this paper.

2. In the introduction, discuss the successful applications of combinatorial libraries in drug discovery as well as their shortcomings.

We agree that there could be a more extensive recognition of the experimental efforts done by the community, and have expanded on the introduction especially with regards to the early stages of combinatorial library design.

3. Application example: Authors should justify and explain more about the scaffold used as an application example (Figure 2).
This has been added as “The scaffold displayed in Figure 2 is adapted from the original LibINVENT publication1. The scaffold was chosen for its’ suitability as a scaffold towards the Dopamine Receptor D2 (DRD2) target. Furthermore, it has two attachment points, which allows us to study the combinatorial design. Finally, the attachment points allows for two of the more common reactions in library chemistry”

4. In the Introduction, comment on the availability of combinatorial libraries on-demand and combinatorial virtual libraries in the public domain.
We have added more information regarding on-demand synthesis, related to combinatorial libraries on-demand. We have also expanded on the number of virtual libraries referenced in the introduction. To our knowledge, most of the available public libraries are enumerations, of which combinatorial subsets can be created, but we did not find any public virtual libraries that were explicitly strictly full combinatorial.

5. Page 2. In the statement “explore the differences in optimized libraries between using available building blocks and commercially available building blocks.” Specify if you mean building blocks available at AstraZeneca? Or available exactly where?
The available building blocks are through a snapshot of the available stock in eMolecules, we have clarified this on page 2.

6. Methods (page 2). It is stated: “blocks, followed by use of retrosynthesis prediction models to estimate if the building blocks are available in a defined stock data set”. This must be corrected. If the building block are available in a stock data set, they can be easily identified by a search analysis. There is no need for a retrosynthesis prediction.
This is indeed true, and AiZynthFinder can indeed return them first by search, and only the remainder is run through the retrosynthesis prediction. This error in the manuscript has been corrected.
“retrosynthesis prediction models to query if the building blocks are available in a defined stock data set, or estimate if they could be produced from this stock through synthesis”

7. Methods (page 3), target activity model. The QSAR model is described very succinctly. It is not possible to reproduce the data with the description provided. Why use a random forest? What are the 50 estimators and the interpretation of the model? Evermore, what is the (biological) target? Exactly what data from ChEMBL was used and why?
The section for the QSAR model has been expanded upon to clarify the procedure behind the training, to improve the reproducibility.
The QSAR model is a random forest model104 built using Scikit-learn 0.21.3105 with 50 trees, other settings were left as default settings. The choice of number of trees was lowered to favour computational speed without observing a drop in classification accuracy on the test set. The QSAR model was trained rather than using the model from the original LibINVENT experiments to experiment with different thresholds for bioactivity, which changed the labels of some training data points.The training data used is all Dopamine receptor D2 (DRD2) data available in ExcapeDB106, a chemogenomics database comprised of active data from assays from Chembl107 , PubChem108 and inactive compounds from PubChem screening data. The activity data for the active compounds are listed with their pXC50 data irrespective of the conducted measurement [IC50, XC50, EC50, AC50, Ki, Kd,Potency]109, and we used a threshold for active/inactive pXC50 of 6. Entries with SMILES110 strings that could not be parsed by RDKit 111 were removed. With these definitions for activity, the data set had 6,304 active compounds and 344,905 inactive compounds. The compounds were represented by the extended connectivity fingerprint with 2,048 bits and radius 3 (ECFP6), computed using the RDKit morgan . The model was trained using a random 80%/20% training/test data split, with 4,974 active compounds in the training set (out of 280,967, 1.77% actives), and 1330 active compounds in the test set (out of 70,242, 1.89%). The model, as well as the script to generate the model, is provided in the repository. The data is imbalanced with most of training points labelled as inactive compounds, resulting in AUC-ROC score of 0.995 by having a bias towards predicting most input as negative. The model is trained for binary classification to predict the product label, and the metric used to evaluate a compound is the classification probability of having the “active” label. This model was used both as part of the LibINVENT reinforcement learning run and during Library selection.



Others
8. Previous works regarding de novo design and multiple-objective parametrization should be acknowledged and discussed.
We have added in the introduction the most common multiple objective optimization approaches in the discovery chemistry, but consider a larger coverage of de novo design with methods not related to library design to be out of scope. We have referred the reader to recent reviews covering the topic.

9. Authors cite only high-impact publications for AI applications (ref, 1,2), but there are outdated. Focus on publications in the last 1-2 years.
We have expanded the list of citations for each application area with recent publications, it is indeed a fast developing area and necessary to keep up to date.

10. Page 2. The statement “Structural diversity (ECFP6)57 “ is not correct. It is missing the similarity coefficient and state explicitly the meaning of the abbreviation “ECFP6”. Furthermore, add a justification for why such a fingerprint was used.
We thank the reviewer for finding that this is not the correct definition. We have corrected this to “Structural diversity (measured by the similarity in the compounds’ Extended Connectivity Fingerprint (ECFP) representation)”


Referee: 2

Comments to the Author
The manuscript by Simon Viet Johansson et al., reports a new method to design libraries from de novo generated building blocks. They introduce the use of k -Determinantal Point Processes and Gibbs sampling as a way to select building blocks. Compared to previous methods they cite, they notably add a filter to check if the building blocks are available on the eMolecules building block platform, or if they seem synthetically accessible using a template-based retrosynthesis prediction tool. I have a few major and minor questions/comments for the authors:

Major
1. While I understand that the QSAR model is not the main point of the study, this model is used (1) for the LibINVENT reinforcement learning run, the (2) library selection, and (3) as a criterion to judge the quality of the the author’s proposed framework. Thus, given the importance of the QSAR model in this study, I have two major concerns.
First, any bias resulting from the data split to train the model will be reflected in the analysis downstream. However, the data split was just done with a train and a validation set, and without a test set. That means the model overfit the validation set during parameters optimization while training.
Additionally, the method to split the data is not mentioned (random, scaffold, time?). The authors should clearly mention what was done, and what are the limitations of the chosen approach (e.g., if random split, list the limitations of such a split on the QSAR predictions). This is important, notably, when discussing the benefit or not of adding new building blocks given the QSAR scores, as the applicability domain of the QSAR model/training method/data split will influence the conclusions made there.
We agree that it is a valid concern that the QSAR model applicability domain influences the overall evaluation of the compounds and that more objectives with guarantees for the applicability domain is better during the optimization and the selection stage, but we kept the metric as we knew i) It behaved antagonistically against the diversity objective, and ii) It could be computed relatively fast during runtime.
The random forest model is inherently trained using a bootstrap of the training data assigned to each tree, and this counteracts the overfitting problem during the training of the QSAR model. Looking at the out of bag (OOB) score, we find that it matches the test score rather well across 5 re-trainings of the model with different splits:
average oob: 0.9942
average test: 0.9947
Variance oob: 1.422e-08
Variance test: 3.562e-08
Still, the reinforcement learning of LibINVENT will indeed with enough epochs find something in the QSAR function to exploit, as would it with even if we trained the QSAR with a validation set. We can observe this by looking at the average QED value of the products produced during generation (before slicing and recombination of the building blocks) .

2. The authors only compared their method to random selection. In general, for new computational methods, I think either experimental validation (probably not needed here), or comparison to another computational method (preference here) should be done to validate their approach. For example, comparing to the methods the authors cite in their introduction (refs 42, 43, and 45), and/or to a baseline like the RDKit MaxMin Picker combined with a simple filter for high QED score/high QSAR score.
The cited previous methods could not be reproduced for this study in an objective manner, for Meinl. et al 2, the full distance (or, as we use, the similarity matrix) over the full dataset at once was required, which becomes prohibitively slow for large selection spaces. The GA in Gillet et al3 does not list parameters used, and we do not know if there are more changes other than the fitness function from its’ predecessor4. Agrafiotis5 lists a framework with an algorithm for a schema but without a specific fitness function or objectives and as such would have to be subjectively adapted by us. We thus opted to implement a version of the RDKit MaxMin as suggested. We filter QED and QSAR to the same ranges to address the minor point 19. However, note that since the RDKIT MaxMin picker operates on the products, rather than the building blocks it is difficult to generate a combinatorial selection. As such we have separated this to a separate table for comparing the diversity, which also gives a frame of reference to the minor point 11.

3. How is the diversity computed? What is its domain of definition? These elements should be clarified and emphasized in the text. Additionally, some examples to relate set of compounds of high and low diversity (e.g., a figure in the supplementary material giving the diversity score of five similar compounds [with a couple of atoms differences], and the same for five randomly sampled molecules) would be helpful for the audience I believe.
The definition of diversity is the determinant of the similarity matrix. This will range between 0 and 1, with 0 having two (or more) molecules that are identical, and 1 having molecules that share no bits in the fingerprint. The latter will not be the case, as all products share a scaffold. As we refer to in our response to your minor point 11, what values constitute as “high” is dependent on the selection size, as it is related to the probability of a DPP of non-fixed k to sample that particular set. An reference to a similar kernel with results shown can be found in the supplementary of Bhaskara et al.6 An example of using a greedy algorithm to sample for once for similar compounds and once for diverse compounds among the building blocks is shown in supplementary figures S.2 and S.3, respectively.


Minor
1. Something that is a little bit confusing to me is that, from the abstract, the paper seems to be only about de novo building blocks (“We propose a framework for designing combinatorial libraries from de novo generated building blocks”). However, checking for the existence of the generated building block is part of their method, thus the framework will consider existing building blocks, i.e., that are not novel. This should be clarified in the abstract and the introduction, as all the results in the tables and in the discussion (if I’m not mistaken) consider also the use of the building blocks found on eMolecules.
We have clarified that the aim of the framework uses generated building blocks, that can be both novel, or pre-existing. We consider previously existing building blocks to not be a bad output from the model as this typically means that the building blocks would be easier to acquire.

2. The first part of the introduction is notably about the size limit of compound libraries that can be created, compared to virtual ones. It would probably be good to also mention DNA encoded libraries in that context.
This is a fair suggestion and has been included in the introduction.

3. What do the authors mean by “using a pessimistic bias” in the “Target activity model” section? This should be clarified.
This has been clarified to “bias towards predicting most input as negative”

4. The “QSAR values” seems to go up to 1.0, but in the “Target activity model” section, it is mentioned that compounds are selected based on pXC50. The relation between the two should be clarified.
5. It’s unclear if the QSAR model was trained as a regressive or as a classification model. This should be clearly stated in the “Target activity model” section.
We have expanded the section about the QSAR model to clarify these points, which are important for the model reproducibility.


6. What was the motivation to use a 0.8 threshold on the QSAR model? And what does exactly the threshold refer to? This question is overall linked to point 5.
We have expanded the QSAR section to clarify that the 0.8 threshold is the classification probability of a classification model in the QSAR section. The motivation behind it is simply to reduce the number of building blocks to optimize over to a more manageable size. We also want to avoid that we have to select too “poor” building blocks when optimizing for diversity, by removing the building blocks that have not prior been observed in a “good” product. In a matrix of combinatorial products, we can see by the distribution (shown in supplementary figure S.4. referred to when addressing your point 18), that most of our products during recombination have quite high score.

7. The authors should list what are all the measurements part of the pXC50 (e.g., IC, etc, ?)
This has been included with reference to how ChEMBL describes which measurements are included in their standardized value7.

8. The authors say “The parameters chosen both for generative modelling and retrosynthesis let both models run for a longer time, 1000 epochs compared to 100 during generation and 5 minutes instead of 2 for retrosynthesis evaluation, than previous uses of the same architectures. This yields more output building blocks and solves more routes than previous use in demonstrated studies”. It would be nice to have a plot showing the number of output building blocks in function of the number of epochs.
We agree that it is a good inclusion, we have included this plot as Supplementary Figure 1.


9. Link to point 8., why did the authors choose to risk over-exploiting the QSAR model, vs using a higher temperature to sample more, and more diverse building blocks for example?
This is a valid concern, we based this choice on previous experience with using LibINVENT as a generative model in particular, as the model tended to have difficulties performing the reinforcement learning to the correct space using higher temperature. We will here demonstrate this by plotting some results on training two models for 100 epochs, one with the reference temperature (T=1) and one with temperature changed to T=2. For T=2, we observe that LibINVENT has difficulties learning both to optimize for the QSAR scores, but also that it no longer remembers what it had learnt about valid molecules. The building blocks that are valid also have a generally match with the targeted reaction.




10. Figure 1 is a good idea and important, but very hard to read (maybe because of the journal formatting?). It would be very helpful to improve it for the readers, to at least be readable.
We are glad that you like the inclusion of Figure 1 and have made the font larger to improve readability.

11. In the Tables, there is the column LogDet, but it’s actually never mentioned in the main text. Either the authors should discuss that value in the text, or remove it from the table. If they keep it, a clearer definition of its possible values could be helpful for the readers (e.g., domain of definition, what range of value is good vs bad for the value).
We have included the definition in the section for determinantal point processes and that it is the measure of diversity used. This value will be in the range [0,1], but which range that is good vs bad is hard to determine, as the values depend on the size of the selection. For a fixed size k, a larger determinant (or LogDet closer to 0), is always representing higher diversity, but e.g., the examples provided in Supplementary Figures S.2. and S.3. can not be compared with the example size of 96.

12. Statistical tests should be used when comparing numbers in the text (e.g., when comparing their method scores to random selection)
We have opted to use the Kullback-Leibler divergence 8 (KL) to measure the distance between the random selection and the simultaneous optimization, which is the most similar (having close values in two out of three values), to measure how different the two distributions are.

13. Table 1 has a problem with the rendering in the first row (next to QSAR)
This has been fixed, we thank the reviewer for pointing this out.

14. It’s puzzling that the QED score is improving in one case as the number of building blocks increase, while decreasing in some cases in another. The authors mentioned that “it is likely that the number of added building blocks through reactions that are “too complex” are lower in this experiment”, but some more quantitative measurements to support this claim would be good to add.
This is an interesting question, we posed the hypothesis based on the average number of heavy atoms per building block on both scenarios and how these increased with number of reaction steps. We have added a table in the Results section showing the average number of atoms for the different reaction steps. To avoid confusion we have replaced “too complex” by “too large”.

15. The “acceptance ratio” is not clearly defined as such when first introduced, but refer that way in the conclusion. It would be good to use the same naming convention all along the manuscript.
16. Additionally, the acceptance ratio is mentioned as one parameter influencing the introduced framework, but all experiments are based on one value only if I’m not mistaken. Either the authors should add experiments across multiple values to clearly show what is the effect of this ratio on the result, or should mention in the conclusion that it was not explored here, and might be an additional parameter to explore in future work (the current phrasing in the MS is a bit confusing).
The reviewer is indeed correct that only one value was used in the experiments conducted in this manuscript. The number of experiments to run if we explored this parameter, while still varying the number of available building blocks became infeasible, and running the experiment on a baseline level of availability would not allow us to draw any generalizable conclusions. As such we have opted with the latter option here and clarified that optimizing the tuning of \alpha to push the optimization slightly further is out of scope for this current work.

17. Why was the comparison with random selection not done in Table 2? If the authors have a good reason, no need to add it. Else, it would be good to add it.
This has now been added.

18. It would be good to have the QED values and the QSAR scores of the entire sampled library, or to repeat the random selection at least 3 times to have a mean and a standard deviation.
We agree that this is useful to get an intuition of the quality of the optimization. We have provided both distributions in supplementary figure S4, for products up to 4 reaction steps away from the eMolecule stock, as that is what we opted to optimize over.

19. It would be good to add a random selection with a simple filter on the QED score and QSAR predictions to compare diversity. E.g., (1) filter for molecules with same median QED and same median QSAR predictions as their introduced approach and (2) do a random selection within that set, to (3) finally compare the diversity values.

This has been implemented using both a filter for having the same range as our preferred selection strategy (SO) in terms of QSAR and QED values, as well as a diversity calculation filtering for molecules having a minimum QSAR/QED equal to the SO average. The former experiment includes molecules that would have had to be chosen according to the combinatorial constraint we followed, compared to a cherry-picked solution. The second more closely explores the area around our optimization.
The random selection here (as well as the minmax picker in your comment 2.) has to be cherry-picking rather than selecting a full combinatorial solution, to generate a valid solution with a QED/QSAR filter, as a combinatorial solution has a mix of good and bad selections, and valid combinations are removed by introducing a filter.



Referee: 3

Comments to the Author
Johansson et al. present a de novo method for the design of combinatorial libraries. This a relevant topic and the authors make excellent work introducing the relevance of their work. Moreover the github repository seems in excellent shape and it's easy to run the code. It would be nice to have a minimal workflow example to run the

We thank the reviewer for the kind words. A minimal workflow example, to run all code from end to end, has been provided as two shell scripts, one installer and one runfile.

Minor comments:
Figure 1 is hard to read in the current pdf version.
We have updated the figure to increase the readability.
It would be nice to have an additional README file to guide through the procedure of reproducing the example in the publication.
We have updated the repository with additional README as requested, in the subdirectory of the publication example.



1. V. Fialková, J. Zhao, K. Papadopoulos, O. Engkvist, E. J. Bjerrum, T. Kogej and A. Patronov, Journal of Chemical Information and Modeling, 2022, 62, 2046-2063.
2. T. Meinl, C. Ostermann and M. R. Berthold, Journal of Chemical Information and Modeling, 2011, 51, 237-247.
3. V. J. Gillet, W. Khatib, P. Willett, P. J. Fleming and D. V. S. Green, Journal of Chemical Information and Computer Sciences, 2002, 42, 375-385.
4. V. J. Gillet, P. Willett, J. Bradshaw and D. V. S. Green, Journal of Chemical Information and Computer Sciences, 1999, 39, 169-177.
5. D. K. Agrafiotis, Molecular Diversity, 2000, 5, 209-230.
6. A. Bhaskara, A. Karbasi, S. Lattanzi and M. Zadimoghaddam, 2020.
7. A. P. Bento, A. Gaulton A Fau - Hersey, L. J. Hersey A Fau - Bellis, J. Bellis Lj Fau - Chambers, M. Chambers J Fau - Davies, F. A. Davies M Fau - Krüger, Y. Krüger Fa Fau - Light, L. Light Y Fau - Mak, S. Mak L Fau - McGlinchey, M. McGlinchey S Fau - Nowotka, G. Nowotka M Fau - Papadatos, R. Papadatos G Fau - Santos, J. P. Santos R Fau - Overington and J. P. Overington.
8. S. Kullback and R. A. Leibler, The Annals of Mathematical Statistics, 1951, 22, 79-86.




Round 2

Revised manuscript submitted on 01 Nov 2023
 

20-Nov-2023

Dear Dr Johansson:

Manuscript ID: DD-ART-05-2023-000095.R1
TITLE: de novo generated combinatorial library design

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry


 
Reviewer 3

I would like to thank the authors for addressing my comments.

Reviewer 1

The authors addressed the reviewer's comments and improved the quality of the manuscript. The revised manuscript can be published in its current form.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license