From the journal Digital Discovery Peer review history

Deep representation learning determines drug mechanism of action from cell painting images

Round 1

Manuscript submitted on 05 Apr 2023
 

26-May-2023

Dear Dr Wong:

Manuscript ID: DD-ART-04-2023-000060
TITLE: Deep Representation Learning Determines Drug Mechanism of Action from Cell Painting Images

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript will be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

The authors present a deep-learning approach for generating numerical representations from cell painting images with the goal to predict the mechanism-of-action of compounds, which is a highly relevant task in drug discovery. For their study they used two datasets that were made publicly available in recent years. Overall, the results indicate that their approach outperforms established approaches and the methods used in the study are clearly documented.
<br>
<br>
However, I feel that the article focuses too much on the scenario where the models are trained and tested on the same compounds (held-out well setup). This scenario is not representative of the real-world MoA prediction problem. More emphasis should be placed on the held-out compound scenario and corresponding evaluation of the proposed approach w.r.t. to identifying the MoA of compounds not included in the model training by restructuring the story flow of the manuscript and adapting statements made on the number of evaluated MOAs. The authors show in Figure 4 that their approach is on-par with or outperforms CP and DP consistently across two datasets in the held-out compound setup. In my opinion this is the central and interesting result from this study (plus the practical advantages of MP over CP and DP as discussed by the authors), albeit with lower performance gains than in the held-out well setup. I also recommend to make more use of the supplementary material for the held-out well results.
<br>
<br>
Additionally, I believe the manuscript would be more informative and convincing for interested readers by addressing the following points:
<br>
(>>> = cited text from the manuscript, *** = my comments)
<br>
<br>
1.
*** Please include a short rationale for choosing EfficientNet-B0 over other architectures.
<br>
<br>
2.
*** Please include a short statement on the plate layouts of the two datasets; specifically were all replicates of a compound assayed in the same well position?
<br>
<br>
3.
*** Which MoAs can MP predict reliably that CP and DP cannot?
<br>
<br>
4.
*** Please include a table with all MoAs evaluated and how well they are predicted in the held-out compound setup.
<br>
<br>
5.
*** Is there a relationship between the predictability of an MOA and how many compounds are in the training set for that MOA?
<br>
<br>
6.
*** How do the corresponding t-SNE maps for CP and DP features look for the highlighted example MOAs in figures 3g and 5g?
<br>
<br>
7.
>>> To compare CP and MP embeddings independent of downstream embedding processing transformations, we used the raw provided CP embeddings and performed a simple standardization of features for both JUMP1 and LINCS (zero mean and unit variance to prevent features with wider ranges from disproportionately affecting similarity metrics)
*** A common and effective baseline method to normalize segmentation-based features (such as from CellProfiler) is by z-scoring each feature with respect to the DMSO controls on the same plate as done in many other publications. Please provide evidence that simple standardization (i.e., z-scoring across all sample and control wells in the entire dataset) does not lead to inferior results for CP features.
<br>
<br>
8.
*** Why was no z-score normalization against DMSO controls done for DP and MP features?
<br>
<br>
9.
>>> Since some compounds have the same MOA, we wanted to know whether the model was learning MOA-specific phenotypes consistent through different compounds with the same MOA, as opposed to simply learning each compound’s phenotypic effect. Hence, we calculated the average PCC of three groups of well pairs with: 1) the same compound (and hence same MOA) 2) different compounds with the same MOA and 3) different compounds with different MOAs. We found that of the embedding pairs with different compounds, the pairs with the same MOA class were more similar than those with different MOA classes (average PCC = 0.23 vs 0.02, Figure 3F). The three populations’ PCC averages were all significantly different (p<<0.0001 for all two-sided z-tests). This suggests that indeed the phenotypic embeddings that the model encoded were MOA-specific rather than compound-specific.
*** Please show such an analysis for held-out compounds. In the held-out well setup shown here, the model might associate very different compound-specific phenotypes of same-MoA compounds into similar embeddings and thus overfit w.r.t to how MoA-specific the learned embeddings are.
<br>
<br>
10.
>>> With the trained classification model, we measured how well the model created embeddings that were meaningful for MOA classification. This differed from simply using the model as a classifier, which would have limited use outside of the fixed MOA set on which it was trained. Hence, we extracted image embeddings from an intermediate layer in the network, aggregated them by well by taking the median, and assessed how valuable these well-level embeddings were for the task of MOA classification compared to CP and DP (Methods).
*** Please show that MOAs not included in the model training can be predicted based on such embeddings to support the statement made here.
<br>
<br>
11.
>>> Furthermore, as with most deep representation learning approaches, interpretability is lacking and falls behind traditional computer vision approaches like CP, which is highly interpretable.
*** I would recommend a less strong statement regarding interpretability of CP features. Many of the CP features typically used for morphological profiling are also difficult to interpret as to their biological meaning. Perhaps simply "more interpretable" instead of "highly interpretable". Or alternatively, a reference to a study in which the interpretability of CP features was critical for MoA insights.

Reviewer 2

There are major issues in this work that unfortunately undermine its conclusions to the degree that I cannot recommend its publication at this time. Getting equivalent or better performance on the task of MOA retrieval without the hassle of segmentation would be a fantastic advance, so thus I think this is important work, and I would be delighted to read an improved version of it, but serious methodological issues exist in the paper as written.
- To start from the biggest issue - other than the analyses presented in Figure 4, what was presumably intended to be presented in Sup Figure 3 (more below), and Sup Figure 5, this is not an MOA profiler. In both datasets the authors use, compound replicates are not scrambled on the plates - they are always in the same well position. Thus, by the authors assignment scheme, in all but these 3 figures the same compound is not prevented from being present in >1 of (or indeed, all 3 of) the training, test, and validation sets - in fact, they say they ensure all MOAs are in all 3, and since most MOAs have only one compound, most compounds are in all 3. So this is really a replicate profiler, not an MOA profiler.
- Supplemental figure 3, other than the right panel of part A, does not just "correspond" to Figure 3, it IS figure 3. As far as I can tell, no other values have changed, all other curves have the same "bumps" in the same places, all of which seems deeply unlikely when dropping from 266 compounds to 59. This makes it impossible to judge, for the JUMP dataset, how successful if at all the actual "MOA" profiling is in this set. Since Figures 5/S5 do not seem to have the same issue (other than perhaps panel F, which seems to be fully identical), I assume it was not done maliciously, but still, it makes this point impossible to assess for one of the 2 data sets.
---Answers for "no" questions in the data reviewer checklist
- 1b - no accession dates are provided, and the LINCS image pointers used in the repository are not the correct current pointers. The CellProfiler pipeline link in the methods section for LINCS is wrong. The model weights from DeepProfiler are listed as "automatically downloaded", but without knowing what date, it will be impossible to trace back to a version. The authors say that no spherized profiles were available in JUMP1, but the repository contains profile versions that say "spherized profiles" (in fact, that is the main description visible on the profiles folder when clicking on the provided link) with an addition date of December 2021.
- 1c - Several possible biases (well position, timepoint of treatment, cell type) are disclosed, but only well position is examined for its effect on performance.
- 2a - This is a borderline no, but while it looks like possibly with the pickles one can probably recreate the authors subsetting of the data, for the LINCS dataset, some quick math says that the authors if only using ~10K wells, they are using less than 20% of the LINCS dataset, but they say to download the whole thing. In browsing through, it seems like the average file is about 0.5 KB, so at ~20K images/plate, this is about 10GB per plate, or over a terabyte that then the user must throw out >80% of. It would be strongly recommended to have a "cleaned data download" script, at least for this data set.
- 5b & c - I've discussed above my issues with the splits, which (per 5b) are indeed described but as described do not, in most figures, do what is claimed (match across MOAs), since except for in the leave-compound-out experiments there will be significant leakage. Since the authors are using other's data, not their own, it is certainly not their fault that the source data was set up to always have the same compound in the same position and that most MOAs have only one compound, but the work should make much more specific about when retrieval/classification is actually based on MOAs rather than just different replicates of the same exact compound.

As I said, I am excited about this work, and look very much forward to an improved and clarified version!

Reviewer 3

This is well structured paper on an important topic. However there are a few minor and larger issues that I think should be adressed/motivated/fixed.

Too small images and also the text in the figures is too small.

Supplemental Figure 3(D-F) is identical to Figure 3.

It is not clear how the illustration of well splits into training, validation and test in figure 1C can produce sets of 60,10, 30% respectively.

I lack a discussion of how this should be used. Are the results (~10%) good enough for any real use?

Motivations and explanations of what the selection of data subsets actually mean is lacking. The cell-lines are the same in the 2 datasets but only one cell type is chosen from LINCS. Also different concentrations are used in the 2 datasets with a motivation for the higher one. What does that infer for the lower conc in the JUMP1 dataset- how many compounds actually display the MoA phenotypes? Similar regarding the times 24 and (is it AND= both?) 48 hours, and 48h for LINCS

Although it is nice to train a system for so many MoAs, it is not convincing that the data you have is enough for this (MoA rather than compound). For most MoAs you cannot separate between compound and MoA since there is only one compound per MoA. Good that you run additional experiments where you split on a compound level, but even so there are rather few compounds per MoA. Also, with the splitting you do, you train and test with data from the same plates I believe. I suggest you use (and add a plot) of the DMSO wells to verify you don’t have any plate effects.

I find the comparison to DP (and CP?) somewhat skewed. DP is trained on compounds and MP on MoA which is a plausible source of the difference in performance. I lack a more detailed error analysis/investigation of where the main sources of differences and errors stem from (confusion matrix perhaps?).


 

************

REVIEWER REPORT(S):

Referee: 1

Comments to the Author

The authors present a deep-learning approach for generating numerical representations from cell painting images with the goal to predict the mechanism-of-action of compounds, which is a highly relevant task in drug discovery. For their study they used two datasets that were made publicly available in recent years. Overall, the results indicate that their approach outperforms established approaches and the methods used in the study are clearly documented.

However, I feel that the article focuses too much on the scenario where the models are trained and tested on the same compounds (held-out well setup). This scenario is not representative of the real-world MoA prediction problem. More emphasis should be placed on the held-out compound scenario and corresponding evaluation of the proposed approach w.r.t. to identifying the MoA of compounds not included in the model training by restructuring the story flow of the manuscript and adapting statements made on the number of evaluated MOAs. The authors show in Figure 4 that their approach is on-par with or outperforms CP and DP consistently across two datasets in the held-out compound setup. In my opinion this is the central and interesting result from this study (plus the practical advantages of MP over CP and DP as discussed by the authors), albeit with lower performance gains than in the held-out well setup. I also recommend to make more use of the supplementary material for the held-out well results.

We thank the reviewer for acknowledging the importance of the study. We agree that held-out compound is the more pressing and real-world application (Discussion adapted to emphasize this and Figure 4). However, the held-out well analyses were informative for answering questions of compound similarity with shared MOAs and the method’s propensity to detect these. For this study, we could not answer these same questions with solely a compound-holdout scheme due to the limited dataset size (as Reviewer 3 points out – we had relatively few compounds per MOA). Hence, we could not afford to hold out multiple compounds for each MOA (just one each). We have added this as a limitation in the Discussion so the reader is aware. We thought that presenting both well-holdout and compound-holdout results would present a fuller picture. We have also adapted statements made on the number of evaluated MOAs.

Additionally, I believe the manuscript would be more informative and convincing for interested readers by addressing the following points:

(>>> = cited text from the manuscript, *** = my comments)

1.

*** Please include a short rationale for choosing EfficientNet-B0 over other architectures.

We have included justification in the Results section: “We chose EfficientNet because of its high ratio of performance on the ImageNet dataset to its number of parameters.”

2.

*** Please include a short statement on the plate layouts of the two datasets; specifically were all replicates of a compound assayed in the same well position?

We have added the following statement to the Methods: “All compounds had replicates plated at the same well locations.”

3.

*** Which MoAs can MP predict reliably that CP and DP cannot?

See the newly included Supplemental Tables 1 and 2.

4.

*** Please include a table with all MoAs evaluated and how well they are predicted in the held-out compound setup.

See newly included Supplemental Tables 1 and 2.

5.

*** Is there a relationship between the predictability of an MOA and how many compounds are in the training set for that MOA?

See newly included Supplemental Figure 5A.

6.

*** How do the corresponding t-SNE maps for CP and DP features look for the highlighted example MOAs in figures 3g and 5g?

See newly included Supplemental Figure 3.

7.

>>> To compare CP and MP embeddings independent of downstream embedding processing transformations, we used the raw provided CP embeddings and performed a simple standardization of features for both JUMP1 and LINCS (zero mean and unit variance to prevent features with wider ranges from disproportionately affecting similarity metrics)

*** A common and effective baseline method to normalize segmentation-based features (such as from CellProfiler) is by z-scoring each feature with respect to the DMSO controls on the same plate as done in many other publications. Please provide evidence that simple standardization (i.e., z-scoring across all sample and control wells in the entire dataset) does not lead to inferior results for CP features.

We have switched from a simple standardization to the recommended z-scoring with respect to the DMSO controls on the same plate for all CP, DP, and MP analyses.

8.

*** Why was no z-score normalization against DMSO controls done for DP and MP features?

We thank the reviewer for suggesting a more commonly used normalization. As per last comment, all analyses now utilize DMSO-standardization.

9.

>>> Since some compounds have the same MOA, we wanted to know whether the model was learning MOA-specific phenotypes consistent through different compounds with the same MOA, as opposed to simply learning each compound’s phenotypic effect. Hence, we calculated the average PCC of three groups of well pairs with: 1) the same compound (and hence same MOA) 2) different compounds with the same MOA and 3) different compounds with different MOAs. We found that of the embedding pairs with different compounds, the pairs with the same MOA class were more similar than those with different MOA classes (average PCC = 0.23 vs 0.02, Figure 3F). The three populations’ PCC averages were all significantly different (p<<0.0001 for all two-sided z-tests). This suggests that indeed the phenotypic embeddings that the model encoded were MOA-specific rather than compound-specific.

*** Please show such an analysis for held-out compounds. In the held-out well setup shown here, the model might associate very different compound-specific phenotypes of same-MoA compounds into similar embeddings and thus overfit w.r.t to how MoA-specific the learned embeddings are.

See newly included Supplemental Figure 5B. Since we held out one compound for each MOA, we do not have same-MOA different-compound pairs in the held-out compound setup (middle column in 3F and 5F).

10.

>>> With the trained classification model, we measured how well the model created embeddings that were meaningful for MOA classification. This differed from simply using the model as a classifier, which would have limited use outside of the fixed MOA set on which it was trained. Hence, we extracted image embeddings from an intermediate layer in the network, aggregated them by well by taking the median, and assessed how valuable these well-level embeddings were for the task of MOA classification compared to CP and DP (Methods).

*** Please show that MOAs not included in the model training can be predicted based on such embeddings to support the statement made here.

With the quoted statement, we were advocating for the preferability of representations instead of simply discrete classification. To avoid confusion and the possible implications to the reader that we can discover even new MOAs, we have modified this statement and included a citation going over the fundamentals of representation learning. We have also added thoughts on the use of classification vs learned representations in the Discussion. Unfortunately, we cannot predict MOAs not included in the model training, since we did not hold out MOAs. We would be very intrigued by a model capable of hypothesizing even new MOAs (MOA-holdout goes one more step beyond the compound-holdout we show). However, this is out of scope for our study.

11.

>>> Furthermore, as with most deep representation learning approaches, interpretability is lacking and falls behind traditional computer vision approaches like CP, which is highly interpretable.

*** I would recommend a less strong statement regarding interpretability of CP features. Many of the CP features typically used for morphological profiling are also difficult to interpret as to their biological meaning. Perhaps simply "more interpretable" instead of "highly interpretable". Or alternatively, a reference to a study in which the interpretability of CP features was critical for MoA insights.

We have lessened the strength of the statement as recommended.

Referee: 2

Comments to the Author

There are major issues in this work that unfortunately undermine its conclusions to the degree that I cannot recommend its publication at this time. Getting equivalent or better performance on the task of MOA retrieval without the hassle of segmentation would be a fantastic advance, so thus I think this is important work, and I would be delighted to read an improved version of it, but serious methodological issues exist in the paper as written.

We thank the reviewer for their interest in our work.

- To start from the biggest issue - other than the analyses presented in Figure 4, what was presumably intended to be presented in Sup Figure 3 (more below), and Sup Figure 5, this is not an MOA profiler. In both datasets the authors use, compound replicates are not scrambled on the plates - they are always in the same well position. Thus, by the authors assignment scheme, in all but these 3 figures the same compound is not prevented from being present in >1 of (or indeed, all 3 of) the training, test, and validation sets - in fact, they say they ensure all MOAs are in all 3, and since most MOAs have only one compound, most compounds are in all 3. So this is really a replicate profiler, not an MOA profiler.

The reviewer’s point is apt about the difficulty of assessing MOA profiling since most MOAs only have one compound. Indeed (previous) Figures 3 and 5 would be more indicative of a replicate profiler, and we thank the reviewer for pointing this out. We have eliminated these figures and replaced them with analyses that would speak more to MOA profiling (i.e. only looking at MOAs that have multiple different compounds). This way we could assess whether the model had simply learned to group compound replicates vs truly grouping different compounds with the same MOA.

- Supplemental figure 3, other than the right panel of part A, does not just "correspond" to Figure 3, it IS figure 3. As far as I can tell, no other values have changed, all other curves have the same "bumps" in the same places, all of which seems deeply unlikely when dropping from 266 compounds to 59. This makes it impossible to judge, for the JUMP dataset, how successful if at all the actual "MOA" profiling is in this set. Since Figures 5/S5 do not seem to have the same issue (other than perhaps panel F, which seems to be fully identical), I assume it was not done maliciously, but still, it makes this point impossible to assess for one of the 2 data sets.

We apologize for the clerical error and thank the reviewer for spotting it and for such fine attention to detail. We must have accidentally linked the wrong file in our Adobe Illustrator editor. We have corrected the figures.

---Answers for "no" questions in the data reviewer checklist

- 1b - no accession dates are provided, and the LINCS image pointers used in the repository are not the correct current pointers. The CellProfiler pipeline link in the methods section for LINCS is wrong. The model weights from DeepProfiler are listed as "automatically downloaded", but without knowing what date, it will be impossible to trace back to a version. The authors say that no spherized profiles were available in JUMP1, but the repository contains profile versions that say "spherized profiles" (in fact, that is the main description visible on the profiles folder when clicking on the provided link) with an addition date of December 2021.

We apologize for the outdated LINCS image pointers - it seems like the LINCS authors deleted the original repository we referenced in the README. We have updated the download instructions. We have also added accession dates to the Methods. For the CellProfiler pipeline, we linked the official README from the Broad Institute for profiling. Perhaps the reviewer is looking for the exact pipeline, which we have now added as a link to the Python script used for processing profiles. But if the reviewer knows of a better resource to link, we are happy to include this as well. We have also added an accession date for DeepProfiler weight downloads.

For the spherized profiles for JUMP1, we thank the reviewer for bringing its availability to our attention. We have downloaded the spherized versions and included a comparison in Supplemental Figure 4.

- 1c - Several possible biases (well position, timepoint of treatment, cell type) are disclosed, but only well position is examined for its effect on performance.

We have added additional analyses on potential biases of timepoint and cell type (Supplemental Figure 1B-C).

- 2a - This is a borderline no, but while it looks like possibly with the pickles one can probably recreate the authors subsetting of the data, for the LINCS dataset, some quick math says that the authors if only using ~10K wells, they are using less than 20% of the LINCS dataset, but they say to download the whole thing. In browsing through, it seems like the average file is about 0.5 KB, so at ~20K images/plate, this is about 10GB per plate, or over a terabyte that then the user must throw out >80% of. It would be strongly recommended to have a "cleaned data download" script, at least for this data set.

We apologize for the confusion – we only used a subset of the LINCS dataset as described in Methods: Dataset Preparation. We have clarified the wording so that the reader is aware that only a fraction of the dataset is used in practice. We have provided download instructions, the relevant CSVs for our subset, and a script to generate the cleaned data.

- 5b & c - I've discussed above my issues with the splits, which (per 5b) are indeed described but as described do not, in most figures, do what is claimed (match across MOAs), since except for in the leave-compound-out experiments there will be significant leakage. Since the authors are using other's data, not their own, it is certainly not their fault that the source data was set up to always have the same compound in the same position and that most MOAs have only one compound, but the work should make much more specific about when retrieval/classification is actually based on MOAs rather than just different replicates of the same exact compound.

Along with eliminating all analyses with MOAs that just have one compound present (see earlier comment about replacing previous Figures 3 and 5), we have also added clarifying statements in the Discussion to reduce the perceived impact of the MOA profiling when it comes to the well-holdout scheme. We also highlight Figure 4 as a test of MOA profiling that cannot be due to any replicate data leakage and must be due to matching MOAs.

As I said, I am excited about this work, and look very much forward to an improved and clarified version!

Referee: 3

Comments to the Author

This is well structured paper on an important topic. However there are a few minor and larger issues that I think should be adressed/motivated/fixed.

Too small images and also the text in the figures is too small.

We have increased the font size and image zoom for the figures that were on the smaller side.

Supplemental Figure 3(D-F) is identical to Figure 3.

See prior comment in response to Reviewer 2. We thank both reviewers for helping us find the clerical error and apologize for the confusion it caused.

It is not clear how the illustration of well splits into training, validation and test in figure 1C can produce sets of 60,10, 30% respectively.

Figure 1C is not the actual splitting scheme used and, as mentioned in the Figure Legends, is merely illustrative. We have clarified the exact splitting processing in Methods.

I lack a discussion of how this should be used. Are the results (~10%) good enough for any real use?

We have added to the Discussion our thoughts about practical usability, contextualizing the performance in light of 1) the difficulty of MOA determination and 2) comparison to the industry’s current gold standard.

Motivations and explanations of what the selection of data subsets actually mean is lacking. The cell-lines are the same in the 2 datasets but only one cell type is chosen from LINCS. Also different concentrations are used in the 2 datasets with a motivation for the higher one. What does that infer for the lower conc in the JUMP1 dataset- how many compounds actually display the MoA phenotypes? Similar regarding the times 24 and (is it AND= both?) 48 hours, and 48h for LINCS

We have added more motivation and explanation for the data selection process in the Methods section.

Although it is nice to train a system for so many MoAs, it is not convincing that the data you have is enough for this (MoA rather than compound). For most MoAs you cannot separate between compound and MoA since there is only one compound per MoA. Good that you run additional experiments where you split on a compound level, but even so there are rather few compounds per MoA. Also, with the splitting you do, you train and test with data from the same plates I believe. I suggest you use (and add a plot) of the DMSO wells to verify you don’t have any plate effects.

We have adjusted all analyses to exclude MOAs that only have one compound present to eliminate the problem of one-to-one correspondence that Referees 2 and 3 aptly pointed out. We showcase analyses with only multiple-compound MOAs as main text figures. We also devoted Supplemental Figure 7 to analyze potential plate effects. Although it would be great to have more data for training, we demonstrate that there was at least enough compound data to make correct predictions on some of the held-out compound data (Figure 4).

I find the comparison to DP (and CP?) somewhat skewed. DP is trained on compounds and MP on MoA which is a plausible source of the difference in performance. I lack a more detailed error analysis/investigation of where the main sources of differences and errors stem from (confusion matrix perhaps?).

Since DP is advertised as a weakly supervised learning scheme (i.e. directly train on compound, but evaluate on an auxiliary task like MOA classification, drug toxicity, etc.), we chose to keep the same scheme as the original authors. We have specified DP’s design choice in the Discussion and Methods to ensure the reader is aware of the weak-supervision.

************




Round 2

Revised manuscript submitted on 21 Jun 2023
 

25-Jul-2023

Dear Dr Wong:

Manuscript ID: DD-ART-04-2023-000060.R1
TITLE: Deep Representation Learning Determines Drug Mechanism of Action from Cell Painting Images

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 2

I appreciate the authors response to the criticisms and find the manuscript generally much improved. I have a couple of open questions I think the manuscript would be improved in addressing, since I think it's critical to understanding how performant the "real world" case actually is, since most of the quantification is done in the non-real-world case of having the compound in the training set. I think it would be much, much improved by making these changes but I would not say it is mandatory to do so.

- I still have a question about the test/val/train split in Figures 3 and 5 - it is definitely an improvement to only use multiple-compound-MOAs in the test set. But if I understand the data splits correctly, it is still possible that other wells of that exact compound are present in the training set, yes (this is one of the distinctions between 3 and 4, as I understand it)? If no, the description of the split needs to be refined; if yes, this needs to be made more explicit as a caveat in the paragraph starting "Since most MOAs were only represented by one compound" (and possibly a similar paragraph in the LINCS section, but if sufficiently clear in the first case I think the reader will understand).
- Is there a reason not to use the same metrics in Figure 4 as in 3e? The addition of the "individual well vote vs aggregated" comparison is nice, but it still seems like an apples-to-apples comparison would be easier to draw by also having a figure that is 3e/5e but extended to show "MOAProfiler- compound held out in training"

Reviewer 4

After reading through the previous referees comments and the rebuttal I would think that the paper is mostly fine as it stands (and in really scientifically better shape than the previous version, realistic w.r.t claims etc)

A few items:

- Intro should probably cite recent related work a bit more extensively, e.g.:
https://www.sciencedirect.com/science/article/pii/S2667318523000041
https://www.biorxiv.org/content/10.1101/580654v2
or the like; also the author's own papers, say
https://pubs.acs.org/doi/10.1021/acs.jcim.8b00670
Nassiri did conceptually related work as well, though from a different angle:
https://pubmed.ncbi.nlm.nih.gov/30011038/

- Intro: "One goal of phenotypic HCS is to determine the MOAs of compounds" - not sure I would agree with this; yes, you can do guilt-by-association (as is done here), but mostly you would determine MoA _afterwards_ (...which is of course precisely a strong point for the work presented here, you get the MoA into place much earlier!)

- 'Mode of Action' is quite multi-faceted; this could/probably should be pointed out in the intro/results/methods as appropriate, the authors could cite e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8827085/ or similar reviews which discusses this in detail. Mode of action is different from 'hitting a target', in vitro activity doesn't always translate to in vivo effects, how do you deal with multiple targets, dose, etc etc., biology is just much more complex than a binary labelling problem (this is currently quite simplified in the methods section, referring to 'single-MoA-compounds' etc.)

- The dataset is quite limited in size (as the previous referees pointed out), so results and discussion could include this aspect a bit more strongly (results always need to be seen in light of the data)

- Results and discussion are basically entirely devoid of biology/pharmacology, which is the main underlying topic though - so which target classes work best, what is actually covered in the data in the first place, etc.? In practice that would really matter - we don't just want a 'tubulin inhibitor detector', so something that detects only the obvious (and often not really relevant in discovery projects), but something that provides finer levels of information to really support drug discovery projects and decision making therein

So overall I would suggest to cite previous related work more broadly, discuss the assumptions in 'labelling MoAs' and those stemming from the dataset size more explicitly, and ideally add more biology/pharmacology/dataset characterisation, which would make the paper more relevant for real-world applications. It's generally a sound direction of the study though, no concerns about that.


 

Referee 2
Comments to the Author
I appreciate the authors response to the criticisms and find the manuscript generally much improved. I have a couple of open questions I think the manuscript would be improved in addressing, since I think it's critical to understanding how performant the "real world" case actually is, since most of the quantification is done in the non-real-world case of having the compound in the training set. I think it would be much, much improved by making these changes but I would not say it is mandatory to do so.

- I still have a question about the test/val/train split in Figures 3 and 5 - it is definitely an improvement to only use multiple-compound-MOAs in the test set. But if I understand the data splits correctly, it is still possible that other wells of that exact compound are present in the training set, yes (this is one of the distinctions between 3 and 4, as I understand it)? If no, the description of the split needs to be refined; if yes, this needs to be made more explicit as a caveat in the paragraph starting "Since most MOAs were only represented by one compound" (and possibly a similar paragraph in the LINCS section, but if sufficiently clear in the first case I think the reader will understand).
For Figures 3 and 5, it is indeed possible for wells of the same compound to be present in the training set. For Figure 4, this is not possible. We have included this clarification in the requested paragraph.

- Is there a reason not to use the same metrics in Figure 4 as in 3e? The addition of the "individual well vote vs aggregated" comparison is nice, but it still seems like an apples-to-apples comparison would be easier to draw by also having a figure that is 3e/5e but extended to show "MOAProfiler- compound held out in training"
The “aggregated vote” method in Figure 4 is identical to the method in Figure 3e (assigning a compound’s MOA by similarity of the CLE to the MLE). We have modified Figure 4 to include F1, precision, and recall.

Referee 4
Comments to the Author
After reading through the previous referees comments and the rebuttal I would think that the paper is mostly fine as it stands (and in really scientifically better shape than the previous version, realistic w.r.t claims etc)

A few items:

- Intro should probably cite recent related work a bit more extensively, e.g.:
https://www.sciencedirect.com/science/article/pii/S2667318523000041
https://www.biorxiv.org/content/10.1101/580654v2
or the like; also the author's own papers, say
https://pubs.acs.org/doi/10.1021/acs.jcim.8b00670
Nassiri did conceptually related work as well, though from a different angle:
https://pubmed.ncbi.nlm.nih.gov/30011038/
We have added the suggested citations and thank the reviewer for bringing them to our attention.

- Intro: "One goal of phenotypic HCS is to determine the MOAs of compounds" - not sure I would agree with this; yes, you can do guilt-by-association (as is done here), but mostly you would determine MoA _afterwards_ (...which is of course precisely a strong point for the work presented here, you get the MoA into place much earlier!)

- 'Mode of Action' is quite multi-faceted; this could/probably should be pointed out in the intro/results/methods as appropriate, the authors could cite e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8827085/ or similar reviews which discusses this in detail. Mode of action is different from 'hitting a target', in vitro activity doesn't always translate to in vivo effects, how do you deal with multiple targets, dose, etc etc., biology is just much more complex than a binary labelling problem (this is currently quite simplified in the methods section, referring to 'single-MoA-compounds' etc.)
We stated the complexity and multifaceted nature of MOA, and added the accompanying citation in the Introduction so the reader is aware.

- The dataset is quite limited in size (as the previous referees pointed out), so results and discussion could include this aspect a bit more strongly (results always need to be seen in light of the data)
We’ve added the limited dataset size as a caveat to the Discussion.

- Results and discussion are basically entirely devoid of biology/pharmacology, which is the main underlying topic though - so which target classes work best, what is actually covered in the data in the first place, etc.? In practice that would really matter - we don't just want a 'tubulin inhibitor detector', so something that detects only the obvious (and often not really relevant in discovery projects), but something that provides finer levels of information to really support drug discovery projects and decision making therein
See Supplemental Figure 9 for a breakdown of the best and worst performing MOAs, and Supplemental Table 1, 2 for granular MOA performance. We have also added more biologically motivated discussion as future directions in the Discussion section.

So overall I would suggest to cite previous related work more broadly, discuss the assumptions in 'labelling MoAs' and those stemming from the dataset size more explicitly, and ideally add more biology/pharmacology/dataset characterisation, which would make the paper more relevant for real-world applications. It's generally a sound direction of the study though, no concerns about that.
We’ve included more citations to previous work. We’ve added caveats to the discussion about labeling MOAs and how they are indeed broadly and simplistically defined, and independent of concentration. We’ve also added a caveat about dataset size. We’ve included more biology-focused discussion as possible future directions and applications.




Round 3

Revised manuscript submitted on 26 Jul 2023
 

09-Aug-2023

Dear Dr Wong:

Manuscript ID: DD-ART-04-2023-000060.R2
TITLE: Deep Representation Learning Determines Drug Mechanism of Action from Cell Painting Images

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry


 
Reviewer 2

I appreciate the authors response to feedback; I am now happy with the manuscript as is. Congratulations!




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license