From the journal Digital Discovery Peer review history

Automated LC-MS analysis and data extraction for high-throughput chemistry

Round 1

Manuscript submitted on 25 Aug 2023
 

26-Sep-2023

Dear Dr Mason:

Manuscript ID: DD-ART-08-2023-000167
TITLE: Automated LC-MS Analysis and Data Extraction for High-Throughput Chemistry

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Professor Jason Hein
Associate Editor, Digital Discovery

************


 
Reviewer 1

A beautiful tool for making LCMS data available for analysis, thank you for making it available as open source!

Page 6: It’d be interesting to know if there are any things in common for the cases in which the assignment was wrong and what the major causes for failure are.

Page 7: What about other possible side products in the reaction? One could think of O-arylation, reductive debromination/dimerization, hydrolysis of the aryl bromide. Is there a way for the data to be written to a database to check for other masses without re-processing everything?

Figure 2: I’m not sure 2B does the savings you achieve enough justice. Wouldn’t it make sense to distinguish between man hours and time spent by the computer processing something? Also, while the logarithmic scales makes it easier to visualize, it minimizes the stark difference between the two approaches even more.

Reviewer 2

The manuscript “Automated LC-MS Analysis and Data Extraction for High-Throughput Chemistry” by Mason et al. introduces PyParse as a Python program designed for analyzing LC-MS data from high-throughput experiments. To underscore its utility, the authors present two real-world scenarios where PyParse has yielded substantial improvements in the speed and information content of plate data analysis, i.e., in the analysis of i) a previously published GSK data set from a Direct-to-Biology (D2B) synthesis and ii) an optimization campaign for the C-H activation of oxazoles.

A big asset of this manuscript is the authors making PyParse available under the open-source Apache 2.0 license including comprehensive documentation and example data.

In general, the manuscript highlights a very nice example, how chromatographic workflows (data generation and analysis) become rate-limiting in state-of-the-art high-throughput labs and how new data analysis tools can help overcome the bottleneck of labor-intensive human data analysis of ever-growing datasets. Overall, I recommend publishing the manuscript in Digital Discovery after addressing the following major and minor comments:

(major) The meta-analysis including the Thomas et al. paper is very effective in building trust in the tool. However, please clarify, how results were assigned to the three categories. It did not become clear for the reviewer also when consulting SI page S12 (why peak area difference less than 1?). Additionally, could the authors give an estimate of quantitative “peak area percentage” error when comparing PyParse’s results to Thomas et al? Finally, could the authors provide reasoning behind “Incorrect” results?

(major) Please clarify the following section in “Method”:

““Overlap Detection” finds where a second peak has overlapped with the product peak in the most successful well.”

Why only in the most successful well? Couldn’t the second most successful well also have an overlap? Additionally, does PyParse also check internal standard signals for overlaps as they are crucial for identifying “best performing conditions” as described in Scheme 1?

(minor) Caption for Scheme 1d is missing.

(minor) For a broad readership, an additional statement introducing the principle of D2B could help in the introduction, e.g., that large libraries are assayed as unchromatographed mixtures.

(minor) It would be nice to include the times for setting up and running the experiments as well as analytical data generation (time for chromatographic runs) for both datasets. This gives context to the discussion around Figure 2c.

(minor) When reporting about time for analysis with PyParse, it would be helpful if the authors could include technical details on the computing resources used.

(minor) It would be helpful for the readership if the authors visualized some key chromatograms in the SI to get an impression on the complexity and quality of the chromatographic separations.

(minor) In the SI, please indicate which LC instruments and which mass detectors were used for data generation.

(minor) Using area percentage of LC peaks as “estimate of purity” of fragments should be backed up with information on PDA settings, especially the detection wavelength or wavelength range used for integration. This is especially relevant since the authors discuss a few sentences later that the “>1000 wells contained numerous complex reaction profiles and/or products with poor UV absorption characteristics”.

Reviewer 3

Dear editor,
I have completed my review of the manuscript titled “Automated LC-MS Analysis and Data Extraction for High-Throughput Chemistry" by authors Mason et al. My recommendation is to accept the manuscript for publication. Please note that my recommendation is based only on the computational section of the manuscript. For reviewing the experiments performed in the study, please request an experimental reviewer in the respective field for their review.
The authors have presented a python library, PyParse, for an automated and accessible program for data extraction from high-throughput chemistry experiments. The library program is capable of reading and analyzing liquid chromatography mass spectrometry (LC-MS) data for high-throughput chemistry. The authors claim that PyParse has shown to provide improvements in speed and accuracy in analyzing LC-MS plate data, potentially becoming an alternative to other prohibitive and high-cost commercial solutions. In the study, the authors have also demonstrated how PyParse works, by discussing its application in two cases: a Direct-to-Biology (D2B) synthesis and screening of a reactive fragment library, and a LC-MS plate-based optimization for the C-H activation of oxazoles.
Overall, the manuscript is well written and easy to follow. The authors have also provided the link to PyParse Github repositories, which has been published under an open-source Apache 2.0 license to allow its broader usability in academia and in industry.

Kindly,


 

Dear Professor Hein,

Many thanks for your email of 26th September 2023, communicating the decision on the above-named and -numbered manuscript. The manuscript was assessed by three referees, with all three referees of the view that the work should be accepted for publication in Digital Discovery, following minor revisions.

In accordance with the stated requests for revision and other matters that need to be addressed, below we list the pertinent comments from all referees, and present our responses in turn to each point. All amendments to the manuscript file have been highlighted in yellow in one version of the manuscript file (and uploaded as “Other”; a final revised manuscript file that does not contain any highlighting has also been uploaded (as “Main Article”). The amendments to the Supporting Information file have not been highlighted; nonetheless, the changes made to the Supporting Information file are listed, below.

Reviewer 1
1. Page 6: It’d be interesting to know if there are any things in common for the cases in which the assignment was wrong and what the major causes for failure are.
The primary cause for failure was generally found to relate to the data itself, where overlapping or “shoulder” peaks in the LC-MS UV trace led to the assignment of compounds to baseline impurities observed. A comment to this effect has been added to the Supplementary Information (page S14) in conjunction with a supporting Figure (Figure S2).

2. Page 7: What about other possible side products in the reaction? One could think of O-arylation, reductive debromination/dimerization, hydrolysis of the aryl bromide. Is there a way for the data to be written to a database to check for other masses without re-processing everything?
In the course of conducting the analysis for the reaction optimisation plate in PyParse, we did not specifically observe the formation of the protodehalogenation side-product, nor the corresponding phenol, nor the ether product that arises from O-arylation. The PyParse script will accept any number of byproducts that are provided, as part of the platemap, and a comment has been added to this effect in the Supplementary Information, page S4, paragraph 3.
The PyParse script does not currently write the analysis to a database, at the time of publication.

3. Figure 2: I’m not sure 2B does the savings you achieve enough justice. Wouldn’t it make sense to distinguish between man hours and time spent by the computer processing something? Also, while the logarithmic scales makes it easier to visualize, it minimizes the stark difference between the two approaches even more.
Figure 2B already represents the time taken to for the PyParse script to complete (i.e. time spent by the computer processing) and the estimated time taken by Thomas et al. to produce an equivalent dataset containing the purity of each compound in each well, based upon the UV peak percentage area. We have elected to continue to prioritize ease of understanding for the reader, by using a logarithmic scale for visualisation 2B. No changes were made to the manuscript, nor to the supplementary information on the basis of this comment.

Reviewer 2
1. (major) The meta-analysis including the Thomas et al. paper is very effective in building trust in the tool. However, please clarify, how results were assigned to the three categories. It did not become clear for the reviewer also when consulting SI page S12 (why peak area difference less than 1?). Additionally, could the authors give an estimate of quantitative “peak area percentage” error when comparing PyParse’s results to Thomas et al? Finally, could the authors provide reasoning behind “Incorrect” results?
The results were assigned to the three categories per the description contained within the Supplementary Information, page S13. A reference to this section of the Supplementary Information has been added to the Manuscript (page 2, right column, final paragraph).
A peak area difference was deemed acceptable, as it accounted for the rounding of peak percentage areas to the nearest integer as carried out by Thomas et al in their original analysis. A statement to this effect has been added to the Supplementary Information, page S13, first bullet point.

The analysis by PyParse itself is carried out using the same LC-MS data files that were used by Thomas et al;
these LC-MS data files have already been processed such that the peak areas have already been determined by integration, prior to the PyParse script being run. As such, the peak area percentage values reported by both Thomas et al and by the PyParse script are the same, when rounded to the nearest integer. The wording used to describe this analysis has been amended in the Manuscript to make it clear to the reader that the analysis by PyParse used the original LC-MS files obtained by Thomas et al (page 2, right column, paragraph 2).

Results marked incorrect are those where the conclusion drawn by PyParse did not match the conclusion drawn by Thomas et al, determined on a well-by-well basis. As described in our response to reviewer 1’s query, these failures were generally noted to result from close-running peaks, or overlapping peaks, that coincidentally displayed the same desired mass ion for the product in question.

2. (major) Please clarify the following section in “Method”:
“Overlap Detection” finds where a second peak has overlapped with the product peak in the most successful well.”
Why only in the most successful well? Couldn’t the second most successful well also have an overlap? Additionally, does PyParse also check internal standard signals for overlaps as they are crucial for identifying “best performing conditions” as described in Scheme 1?
Currently, the PyParse script looks to determine whether there is peak overlap only for the best-performing well, as this will typically be the result that the chemist looks to carry out further work on (e.g., purification, scale up, or submission to a biological assay). This also applies to the internal standard, with the understanding that PyParse has been designed to work in conjunction with analysis of the data by the user, to ensure that the internal standard has been selected such that it does not overlap with other peaks. A short note on this point has been added to the Manuscript (page 2, left column, paragraph 3) to provide greater clarity to the reader, and a detailed explanation of this point has been added to the Supplementary Information, page S6, paragraph 5.

3. (minor) Caption for Scheme 1d is missing.
The caption for Scheme 1d has been added to the Manuscript (page 4).

4. (minor) For a broad readership, an additional statement introducing the principle of D2B could help in the introduction, e.g., that large libraries are assayed as unchromatographed mixtures.
A short explanation has been added to the Manuscript (page 1, left column, paragraph 1) which, in conjunction with the literature references already provided, will facilitate understanding amongst a broad readership.

5. (minor) It would be nice to include the times for setting up and running the experiments as well as analytical data generation (time for chromatographic runs) for both datasets. This gives context to the discussion around Figure 2c.
As described in our response to comment 1 from Reviewer 2, the LC-MS data were obtained only once, by Thomas et al. The wording used to describe this analysis has been amended in the manuscript to make it clear to the reader that the analysis by PyParse used the original LC-MS files obtained by Thomas et al (page 2, right column, paragraph 2).

6. (minor) When reporting about time for analysis with PyParse, it would be helpful if the authors could include technical details on the computing resources used.
These details have been provided in the Supplementary Information, page S13, paragraph 4.

7. (minor) It would be helpful for the readership if the authors visualized some key chromatograms in the SI to get an impression on the complexity and quality of the chromatographic separations.
Two illustrative chromatograms from the D2B dataset have been provided to better describe the range of complexities for the LC-MS profiles that were obtained (Supplementary Information, pages S14-15, Figures S3 and S4).

8. (minor) In the SI, please indicate which LC instruments and which mass detectors were used for data generation.
Further details relating the LC instruments and mass detectors have been added to the Supplementary Information, General Experimental section (page S8, paragraph 4).

9. (minor) Using area percentage of LC peaks as “estimate of purity” of fragments should be backed up with information on PDA settings, especially the detection wavelength or wavelength range used for integration. This is especially relevant since the authors discuss a few sentences later that the “>1000 wells contained numerous complex reaction profiles and/or products with poor UV absorption characteristics”.
Additional details relating to the detection wavelength and PDA settings have been provided in the general experimental section of the Supplementary Information (page S8, paragraph 4). Furthermore, as discussed in our response to comment 7 from Reviewer 2, two illustrative chromatograms have also been provided (Supplementary Information, pages S14-15, Figures S3 and S4).

Reviewer 3
No specific changes were recommended by Reviewer 3.

Additionally, as requested by yourself, the editor, we have standardised the contribution descriptions for the author contributions section in the manuscript, such that it aligns with the recommended CRediT taxonomy. Minor changes to the formatting of the references have also been carried out, such that the manuscript aligns with the journal’s standard.

Once again, many thanks for your consideration thus far and I look forward to hearing from you in due course.

Yours sincerely,

Joseph Mason




Round 2

Revised manuscript submitted on 06 Oct 2023
 

17-Oct-2023

Dear Dr Mason:

Manuscript ID: DD-ART-08-2023-000167.R1
TITLE: Automated LC-MS Analysis and Data Extraction for High-Throughput Chemistry

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Professor Jason Hein
Associate Editor, Digital Discovery


 
Reviewer 2

The authors addressed all points satisfactorily and I recommend the paper for publication!




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license