From the journal Digital Discovery Peer review history

Event-driven data management with cloud computing for extensible materials acceleration platforms

Round 1

Manuscript submitted on 09 Nov 2023
 

06-Dec-2023

Dear Dr Gregoire:

Manuscript ID: DD-OPN-11-2023-000220
TITLE: Event-Driven Data Management with Cloud Computing for Extensible Materials Acceleration Platforms

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery

************


 
Reviewer 1

This piece puts forward a well-resourced expert opinion based on direct experience in developing data management systems for MAPs and beyond. I thank the authors for this timely contribution that nicely accompanies their published work in Digital Discovery, and thus I am happy to recommend its publication as-is. However, I would also support any additional work by the authors to flesh out these ideas into a perspective as suggested in the cover letter. In particular, I'd personally love to hear their takes on how this event-driven approach interacts at a low-level with other LIMS-style approaches that may be implemented by "manual" labs that collaborate on a MAP, and how community-driven approaches to data recording, data standardization and data dissemination can combine to make not just the "lab of the future", but the scientific community of the future.

If I were forced to make one suggestion, it would be to increase slightly the consideration given to knowledge graphs in this piece, as although they are very well-motivated in the recent publication of MatKG, many readers of this piece (and Digital Discovery, in general) probably still find these quite an abstract concept! To motivate the case in which we move to a more diverse data dissemination model that de-emphasizes journal publications, some discussion could be added about the "user interface" between a knowledge graph and an external user, and how we can adapt to this way of working. To reiterate, I do not see this as blocking publication.

Reviewer 2

The authors have presented a view on what it means to build scalable materials discovery platforms integrating cloud computing and automated laboratories. The perspective mostly focuses, in my opinion, on a variety of cloud services that can support event-driven pipelines, which is very useful to the broader automated laboratory efforts. But it would help if the perspective also provided alternate views (apart from event based execution platforms) to cover the area. In addition, the perspective would benefit from a discussion of other platforms such as https://arxiv.org/abs/2308.09793 and others within the community to suggest a broader context for the studies. Given that this particular paper also covers some aspects of how to handle concurrent requests and handle diverse hardware platforms, it may be good to consider these as viable approaches as well. Overall, I think the perspective is valuable, relevant, and advocates good practices for the community to consider when scaling automated laboratory systems for materials acceleration platforms.


 

The reviewer comments were excellent. We expanded our discussion of other event-based platforms, knowledge graphs, and integration of event-based pipelines with manual and LIMS-based data. We received positive feedback on publication of this work as a Perspective at the MRS conference and appreciate the expedient review that will enable timely publication of the work.

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters:

REVIEWER REPORT(S):
Referee: 1

Comments to the Author
This piece puts forward a well-resourced expert opinion based on direct experience in developing data management systems for MAPs and beyond. I thank the authors for this timely contribution that nicely accompanies their published work in Digital Discovery, and thus I am happy to recommend its publication as-is. However, I would also support any additional work by the authors to flesh out these ideas into a perspective as suggested in the cover letter. In particular, I'd personally love to hear their takes on how this event-driven approach interacts at a low-level with other LIMS-style approaches that may be implemented by "manual" labs that collaborate on a MAP, and how community-driven approaches to data recording, data standardization and data dissemination can combine to make not just the "lab of the future", but the scientific community of the future.

Response: Thank you for this positive assessment. Additional discussion of integrating with manual and LIMS-style data has been added:

While the data flow in Fig. 1 is well suited for coupling to automated experiments, manually-performed experiments or analyses may also generate events, for example through a web form where manual data entry comprises an event producer whose published events enter the event bus alongside automatically-published events. Workflows that currently employ a laboratory information management system (LIMS) may seek to incorporate the LIMS into an event-based pipeline. Provided that the LIMS has an application programming interface (API) for accessing data, this API could be configured to send events to the event bus. Regular polling of the API may be necessary to detect new data, potentially causing delays in the system. Additionally, developing software to monitor for new data would create an obstacle in integrating data streams into a unified system. Therefore, it is advantageous to host user interfaces for data input on infrastructure that can directly interact with the event bus.

We have also revised the discussion of the replay functionality as follows:

The event replay capability also eliminates the need for writing a translator to upgrade data to a new or additional database. The same code that ingests new data can also ingest legacy data by replaying the event stream. Legacy data and new data can even be sent to separate consumers based upon version identifiers within the event, enabling specificity in the responsibility of each consumer by reducing the scope of data any given consumer needs to consider.

Referee: 1

If I were forced to make one suggestion, it would be to increase slightly the consideration given to knowledge graphs in this piece, as although they are very well-motivated in the recent publication of MatKG, many readers of this piece (and Digital Discovery, in general) probably still find these quite an abstract concept! To motivate the case in which we move to a more diverse data dissemination model that de-emphasizes journal publications, some discussion could be added about the "user interface" between a knowledge graph and an external user, and how we can adapt to this way of working. To reiterate, I do not see this as blocking publication.

Response: We agree, and also believe that the MatKG manuscript provides an intuitive introduction to knowledge graphs for materials science. We have revised the introduction of knowledge graphs as follows:

For example, MatKG\cite{venugopal_matkg_2022} represents the knowledge from published abstracts and figure captions as relationships (edges) among materials properties, descriptors, applications, etc. (nodes) \cite{venugopal_matkg_2022}. Such approaches provide new opportunities for human and machine learning from diverse data. The machinery for real-time interaction among MAPs and databases is relatively underdeveloped in the materials chemistry community.


Referee: 2

Comments to the Author
The authors have presented a view on what it means to build scalable materials discovery platforms integrating cloud computing and automated laboratories. The perspective mostly focuses, in my opinion, on a variety of cloud services that can support event-driven pipelines, which is very useful to the broader automated laboratory efforts. But it would help if the perspective also provided alternate views (apart from event based execution platforms) to cover the area. In addition, the perspective would benefit from a discussion of other platforms such as https://arxiv.org/abs/2308.09793 and others within the community to suggest a broader context for the studies. Given that this particular paper also covers some aspects of how to handle concurrent requests and handle diverse hardware platforms, it may be good to consider these as viable approaches as well. Overall, I think the perspective is valuable, relevant, and advocates good practices for the community to consider when scaling automated laboratory systems for materials acceleration platforms.

Response: Thank you for the positive assessment. We agree that we should explicitly note that other event-based systems, for example in workflow orchestration, should be more explicitly highlighted. We have added the following and separately called out Globus as a key example of representing experiments as a sequence of events:

Cloud-based workflow orchestration has been
demonstrated,\cite{li_autonomous_2020,chard_globus_2023} and extending the use of cloud services to include event-driven pipelines will enable the next generation of data management.
We have also better contextualized the utilization of cloud services with the following addition:
These services may be deployed to bring various aspects of lab automation into the cloud,\cite{li_autonomous_2020,chard_globus_2023,segal_operating_2019} and we believe that event-based data management comprises the most universally useful implementation of cloud computing for MAPs.




Round 2

Revised manuscript submitted on 15 Dec 2023
 

20-Dec-2023

Dear Dr Gregoire:

Manuscript ID: DD-OPN-11-2023-000220.R1
TITLE: Event-Driven Data Management with Cloud Computing for Extensible Materials Acceleration Platforms

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Joshua Schrier
Associate Editor, Digital Discovery


 
Reviewer 2

Thank you for addressing the comments and the paper will be a good resource to the community.

Reviewer 1

Thank you for taking my comments on board; I am happy to recommend this for publication as a perspective.

I must also apologise that in my comment on KG's I actually meant to refer to your own MekG (which I believe is still uncited here?) but MatKG is of course also relevant. Please feel free to revise (or just add the reference) and I will be to accept once again, or leave it, as you prefer.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license