Peer review - Chemspyd: an open-source python interface for Chemspeed robotic chemistry and materials platforms

09-Apr-2024

Dear Professor Aspuru-Guzik:

Manuscript ID: DD-ART-02-2024-000046
TITLE: Chemspyd: An Open-Source Python Interface for Chemspeed Robotic Chemistry and Materials Platforms

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after minor revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery

************

Reviewer comments

Reviewer 1

Laboratory automation holds significant appeal for the chemistry community, yet current hardware and software fall short of meeting their needs. While Chemspeed's robotic platform is popular, its software lacks user-friendliness. As a remedy, Aspuru-Guzik et al. have introduced Chemspyd, a Python software designed to program and automate Chemspeed's robotic chemistry. Chemspyd, an open-source Python package, facilitates seamless communication with Chemspeed's lab robotics, integrating with proprietary software for customizable workflows and supporting natural language interfaces with large models. The manuscript, backed by relevant Supporting Information files, recommends publication in Digital Discovery upon addressing specified major and minor comments.
Major Comments:
1. It is suggested to illustrate Figure 2, depicting the installation and use of Chemspyd, through a workflow or logic flow chart, with certain code details moved to the Supplementary Information (SI).
2. A comparison with Gomes and Schwaller's recent works in large language model (NLP) driven automated labs is encouraged. It seeks insights into the success rate of translating natural language instructions into Chemspyd code and the proper approach to prompt engineering in natural language instructions.
3. The manuscript should explore the author's perspective on integrating this automation software with other open-source tools/algorithms for (1) design-of-experiment, (2) lab automation (linked with Chemspyd), (3) online analysis, and (4) in-silico reaction optimization. This approach aims to achieve a comprehensive closed-loop optimization.
Minor Comments:
1. A few verbs in the manuscript lack third-person singular forms, expected to be rectified during proofreading.

Reviewer 2

The authors present an extremely well written paper that focuses on the development of an open source Python API for controlling Chemspeed instruments. Specifically the approach is to automate the input of 'instructions' utilised by the Chemspeed platforms (based on a pre-determined/defined platform setup) via the established CSV input method. Whilst the use of CSV files to control the input of variables into Chemspeed's AutoSuite software is well established this approach adds an increased degree of flexibility and control. The use of CSV files to implement feedback/closed loop optimisations on a Chemspeed platform is a challenge that many academic groups (and industry) have faced, each adopting their own approach to over one this challenge. The work here is the first published example of a working solution but also forms the basis of building a collective open source approach that will benefit the wider community.

The following points are suggestions/ items I would greatly appreciate the authors to consider and include in their publication:

1) Chemspyd is defined as a lightweight Python Package, but is an extensive code base that when includes more complex multi step processes on a Chemspeed, would arguably not be classed as 'lightweight'. Could the authors comment on what they specifically mean by 'lightweight'

2) I could not locate the copy of the Autosuite application file, could the authors please ensure this is included in the repository/SI or at least figures that highlight not only the deck layout in the Autosuite software but also the command line format to illustrate how the workflow is built in AutoSuite that enables full interaction with Chemspyd.

3) The authors discuss 'safety checks' built within Chemspyd, specifically a simulation mode. Could the authors explain more about what simulation mode they are referring to i.e. a newly design simulation mode or utilising the existing simulation mode feature with AutoSUite software. Additionally, could the authors comment on how (if any) feedback is obtained and utilised from these safety checks with Chemspyd, prior to running scripts.

3) When the use of natural languages to create Chemspyd code there is a manual validation method by the user. Is it at all possible to automate the validation of the code by running simulations on the generated code from Natural languages as an automated 'pre-check'? This would potentially further streamline the creation of 'working code' especially for those with a limited coding background.

4) The authors summaries stating 'We have introduced Chemspyd as a simple, lightweight and easy-to-use Python API for Chemspeed platforms. In contrast to existing software interfaces, Chemspyd allows for fine-grained, dynamic instrument control, thereby facilitating the usage of Chemspeed instruments in custom workflows and
SDLs'. Could the authors comment which other existing software interfaces have been considered as a comparison in this work?

Thank you

Author response

April 24, 2024

Associate Editor
Digital Discovery

Dear Joshua,

We are happy to submit a revised version of our manuscript entitled “Chemspyd: An Open-Source Python Interface for Chemspeed Robotic Chemistry and Materials Platforms”, for your consideration as a research article in Digital Discovery. We highly appreciate the positive, highly constructive feedback from both reviewers.

According to their suggestions, we have revised our manuscript, and provide further details and explanations on the architecture and functionality of our open-source software interface for the Chemspeed robotic systems. Moreover, we extended the discussion around our natural language interface, and placed it in the context of recent works on automated code generation through large language models. A detailed, point-by-point response to the questions from the reviewers, along with a description of our specific actions in response, can be found below.

We would be happy to provide further information or respond to additional questions if anything has remained unclear!

Sincerely,
Alán Aspuru-Guzik (on behalf of all authors)

Referee: 1
Laboratory automation holds significant appeal for the chemistry community, yet current hardware and software fall short of meeting their needs. While Chemspeed's robotic platform is popular, its software lacks user-friendliness. As a remedy, Aspuru-Guzik et al. have introduced Chemspyd, a Python software designed to program and automate Chemspeed's robotic chemistry. Chemspyd, an open-source Python package, facilitates seamless communication with Chemspeed's lab robotics,integrating with proprietary software for customizable workflows and supporting natural language interfaces with large models. The manuscript, backed by relevant Supporting Information files, recommends publication in Digital Discovery upon addressing specified major and minor comments.

Major Comments:

1. It is suggested to illustrate Figure 2, depicting the installation and use of Chemspyd, through a workflow or logic flow chart, with certain code details moved to the Supplementary Information (SI).

We thank the reviewer for their suggestion that we simplify Figure 2. We have exchanged the figure for a new version, and moved the original to the SI.

2. A comparison with Gomes and Schwaller's recent works in large language model (NLP) driven automated labs is encouraged. It seeks insights into the success rate of translating natural language instructions into Chemspyd code and the proper approach to prompt engineering in natural language instructions.

We appreciate the reviewer's suggestion to contextualize our work alongside recent well-known examples in the literature: ChemCrow and Coscientist. We have updated the manuscript as follows:

To further facilitate the adoption of Chemspyd and its rapid implementation into new laboratory routines, we provide a natural language interface for generating Chemspyd code based on iterative prompting of GPT-4. Analogous methods have recently proven to be powerful enabling technologies for automated or self-driving laboratories.38,39 For instance, ChemCrow (reported by Schwaller, White and co-workers) uses iterative large language model (LLM) prompting and tool integration to propose synthetic routes for organic molecules. Coscientist, recently described by Boiko et al., uses a related approach and adds the ability to generate instrument operation code for automated systems. Notably, Chemspyd could readily be integrated into these types of systems. However, our natural language interface targets a slightly different use case: namely, facilitating adoption of automation tools by users who are not familiar with programming in Python, which is characteristic of a significant portion of the chemistry and materials science communities at the moment.

Similar to our recent work,40 we provide a web interface that uses a large language model to convert the natural language inputs into structured Chemspyd output.41 In our implementation, all Chemspyd functions, along with their natural language documentation and all parameters, are organized in an associative array. We use a similar approach as Boiko et al., demonstrating that it generalizes well to a domain-specific language unknown to GPT-4. Incoming natural language instructions are segmented into structured commands, which are then matched to the classes and functions in the associative array based on cosine similarity.
A quantitative comparison ( in terms of success rate and prompt engineering), however, is not feasible at the current stage. Unfortunately, neither ChemCrow (ref. 38) nor Coscientist (ref. 39) provide sufficient details to perform adequate comparisons.

3. The manuscript should explore the author's perspective on integrating this automation software with other open-source tools/algorithms for (1) design-of-experiment, (2) lab automation (linked with Chemspyd), (3) online analysis, and (4) in-silico reaction optimization. This approach aims to achieve a comprehensive closed-loop optimization.

We thank the reviewer for emphasizing the interoperability of Chemspyd, which we believe is a major strength of our work: In designing Chemspyd as an open-source Python library, we aimed to make use of Python’s versatility and widespread adoption, offering two significant benefits: a) As Python is a free, open-source and widely used programming language, it allows for the swift adoption of Chemspyd, as well as maximum flexibility upon usage. b) With Python as the de-facto preferred programming language for automated laboratories, Chemspyd gives the possibility to leverage the entire Python ecosystem: This includes e.g. control software for further instrumentation (including online analysis), as well as numerous libraries for experiment planning (design of experiments, Bayesian optimization). Therefore, it enables researchers to flexibly integrate Chemspeed robotic platforms into highly custom workflows.

Indeed, in the experimental demonstrations in this paper, as well as in previous works (e.g. references 17, 18, 27), we have made use of this advantage, integrating Chemspyd with further Python libraries for instrument control or experiment planning.

We have revised our manuscript to emphasize these aspects more clearly:
To address these gaps, we introduce Chemspyd, an open-source Python API specifically designed for Chemspeed platforms. Python has long been a foundational element of scientific computing and of automated laboratories, in particular. This API enables real-time, adaptive control of Chemspeed instruments by providing a simple interface through which Chemspeed instruments can be integrated with the rest of the scientific Python ecosystem. Users are able to integrate with other open source tools such as packages for optimization or design-of-experiment, other lab automation, online analysis, and in-silico predictions. Ultimately, this empowers researchers to seamlessly integrate Chemspeed robots into custom workflows and automated or self-driving laboratories (SDLs). We use three experimental case studies to demonstrate how Chemspyd can be used for experiments in the chemical and materials sciences. Most importantly, Chemspyd is designed as a modular and expandable open-source project,28 and can therefore serve as a blueprint for the development of similar interfaces that meet the evolving demands of modern, flexible, and customizable automated laboratories.

Minor Comments:

1. A few verbs in the manuscript lack third-person singular forms, expected to be rectified during proofreading.

We thank the reviewer for their attention to detail and defer to the proofreaders on this matter.

Referee: 2
The authors present an extremely well written paper that focuses on the development of an open source Python API for controlling Chemspeed instruments. Specifically the approach is to automate the input of 'instructions' utilised by the Chemspeed platforms (based on a pre-determined/defined platform setup) via the established CSV input method. Whilst the use of CSV files to control the input of variables into Chemspeed's AutoSuite software is well established this approach adds an increased degree of flexibility and control. The use of CSV files to implement feedback/closed loop optimisations on a Chemspeed platform is a challenge that many academic groups (and industry)have faced, each adopting their own approach to over one this challenge. The work here is the first published example of a working solution but also forms the basis of building a collective open source approach that will benefit the wider community.The following points are suggestions/ items I would greatly appreciate the authors to consider and include in their publication:

1) Chemspyd is defined as a lightweight Python Package, but is an extensive code base that when includes more complex multi step processes on a Chemspeed, would arguably not be classed as 'lightweight'. Could the authors comment on what they specifically mean by 'lightweight'

We appreciate the reviewer’s comment and inquiry regarding our characterization of Chemspyd as a "lightweight" Python Package. By describing Chemspyd as lightweight, we aim to convey its ease of integration and use within Python environments, particularly emphasizing its simplicity in terms of installation and lack of heavy dependencies. In other words, when we refer to Chemspyd as lightweight, we emphasize its accessibility and user-friendly nature, ensuring that users can leverage its capabilities with minimal overhead and without the burden of managing large dependencies and dealing with complex code structures. To clarify our meaning, we have updated the manuscript as follows:

“Because of 2) and 3), Chemspyd comes as a lightweight Python package (i.e., easy to install and lacking heavy dependencies) that dynamically interacts with Chemspeed’s proprietary AutoSuite software.”

2) I could not locate the copy of the Autosuite application file, could the authors please ensure this is included in the repository/SI or at least figures that highlight not only the deck layout in theAutosuite software but also the command line format to illustrate how the workflow is built inAutoSuite that enables full interaction with Chemspyd.

We thank the reviewer for pointing out that access to the Manager app should be clarified. Indeed, the installation guide (https://aspuru-guzik-group.gitlab.io/self-driving-lab/instruments/chemspyd/intro/install.html#chemspeed-manager-app) was updated during the review process and includes guidance on finding the location of the various Manager.app files. We have also updated the manuscript as follows:

“To enable dynamic control on the Chemspeed side, we created a dedicated AutoSuite application file, referred to as the Manager (see Installation Guide in the package documentation for file location), that listens for command files, and executes actions based on the provided keywords and parameters.”

The AutoSuite application itself, however, is proprietary software provided by Chemspeed, which we therefore cannot distribute. However, any lab that has a Chemspeed already has access to AutoSuite, so we do not foresee any problems in this regard.

3) The authors discuss 'safety checks' built within Chemspyd, specifically a simulation mode. Could the authors explain more about what simulation mode they are referring to i.e. a newly design simulation mode or utilising the existing simulation mode feature with AutoSUite software.Additionally, could the authors comment on how (if any) feedback is obtained and utilised from these safety checks with Chemspyd, prior to running scripts.

We appreciate the reviewer’s request for clarification regarding “safety checks” in Chemspyd, which we believe to be one of the valuable aspects of our software. The simulation mode referred to in the text is implemented at the Python level and, thus, is different from the simulation mode provided by AutoSuite. We have revised the manuscript to address this point as follows:

“Beyond the fine-grained control over elementary actions, we developed Chemspyd to contain a series of optional tools to assist with operation safety, accurate resource management, and standardized data collection. These safety checks include a simulation mode within Chemspyd that can be used to verify that the code will execute without generating internal Chemspyd errors prior to testing the operations in AutoSuite’s own simulation mode, which validates other operational parameters. Together, these form a “digital twin” that enables users to simulate processes without the need to access a Chemspeed instrument. Chemspyd’s resource management features also allow users to validate operations of workflows prior to execution to ensure that liquids or solids can be added or removed from the specified wells, and that the wells will not be overfilled or depleted. The required attributes of each well (type, volume, etc.) are automatically extracted from the instrument configuration, avoiding manual input by the user (see section “Installation and Usage” for further details).”

4) When the use of natural languages to create Chemspyd code there is a manual validation methodby the user. Is it at all possible to automate the validation of the code by running simulations on the generated code from Natural languages as an automated 'pre-check'? This would potentially further streamline the creation of 'working code' especially for those with a limited coding background.

We thank the reviewer for their question about automated “pre-checks” for the NLI. We have added the following text to the manuscript to clarify and add more context:

“Command-by-command, each section of the generated Chemspyd code is sent back to the user for feedback and validation. This match-translate cycle is repeated iteratively until satisfactory Chemspyd code is reached (Figure 4b). We maintain user feedback in the match-translate cycle because there are many cases where semantic errors in the generated code cannot be detected in simulation. As an example, if the natural language input specifies “Add ethanol”, and the generated code incorrectly adds methanol, the generated code does not reflect the intent of the user but the simulation may not yield errors. This kind of semantic error detection requires a high-level natural language understanding ability beyond simulation, and forms part of our future work.”

5) The authors summaries stating 'We have introduced Chemspyd as a simple, lightweight and easy-to-use Python API for Chemspeed platforms. In contrast to existing software interfaces, Chemspyd allows for fine-grained, dynamic instrument control, thereby facilitating the usage of Chemspeed instruments in custom workflows andSDLs'. Could the authors comment which other existing software interfaces have been considered as a comparison in this work?

We appreciate the reviewer's inquiry regarding comparisons with existing software interfaces in the context of our work. We intended to compare Chemspyd to the use of Chemspeed’s proprietary AutoSuite software, which is the only other control software that is publicly available. Unlike Chemspyd, which offers dynamic control at the Python API level, AutoSuite functions through a graphical user interface (GUI) and lacks the flexibility to be integrated into other SDLs or custom workflows. We have amended the text as follows:

“We have introduced Chemspyd as a simple, lightweight and easy-to-use Python API for Chemspeed platforms. In contrast to the existing graphical user interface control software (AutoSuite), Chemspyd allows for fine-grained, dynamic instrument control through Python, thereby facilitating integration of Chemspeed instruments in custom workflows and SDLs.”

Editor’s decision letter

08-May-2024

Dear Professor Aspuru-Guzik:

Manuscript ID: DD-ART-02-2024-000046.R1
TITLE: Chemspyd: An Open-Source Python Interface for Chemspeed Robotic Chemistry and Materials Platforms

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our LinkedIn account [https://rsc.li/Digital_showcase] please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Joshua Schrier
Associate Editor, Digital Discovery

Reviewer comments

Reviewer 1

The authors addressed all my concerns and I believe this manuscript is ready to be published.

Reviewer 2

Congratulations to the team on an excellent paper.

From the journal Digital Discovery Peer review history

Round 1

Reviewer 1

Reviewer 2

Round 2

Reviewer 1

Reviewer 2

Transparent peer review