From the journal Digital Discovery Peer review history

ULSA: unified language of synthesis actions for the representation of inorganic synthesis protocols

Round 1

Manuscript submitted on 04 Nov 2021
 

22-Jan-2022

Dear Dr Ceder:

Manuscript ID: DD-ART-11-2021-000034
TITLE: ULSA: Unified Language of Synthesis Actions for Representation of Synthesis Protocols

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below. The data reviewer checklist is also attached.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy from CASRAI, https://casrai.org/credit/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

The manuscript "ULSA: Unified Language of Synthesis Actions for Representation of
Synthesis Protocols" by Z. Wang et al. lays out a strategy for creating automated synthesis workflows by extracting information from scientific publications. It is a well-written and timely piece of work that addresses the important issue of how to attain meaningful information from published literature which is out there but often not in an easily readable or understandable manner. Using a somewhat limited dataset of ~ 3000 sentences on 4 different types of ceramic synthesis techniques, the authors demonstrate how a mapping model can be built to identify essential words and create synthesis flowcharts. The work is well motivated and the results are reasonably explained. I recommend the manuscript for publication in Digital Discovery but have a few comments below that need addressing:

1. There are no actual examples of synthesis of a particular material that has been included in this work. This seems pretty deliberate from the authors, but it would be nice to have a couple of flowcharts (e.g. for TiO2 or LiF) added as additional figures or text with a discussion of how meaningful/accurate the predicted synthesis pathways are.

2. The PCA discussion presented in the first three paragraphs of page 15 is useful, but not very intuitive. Can the authors add more information (perhaps as supplementary information) about which features constitute the principal components and how these observations follow?

Reviewer 2

This work describes the development of a standardized language for describing synthesis protocols, which can be used by machine-learning based tools to automate the extraction of synthesis procedures from the scientific literature. The language breaks down synthesis steps into 8 basic procedures (called “action terms”: Starting, Mixing, Purification, Heating, Cooling, Shaping, Reaction, and Miscellaneous). Overall, the paper is clearly written and appears to be within the scope of the journal, and the ULSA language is potentially useful for the future automation of materials synthesis processes. However, there are several issues that should be addressed before this work can be considered for publication:

1. The title and abstract somewhat exaggerate the generality of the work. The actual ULSA language as described in this work is quite narrow in scope, in that it only aims to describe a relatively limited subset of synthesis procedures for ceramic materials. It does not describe organic synthesis methods; and even for ceramic materials, relatively important methods such as vapor deposition or spark plasma sintering do not seem to be considered. It is possible that the language could be extended in the future to include such methods, but for this paper, the abstract and title should more clearly reflect the limited scope of the current work.

2. The literature review of the use of machine-learning for extracting materials synthesis procedures from the literature tends to downplay the amount of work which has already been published – some missing recent publications include npj Comput. Mater. 3, 53 and Chem. Mater. 29, 9436.

3. The choice of the 8 “action terms” seems a bit limited. In particular, the “shaping” term (which has the lowest Fleiss kappa score after “miscellaneous”) seems to incorporate a large number of quite different procedures: grinding into a powder, pressing into pellets, cutting into a specific shape, etc. Granulation and pelletization seem to be opposite processes in the same way that the separate “heating” and “cooling” action terms are. Additionally, there does not appear to be a keyword for the application of pressure to obtain certain structural phases, which is an important synthesis procedure for ceramics.

4. The existence of the “miscellaneous” action term is also somewhat concerning, since it implies that the language cannot uniquely specify synthesis protocols in such a way that they could be reproduced without expert human intervention, and does not align with the statement at the start of Section 2.1 that the action terms should “unambiguously identify a type of synthesis action”. This should be discussed further, including analysis of how frequently this action term is required, and the potential importance of “miscellaneous” action terms to successful synthesis procedures.

5. It is unclear from the paper how the language tags and records data other than action terms, such as the heating temperature or environmental conditions. In the example shown in Figure 1, the heating was performed in Ar, and this is included as an annotation in the flowchart, but it is unclear how this information is tagged or categorized. The JSON files linked in the Data Availability Statement seem to indicate that only the action terms are tagged, so it is unclear how the other relevant quantities are identified, retrieved, stored and categorized.

6. It is also unclear if the initial condition of the precursors is described in this language, e.g. their purity, the fineness of the powders, etc.

7. In Section 3.2, it is mentioned that 6 human experts performed the same annotation of the data set to compare with the machine-learning results. Are these human experts co-authors on this work? How much input did materials synthesis experts have in the design of the language itself? This might be important, since the main authors of this work all appear to be from computational backgrounds, and input from experimental synthesis experts is important to make sure that all of the potential subtleties are captured.

8. Finally, there are some minor issues: in the cation for Table 2, “ULSA” is misspelled as “USLA”; and Refs. [27] and [28] appear to be missing some volume, page number and publication year information.

Reviewer 3

Authors propose a a unified language for synthesis actions (ULSA) to fill the gap in automated synthesis information extraction from scientific publications. Although in a different context, this task was extensively covered in two other works already cited in this manuscript: [29]10.1126/science.abc2986 and [20]10.1038/s41467-020-17266-6. Despite the different application field compared to previous works, the significant contributions of this work to the field of action conversion are unclear. The manuscript lacks a comparison with existing methods (even if they are specialised in a different application field) and more robustness in the methodology of the analysis. Although some statements are bold and not substantiated by the presented results, the paper reads well. I support publication after the necessary major revisions.

(1) Unified language of synthesis actions: given the limited application to the ceramic synthesis space, the word unified is quite bold. The paper presents a language scheme for ceramic material synthesis actions. I invite the authors to present the work (including the title) in a manner that better reflects the actual content of the manuscript. Nothing in this paper suggests a unified language for synthesis actions unless the authors decide to extend the paper to broader applications (organic, catalysis, biomasses, etc.) in their revision.

(2) The introduction should be much more specific in order to convey what the goal of previous approaches was and how their information extraction procedures worked. The introduction currently provides a cursory description of related work, but it is unclear from the text what previous studies have accomplished. The introduction would be greatly enhanced if more time was spent describing previous contributions and work in organic chemistry; specifically, describing the methods that these approaches used rather than just the application/context for them.

(3) Comparison with state of the art: the main goal of this paper is to present a new language scheme for extracting synthesis actions from published synthesis procedures for ceramic materials. Despite having been trained and developed in different fields, the authors have to show a comparison between their methodology and that published in: [29] 10.1126/science.abc2986 and [20] 10.1038/s41467-020-17266-6. Both works make code available (or services for even easier use). Even without further customising refs [29] and [20] to the space of ceramic materials , how do these approaches perform when applied to the same ceramic material synthesis compared to the proposed method ? What are the main advantages of this approach over the other two? A new section with the comparison of the existing methodology on the ceramic dataset is crucial to provide evidences of the benefits of this approach.

(4) It would be nice to see some examples of the procedures, what is converted by the 2 baselines, and what is converted by the bi-LSTM model to demonstrate the value (aside from the quantitative accuracy). I understand how difficult it will be to find concise examples, but most readers are unlikely to open and inspect the files made available on GitHub.

(5) It is critical to provide a measure of the annotated samples' variability/difference. This is significant because if the action sequences are randomly split, there will be a lot of overlap between the training, testing and validation action sequences. The evaluation should be carried out in such a way that validation samples are drawn from paragraphs linguistically different from the one included in the training and test set.

(6) In respect to (5), a more detailed description and analysis of the data would help the reader understand the conversion task. Beyond the action distribution already shown, some statistics on data distribution, sentence lengths, average number of actions in a sentence, and so on would be useful.

(7) There are numerous neural network / translation models, and an explanation of why the authors chose the bi-LSTM would be helpful. Why did the author choose this architecture over other options? A brief justification/comparison would make the choice appear more rational.

(8) The authors acknowledge that the actions in section 2 cover the majority of procedures for ceramic synthesis, but in their current form (which is easily extensible) they do not cover some relevant chemistry protocols such as electrochemistry. The reviewer believes that these will be useful in robotic synthesis of materials.

(9) One concern I have is with the evaluation. The performance per synthesis sentence is reported in Table 3. The authors present F1, precision, and recall. After using an LSTM baseline, and because this is a text-to-unified-language conversion, authors should provide more effective and convincing metrics about the quality of the conversion process. The Levenshtein metrics and the BLEU are two key metrics used in translation tasks, and they should be provided also for performance comparison with the state of the art.

More detailed notes:

(10) On pag. 3, authors claim that, to the best of their knowledge, there is only one corpus of annotated materials synthesis protocols. If I am not mistaken also ref. [20] provides an annotated corpus of organic synthesis on request. Authors should double check and revise the statement if needed.

(11) pag. 4 authors mention in the text: “[..] has shown that this ULSA vocabulary [..]”. However, the vocabulary was never mentioned in the preceding sentences. That statement is also quite bold: the authors have only demonstrated high-accuracy extraction of synthesis actions from a very narrow domain of synthesis procedures (solid-state, sol-gel, precipitation, and solvo-/hydrothermal).

(12) In section 2.1 authors enumerate a series of actions. How do these compare to those reported in [20, 29]? What actions are more general and unified, and which are more typical of the (ceramic) processes under consideration?

(13) In a few places, the authors choose to ignore some of the complexities underlying the definition of actions. It is the case of "Mixing," where they do not distinguish between the various types of mixing. They appear to oversimplify the task in this context and leave it up to the reader (or user) to address the problem using a rule-based approach. It's unclear why the authors decided to abandon the conversion of these qualification attributes. I believe the choice is related to the use of word embeddings and the similarity of the embeddings between "Powder Mixing" and "Ball Milling," - for example - but this could be incorrect. Authors should explain why they made this choice. Is there any similarity between this and what done in [20] and [29]? Were these choices also made in previous works ? Authors should comment and provide supporting evidences to their decisions.


 

We included formatting in our response (e.g. tables) that does not render well in this dialogue box. We have uploaded this response as a separate file (Response_Letter.pdf) and refer the readers to this.

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters.

Response letter for “ULSA: Unified Language of Synthesis Actions for the Representation of Inorganic
Synthesis Protocols”

Response Letter to Reviewers
In the response letter below, point-by-point responses to the reviewers’ comments are provided.
The reviewers’ comments are noted in italics. Major modifications in the main manuscript in
response to the reviewers’ comments are noted in blue.

Response to Reviewer #1:
Comments to the Author: The manuscript "ULSA: Unified Language of Synthesis Actions for
Representation of Synthesis Protocols" by Z. Wang et al. lays out a strategy for creating
automated synthesis workflows by extracting information from scientific publications. It is a
well-written and timely piece of work that addresses the important issue of how to attain
meaningful information from published literature which is out there but often not in an easily
readable or understandable manner. Using a somewhat limited dataset of ~ 3000 sentences on 4
different types of ceramic synthesis techniques, the authors demonstrate how a mapping model
can be built to identify essential words and create synthesis flowcharts. The work is well
motivated and the results are reasonably explained. I recommend the manuscript for publication
in Digital Discovery but have a few comments below that need addressing:

1. There are no actual examples of synthesis of a particular material that has been included in
this work. This seems pretty deliberate from the authors, but it would be nice to have a couple of
flowcharts (e.g. for TiO2 or LiF) added as additional figures or text with a discussion of how
meaningful/accurate the predicted synthesis pathways are.

Reply:
We greatly appreciate the reviewer’s positive comments on our work. Upon request, we have
added a table with two actual examples of synthesis action extraction from text, including one
solid-state synthesis and one solution-based (hydrothermal) synthesis, in Table 4.

2. The PCA discussion presented in the first three paragraphs of page 15 is useful, but not very
intuitive. Can the authors add more information (perhaps as supplementary information) about
which features constitute the principal components and how these observations follow?

Reply:
We appreciate the comment which is why we had included some indication of what the
components may mean in the last three paragraphs in “Section 3.3.3: Analysis of graphs
clustering”. This includes discussion about which features constitute the principal components.

We explain that the increase of the value of the 1st component (in positive direction) matches
with the increase in the number of synthesis actions involving dissolving and mixing precursors
in solution and mixture purification in the corresponding synthesis paragraph. Similarly, in the
last paragraph we explain that increase of the value of the 2nd component (in positive direction)
correlates with an increase in the complexity of the synthesis procedure (e.g. more re-grinding
and re-shaping steps).

We modified the text in this Section to make it easier to read, so now it reads as:

Figure 4 displays the projection of the 1st and 2nd principal components. Each data point
here corresponds to one synthesis paragraph, i.e. one synthesis flowchart. Different colors
highlight different types of synthesis. A few observations can be made from the plot. First, the
clusters of synthesis procedures are well separated and aggregated according to the synthesis
types. Specifically, the data points corresponding to solid-state synthesis are narrowly clustered
along a line with negative slope while the other synthesis types are spread more widely and the
slope of their linear fit is positive. Second, the clusters of data points for precipitation and
hydrothermal synthesis almost completely overlap and partially overlap with sol-gel synthesis,
while the overlap with solid-state synthesis is negligible.

......

To get further insights, we manually sampled and compared synthesis procedures
corresponding to the data points along each of the fitted lines. The results show that the 1st
principal component correlates with the involvement of solution mixing for precursors in
synthesis procedures. In other words, the larger a coordinate and more positive the data point has
along the 1st principal component, the more steps of dissolving and mixing precursors in
solution as well as purification that data point involves. This agrees well with the fact that
solid-state synthesis mostly operates with powders while hydrothermal and precipitation
procedures are solution-based procedures, and sol-gel syntheses exist in between.

The 2nd principal component corresponds to the level of complexity of the synthesis
procedure. The larger and more positive the data point along the 2nd principal component, the
more synthesis steps are become involved in the synthesis process. Interestingly, all four
synthesis types exhibit simple synthesis procedures (fewer steps) and complex synthesis
procedures (many steps). Nonetheless, solid-state synthesis has the largest deviation along the
2nd principal component compared to hydrothermal and precipitation synthesis since solid-state
procedures can involve multiple heating and re-grinding steps for the sample to obtain the
desired material phase while in solution synthesis this can often be achieved in one or two steps.

Response to Reviewer #2:
Comments to the Author: This work describes the development of a standardized language for
describing synthesis protocols, which can be used by machine-learning based tools to automate
the extraction of synthesis procedures from the scientific literature. The language breaks down
synthesis steps into 8 basic procedures (called “action terms”: Starting, Mixing, Purification,
Heating, Cooling, Shaping, Reaction, and Miscellaneous). Overall, the paper is clearly written
and appears to be within the scope of the journal, and the ULSA language is potentially useful
for the future automation of materials synthesis processes. However, there are several issues that
should be addressed before this work can be considered for publication:

1. The title and abstract somewhat exaggerate the generality of the work. The actual ULSA
language as described in this work is quite narrow in scope, in that it only aims to describe a
relatively limited subset of synthesis procedures for ceramic materials. It does not describe
organic synthesis methods; and even for ceramic materials, relatively important methods such as
vapor deposition or spark plasma sintering do not seem to be considered. It is possible that the
language could be extended in the future to include such methods, but for this paper, the abstract
and title should more clearly reflect the limited scope of the current work.

Reply:
We greatly appreciate the reviewer's positive comments on our work.

We agree that our ULSA can only describe a subset of chemical synthesis and we have updated
our statement to reflect that. We would like to highlight that solid-state, sol-gel, precipitation,
and hydrothermal synthesis are certainly some of the most common methods in inorganic
synthesis , and so most of our current and previous work in inorganic synthesis text-mining 1
focuses on these methods. Methods such as spark plasma sintering are less used to synthesize
compounds (from precursors), but more for densifying and forming samples.

1 Xu R. and Xu Y., Modern Inorganic Synthetic Chemistry 2nd Edition (Elsevier, 2017)

We would like to emphasize that the major scope of this work is to provide researchers with the
seeds (i.e. annotated dataset) for the development of a more robust and clear unified language
and ontology for synthesis procedures. We have added corresponding statements in the
introduction and discussion. Specifically, we have changed the title to "ULSA: Unified Language
of Synthesis Actions for the Representation of Inorganic Synthesis Protocols” to emphasize that
the ULSA described in this work will focus on inorganic synthesis. We have also added a
statement in the abstract and the main text to clarify our scope for this work. These excerpts are
shown below.

…and (b) it can capture important features of synthesis protocols. The present work
focuses on the synthesis protocols for solid-state, sol-gel, and solution-based inorganic synthesis,
but the language could be extended in the future to include other synthesis methods. This work is
an important step towards creating a synthesis ontology and a solid foundation for autonomous
robotic synthesis.


In this work, we discuss a potential approach to the problem of inorganic synthesis
ontology based on creating a unified language of synthesis actions (ULSA). We demonstrate an
application of this approach in describing solid-state, sol-gel, precipitation, and
solvo-/hydrothermal synthesis procedures, which cover the majority of inorganic synthesis
procedures [38, 39]. Specifically, we built and created a dataset of 3,040 synthesis sentences
labeled according to the ULSA schema and trained a neural network-based model that identifies
a sequence of synthesis actions in a paragraph, maps them into the ULSA, and builds a graph of
the synthesis procedure (Figure 1). We applied this model to thousands of synthesis paragraphs
and analysed the resulting synthesis graphs. The obtained results show that our ULSA
vocabulary is comprehensive enough to obtain high-accuracy extraction of synthesis actions as
well as to identify the important features of each of the aforementioned synthesis types.
Additionally, the ULSA as it is encoded in the labeled dataset, can be easily customized and
augmented to account for other inorganic synthesis methods. The dataset and the scripts for
building such a synthesis flowchart are publicly available.

2. The literature review of the use of machine-learning for extracting materials synthesis
procedures from the literature tends to downplay the amount of work which has already been
published – some missing recent publications include npj Comput. Mater. 3, 53 and Chem.
Mater. 29, 9436.

Reply:
We thank the reviewer for pointing us to this paper, we now cite this work along with several
other papers in the Introduction section.

3. The choice of the 8 “action terms” seems a bit limited. In particular, the “shaping” term
(which has the lowest Fleiss kappa score after “miscellaneous”) seems to incorporate a large
number of quite different procedures: grinding into a powder, pressing into pellets, cutting into a
specific shape, etc. Granulation and pelletization seem to be opposite processes in the same way
that the separate “heating” and “cooling” action terms are. Additionally, there does not appear
to be a keyword for the application of pressure to obtain certain structural phases, which is an
important synthesis procedure for ceramics.

Reply:
We thank the reviewer for bringing up valid concerns with the limitations of the choices for these
action terms. It is indeed difficult to find proper definitions to eliminate all ambiguities and to
make the language more universal.

For this study, our choices of the action item categories were mainly based on discussions with
experimentalists, but within the aim to maintain maximum clarity and generalizability. We
wanted to provide researchers that will use this labeled dataset with the opportunity to easily
re-define the terms or split them into more fine-grained subgroups. Thus, some action terms may
include more “sub-actions” (e.g. “compacting”, “pressing” for SHAPING terms) than others.
The assignment of SHAPING terms is largely reserved for non-mixing actions that result in the
manipulation of the shape of the sample, perhaps for characterization. But we agree that in the
future this could be split. Moreover, we would like to point out that we deliberately assigned
“grinding” actions to the MIXING category since this step generally involves mixing of reagents,
and thus contributes to their reaction into the final product.

4. The existence of the “miscellaneous” action term is also somewhat concerning, since it
implies that the language cannot uniquely specify synthesis protocols in such a way that they
could be reproduced without expert human intervention, and does not align with the statement at
the start of Section 2.1 that the action terms should “unambiguously identify a type of synthesis
action”. This should be discussed further, including analysis of how frequently this action term is
required, and the potential importance of “miscellaneous” action terms to successful synthesis
procedures.

Reply:
Following the previous reply, the MISCELLANEOUS action term was included to allow users to
easily re-label the data according to the application or augment the dataset with additional
information. For example, terms like “transfer” and “sealing” can be important for the task of
studying the synthesis conditions and attributes or robotic synthesis applications.
Nonetheless, to reduce the ambiguity, acknowledge that the terminology for this action term is
not very descriptive, and still address actions which may be non-reactive and do not manipulate
the characteristics of the sample, we renamed the MISCELLANEOUS action term to
NON-ALTERING. These changes have been reflected in the manuscript and dataset.

5. It is unclear from the paper how the language tags and records data other than action terms,
such as the heating temperature or environmental conditions. In the example shown in Figure 1,
the heating was performed in Ar, and this is included as an annotation in the flowchart, but it is
unclear how this information is tagged or categorized. The JSON files linked in the Data
Availability Statement seem to indicate that only the action terms are tagged, so it is unclear how
the other relevant quantities are identified, retrieved, stored and categorized.

Reply:
We thank the reviewer for pointing out this. We have added an explanation for how to extract
synthesis actions' attributes such as temperature, time, and environment as a new “Section 2.5.2
Assigning synthesis actions attributes”. We would like to clarify that the focus of this work is on
the extraction of action terms, and that the condition extraction was implemented in this work as
an example of usage of the dataset.

2.5.2 Assigning synthesis actions attributes

Synthesis actions identified as Mixing, Heating and Cooling, as well as the actions
referring to drying processes (identified by the stem “dry” and “evaporate”), were assigned
attributes such as temperature, time, and environment. This was done by analysing dependency
sub-trees associated with each action token [40] and by applying rule-based regular expression
matching [25]. It is important to notice that this approach fails when the action and its attributes
are not mentioned in the same context or the dependency tree is built incorrectly.

6. It is also unclear if the initial condition of the precursors is described in this language, e.g.
their purity, the fineness of the powders, etc.

Reply:
We thank the reviewer for pointing out this. However, this paper only focuses on synthesis
actions, so the initial conditions of precursors such as purity and fineness of powders are out of
the scope of this paper. Other papers, including some from our group, have dealt with the details
of chemistry extraction (and its attributes) from synthesis text.

7. In Section 3.2, it is mentioned that 6 human experts performed the same annotation of the
data set to compare with the machine-learning results. Are these human experts co-authors on
this work? How much input did materials synthesis experts have in the design of the language
itself? This might be important, since the main authors of this work all appear to be from
computational backgrounds, and input from experimental synthesis experts is important to make
sure that all of the potential subtleties are captured.

Reply:
We thank the reviewer for mentioning the importance of input from experimental synthesis
experts. Indeed, all 6 of these human experts (Z.W., K.C., Y.F., Y.Z., B.D. and O.K.) are
co-authors on this work. Two of them (Y.F. and Y.Z.) are experimental synthesis experts, and
mainly contributed to the definition of the synthesis actions schema. The schema has undergone
intensive discussion and has been iterated on many times with these collaborators. Z.W., K.C,
Y.F., H.H., T.H., and B.D. formed the inter-agreement for annotation based on the schema and
prepared the annotated dataset.

8. Finally, there are some minor issues: in the cation for Table 2, “ULSA” is misspelled as
“USLA”; and Refs. [27] and [28] appear to be missing some volume, page number and
publication year information.

Reply:
We thank the reviewer for pointing out these issues. We have corrected these issues.

Response to Reviewer #3:
Comments to the Author: Authors propose a a unified language for synthesis actions (ULSA) to
fill the gap in automated synthesis information extraction from scientific publications. Although
in a different context, this task was extensively covered in two other works already cited in this
manuscript: [29]10.1126/science.abc2986 and [20]10.1038/s41467-020-17266-6. Despite the
different application field compared to previous works, the significant contributions of this work
to the field of action conversion are unclear. The manuscript lacks a comparison with existing
methods (even if they are specialised in a different application field) and more robustness in the
methodology of the analysis. Although some statements are bold and not substantiated by the
presented results, the paper reads well. I support publication after the necessary major revisions.
(1) Unified language of synthesis actions: given the limited application to the ceramic synthesis
space, the word unified is quite bold. The paper presents a language scheme for ceramic
material synthesis actions. I invite the authors to present the work (including the title) in a
manner that better reflects the actual content of the manuscript. Nothing in this paper suggests a
unified language for synthesis actions unless the authors decide to extend the paper to broader
applications (organic, catalysis, biomasses, etc.) in their revision.

Reply:
We thank the reviewer for this comment and agree that our ULSA can only describe the limited
subset of synthesis procedures for inorganic synthesis. To comply with the reviewer’s comments,
we updated the title to "ULSA: Unified Language of Synthesis Actions for the Representation of
Inorganic Synthesis Protocols". As we mentioned in our reply to the other reviewers, the major
scope of this work is to provide researchers with the seeds (i.e. annotated dataset) for
development of a more robust and clear unified language and ontology for synthesis procedures.
The analysis that we have performed (Section 3.3.3) also exemplifies how this annotation
language can be modified to a specific task. Moreover, we anticipate and encourage researchers
to expand and modify the ULSA to reach more applications as they see fit. To clarify this, we
wrote the corresponding statements in the introduction and discussion:

…and (b) it can capture important features of synthesis protocols. The present work
focuses on the synthesis protocols for solid-state, sol-gel, and solution-based inorganic synthesis,
but the language could be extended in the future to include other synthesis methods. This work is
an important step towards creating a synthesis ontology and a solid foundation for autonomous
robotic synthesis.


In this work, we discuss a potential approach to the problem of inorganic synthesis
ontology based on creating a unified language of synthesis actions (ULSA). We demonstrate an
application of this approach in describing solid-state, sol-gel, precipitation, and
solvo-/hydrothermal synthesis procedures, which cover the majority of inorganic synthesis
procedures [38, 39]. Specifically, we built and created a dataset of 3,040 synthesis sentences
labeled according to the ULSA schema and trained a neural network-based model that identifies
a sequence of synthesis actions in a paragraph, maps them into the ULSA, and builds a graph of
the synthesis procedure (Figure 1). We applied this model to thousands of synthesis paragraphs
and analysed the resulting synthesis graphs. The obtained results show that our ULSA
vocabulary is comprehensive enough to obtain high-accuracy extraction of synthesis actions as
well as to identify the important features of each of the aforementioned synthesis types.
Additionally, the ULSA as it is encoded in the labeled dataset, can be easily customized and
augmented to account for other inorganic synthesis methods. The dataset and the scripts for
building such a synthesis flowchart are publicly available.

(8) The authors acknowledge that the actions in section 2 cover the majority of procedures for
ceramic synthesis, but in their current form (which is easily extensible) they do not cover some
relevant chemistry protocols such as electrochemistry. The reviewer believes that these will be
useful in robotic synthesis of materials.

Reply:
We agree that other chemistry protocols (such as electrochemistry, which is an important
procedure for tasks such as electrosynthesis of organic compounds and electrodeposition of
metals) are important for various synthesis tasks. However, it is not a common approach for the
space of inorganic synthesis that our study focuses on. Therefore, we did not spend much effort
on incorporating it into the schema. We have updated the relevant statements in the title, abstract,
and main text to clarify our scope for this work.

(11) pag. 4 authors mention in the text: “[..] has shown that this ULSA vocabulary [..]”.
However, the vocabulary was never mentioned in the preceding sentences. That statement is also
quite bold: the authors have only demonstrated high-accuracy extraction of synthesis actions
from a very narrow domain of synthesis procedures (solid-state, sol-gel, precipitation, and
solvo-/hydrothermal).

Reply:
We thank the reviewer for pointing out this. As noted above we have revised this sentence.

(2) The introduction should be much more specific in order to convey what the goal of previous
approaches was and how their information extraction procedures worked. The introduction
currently provides a cursory description of related work, but it is unclear from the text what
previous studies have accomplished. The introduction would be greatly enhanced if more time
was spent describing previous contributions and work in organic chemistry; specifically,
describing the methods that these approaches used rather than just the application/context for
them.

Reply:
Upon the reviewer’s request, we have added in the introduction the following discussion of all
the similar work done in the field:

…There have only been a few attempts to extract information about chemical synthesis and
reactions and compile them into the flowchart of synthesis actions. Hawizy et al. [20] were early
developers for such extraction, using a combination of rule-based regular expressions (regex)
[29] and syntax tree parsing to identify and classify action phrases in their tool, ChemicalTagger.
This approach shows very good performance on organic synthesis procedures. Vaucher et al. [21]
used a combination of rule-based approaches and machine learning models trained on over 2
million procedural sentences to extract synthesis actions from the organic chemistry patents texts
and map them into well-defined language schemas. We found this work to be one of the most
robust and accurate in describing organic synthesis procedures. Mehr et al. [22] developed a
semi-automated workflow that uses NLP-based approaches to translate human-written text into
an internal Chemical Description Language (so-called XDL) and then map it into robotic
operations. To the best of our knowledge, this is the only work that applied the developed
synthesis ontology to robotic synthesis for organic molecules. Mysore et al. [23] paved the way
for synthesis action graph extraction from the inorganic synthesis text. For this, they applied
several neural network-based models and used dependency tree parsing to combine the extracted
information into synthesis graphs. Similarly, Kuniyoshi et al. used bi-LSTM combined with
BERT word embeddings to construct synthesis graphs for solid-state batteries fabrication [24],
which showed excellent results on the extraction of operations using the science
literature-specific SciBERT pretrained language model.

(3) Comparison with state of the art: the main goal of this paper is to present a new language
scheme for extracting synthesis actions from published synthesis procedures for ceramic
materials. Despite having been trained and developed in different fields, the authors have to
show a comparison between their methodology and that published in: [29]
10.1126/science.abc2986 and [20] 10.1038/s41467-020-17266-6. Both works make code
available (or services for even easier use). Even without further customising refs [29] and [20]
to the space of ceramic materials , how do these approaches perform when applied to the same
ceramic material synthesis compared to the proposed method ? What are the main advantages of
this approach over the other two? A new section with the comparison of the existing
methodology on the ceramic dataset is crucial to provide evidences of the benefits of this
approach.

Reply:
At the suggestion of the reviewer we attempted such a comparison with the work of Vaucher et
al. and of Mehr et al., but the results indicate that it may not be fully appropriate to compare
them given the very different application fields and the different breadth of synthesis actions in
inorganic versus organic synthesis. The work of Vaucher et al. and Mehr et al. deals with the
extraction of organic synthesis actions. As shown below, this makes the comparison non-trivial
and potentially not meaningful.

Vaucher et al. created a synthesis action extraction model, named Rxn4Chemistry, for
organic molecule synthesis. In order to apply this model to our annotated dataset for performance
comparison, we first normalized the 28 possible synthesis action tags from the Rxn4Chemistry
model to our 8 action tags. This mapping is shown in Table 1 below. Because Rxn4Chemistry’s
SETTEMPERATURE action could map to either our HEATING or COOLING actions, we
simply mapped our HEATING and COOLING actions to SETTEMPERATURE.
NON-ALTERING actions in our dataset are replaced by blanks, “”. Finally, to be consistent with
Rxn4Chemistry’s extraction format, any span of actions tagged in our dataset (e.g. “ball”, “-”,
“milled” would be tagged as “MIXING”, “MIXING”, “MIXING”) were collapsed into a single
action. We then ran inference over all 3,040 synthesis sentences in our dataset. 787 of these
sentences produced errors from the model’s API, so we only consider inference from 2,253
sentences. The results from this comparison are shown in Table 2. The low (<0.25) precision,
recall, and F1 scores indicate that the Rxn4Chemistry model does not transfer well to the
extraction of these inorganic synthesis actions, motivating our development of the new ULSA
schema. Through manual inspection of some examples, we found that HEATING actions were
often extracted as STIR actions by Rxn4Chemistry, and that specific heating terms like
“calcining” and “sintered” were not extracted by this model. Additionally, shaping-related
actions were missing from the Rxn4Chemistry schema, so these actions were also missed.
Shaping is an important and distinct aspect of inorganic synthesis procedures, so this also
motivates the need for a specific annotation schema for inorganic synthesis action extraction.

Table 1. Normalized schema for action tags to compare ULSA and Rxn4Chemistry

ULSA Tag Rxn4Chemistry Tags
`NoAction', `CollectLayer', `Yield',
`FollowOtherProcedure', `OtherLanguage',
`InvalidAction'
STARTING
MIXING `Add', `MakeSolution', `PH', `Reflux',
`Sonicate', `Stir', `Triturate'
PURIFICATION `Concentrate', `Degas', `DrySolid',
`DrySolution', `Extract', `Filter', `Partition',
`PhaseSeparation', `Purify', `Recrystallize',
`Wash'
SetTemperature `Microwave', `Quench', `SetTemperature'
SHAPING
REACTION `Wait'

Table 2. Performance of existing Rxn4Chemistry model on annotated ULSA dataset
Precision Recall F1
0.24 0.20 0.22

The code for the XDL action extraction published by Mehr et al. was not easily
accessible for high-throughput inference, but we did inspect its performance for a few examples
using the provided UI, ChemIDE. The results from these indicate similar confusion between
stirring and heating actions. The ChemIDE interface does allow for users to modify the extracted
output, however it is not intuitive how to add steps which were not originally extracted, and
some actions (e.g. SHAPING) could not be captured using their schema. Our proposed synthesis
action schema addresses both these problems by introducing inroganic synthesis-specific
annotations for HEATING actions as well as incorporating a SHAPING action. Also, we would
like to mention that the performance of synthesis action extraction through ChemIDE is largely
limited by the rule-based pattern matching (F1~0.75) before manual revision, whereas our
bi-LSTM model can achieve high mapping accuracy (F1> 90%) without corrections.
The previous accomplishments from these two works in the field have been described in
more detail in the introduction. We have also included a brief statement regarding the
incompatibility of these models in this material space in the introduction, thus motivating our
work:

“...Although development of synthesis action extraction from the text in organic chemistry has
significantly accelerated and some groups have developed specific ontologies [21, 22] for such
vocabulary, we found that the existing models do not transfer well to the inorganic synthesis
space due to the disparate natures of these two approaches. For example, we found that
vocabulary unique to inorganic synthesis like sintering and calcining would be frequently
misclassified. Additionally, existing models with developed ontologies do not include tags for
important inorganic synthesis tasks like shaping of samples into pellets.”

(4) It would be nice to see some examples of the procedures, what is converted by the 2
baselines, and what is converted by the bi-LSTM model to demonstrate the value (aside from the
quantitative accuracy). I understand how difficult it will be to find concise examples, but most
readers are unlikely to open and inspect the files made available on GitHub.

Reply:
We added these examples in Table 4 upon the Reviewer’s request. The table includes one solid-state
synthesis paragraph and one hydrothermal (solution-based) synthesis paragraph.

(5) It is critical to provide a measure of the annotated samples' variability/difference. This is
significant because if the action sequences are randomly split, there will be a lot of overlap
between the training, testing and validation action sequences. The evaluation should be carried
out in such a way that validation samples are drawn from paragraphs linguistically different
from the one included in the training and test set.

Reply:
From our understanding, the reviewer is suggesting that data evaluated by the model should be
substantially different from the data that it is trained on, which is important for language
modeling and machine learning in general. We agree with the reviewer on this point. We believe
our approach to choose 535 paragraphs for annotation is done in accordance with this statement.
First, we note that holding out a subset based on a quantitative linguistic variability metric in this
setting could be problematic because the language used in these sentences does not vary much
due to the standards imposed on scientific writing of experimental procedures. A similar
argument is made in Vaucher et al. In this case, much of the variability could stem from the 2
actions used in a given sentence (e.g. heating samples, mixing precursors) or from the method
employed (e.g. solid powders ground together in solid-state synthesis, aqueous mixing of
precursors in hydrothermal synthesis, etc.). Thus, reserving a set of these sentences based on this
variability could lead to some actions or methods being disproportionately represented in the
training data and thus mislabeled or ignored in validation or testing.

Thus, we tried to maximize variability between sentences in each dataset through manual vetting
of data sources. We sampled multiple journals and publishers, including Elsevier, The
Electrochemical Society, The Royal Society of Chemistry, etc., to make sure that the sentences
are representative of different styles, which are usually defined by journal requirements. We also
made sure not to include paragraphs from the same authors or research groups to maximize the
variability in individual/group writing styles. Finally, during the annotation process, simple
paragraphs were manually rejected and those that were more complex and rich in language were
annotated. After these stipulations, we randomly split the 3,040 sentences from these paragraphs
into training, validation, and test sets such that there would not be any duplicates. We believe the
provided annotated dataset should be able to cover the most commonly used syntax patterns to
describe synthesis procedures in the literature and that the Bi-LSTM trained on this dataset can
learn the syntax patterns of synthesis sentences and predict the role of synthesis actions by their
context with good generalizability.

(6) In respect to (5), a more detailed description and analysis of the data would help the reader
understand the conversion task. Beyond the action distribution already shown, some statistics on
data distribution, sentence lengths, average number of actions in a sentence, and so on would be
useful.

Reply:
We refer the reviewer to Figure 2, which includes quantitative statistics for the annotated dataset,
including the distributions of (a) sentence lengths, (b) tokens per sentence, and (c) action terms
per sentence.

2 Vaucher, A. et al. Automated extraction of chemical synthesis actions from experimental procedures.
Nat. Commun. 11, 3601 (2020)

(7) There are numerous neural network / translation models, and an explanation of why the
authors chose the bi-LSTM would be helpful. Why did the author choose this architecture over
other options? A brief justification/comparison would make the choice appear more rational.

Reply:
We progressively increased complexity when considering models for mapping synthesis
paragraphs onto the ULSA vocabulary. The baseline models showed far from good performance,
hence, we tried word embeddings combined with bi-LSTM, and this appeared to be enough to
solve the problem.

Indeed, the variety of the NLP models for this task is enormous, and state-of-the-art approaches
will probably show the best performance. However, we thought that the application of BERT or
GPT models for this task would be overkill. Additionally, re-training and fine-tuning these
models is time and resource consuming. Hence, we went forward with the bi-LSTM model.
We added a brief explanation to the text, it now reads:

“These results moved us toward considering a recurrent neural network model for
mapping paragraphs into ULSA. It is generally accepted that recurrent neural networks (RNNs),
and specifically bi-LSTMs, can effectively process sequential data and keep track of past events
[44]. Indeed, bi-LSTM is simple enough and does not require exhaustive training and
fine-tuning, as is common for BERT [45] and GPT [46, 47, 48] models.”

(9) One concern I have is with the evaluation. The performance per synthesis sentence is
reported in Table 3. The authors present F1, precision, and recall. After using an LSTM baseline,
and because this is a text-to-unified-language conversion, authors should provide more effective
and convincing metrics about the quality of the conversion process. The Levenshtein metrics and
the BLEU are two key metrics used in translation tasks, and they should be provided also for
performance comparison with the state of the art.

Reply:
This is a fair point and we thank the Reviewer for pointing us to it.
The calculated BLEU score for the baseline models are 0.924 and 0.933, and for the bi-LSTM
model it is 0.979. The normalized Levenshtein metric for baseline models are 0.163 and 0.147,
and for the bi-LSTM model it is 0.051. Even though both BLEU score and normalized
Levenshtein metric show the bi-LSTM model outperforms the baseline models, we would like to
point out that neither of those scores is really appropriate for our task. First, the BLEU score is
extremely skewed: normally most parts of the synthesis sentence are in the non-relevant category
(i.e. not synthesis actions). A system tuned to maximize BLEU can appear to perform well by
simply deeming the whole sentence non-relevant to all queries. For example, with ''the solution
was heated and stirred …'' as input, the output of our model will be ['''', '''', '''', HEATING, "",
MIXING...]. Since most of the tokens are “''''” and are identical in the predicted results, the
BLEU score will always be very high and not provide much information. The Levenshtein
distance characterizes the “distance” needed to edit one sequence into another, often for character
edits in a single token. This metric is similarly misleading because of the imbalance between
synthesis actions and non-synthesis actions, potentially leading to overly confident model
performance.

For these reasons, we were primarily interested in quantifying the accuracy of the mapping per
sentence because having all synthesis actions identified correctly is more important than having
each entity recognized correctly. In other words, if at least one synthesis action in the sentence or
paragraph is identified incorrectly, the whole synthesis procedure will be wrong. Hence we chose
the binary accuracy metric since it is more strict than the BLEU score or Levenstein distance.
More detailed notes:

(10) On pag. 3, authors claim that, to the best of their knowledge, there is only one corpus of
annotated materials synthesis protocols. If I am not mistaken also ref. [20] provides an
annotated corpus of organic synthesis on request. Authors should double check and revise the
statement if needed.

Reply:
We thank the reviewer for pointing this out. We have revised the statement in the introduction
and also included information on a newer annotated corpus of synthesis protocols.
“…Even with such data availability, to the best of our knowledge, there have been only a few
attempts to create a publicly available annotated corpus containing materials synthesis protocols.
The dataset created by Mysore et al. [13] contains 230 labeled synthesis paragraphs with labels
assigned to material entities, synthesis actions, and other synthesis attributes for inorganic
synthesis, and is freely available to users. The dataset used by Vaucher et al. [21] was obtained
by augmenting the existing Pistachio dataset [34] of organic synthesis procedures, and is
available upon request. Kuniyoshi et al. [25] annotated an in-house dataset of inorganic materials
synthesis entities that is publicly available.”

(12) In section 2.1 authors enumerate a series of actions. How do these compare to those
reported in [20, 29]? What actions are more general and unified, and which are more typical of
the (ceramic) processes under consideration?

Reply:
The authors from Vaucher et al. classify actions into 28 types for single-reaction step organic
chemistry synthesis in a reactor. At the time of publication, the authors from Mehr et al.
implemented 44 high-level synthesis steps for complex organic molecule synthesis. Each of these
sets provides specific commands for organic molecule synthesis. Those from Mehr et al.
represent specifications of otherwise abstracted language from the text, so that the action can be
performed fluidly on a particular robotic synthesis platform (for instance, specific “Switch”
actions for argon or vacuum schlenk lines are captured from text extraction), but could, in theory,
be transferable to other similar robotic setups. Those in Vaucher et al. are somewhat more
abstracted, but still capture specific actions that occur commonly in organic chemistry synthesis.
Action types corresponding to “mixing” or “stirring” appear in both of these datasets and ours.
“Heating” steps are treated similarly in our dataset and that of Vaucher et al., however those steps
in Mehr et al. are separated into different actions depending on the vessel whose temperature is
being changed. “Shaping” actions are particular to inorganic synthesis and do not appear to be
well-covered by the datasets in Vaucher et al. or Mehr et al.

(13) In a few places, the authors choose to ignore some of the complexities underlying the
definition of actions. It is the case of "Mixing," where they do not distinguish between the various
types of mixing. They appear to oversimplify the task in this context and leave it up to the reader
(or user) to address the problem using a rule-based approach. It's unclear why the authors
decided to abandon the conversion of these qualification attributes. I believe the choice is related
to the use of word embeddings and the similarity of the embeddings between "Powder Mixing"
and "Ball Milling," - for example - but this could be incorrect. Authors should explain why they
made this choice. Is there any similarity between this and what done in [20] and [29]? Were
these choices also made in previous works ? Authors should comment and provide supporting
evidences to their decisions.

Reply:
We thank the reviewer for pointing out the complexities underlying the definition of actions. This
is the essence of our motivation for developing the Unified Language for Synthesis Actions. We
would like to provide higher-level abstractions of the typical action steps traversed in inorganic
synthesis in this work. We do not see this work as abandoning these specificities. Instead, we
showed that a language model can explicitly capture the inherently hierarchical nature from the
context i.e., extract synthesis actions and classify them into the corresponding action term based
on the ULSA vocabulary with high accuracy. In Section 2.5.1, we take MIXING terms as one
example to show how users can reassign terms. The token itself can provide the flexibility to
reassign terms. In our case, we reassigned MIXING into DISPERSIONMIXING,
SOLUTIONMIXING, or BALLMILLING, which is similar to what has been done in Vaucher et
al., for example, for drying steps as in "DrySolid" and "DrySolution". Hawizy et al. distinguish
between various kinds of mixing, such as “add-phrase”, “dissolve-phrase”, and “stir-phrase,”
which can also be done by our ULSA schema with post-processing, similar to what was done in
Section 2.5.1. Mehr et al. similarly include “Add”, “Dissolve”, and “Stir” terms, as well as
additional “Add___” terms for specific components of their robotic synthesis setup. We do not
believe that there is a suitable universal classification for low-level synthesis actions; some users
might want to categorize synthesis actions by the environment similar to us, while others might
want to categorize synthesis sub-actions by the equipment, such as ball-milling, as users are
designing autonomous synthesis platforms. In this work, we proposed the ULSA schema, and the
choice of action terms was designed to provide maximum flexibility to future users and allow
them to adjust the schema according to their preferences and tasks.




Round 2

Revised manuscript submitted on 25 Mar 2022
 

18-Apr-2022

Dear Dr Ceder:

Manuscript ID: DD-ART-11-2021-000034.R1
TITLE: ULSA: Unified Language of Synthesis Actions for the Representation of Inorganic Synthesis Protocols

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

Thank you for publishing with Digital Discovery, a journal published by the Royal Society of Chemistry – connecting the world of science to advance chemical knowledge for a better future.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry


 
Reviewer 1

Reviewer comments have been addressed satisfactorily.

Reviewer 2

In the revised version of this work, the authors have addressed the issues raised in the previous round of review. They have modified the title and abstract to more accurately reflect the scope of the work, they have expanded the discussion of the previous work performed in the field, and they have clarified several technical details about the language itself including refining the definitions of some of the action terms. Therefore, the manuscript can now be published in its present form, although there are some minor issues that the authors may wish to address prior to publication:

1. In the 3rd paragraph of the introduction, the authors state that “organic synthesis is more deterministic and hence more common in materials science and biochemical domains”. I’m not sure if being deterministic makes it more common – perhaps the authors mean that materials informatics analyses of organic synthesis procedures are more common because they are more deterministic?
2. Refs. [46] and [47] appear to be missing some publication information.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license