From the journal Digital Discovery Peer review history

Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances

Round 1

Manuscript submitted on 21 Jul 2023
 

11-Sep-2023

Dear Dr Schwalbe-Koda:

Manuscript ID: DD-ART-07-2023-000134
TITLE: Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

Authors created the new method so called synthesis structure maps in zeolite. This work is very much fascinating and I have never seen similar concepts and works. This is particualrly helpful way to visualzie the synthesis of zeolite. I strongly recommend this work for publication. There are only a few issues that should be addressed

Authors use too many acronym such SBS MWF etc. Authors should create table or well define what they are for non expert in zeolite.

Have you consider the overfitting in XGBoost? Such fine tune XGBoost can be overfitted.
Please check it.

In same fashion, confusion_matrix such as precsioin and F value should be calculated.

Reviewer 2

This study employed a distance metric and machine learning to map inorganic synthesis in zeolites. The manuscript is clearly written, and the results are presented concisely. However, there are a few aspects that require further attention.
Central to this research is deploying the novel distance metric, AMD, for zeolite structures, which the authors' group previously developed. Although AMD is faster than specific graph-based methods and eliminates the need for structural labels, its effectiveness in analyzing zeolite structures compared to other distance metrics remains to be validated. Conducting control experiments contrasting AMD with other established metrics would be beneficial.

While a significant portion of the data presented by the authors corroborates the experimental results, some counterexamples should be discussed. Addressing these discrepancies and comparing them using methodologies other than AMD is essential.

Major Comments
1. The authors describe AMD in various sections (e.g., lines 60-61 on page 3, lines 6-7 on page 4, and lines 352-353 on page 17) as "a mathematically strong representation" or a "strong distance metric." What constitutes a metric as (mathematically) robust? Clarification is needed.
2. In lines 85-87 on page 5, the authors claimed that "local structural fingerprints such as the Smooth Overlap of Atomic Positions (SOAP) may be unable to distinguish between certain atomic environments." This statement underscores their preference for AMD over SOAP. It would be beneficial to showcase instances where SOAP falls short in discerning vital structural variances in zeolites, which AMD can discern.
3. In lines 97-99 on page 5, the authors observe a weak correlation between AMDs, density differences, and SOAP distances but almost no correlation with previously known graph-based distances. Why? If such divergence exists between graph-based distances and AMD, could AMD overlook critical graph and structural features of zeolites? Which one is better? More mechanistic insights on AMD and other metrics should be addressed.
4. In lines 99-103 on page 5, the authors noted that "zeolites sharing the lowest distances according to our metric have often been synthesized together (see Table S1 in the Supporting Information). Recovering pairs of structurally similar frameworks such as ITH-ITR, SBS-SBT, or MWF-PAU at low distance already suggests that the similarity metric is qualitatively sound." The authors should show which pairs are experimentally supported and which are not with appropriate citations.
5. The authors do not discuss some synthetically and structurally similar pairs but distant in the map. For example, SOD and LTA are synthetically and structurally similar but far apart. Similar examples are OFF/LTL, AFI/GME, UTL/PCR, and UTL/OKO. It would be necessary to address such anomalies, compare them with positive results, and test them against methods like SOAP and graph-based distances.
6. Regarding ll125-127 on page 6, it would be hard to correlate synthesis of MEI with PST-32 and PST-2 as the use of sodium in synthesizing high silica aluminosilicate zeolites is not limited to PST-32 and PST-2 but is very common for high silica zeolites.
7. Concerning lines 153-157 on page 8, it is critical to discern whether the observed behavior is intrinsic to AMD or zeolite synthesis. Comparative analyses with other metrics would be insightful.
8. Does AMD prioritize short-range or long-range ordering compared to other methods? Comparing with other similarity metrics could elucidate this aspect.

Minor Comments
9. In l.113 on page 6 and Figure S3, the authors claimed that one cluster with six-membered rings contains GIU. MEP, DOH, and MTN should instead be characterized by five-membered rings, considering that the six-membered ring is one of the most popular rings in zeolites, and these structures have a lot of five-membered rings that are not observed in other cluster members. Why does AMD cluster GIU and MEP so strongly? Why are CAN and APC out of the GIU cluster?
10. Regarding l.122 on page 6, what building patterns could help inform the MEI, SBS, SBT, SAO, and SBE synthesis?
11. The manuscript utilizes the terms organic templates and OSDA interchangeably. A consistent term should be adopted if there's no discernment between them in the study.
12. In lines 154-156 on page 8, Co, Mg, and Zn show inferior predictive power than others. Could the dataset's distribution influence this? Providing the number of entries per inorganic condition would offer clarity.

Reviewer 3

I've been asked to review the data/code aspects of this manuscript, so I will refrain from providing a critique of the technical/scientific content of the work.

Based on my inspection of the provided data and code, I would rate the clarity, accessibility, and reproducibility of this work as exemplary. The giftront/.../Zeolites-AMD repository includes all relevant data and code and is quite easy to follow.

My only note is that the provided "average-minimum-distance" github link leads to a 404 page.


 

Dear Reviewers,

Thank you very much for your helpful feedback and questions. Based on your recommendations, we have carried out a major improvement in the manuscript. Changes added to the discussion include:

1. At the suggestion of Reviewer 2, new results now compare the use of SOAP as representation and its distance space against the results obtained with the AMD. Despite being very different representations, SOAP vectors are also able to capture similarities in inorganic synthesis conditions, both for the unsupervised and the supervised learning methods. This strengthens the major point of the manuscript, which argues that inorganic synthesis similarity can be captured through data-driven structural similarities, thus beyond composite building units (CBUs) and other heuristics.
2. We discuss additional cases of zeolite pairs proposed by Reviewer 2. While rationalizing every pair of zeolite is not desired in this manuscript, we offer more examples and counterexamples to illustrate the synthesis-structure maps and the use of synthesis classifiers. We also further discuss how the minimum spanning tree offers a visual guide that quantifies valuable intuition, but is not an absolute guide. Because the graph is a discrete entity, care should be taken when pursuing counterexamples based only on the connectivity. As later sections of the manuscript quantify similarity beyond the qualitative nearest-neighbor analysis in the tree, this preliminary, qualitative discussion is intentional.
3. We added the data on classifier figures or merit, including the confusion matrices requested by Reviewer 1.
4. The data, previously hosted on GitFront for peer-review, is now available for the public at a GitHub repository, with persistent storage at Zenodo. We have fixed a typo pointed out by Reviewer 3.

All changes are available in the manuscript with tracked changes attached in this revised manuscript. Below is a point-by-point discussion to the questions raised by all Reviewers.

---

Reviewer 1: Authors created the new method so called synthesis structure maps in zeolite. This work is very much fascinating and I have never seen similar concepts and works. This is particualrly helpful way to visualzie the synthesis of zeolite. I strongly recommend this work for publication. There are only a few issues that should be addressed

Authors: Thank you for the positive evaluation of the work. In the revised manuscript, we address the issues raised by the Reviewer.

Reviewer 1: Authors use too many acronym such SBS MWF etc. Authors should create table or well define what they are for non expert in zeolite.

Authors: Most three-letter codes shown in bold typeface are references to zeolite frameworks, as explained in the Methods section. This notation, standardized by the International Zeolite Association (IZA), is used universally in the zeolite community and is essential to compare the structures according to their experimental connectivity. To clarify this point, we added a note to the first appearance of an IZA code to help the reader. The notation is explained in the Methods section.

Reviewer 1: Have you consider the overfitting in XGBoost? Such fine tune XGBoost can be overfitted. Please check it.

Authors: We carefully evaluated the possibility of overfitting in the XGBoost models, but found no evidence of poor generalization performance despite the hyperparameter optimization. All models verified in the manuscript are trained with held-out validation (20%) and test (20%) sets, and evaluated against the same splits. We selected the best set of hyperparameters by taking the ones that minimize the validation error, thus without considering the training error. The final results reported in the manuscript correspond to the test results from these models, as described in the Methods. While the training error is always smaller than the validation and test errors, as should be expected, the performance in held-out data is still compelling (see Fig. 4 and Figs. S8-S11). Therefore, we do not see poor generalization performance due to overfitting in any of our results.
In the revised manuscript, we provide a table of the validation and test figures of merit for the set of hyperparameters that were used in the manuscript (Table S5-S7). An additional comment on the issue of overfitting was added to the methods.

Reviewer 1: In same fashion, confusion_matrix such as precsioin and F value should be calculated.

Authors: We had not included the confusion matrix for all models because the precision, recall, F1 score, and other figures of merit are already calculated separately. These results are shown in Fig.4, and Figs. S9-S15 of the current version of the manuscript and corresponding supplementary material, and discussed at length in the manuscript. As an example for the reader and at the request of the Reviewer, however, we now included the confusion matrix for the XGBoost models trained with AMD distances between zeolites (Fig. S16).

---

Reviewer 2: This study employed a distance metric and machine learning to map inorganic synthesis in zeolites. The manuscript is clearly written, and the results are presented concisely. However, there are a few aspects that require further attention.

Authors: Thank you for the positive evaluation of the work and detailed comments that help strengthen the work. In the revised manuscript, we address the issues raised by the reviewer.

Reviewer 2: Central to this research is deploying the novel distance metric, AMD, for zeolite structures, which the authors' group previously developed. Although AMD is faster than specific graph-based methods and eliminates the need for structural labels, its effectiveness in analyzing zeolite structures compared to other distance metrics remains to be validated. Conducting control experiments contrasting AMD with other established metrics would be beneficial.

While a significant portion of the data presented by the authors corroborates the experimental results, some counterexamples should be discussed. Addressing these discrepancies and comparing them using methodologies other than AMD is essential.

Authors: We partially agree with the Reviewer on this point. The major point of this work is to predict inorganic synthesis conditions for zeolites using a structural similarity method. This prediction could have been performed using hand-crafted vectors, such as in the work of Muraoka et al. [1], or by directly inspecting representation vectors, such as in the work by Helfrecht et al. [2]. However, our major goal is not to demonstrate that AMD is superior to other invariants when comparing zeolite structures, which is why this claim was not made in the manuscript. Rather, we showed that structural similarity between zeolites correlates to similarity in inorganic synthesis conditions. This generalizes the intuition of building patterns towards a data-driven approach without manual annotation or expensive graph-based calculations.
We highlight that the effectiveness of invariants such as AMD (as well as PDD, described in the Methods) has been demonstrated rigorously in a series of papers from some of us [3, 4]. Among the results, the PDD invariant, from which the AMD can be derived, is generically complete, exhibits no false negatives, and is Lipschitz-continuous under small perturbations. Additionally, the computational performance of AMD enabled a complete exploration of the Cambridge Structural Database within extremely limited computational resources [3], and has been successful in other classes of porous materials [5]. Altogether, this justifies the use of AMD for producing synthesis-structure maps in zeolites. On the other hand, more analysis is needed to proof whether SOAP is always continuous under perturbations, which affects the space induced by this representation.
Nevertheless, in the answers below, we describe our additional experiments to support this point. At the request of the reviewer, we compared the performance of inorganic synthesis predictions using the average SOAP vector of the structure as an invariant. The results show that distances between structural SOAP vectors are also predictive of inorganic synthesis conditions, although computing the SOAP representation can be at least one order of magnitude slower than distances between AMD vectors. All the main results of the paper are thus strengthened by this finding. With it, we show that distances between structural invariants can be used to predict inorganic synthesis conditions in zeolites beyond a single representation, and despite the scarcity of literature data used to train the models. The updated manuscript and this response letter describe our findings in detail, along with additional data in the Supporting Information (Figs. S8, S10, S12, S15, and Tables S6 and S7).

[1] - https://doi.org/10.1038/s41467-019-12394-0
[2] - https://doi.org/10.1039/D2DD00056C
[3] - http://dx.doi.org/10.46793/match.87-3.529w
[4] - https://proceedings.neurips.cc/paper_files/paper/2022/file/9c256fa1965318b7fcb9ed104c265540-Paper-Conference.pdf
[5] - https://doi.org/10.1021/jacs.2c02653

Major Comments
Reviewer 2: 1. The authors describe AMD in various sections (e.g., lines 60-61 on page 3, lines 6-7 on page 4, and lines 352-353 on page 17) as "a mathematically strong representation" or a "strong distance metric." What constitutes a metric as (mathematically) robust? Clarification is needed.

Authors: In mathematics, the term ‘metric’ is given to a function d satisfying these axioms:
1. d(x, x) = 0 and d(x, y) > 0 if x and y are distinct,
2. d(x, y) = d(y, x),
3. d(x, z) <= d(x, y) + d(y, z).

These properties constitute an abstraction of the common notion of ‘distance’, and their consequences are well-understood. Non-metrics such as the ‘distance’ given by comparing X-ray diffraction patterns or radial distribution functions are often obtained heuristically. This leads to scenarios which conflict with fundamental ideas about how distances should behave - e.g., there are shorter paths between two points than the straight line between them - hence unlike AMD they do not induce a ‘crystal space’ where each crystal is a point and there is a sensible notion of distance between them. These definitions are explored in a related work by some of us [1]. In the manuscript, we removed this comment to avoid obscuring the structural similarity hypothesis, but expanded on these ideas in the Methods section for the reader.

[1] - http://dx.doi.org/10.46793/match.87-3.529w

Reviewer 2: 2. In lines 85-87 on page 5, the authors claimed that "local structural fingerprints such as the Smooth Overlap of Atomic Positions (SOAP) may be unable to distinguish between certain atomic environments." This statement underscores their preference for AMD over SOAP. It would be beneficial to showcase instances where SOAP falls short in discerning vital structural variances in zeolites, which AMD can discern.

Authors: Recent works by Podznyakov, Ceriotti et al. [1, 2] have demonstrated several counterexamples to the injectivity of SOAP descriptors. On the other hand, the PDD, from which the AMD can be derived, was demonstrated by some of us [3] to be a complete and continuous invariant for all periodic crystals in general position. The four-connected neighborhoods seen in SiO2 networks are similar to the CH4 examples demonstrated in [1, 2]. While this does not prove that the SOAP vectors necessarily do not distinguish between atomic environments in zeolites (nor that degenerate environments exist within the dataset under analysis), it motivates the use of different representations for representing distances between crystal structures. Demonstrating the shortcomings of SOAP and other descriptors is beyond the scope of this manuscript, as finding such counterexamples and rationalizing them can lead to standalone investigations, as demonstrated by [1, 2].
To better frame this point, we have rewritten the sentence to focus on structure-synthesis relationships rather than the properties of the descriptor. We also perform the same analysis made in the paper using SOAP instead of the AMD, achieving comparable results. The discussion on the degeneracies of SOAP was moved to the Methods section, where we expand on the details of the SOAP analysis.

[1] - https://doi.org/10.1103/PhysRevLett.125.166001
[2] - https://arxiv.org/abs/2109.11440
[3] - https://proceedings.neurips.cc/paper_files/paper/2022/hash/9c256fa1965318b7fcb9ed104c265540-Abstract-Conference.html

Reviewer 2: 3. In lines 97-99 on page 5, the authors observe a weak correlation between AMDs, density differences, and SOAP distances but almost no correlation with previously known graph-based distances. Why? If such divergence exists between graph-based distances and AMD, could AMD overlook critical graph and structural features of zeolites? Which one is better? More mechanistic insights on AMD and other metrics should be addressed.

Authors: As graph distances do not consider the structural diversity of zeolites, but only their connectivity, distortions in the atomic structure without breaking Si-O covalent bonds can lead to equivalent graphs. One of us has discussed this issue in extensive detail in a previous publication [1]. The reviewer is correct that absence of correlation between representations can lead to complementary results. In fact, it was demonstrated that graph similarity between zeolites can recover diffusionless phase transformations [1], but not phenomena typically attributed to building units. Predicting intergrowth was only possible by combining structure and graph similarity [1], but cannot be observed using structural similarity alone.
In this perspective, our work is less about the mechanistic insight on AMD, and more about what physical phenomena can be derived from this representation. Several other works by some of us have explored the AMD and other distances with extensive mathematical rigor [2], including applications in computational chemistry [3, 4]. In many of those cases, it was demonstrated that the definition of rigorous structural invariants allows the deduplication of crystal structures without false positives, and provide numerous demonstrations for such.
Building on these results, in the current manuscript, we demonstrate how the use of AMD can predict similarities in inorganic synthesis conditions. Whereas this result could be achieved with other representations, the AMD offers unparalleled computational efficiency to do so, enabling us to compute distance matrices between all hypothetical and known zeolites (about 330,000 x 250 = 82.5M distances between vectors) without resorting to high-performance computing facilities.
The definition of a “better” representation, therefore, may not be adequate for investigations such as this one. We believe there should not be a single champion representation, but a diverse set of methods that can achieve complementary goals. In this context, our investigation demonstrates the use of structural similarity (specifically, distances computed with the AMD and SOAP methods) for the prediction of inorganic synthesis conditions in zeolites, a topic that has not been explored in the literature before.

[1] - https://doi.org/10.1038/s41563-019-0486-1
[2] - https://proceedings.neurips.cc/paper_files/paper/2022/hash/9c256fa1965318b7fcb9ed104c265540-Abstract-Conference.html
[3] - https://doi.org/10.1021/jacs.2c02653
[4] - https://doi.org/10.1039/D2DD00068G


Reviewer 2: 4. In lines 99-103 on page 5, the authors noted that "zeolites sharing the lowest distances according to our metric have often been synthesized together (see Table S1 in the Supporting Information). Recovering pairs of structurally similar frameworks such as ITH-ITR, SBS-SBT, or MWF-PAU at low distance already suggests that the similarity metric is qualitatively sound." The authors should show which pairs are experimentally supported and which are not with appropriate citations.

Authors: We added references to some pairs related by structural/synthesis similarities to the main text and to the supplementary table. Although some pairs have not been reported together experimentally, most of the zeolites are related by known structural families such as the ABC-6 zeolites or the RHO family.

Reviewer 2: 5. The authors do not discuss some synthetically and structurally similar pairs but distant in the map. For example, SOD and LTA are synthetically and structurally similar but far apart. Similar examples are OFF/LTL, AFI/GME, UTL/PCR, and UTL/OKO. It would be necessary to address such anomalies, compare them with positive results, and test them against methods like SOAP and graph-based distances.

Authors: First, we emphasize that the synthesis map in Fig. 2 is a minimum spanning tree based on the distance matrix. This means that one node in the graph is connected only to its nearest neighbor in the tree, which facilitates the visualization. However, the two-dimensional distance between points in the plot (or graph distance between nodes) does not preserve distances of the space between zeolites. As described in the manuscript, the minimum spanning tree in Fig. 2 is a useful map to capture the intuition of synthesis in a completely unsupervised way. By all means, however, it is meant to capture all the possible pairwise similarities between zeolites. The results discussed after the minimum spanning tree, namely the homogeneity and the supervised learning, have been added to the manuscript exactly to generalize the qualitative results from the tree map.
Interestingly, the results of our data-driven analysis suggest that hand-crafting a distance function that forces certain distances to be low can be less efficient than learning from the data itself. To explore the examples suggested by the Reviewer:
• LTA zeolite has the sod building units that are the motif of the SOD zeolite. It is not unreasonable to expect that SOD should be close to LTA. However, when the top four nearest neighbors of SOD are listed, we see these zeolites (in ascending order of distance): FRA, PTT, DOH, and LOS. The first two explicitly contain the sod building unit. The latter two are 0-dimensional frameworks, analogously to SOD. All of them are zeolites with low accessible volume and higher framework density (~ 17 T/1000 Å3) compared to LTA (~14 T/1000 Å3), and all can be accessed in similar synthesis routes. Therefore, one could argue that the current nearest neighbors for SOD are better choices than forcing it to be LTA and understanding this distance as an “anomaly.” Moreover, note how LTA is closer to FAU and EMT instead, both of which also have the sod building unit, but larger cages and lower density compared to SOD. This result is thus a feature rather than a bug.
• OFF/LTL zeolites are related by d6r and can building units, as well as dsc chains, exhibiting reasonably similar densities (16.1 and 16.7 T/1000 Å3 for OFF and LTL, respectively), ring sizes (12, 8, 6, 4), and channel dimensionality (3D). Note, however, how OFF is a neighbor of SWY - a zeolite that shares exactly the same density (16.1 T/1000 Å3), building units (d6r, can, and gme), and channel dimensionality (3D) – and to ERI, a zeolite with equally similar building units (d6r and can), same density, and which it is known to form intergrowths. Similarly, LTL is a nearest neighbor of MOZ, a zeolite that shares all building units with LTL, including the larger ltl CBU, and has similar density. Furthermore, ZSM-10 (of topology MOZ) can also be synthesized as an aluminosilicate in the presence of potassium.
• GME/AFI are indeed structurally related, with GME sometimes undergoing a reconstructive transformation towards AFI. This has been explored in detail in previous work by one of us [1], where we show that the connectivity of these two structures is the same. However, as highlighted before, graph distances do not necessarily correlate to structural distances, and may indicate different phenomena. This is the case of GME and AFI [1]. It is not surprising that GME’s closest neighbors are SFW, AFX, and AFV, as the GME structure (or intergrowths) sometimes co-occur with ABC-6 zeolites. The closest neighbors to AFI are reasonably far according to the AMD distances (~0.18 Å). However, the closest nearest neighbor is TON, which is also a 1D zeolite. While this is not necessarily the determining factor, it is a correlation that can be used to rationalize this distance.
• The cases of UTL/PCR/OKO have also been discussed in previous work by one of us [1]. The disassembly of UTL using the ADOR method and/or inverse sigma transformation leads to structurally related materials due to similar layers. PCR and OKO are still closely related in terms of distances between AMDs: they are 9th nearest neighbor, and the 3rd-10th nearest neighbors to OKO have very similar distances as PCR. The ranking, therefore, does not convey the continuity of the distance, though it makes a convenient plot. The first nearest neighbor of OKO is actually the idealized *PCS, which, unsurprisingly, is the IPC-6 structure that also results from the ADOR method. The most similar structure to UTL is the EWS framework, possibly due to the large channel intersection and the layered structure of the latter.

In summary, throughout the paper we show several examples and correlate crystallographic distance to inorganic synthesis similarity. Our evidence supports the hypothesis that similarity in the structure space predicts similarity in the synthesis space. However, the opposite is not necessarily true, which is why we do not consider the examples proposed by the Reviewer as “anomalies”, but rather as expected outcomes of a data-driven analysis. While not every detail can be rationalized, we hope to have convinced the Reviewer that the counterexamples are not necessarily failures of our method, but simply a mismatch between the intuitive understanding of zeolite structures and a data-driven analysis. Several other examples could be discussed given the discrete nature of the minimum spanning tree. However, we decided not to explore every single result to keep the manuscript concise and added a longer discussion to the Supporting Information. Part of the discussion above was to the manuscript.

[1] - https://doi.org/10.1038/s41563-019-0486-1

Reviewer 2: 6. Regarding ll125-127 on page 6, it would be hard to correlate synthesis of MEI with PST-32 and PST-2 as the use of sodium in synthesizing high silica aluminosilicate zeolites is not limited to PST-32 and PST-2 but is very common for high silica zeolites.

Authors: We agree. We rewrote the sentence to point out the correlations between the assembly of MEI, AFS, and SAO. Sodium is not be the only player in that synthesis, as known from the template-driven crystallization of MEI, but the assembly mechanism can be explored in future works Thus, we removed the observation on sodium for the use of aluminosilicates.

Reviewer 2: 7. Concerning lines 153-157 on page 8, it is critical to discern whether the observed behavior is intrinsic to AMD or zeolite synthesis. Comparative analyses with other metrics would be insightful.

Authors: In the revised manuscript, we show that the SOAP invariant also exhibits this behavior. The comment, therefore, was strengthened to local structural similarity, as derived from the SOAP and AMD vectors, are predictive of inorganic synthesis conditions when considering all reports from the literature. This thorough analysis has been added to the manuscript.

Reviewer 2: 8. Does AMD prioritize short-range or long-range ordering compared to other methods? Comparing with other similarity metrics could elucidate this aspect.

Authors: As described in the Methods section, the AMD vector represents the average distance towards the k-nearest neighbors of a structure, and distances between AMD vectors are computed using the the L∞ norm. Because of this choice, the range of the interactions is defined by the variable k. In our case, we are using k = 100, which is sufficient to consider an entire unit cell of several zeolites, and is not unlike the typical cutoffs of 5-6 Å in local descriptors. As k → ∞, the distance between AMD vectors asymptotically approaches the density difference between structures.

Minor Comments

Reviewer 2: 9. In l.113 on page 6 and Figure S3, the authors claimed that one cluster with six-membered rings contains GIU. MEP, DOH, and MTN should instead be characterized by five-membered rings, considering that the six-membered ring is one of the most popular rings in zeolites, and these structures have a lot of five-membered rings that are not observed in other cluster members. Why does AMD cluster GIU and MEP so strongly? Why are CAN and APC out of the GIU cluster?

Authors: We agree with the Reviewer about the 5-membered rings of MEP, DOH, and MTN. These three zeolites share indeed low distances and could be considered a cluster by itself. With our highlighted cluster in Fig. S3 and discussion, we pointed out to the fact that all the highlighted zeolites within that cluster have at most 6-membered rings. Again, following the caveats of the minimum spanning tree depiction, we can provide a discussion on GIU/MEP, APC, and CAN:
• GIU and MEP are both clathrasil zeolites with similar densities. GIU is built by sod and can building units, and forms large cages, while MEP is built with mtn cages. However, the AMD distance between GIU and MEP is 0.18 Å. In comparison, GIU has 13 other neighbors within that distance (FAR, AFG, CAN, LTN, LIO, VNI, …), and MEP has two (DOH and MTN). The connection happens because, following the procedure of the minimum spanning tree, the MEP-GIU edge in the graph is the minimum pathway to connect the MEP-DOH-MTN cluster to the rest of the tree. Therefore, we do not see enough evidence to say that GIU and MEP are strongly connected, but simply that GIU is the closest zeolite to the MEP-DOH-MTN cluster given all the other distances.
• APC’s nearest neighbors are LIO and UEI, with the distances APC-LIO and APC-UEI varying in the fourth decimal place (0.1467 and 0.1475 Å, respectively). The correlation with UEI is obvious, both APC and UEI being similar zeolites with dcc chains, 2D channels, similar densities, and so on. It is not clear why APC and LIO are so similar, except that they share nearly the same density (17.7 and 17.6 T/1000 Å3), and the “cage” of APC is not unlike the los cage. However, we believe that the discrete nature of the tree may be inducing a mismatch between expectation and reality, which is better understood with the continuous distances. As discussed extensively in Major point 5 from the Reviewer, we do not attempt to explain all pairwise correlations between zeolites induced by biases in our visualization of the structures. Rather, we adopt a data-first approach where we try to learn new insights from correlations that we did not see before given the sheer number of zeolite pairs that we do not consider beforehand.
• In Fig. S3, we did not highlight CAN because our major discussion point was that all zeolites colored in that region have zero accessible area. However, GIU is still the closest zeolite to CAN, and may be considered part of a “GIU cluster”.

Therefore, we believe the Reviewer’s confusion is due to our choice of labeling of a “cluster” in a discrete tree to concisely discuss some observations. While there could be much more to explore in each pair of zeolites (and different subclusters could be formed up to all pairwise comparisons), we decided to adopt other descriptors later in the paper that better quantify this intuition. The minimum spanning tree is not an absolute answer to the synthesis/structure similarity, but an aid to quantify these intuitions. While this point had been mentioned in the previous version of the manuscript, we further stressed it in this revised submission.

Reviewer 2: 10. Regarding l.122 on page 6, what building patterns could help inform the MEI, SBS, SBT, SAO, and SBE synthesis?

Authors: As mentioned in Major point 6 above, we now discuss in the manuscript how the secondary building units in MEI are similar to those seen in AFS, SAO etc.

Reviewer 2: 11. The manuscript utilizes the terms organic templates and OSDA interchangeably. A consistent term should be adopted if there's no discernment between them in the study.

Authors: Both terms are often used interchangeably in the literature to refer to the organic molecules that direct the synthesis of certain zeolite polymorphs. Nevertheless, we have standardized the term to OSDA throughout the manuscript.

Reviewer 2: 12. In lines 154-156 on page 8, Co, Mg, and Zn show inferior predictive power than others. Could the dataset's distribution influence this? Providing the number of entries per inorganic condition would offer clarity.

Authors: We added the distribution of points to the Supporting Materials. While the lower number of data points can explain this factor, it may not be the only explanation. Beryllium-containing zeolites, for example, are as scarce as Mg-containing zeolites, but have distinct structural features that make them predictive. We could not find a strong correlation between Co, Mg, and Zn and the structures, and there are few reports on what makes a zeotype synthesizable [1]. Possibly, the diversity of compositions in some zeotypes further complicates the analysis of specific building patterns driven by Co, Mg, or Zn, in contrast with other inorganic agents such as Ge, F, or Be. Interestingly, the models trained with AMD distances perform better in the prediction of Mg zeolites compared to models using SOAP. SOAP-based models, on the other hand, perform better when predicting syntheses of Ca- and Ga-based zeolites, although these results are not statistically significant given the magnitude of error bars.

We added this discussion to the manuscript.

[1] - https://doi.org/10.1021/acs.jpclett.9b00136

---

Reviewer 3: I've been asked to review the data/code aspects of this manuscript, so I will refrain from providing a critique of the technical/scientific content of the work.

Based on my inspection of the provided data and code, I would rate the clarity, accessibility, and reproducibility of this work as exemplary. The giftront/.../Zeolites-AMD repository includes all relevant data and code and is quite easy to follow.

My only note is that the provided "average-minimum-distance" github link leads to a 404 page.

Authors: We thank the reviewer for the analysis of the code and data, and the positive evaluation of them. We have now uploaded the dataset and the code for the public at: https://github.com/dskoda/Zeolites-AMD. The repository and data also have persistent links on Zenodo.

The link to the GitHub repository “average-minimum-distance” had a typo in the previous manuscript. The link was corrected. The repository can be accessed at: https://github.com/dwiddo/average-minimum-distance




Round 2

Revised manuscript submitted on 10 Oct 2023
 

26-Oct-2023

Dear Dr Schwalbe-Koda:

Manuscript ID: DD-ART-07-2023-000134.R1
TITLE: Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry


 
Reviewer 2

The authors have diligently considered my feedback and addressed each point effectively. I am pleased to recommend acceptance of the revised version.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license