Peer review - Machine learning interatomic potentials for amorphous zeolitic imidazolate frameworks

22-Dec-2023

Dear Dr Coudert:

Manuscript ID: DD-ART-12-2023-000236
TITLE: Machine Learning Interatomic Potentials for Amorphous Zeolitic Imidazolate Frameworks

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after minor revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery

************

Reviewer comments

Reviewer 1

In this work by Castel et al., the authors develop machine-learned interatomic potentials (MLPs) for amorphous ZIFs, a family of MOFs. The models have excellent agreement with the ab initio reference data and are capable of efficiently simulating the structural, mechanical, and dynamic properties of complex, amorphous ZIF systems.

Overall, the work is clearly written, methodologically sound, and will likely prove of interest to the audience of Digital Discovery, particularly those who are new to the area of MLPs and MOFs. I don’t have any hesitation about the work being published, pending a few minor comments to be addressed. I will note, however, that I agree with Reviewer 2 from the previous round of reviews that the results are unlikely to be considered highly impactful.

The authors state the following in the Conclusion section: “In this work, we provide strong evidence that the development of machine-learnt potentials can pave the way towards the generation of multiple amorphous MOF models and the study of their properties.” At this point, it is well-accepted that MLPs are capable of mapping out potential energy landscapes given sufficient ab initio training data. Even if specific studies have not been carried out on amorphous MOFs, there is ample evidence from other diverse material classes and phases to expect the same here. As acknowledged in the conclusion, most of the work is focused on demonstrating that the model performs well and that MLPs are promising. There are relatively few new insights about amorphous MOFs and no major methodological advances, as the published model architectures work quite well already. All that being said, this study should be thought of as a validation effort, and validation efforts are always useful, particularly when it’s done via a thorough writeup, clear methodology, and openly accessible data like that in the submitted work. The study was expertly carried out, and I have no doubt about the accuracy of the results. The science is sound, and that's always a joy to see.

With that said, I have a few very minor suggestions that the authors may wish to consider, which I enumerate below:

1. It may be useful to provide some snapshots of the simulated structures from different phases. As the authors state, amorphous MOFs have recently gained traction in the scientific community and are not as widely studied as their crystalline counterparts. For this reason, the reader may be unfamiliar with what amorphous ZIFs may look like, and I think a few representative images could be useful for the sake of the uninitiated reader.

2. On Page 9, the authors state “Furthermore, we see that sampling the data set with the flat histogram method does not appear to provide more accuracy when tested against a large random sample. However, as observed in the training curves (Figure 2c), it does appear to learn faster.” I don’t know that I would necessarily make a point that it learns faster, assuming I understand this description correctly. The first reason is the obvious one: the initial MAE in the energy prediction for `flathist` is much worse than `random`, so it inherently has a lot more room to improve compared to the `random` sampling approach. In addition, because the data binning was done on the entire dataset (including the testing dataset), there is potentially some slight data leakage that could influence the learning curve in early epochs. Just like how it is not advisable to normalize or scale the entire dataset (i.e. it should only be done on the training set) [1], a similar argument could potentially be made for binning. That said, I don't think this will influence the results or conclusions in this work, but it's perhaps something to confirm within the context of Figure 2c. [1] https://doi.org/10.1021/acs.chemmater.0c01907

3. The caption to Table 2 is likely to be confusing to many readers, as it's unclear at a quick glance whether "... with the NequIP MLP" is referring to just "MLP glasses" or "ab initio and MLP glasses." I can imagine some confusion about whether "ab initio glass" is from AIMD simulations or the ab initio-derived glass that is modeled with the MLP. This distinction could be made clearer.

4. Please report the version of CP2K used in this work.

Reviewer 2

In this work, the authors trained a Neural Equivariant Interatomic Potential (NequIP) for molecular dynamics simulation of ZIF-4, a complex metal-organic framework system. The machine-learning interatomic potential (MLIP) is trained from snapshots of ab initio MD (aiMD) trajectory, where both flat-histogram and random sampling were compared in training set selection. The structural features of crystalline and amorphous ZIF-4 from MLIP and aiMD simulations, such as RDF, are compared to validate the model.

The manuscript is qualified in soundness, clarity, and reproducibility. All information, including ab initio dataset, software, and models are publicly available on various repositories.

I have a few minor comments as follows.

1. The train-test splitting is not clearly described, especially how data leakage is avoided. For example, are the train and test sets chosen from different trajectories to avoid closeness/correlation?
2. The difference between flathist and random sampling is confusing. The flathist error in force is clearly lower than that of random sampling. However, the claim that flathist “does appear to learn faster” is not solidly supported by the single training line in Figure 2c. Cross-validation may be needed to support this statement.

Reviewer 3

The manuscript by Castel et al. describes a comprehensive analysis of machine learning interatomic potentials using neural network potentials for amorphous ZIFs. The authors compare both NequIPs and Allegro which are popular “off-the-shelf” tools for training such models, but the authors focus on the critical aspect of generating training data and evaluating the physicochemical phenomena that emerge from these models which has great value to the broader community. The manuscript is well-organized, clearly written, and contains much insight for others in the field. I recommend this article be published in Digital Discovery subject to the minor comments I have below.

Considering the value of the models developed in the field of amorphous ZIFs, do the authors plan to make the training data or models available?

This is more of a curiosity, but I am interested in how local representations of bonded interactions can reproduce mechanical properties such as the bulk modulus which implicitly depend on long-range interactions. It’s reassuring to see the comparison with AIMD data, though this is certainly a space where experimental data are necessary (though they have their own complexities).

Figure 6 bottom panel it would be helpful to include a legend to this plot.

Figure 8 the text is labeled “abinitio”

Author response

Reviewer 1

In this work by Castel et al., the authors develop machine-learned interatomic potentials (MLPs) for amorphous ZIFs, a family of MOFs. The models have excellent agreement with the ab initio reference data and are capable of efficiently simulating the structural, mechanical, and dynamic properties of complex, amorphous ZIF systems.

Overall, the work is clearly written, methodologically sound, and will likely prove of interest to the audience of Digital Discovery, particularly those who are new to the area of MLPs and MOFs. I don’t have any hesitation about the work being published, pending a few minor comments to be addressed. I will note, however, that I agree with Reviewer 2 from the previous round of reviews that the results are unlikely to be considered highly impactful.

The authors state the following in the Conclusion section: “In this work, we provide strong evidence that the development of machine-learnt potentials can pave the way towards the generation of multiple amorphous MOF models and the study of their properties.” At this point, it is well-accepted that MLPs are capable of mapping out potential energy landscapes given sufficient ab initio training data. Even if specific studies have not been carried out on amorphous MOFs, there is ample evidence from other diverse material classes and phases to expect the same here. As acknowledged in the conclusion, most of the work is focused on demonstrating that the model performs well and that MLPs are promising. There are relatively few new insights about amorphous MOFs and no major methodological advances, as the published model architectures work quite well already. All that being said, this study should be thought of as a validation effort, and validation efforts are always useful, particularly when it’s done via a thorough writeup, clear methodology, and openly accessible data like that in the submitted work. The study was expertly carried out, and I have no doubt about the accuracy of the results. The science is sound, and that's always a joy to see.

With that said, I have a few very minor suggestions that the authors may wish to consider, which I enumerate below:

1. It may be useful to provide some snapshots of the simulated structures from different phases. As the authors state, amorphous MOFs have recently gained traction in the scientific community and are not as widely studied as their crystalline counterparts. For this reason, the reader may be unfamiliar with what amorphous ZIFs may look like, and I think a few representative images could be useful for the sake of the uninitiated reader.

==> We have added an atomistic visualization of a ZIF-4 glass system as a new figure (Figure 2).

2. On Page 9, the authors state “Furthermore, we see that sampling the data set with the flat histogram method does not appear to provide more accuracy when tested against a large random sample. However, as observed in the training curves (Figure 2c), it does appear to learn faster.” I don’t know that I would necessarily make a point that it learns faster, assuming I understand this description correctly. The first reason is the obvious one: the initial MAE in the energy prediction for `flathist` is much worse than `random`, so it inherently has a lot more room to improve compared to the `random` sampling approach. In addition, because the data binning was done on the entire dataset (including the testing dataset), there is potentially some slight data leakage that could influence the learning curve in early epochs. Just like how it is not advisable to normalize or scale the entire dataset (i.e. it should only be done on the training set) [1], a similar argument could potentially be made for binning. That said, I don't think this will influence the results or conclusions in this work, but it's perhaps something to confirm within the context of Figure 2c. [1] https://doi.org/10.1021/acs.chemmater.0c01907

==> Reviewer #2 raised a similar concern and we have reworded this to make it clear we’re not claiming a generic advantage based on this single study. We agree that there may be a slight data leakage in the procedure followed, but it is marginal for our purposes here. We applied the binning process to generate the train and test datasets separately to avoid obvious sources of data leakage. We intend in future work to more systematically study of random sampling vs. flat-histogram methods, considering various approaches to test for data leakage in a systematic comparison.

3. The caption to Table 2 is likely to be confusing to many readers, as it's unclear at a quick glance whether "... with the NequIP MLP" is referring to just "MLP glasses" or "ab initio and MLP glasses." I can imagine some confusion about whether "ab initio glass" is from AIMD simulations or the ab initio-derived glass that is modeled with the MLP. This distinction could be made clearer.

==> We have reworded to make things clearer:

"Bulk modulus K and density at zero pressure ρ0, calculated with the finite strain difference method and MD simulations relying on the NequIP MLP, for three different systems: the ZIF-4 crystal, ab initio and MLP glasses models."

4. Please report the version of CP2K used in this work.

==> We have added the version number in the Methods section, on page 6.

Reviewer 2

In this work, the authors trained a Neural Equivariant Interatomic Potential (NequIP) for molecular dynamics simulation of ZIF-4, a complex metal-organic framework system. The machine-learning interatomic potential (MLIP) is trained from snapshots of ab initio MD (aiMD) trajectory, where both flat-histogram and random sampling were compared in training set selection. The structural features of crystalline and amorphous ZIF-4 from MLIP and aiMD simulations, such as RDF, are compared to validate the model.

The manuscript is qualified in soundness, clarity, and reproducibility. All information, including ab initio dataset, software, and models are publicly available on various repositories.

I have a few minor comments as follows.

1. The train-test splitting is not clearly described, especially how data leakage is avoided. For example, are the train and test sets chosen from different trajectories to avoid closeness/correlation?

==> We have expanded the description of train/test splitting on Page 8 to be clearer about the train-test splitting. We also added a comment on the question of correlation: our AIMD simulations are much larger than “typical” and configurations are well spaced in time, meaning the issue of correlation between configurations is not a concern here.

2. The difference between flathist and random sampling is confusing. The flathist error in force is clearly lower than that of random sampling. However, the claim that flathist “does appear to learn faster” is not solidly supported by the single training line in Figure 2c. Cross-validation may be needed to support this statement.

==> We agree with the reviewer that our statement was overly broad, and we do not want to claim a generic advantage based on this single study. We have made it clearer on Page 13 that the difference is marginal and maybe only be on this specific set. We do intend to pursue more systematically the difference in learning (and data efficiency) between random sampling and flat histograms methods in future work.

Reviewer 3

The manuscript by Castel et al. describes a comprehensive analysis of machine learning interatomic potentials using neural network potentials for amorphous ZIFs. The authors compare both NequIPs and Allegro which are popular “off-the-shelf” tools for training such models, but the authors focus on the critical aspect of generating training data and evaluating the physicochemical phenomena that emerge from these models which has great value to the broader community. The manuscript is well-organized, clearly written, and contains much insight for others in the field. I recommend this article be published in Digital Discovery subject to the minor comments I have below.

Considering the value of the models developed in the field of amorphous ZIFs, do the authors plan to make the training data or models available?

==> We have already made both the training data and models available. The training data is published on Zenodo at https://doi.org/10.5281/zenodo.10015594, and the models are published on Github at https://github.com/fxcoudert/citable-data/tree/master/preprint-Castel. The links are given in the Data availability statement, as well as in the Methods section.

This is more of a curiosity, but I am interested in how local representations of bonded interactions can reproduce mechanical properties such as the bulk modulus which implicitly depend on long-range interactions. It’s reassuring to see the comparison with AIMD data, though this is certainly a space where experimental data are necessary (though they have their own complexities).

==> We agree with the reviewer that experimental data would be very welcome as a comparison point. However, as they note, there is very little available for now, and what is available comes with large complications — usually from nanoindentation, which is quite difficult to relate to bulk properties as typically measured in simulations. We have added a comment on that on Page 20.

Figure 6 bottom panel it would be helpful to include a legend to this plot.

==> We have added a legend to the bottom panel.

Figure 8 the text is labeled “abinitio”

==> We have fixed this typo.

Editor’s decision letter

06-Jan-2024

Dear Dr Coudert:

Manuscript ID: DD-ART-12-2023-000236.R1
TITLE: Machine Learning Interatomic Potentials for Amorphous Zeolitic Imidazolate Frameworks

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Joshua Schrier
Associate Editor, Digital Discovery

Reviewer comments

Reviewer 3

The authors have addressed my remaining comments.

Reviewer 1

The authors have addressed my comments, and I am happy to recommend this manuscript for publication in Digital Discovery. Thank you to the authors for your time in addressing my feedback and for the opportunity to provide input on this very nice work.

Reviewer 2

The authors have addressed my concerns. I recommend its publication in Digital Discovery.

From the journal Digital Discovery Peer review history

Round 1

Reviewer 1

Reviewer 2

Reviewer 3

Round 2

Reviewer 3

Reviewer 1

Reviewer 2

Transparent peer review