From the journal Digital Discovery Peer review history

Neural network potentials for reactive chemistry: CASPT2 quality potential energy surfaces for bond breaking

Round 1

Manuscript submitted on 27 Mar 2023
 

11-May-2023

Dear Dr Goodpaster:

Manuscript ID: DD-ART-03-2023-000051
TITLE: Neural Network Potentials for Reactive Chemistry: CASPT2 Quality Potential Energy Surfaces for Bond Breaking

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions may be necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

see attached

Reviewer 2

The authors presented an accurate and efficient neural network method for reproducing the specific C-C bond breaking potentials for alkanes with up to 8 carbons at the CASPT2 level of accuracy, with existing large training dataset of DFT but very small training data of CASPT2. Overall, this is an interesting contribution and will advance the development of reactive potentials by neural network in this field. I have one major concern about their finding.

As shown in Fig. 4, the potential curves for C-C bond breaking from C4 to C6, C8 are basically similar. I suggest the authors if they plot them together in one figure, I doubt the three curves should have much overlap. If that is the case, there is not so strong power in this method. Can the authors present this issue carefully.

Reviewer 3

Dear Authors,
I read your manuscript with great interest. I have been assigned to review the data and code for the paper.
The code used for training the neural network potentials refers to a previously published work. The Readme.md file looks well organized and contains clear instructions. The GitHub codebase has received considerable attention from the community, has active issues and PRs. Overall from the perspective of the current submission no further reiview of the code is required.

Regarding the data, I have some observations:
1. The data is hosted on the GitHub repo owned by the researchers. I would suggest sharing the data via a FAIR repository such as Zenodo with a DOI for the dataset.
2. The datasets should contain more information about the methods used to generate them. For example, the alkane-dataset has data from DFT calculations. The repo does not mention the method/software used to generate the data as metadata. I strongly suggest to provide a sample input file used for each calculation. For example, if you performed DFT based geometry optimization of say 1000 complexes, provide one input file which details all the keywords that were used for the geometry optimization calculation including the software/version used as comments in the input file. Alternatively, a well-structured metadata file can be added.


 

We thank the reviewers for their careful reading of our manuscript. We have made the requested changes and additions, and we now believe the manuscript is ready for publication.

Reviewer 1

Comment: The authors do a reasonably good job of aiming to release the code and dataset. However, the specific methodology used to perform an automated CASPT2 simulation with molpro, which must exists as code in some from, is not released. Since this is a significant draw of the paper, though should be released on github along with the dataset.

Response: We thank the reviewer for the thoughtful comments. The molpro sample code for the CASPT2 calculation is now added to the github repository.

Comment: The biggest issue with this paper is the fact that forces were not retained in the dataset for either the DFT or CASPT2 data. TorchANI, and indeed all modern neural networks, support the ability to learn to both total energy and atomic force data. The fact that this capability is not used in these models is a significant downside to this work. Thought the ability to rebuild the entire dataset with forces is likely not possible or necessarily worth while. Future efforts to make more general CASPT2 quality potentials should retain force data in their training dataset. The fact that force data was not captured in either the DFT or CASPT2 data needs to be clearly stated in this paper.

Response: We thank the reviewer for the thoughtful comments. It is true we did not include forces in this work even though it is possible to train ANI with forces. This work started soon after ANI was initially published, before forces were included in the published ANI dataset. However, we did generate force data for some of our dataset and trained some NNPs with forces. Unfortunately, we did not observe significant improvement to the evaluation of bond dissociation predictions at the DFT level when forces were included in training, and the generation of force data at CASPT2 would be even more expensive, therefore we kept the work as it is without force training.
Regardless, as requested by the reviewer, we have added the following text to our paper clearly stating the lack of force data in the training. We also added force data we have generated to our github repository.

The ANI-1CH database only contains DFT energies, force data were not available prior to this work. Therefore, all NNPs discussed in this work were only trained with energies.

Comment: References to the state of the art NN’s a bit out of date. I would probably also suggest citing Schnet, Nequip, E3NN, and tensormol. Honestly, the list of MLIAPs is pretty long at this point so trying to cover everything is nearly impossible.

Response: We thank the reviewer for the thoughtful comments. The suggested references are now included in our paper. The following text was added in response to this comment.

While there are several examples of system-specific NNPs being highly accurate, and several well developed interatomic NNP models such as TensorMol, SchNet, NequIP, TorchMD-NET
and e3nn to name a few. However, the transferability of training on small systems and applying them to large systems for bond breaking processes remains a greater challenge.


Comment: Figure 3 would probably work better on a semi-log scale, since the plot pas 4 ‘maximum number of carbons in Training data’ is basically unreadable.

Response: We thank the reviewer for the thoughtful comments. We have modify the figure, and it is now on a log scale.

Reviewer 2

Comment: As shown in Fig. 4, the potential curves for C-C bond breaking from C4 to C6, C8 are basically similar. I suggest the authors if they plot them together in one figure, I doubt the three curves should have much overlap. If that is the case, there is not so strong power in this method. Can the authors present this issue carefully.

Response: We thank the reviewer for the thoughtful comments. The C4 to C6, C8 bond dissociation potential curves are indeed very similar. The largest gap between the dissociation curves for any of the linear alkanes in our work at the CASPT2 reference level is less than 3 kcal/mol. However, our NNP prediction error for all of them is less than 1 kcal/mol. Even if the reference curves have a gap less than 1 kcal/mol, the NNP is very accurate in predicting the relative energy gap between the curves. For example, between C8(34) and C8(45), the CASPT2 reference shows a gap of 0.19 kcal/mol with C8(34) being slightly higher, the NNP prediction shows a gap of 0.25 kcal/mol with C8(34) being slightly higher.
In response to the comments, we decided to add two additional columns to table 2 showing dissociation energies of the CASPT2 reference and the NNPs predictions; we also added a figure plotting C4 and C6 in one figure in the SI; additionally, we added the following text in the discussion.


As shown in Table 2, the NNP prediction E_D(NNP) has a less than 1 kcal/mol error for any of the dissociations. The dissociation energy in these alkanes, E_D(CASPT2), is between 91-94 kcal/mol, with the largest gap between the dissociation curves being about 3 kcal/mol. The highest and lowest bond dissociation energies are C2 and C4(23), which suggests that one reason high accuracy is seen from training on C2-C4 and predicting up to C8 is due to the NNP only having to interpolate to energies within the CASPT2 data set.
The NNP predictions have even greater accuracy for relative energies. For example, between C8(34) and C8(45), the CASPT2 reference shows a gap of 0.19 kcal/mol with C8(34) being slightly higher, the NNP prediction shows a gap of 0.25 kcal/mol with C8(34) being slightly higher. By looking at the relative energies, one can still see that knowledge was transferred from the DFT trained NNP to the CASPT2 trained NNP. The only carbon-carbon bond dissociation between two secondary carbons in the CASPT2 training set is C4(23). The reference CASPT2 data shows a relative energy difference between the dissociation energies C4(23) and C6(34) of 1.08 kcal/mol compared to 1.16 kcal/mol for the CASPT2 trained NNP. Thus, despite the C6(34) bond dissociation only being included in the DFT data, the CASPT2 trained network correctly predicts this energy due to the transfer learning. Therefore, while here we are only demonstrating a modest transferability, performing a larger study on transferability of this network to non-alkanes is the subject of our future work.
We can conclude that once a NNP is sufficiently trained with DFT data for a specific system, you only need a very small amount of CASPT2 data to achieve a high level energy correction. What is demonstrated here shows that if only targeting a specific system, one would only need a few thousands of geometries at the higher level of theory to utilize transfer learning to retrain a DFT level NNP to the CASPT2 level of theory.

Reviewer 3

Comment: The data is hosted on the GitHub repo owned by the researchers. I would suggest sharing the data via a FAIR repository such as Zenodo with a DOI for the dataset.

Response: We thank the reviewer for the thoughtful comments. The repository is now available on Zenodo under DOI 10.5281/zenodo.7983019

Comment: The datasets should contain more information about the methods used to generate them. For example, the alkane-dataset has data from DFT calculations. The repo does not mention the method/software used to generate the data as metadata. I strongly suggest to provide a sample input file used for each calculation. For example, if you performed DFT based geometry optimization of say 1000 complexes, provide one input file which details all the keywords that were used for the geometry optimization calculation including the software/version used as comments in the input file. Alternatively, a well-structured metadata file can be added.

Response: We thank the reviewer for the thoughtful comments. The Gaussian sample code for the DFT calculations and the molpro sample code for the CASPT2 calculation are now added to the github repository and Zanodo.




Round 2

Revised manuscript submitted on 01 Jun 2023
 

22-Jun-2023

Dear Dr Goodpaster:

Manuscript ID: DD-ART-03-2023-000051.R1
TITLE: Neural Network Potentials for Reactive Chemistry: CASPT2 Quality Potential Energy Surfaces for Bond Breaking

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry


 
Reviewer 2

I am satisfied with the changes and replies made by the authors.

Reviewer 1

The authors have sufficiently addressed all of my comments. Further, in emphasizing that the NN potential is able to differentiate between alkane disassociation energies, which requires an accuracy of 1 part in 100, the authors have strengthened their manuscript. In fact, they show that these NN potentials are possibly able to have accuracies of up to 1 part in 1000, which should be sufficient for accurate modeling of reactive dynamics. This is an important illustration in the field of NN potentials.

Reviewer 3

Thank you for addressing the concerns! Best regards1




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license