From the journal Digital Discovery Peer review history

You do not have JavaScript enabled. Please enable JavaScript to access the full features of the site or access our non-JavaScript page.

Round 1

Manuscript submitted on 25 Feb 2022

Editor’s decision letter

26-Apr-2022

Dear Dr Maurer:

Manuscript ID: DD-ART-02-2022-000016
TITLE: Long-range dispersion-inclusive machine learning potentials for structure search and optimization of hybrid organic-inorganic interfaces

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy from CASRAI, https://casrai.org/credit/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************

Reviewer comments

Reviewer 1

This paper proposes to combine a short-ranged interatomic ML potential and a long-range dispersion correction based on atoms-in-molecules partitioning. The reference data for the local models was obtained by subtracting a van-der-Waals correction term from the energy and forces obtained with DFT. The geometries themselves were taken from existing repositories, using a multi-step active sampling approach that relies on empirical variance estimates generated from multiple independently trained models. This scheme is applied in a global structures search task, as well as a a pre-relaxation task.

The novel aspect of this work is the separation of local and global energy/force contributions into independent models based on the observation that current message passing neural network models (such as the SchNet architecture used here) have trouble describing long-range interactions. While this general modeling idea is not new (e.g. see [1]), this work demonstrates different applications. All numerical experiments are documented in a detailed manner and the discussion of the results puts great emphasis on the practicality of this approach in terms of computational efficiency, which is a crucial consideration. The proposed approach reaches a break-even point in terms of computational efficiency fairly early, compared to a traditional DFT-based approach.

A minor concern that remains pertains to the robustness of the approach: There seems to be an issue with keeping the optimization contained within the training regime of the model. Several heuristics aim to robustify the procedure, but they rely on somewhat tedious to define hyper-paramters (adaptive threshold on the predicted energy variance, limit on number of steps with rising variance, etc.). It appears that this could limit the applicability to some extend, at least for inexperienced practitioners.

I also think that this manuscript would benefit from a more extensive review of other published solutions to the problem of separating interaction scales within empirical ML models. This would put the specific modeling decisions made in this paper into a much needed broader context.

Overall, the paper is well-written and easy to follow. The limited novelty of the proposed approach is offset by the care that has been put into explaining the experimental design.

[1] Unke, Oliver T., et al. "SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects." Nature communications 12.1 (2021): 1-14.

Reviewer 2

I was asked to contribute a "Data review" for the current manuscript. I should mention that my expertise lies somewhat far from ML in computational chemistry - I am more focused on benchmarking of gas phase structures and in reproducible heterogeneous catalysis.

With the limitations of my review out of the way, first let me comment on the article in general. I think it's an impressive work, and as someone using DFT of solids routinely, it's great to see development in methods which focuses on geometries and not only energies. A couple of minor points below:

I see the authors are using the reference implementation of MBD. How about TS - is that also part of the MBD library, or is it part of ASE, or did that have to be implemented independently? Could the authors comment on the steps that would be required to interface their code with other "dispersion corrections", such as XDM or with Grimme's dftd3 or dftd4 utilities?

The final paragraph at the end of the introduction should be rewritten to be a little clearer about what's new in the current work and what has been published previously.

I like the use of existing data in the Ag study. It shows that publishing raw data from comp. chem. studies is worthwhile.

The authors mention that the C atoms are frozen in all calculations. How about the Ag atoms? The top 3 layers of Ag are mentioned to be allowed to relax for the 16 B2O structures, but how about the rest?

The authors mention in the SI that the initial ML models struggle with describing Au@C for Au clusters that are not explicitly used in the training set. What is the performance of the model on these "non-trained" cluster sizes after adaptation? I believe this ought to be mentioned in the body of the manuscript as opposed to the SI.

As for the data review, I have combined the checklist and highlighted a few issues below, which, in my view, warrant a revision. I am happy to have a look at the manuscript again:

1. Data sources:
The input data (both Au@C and X2O@Ag) is made available via NOMAD. As DOIs to data archives are provided, reporting access date / version number is unnecessary. The data is well discussed in the manuscript.

2. Data cleaning:
The way I understand the archives provided, the two software packages "SchNet_EV-AuC" (via figshare) and "SchNet-vdW" (via github) can be used to use the already-generated ML models as part of ASE. However, while the data cleaning steps are described in text, even with the above software archives, it would take me a significant amount of time to get started with reproducing the ML training process. Perhaps a step-by-step how-to would be appropriate here.

Source of the used data is clearly documented, and it is my understanding that apart from a training/validation/testing split, no data from the underlying datasets has been removed.

3. Data representations
The authors provide a set of npz files (which I assume to be zipped np.ndarray pickles). I have tried opening these accordingly, but in some cases I couldn't get any data out ("Fig3a_AuC.npz"). In some cases only a part of the data shown in the figures is provided - e.g. "Fig4.npz" only contains the data from plot (d). The authors certainly could, and in my view should, be a little more thorough in teaching the reader/user on how to use their software and how to arrive at the figures in the paper.

4. Model choice
I am not knowledgeable enough in ML development to judge the applied methods. However, as a potential user, I am not sure the authors include all necessary code to reproduce the MLs generated in this work, or to create new MLs using different sets of input data. The way I read the documentation, this would in principle be the job of the SchNetPack library with which the authors code interfaces.

5. Model training and validation
I mentioned above that it is unclear how the input data was sorted between training, validation, and testing sets. The issues are discussed both in the manuscript and SI, but the documentation of the code where this happens could be clearer. For instance, out of the 5368 data points in Au@C, how were the 4500 (training) and 500 (validation) points selected? The manuscript mentions that the selection was random - but was it randomised across the whole dataset, or did they split the data by cluster size first?

7. Code and reproducibility
The "SchNet-vdW" is available in a public repository (github), it is version-tracked, clearly licensed, and provides installation instructions. It also has a test suite, which is however not plugged into any publicly visible CI. On the other hand, not much documentation of the software is provided, except a couple of docstrings, and there is no tagged version. In my view, the version submitted to the journal ought to be tagged, and ideally available under its own DOI (on figshare or zenodo).

The "SchNet_EV-AuC" code is only available as a zipped folder within the archive of files for review of this manuscript. It contains a license, and a standard installation script, but not much in the way of documentation - I could only find a very short README which describes the version of this archive, and two jupyter notebooks in the scripts folder which may be a useful starting point if the authors chose to provide instructions on how to use their codes.

No plotting routines are provided. The data presented in the figures in the manuscript is, if provided, not linked to a workflow.

Summary:
In summary, two items would help this manuscript tremendously - a "howto" on fitting DFT data to obtain the energies/forces as well as Hirschfeld charges models, and a "sample" use case on calculating energies. The latter of which is currently present in the tests folder, but the reader has to look for it and set up an environment themselves.

Best wishes,
Peter Kraus

Reviewer 3

In this paper by Westermayr et al., the authors work on establishing effective methods modelling of surface nanostructures. They explore a combined approach of using state-of-the-machine-learning interatomic potentials for short-range interactions while developing separate ML methods to predict key parameters of various methods in the Tkatchenko et al. family of vdW-corrections to DFT.

The paper is highly detailed and would provide valuable guidance to researchers attempting to model similar structures using MLIP. In particular, the exploration across material space is interesting, though the performance is quite modest. In many cases, it is the energy difference between different sites that is the most important. I would suggest the authors, complement Table 1 with such numbers, for instance for benzene on Ag(111). It would provide valuable input (or warnings) to potential users.

One thing I really find missing in the paper is a discussion on whether the approach here is useful or at all relevant for the DFT-D and vdW-DF methods. Could one bypass the need for long-range training with DFT-D? If so, the readers should know this, but maybe also discuss whether DFT-D has had any success in modeling this class of systems. vdW-DFs (not the standard vdW-DF1 and DF2 variants tough) have been quite successful in describing surface nanostructures similar to the ones modeled here. The authors must have missed this in their literature review.
But maybe the approach developed here is limited to the Tkatchenko-type methods? If so, this should be stated.

In the same vein, some discussion on long-range electrostatic effects in the conclusions would be welcome. I understand this method does not account for this? If so, how do the authors suggest we should proceed with accounting for this? Noting that this requires more work in the conclusions should be fine. Or are they not important for surface nanostructures?

One thing I find confusing about this paper, is the low computational cost of ML+MBD? It implies that the MBD is less costly than DFT. Is that really the case? If so, I do understand why the DFT-TS or DFT-TS_surf method has not fully been replaced by DFT-MBD. There is probably something I am missing here. Could the authors elucidate?

Minor notes:
Calling the DFT+TS method, the vdW method, is poor nomenclature. There are many “vdW methods”.
In 2.1, the authors write “f is a damping function to avoid double counting of short-range contributions”. This is misleading.
First, there are no formal mechanisms in place to avoid double counting.
Second, it implies that C_6/r⁶ correctly describes short-range interactions. It does not; rather it is inaccurate and eventually diverges. The damping function serves primarily to mimic short-ranged vdW-interactions in such a manner that it is a good match to the specific exchange-correlation chosen.

These comments are minor and I certainly recommend this paper for publication.
However, my background is from the DFT side and I am unable to provide detailed comments on the MLIP approach beyond the fact that it seems sensible.

Author response

Dear Prof. Hung,

please find our reply to the referees and the revised manuscript in this resubmission.

KInd regards,
Reinhard Maurer

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters:

Dear Professor Huang,
Thank you for your effort and for considering our work as potentially publishable after revision. We feel that all reviewer opinions have helped us to substantially improve the manuscript.
We have made comprehensive corrections to the manuscript based on the comments and supply a point-by-point response to the reviewers. We also attach a version of the updated manuscript and supplemental information which highlighted text changes.
We hope that the revised version is acceptable for publication in Digital Discovery.
Kind regards,
Julia Westermayr and Reinhard Maurer on behalf of all authors
************
REVIEWER REPORT(S):
Referee: 1

Comment 1
Comments to the Author “This paper proposes to combine a short-ranged interatomic ML potential and a long-range dispersion correction based on atoms-in-molecules partitioning. The reference data for the local models was obtained by subtracting a van-der-Waals correction term from the energy and forces obtained with DFT. The geometries themselves were taken from existing repositories, using a multi-step active sampling approach that relies on empirical variance estimates generated from multiple independently trained models. This scheme is applied in a global structures search task, as well as a a pre-relaxation task. The novel aspect of this work is the separation of local and global energy/force contributions into independent models based on the observation that current message passing neural network models (such as the SchNet architecture used here) have trouble describing long-range interactions. While this general modeling idea is not new (e.g. see [1]), this work demonstrates different applications. All numerical experiments are documented in a detailed manner and the discussion of the results puts great emphasis on the practicality of this approach in terms of computational efficiency, which is a crucial consideration. The proposed approach reaches a break-even point in terms of computational efficiency fairly early, compared to a traditional DFT-based approach. A minor concern that remains pertains to the robustness of the approach: There seems to be an issue with keeping the optimization contained within the training regime of the model. Several heuristics aim to robustify the procedure, but they rely on somewhat tedious to define hyper-paramters (adaptive threshold on the predicted energy variance, limit on number of steps with rising variance, etc.). It appears that this could limit the applicability to some extend, at least for inexperienced practitioners.”

Response:
We thank the reviewer for their detailed analysis of our work. During geometry optimizations and molecular dynamics simulations, there is always a potential risk of entering regimes for which little data was available within the original training dataset. This is particularly so for small training datasets, such as was the case in the initial dataset for the Au@C system.
We have implemented an adaptive force threshold approach to provide a measure of when such a scenario arises. This allows us to truncate optimizations early at which point we switch back to the reference DFT method to refine the structures further. We believe that the adaptive threshold used for structural optimizations with initially trained ML models is an important and novel aspect of our work, as it allows ML models trained with little data to be used to accelerate structural relaxations as well.
Note that the later refined models for Au nanoclusters on diamond (110) (e.g. MLadapt.2 etc) based on adaptive sampling did not require early stopping mechanisms during the optimizations, as the desired fmax values could always be achieved.
The parameter choice for the adaptive threshold as described in the original manuscript was not sufficiently clear. We have revised section 2.4 in the new manuscript to clarify this point. On page 5, we have added
“The fmaxinit value was found to be relatively robust and set to 0.15 eV/A for the test studies shown in this work, but it can be set to a different value by the user to take into account the requirements of other ML models. We tested different thresholds between 0.1-0.2 eV/A for initial models and found that the structures obtained were very similar and differed by less than 0.05 A root-mean-squared deviation.”
In the next paragraph, we added:
“The results obtained when using slightly different parameters (Fig. S1) for structure optimizations of nanoclusters on surfaces and molecules on surfaces show that parameters are robust and relatively generally applicable to ML models trained on other types of systems.”
At the end of section 2, we added: “These additional parameters are only relevant for models that are trained on a small training set and ensure that the optimization is stopped before the training regime is left. At that point, the remaining optimizations can be carried out with the reference method.”

Comment 2
“I also think that this manuscript would benefit from a more extensive review of other published solutions to the problem of separating interaction scales within empirical ML models.
This would put the specific modeling decisions made in this paper into a much needed broader context.
Overall, the paper is well-written and easy to follow. The limited novelty of the proposed approach is offset by the care that has been put into explaining the experimental design. [1] Unke, Oliver T., et al. "SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects." Nature communications 12.1 (2021): 1-14.”

Response:
To provide a better literature overview, we have added the following text to the introduction and included the mentioned reference in the revised manuscript:
“Earlier work by Behler and co-workers [51, 52] has also shown that the simulation of liquid water can be facilitated with neural networks trained on energies and atomic charges, where the latter was used to correct for electrostatic interactions. This scheme was later complemented with long-range dispersion interactions based on the Grimme D3 correction method [52] Atomic charges were further used in TensorMol-0.1 [53] to augment the total energy with Coulomb and vdW corrections. A similar approach was applied by [38] in PhysNet, where the total energy was corrected with additive terms that include electrostatic corrections obtained from partial atomic charges and a DFT-D3 dispersion correction term. Recently, this description was extended in SpookyNet, where the total energy is corrected with empirical terms for the nuclear repulsion based on an analytical short-range term, a term for electrostatics and a term for dispersion interactions. [54]”

Referee: 2
Comment 1
“Comments to the Author I was asked to contribute a "Data review" for the current manuscript. I should mention that my expertise lies somewhat far from ML in computational chemistry - I am more focused on benchmarking of gas phase structures and in reproducible heterogeneous catalysis. With the limitations of my review out of the way, first let me comment on the article in general. I think it's an impressive work, and as someone using DFT of solids routinely, it's great to see development in methods which focuses on geometries and not only energies. A couple of minor points below: I see the authors are using the reference implementation of MBD. How about TS - is that also part of the MBD library, or is it part of ASE, or did that have to be implemented independently? Could the authors comment on the steps that would be required to interface their code with other "dispersion corrections", such as XDM or with Grimme's dftd3 or dftd4 utilities?”

Response
First, we thank the reviewer for the assessment of our work and thank him for the interesting question, which raises a point that we indeed did not mention anywhere in the text of the original manuscript. Both, vdw(TS), vdWsurf and MBD are functionalities within Libmbd, which we coupled with ASE. Hence, they do not have to be implemented independently. Other methods, like DFT-D3, DFT-D4, etc. could be used as well as they are implemented in ASE via calculator interfaces. We have mentioned this in the revised version, and we also provide additional scripts (“test_ml_dftd3-4.sh” and “test_ml_d3.py”) that show a minimal example how to use our approach in combination with D3 and D4.
The text is added to the Data Availability section on page 11:
“In addition, we include a script that shows how to replace the external vdW(TS) correction schemes [19,22] with the D3 method by Grimme et al. [63], which is also interfaced with ASE. Tutorials for training ML models, generating a data set, and making ML-based optimizations with external vdW corrections (vdW(TS), vdWsurf, MBD, DFT-D3, and DFT-D4) (Jupyter Notebooks), files to reproduce figures, test data, and additional code to run ML models are available from figshare (10.6084/m9.figshare.19134602) [91].”

Comment 2
“The final paragraph at the end of the introduction should be rewritten to be a little clearer about what's new in the current work and what has been published previously.”

Response:
As also suggested by reviewer 1, we provide a more extensive literature review in the introduction and extend the last paragraph to make clear what the current work establishes compared to previous works (see response above).

Comment 3
“I like the use of existing data in the Ag study. It shows that publishing raw data from comp. chem. studies is worthwhile. The authors mention that the C atoms are frozen in all calculations. How about the Ag atoms? The top 3 layers of Ag are mentioned to be allowed to relax for the 16 B2O structures, but how about the rest?”

Response
We thank the reviewer for pointing out this missing information. We have included following text in the methods section: “The rest of the Ag layers are kept frozen, as in the reference.”

Comment 4
“The authors mention in the SI that the initial ML models struggle with describing Au@C for Au clusters that are not explicitly used in the training set. What is the performance of the model on these "non-trained" cluster sizes after adaptation? I believe this ought to be mentioned in the body of the manuscript as opposed to the SI.”

Response
The initial ML models struggle to describe Au@C for Au clusters that are not contained in the training set, which is why we add different cluster sizes during adaptive sampling. After the first adaptive sampling step, the error decreases by a factor of more than 100 with the remaining maximum model errors for known cluster sizes of around 1 eV. The mean model variance of all optimizations during basin hopping is in the range of 0.1 eV including errors from optimizations that left the training regime and that were used for further adaptive sampling. Whenever models stayed within the training regime, the model variance is in the range of a few meV.
We have added this information to the supporting information and further mentioned it in the main text of the revised manuscript, section 2.2.1, page 4:
Before adaptive sampling, ML models deviated by several 10s of eV for cluster sizes that were not included in the training set, leading to unphysical structure relaxations. After adding additional data points, the average model variance decreased to around 0.1 eV with maximum errors in the range of 1 eV, when the training regime is left.
To further increase the accuracy of the ML models a second adaptive sampling run MLadapt2 was executed with MLadapt1.

Comment 5:
“As for the data review, I have combined the checklist and highlighted a few issues below, which, in my view, warrant a revision. I am happy to have a look at the manuscript again:
1. Data sources: The input data (both Au@C and X2O@Ag) is made available via NOMAD. As DOIs to data archives are provided, reporting access date / version number is unnecessary. The data is well discussed in the manuscript.
2. Data cleaning: The way I understand the archives provided, the two software packages "SchNet_EV-AuC" (via figshare) and "SchNet-vdW" (via github) can be used to use the already-generated ML models as part of ASE. However, while the data cleaning steps are described in text, even with the above software archives, it would take me a significant amount of time to get started with reproducing the ML training process. Perhaps a step-by-step how-to would be appropriate here. Source of the used data is clearly documented, and it is my understanding that apart from a training/validation/testing split, no data from the underlying datasets has been removed.”

Response:
It is correct that despite the random splitting automatically done by SchNetPack (exact splits and indices are provided in the file “split.npz” with the model), no data was removed.
As mentioned further below, the revised submission provides additional scripts to generate a data set for our method and to train ML potentials.

Comment 6
“3. Data representations The authors provide a set of npz files (which I assume to be zipped np.ndarray pickles). I have tried opening these accordingly, but in some cases I couldn't get any data out ("Fig3a_AuC.npz"). In some cases only a part of the data shown in the figures is provided - e.g. "Fig4.npz" only contains the data from plot (d). The authors certainly could, and in my view should, be a little more thorough in teaching the reader/user on how to use their software and how to arrive at the figures in the paper.”

Response
In the revised submission, we provide additional scripts on figshare. In total, we have added 4 new jupyter notebooks. The first one generates a minimal data set and extracts data out of an FHI-aims output file. The python script for data extraction is provided and called within the jupyter notebook. This data set can then be used to train ML models. We also provide another jupyter notebook that shows how to train ML models on energies, forces, and Hirshfeld volume ratios.
A third jupyter notebook then uses the trained models for ML optimization and guides the user to the github of SchNet-vdW, where we provide scripts that show how to use the ML model for basin hopping and optimizations and that also explains different parameters that can be specified. In addition, we add 2 scripts that show how to use the ML models with DFTD3 and DFTD4, which we implemented. This is more difficult to show in jupyter notebooks and we thus prefer to keep it in the github repository.
A final notebook shows how to extract data from npz files and also explains how to arrive at the figures in the text. We would like to mention that we did not provide the error files for scatter plots, because we did not see any benefit for the user and the files are relatively large. However, we added an example (jupyter notebook 2) on how to validate ML models and how to get to the mean absolute errors. Any user who might be interested in evaluating models for a given set of data can do this with the models and data provided.
When checking some npz files, we realized that there was no data inside. We thank the reviewer for pointing this out. All missing files are now included in the revised manuscript where we provide npz files with data and examples on how to plot them including the code to generate this data from structures. We further realized that the names of structures provided were incorrect, i.e., structures relating to figure 4 were named as being related to figure 3. We renamed the files accordingly. Regarding figures 5 and 6, we notice that the data was not described clearly enough. We therefore renamed the files and provide additional scripts and files. We are confident that with these additional steps, our work is easily reproducible.

Comment 7
“4. Model choice I am not knowledgeable enough in ML development to judge the applied methods. However, as a potential user, I am not sure the authors include all necessary code to reproduce the MLs generated in this work, or to create new MLs using different sets of input data. The way I read the documentation, this would in principle be the job of the SchNetPack library with which the authors code interfaces.”

Response:
The revised submission provides a jupyter notebook which shows how to generate a data set and how to train a ML model with SchNetPack.

Comment 8
“5. Model training and validation I mentioned above that it is unclear how the input data was sorted between training, validation, and testing sets. The issues are discussed both in the manuscript and SI, but the documentation of the code where this happens could be clearer. For instance, out of the 5368 data points in Au@C, how were the 4500 (training) and 500 (validation) points selected? The manuscript mentions that the selection was random - but was it randomised across the whole dataset, or did they split the data by cluster size first?”

Response:
A new script for this has been provided.
The data is split randomly by SchNet, hence it is not split according to cluster size. This is clarified in the provided jupyter notebook that shows how to train ML models.

Comment 9:
“7. Code and reproducibility The "SchNet-vdW" is available in a public repository (github), it is version-tracked, clearly licensed, and provides installation instructions. It also has a test suite, which is however not plugged into any publicly visible CI. On the other hand, not much documentation of the software is provided, except a couple of docstrings, and there is no tagged version. In my view, the version submitted to the journal ought to be tagged, and ideally available under its own DOI (on figshare or zenodo).”
The "SchNet_EV-AuC" code is only available as a zipped folder within the archive of files for review of this manuscript. It contains a license, and a standard installation script, but not much in the way of documentation - I could only find a very short README which describes the version of this archive, and two jupyter notebooks in the scripts folder which may be a useful starting point if the authors chose to provide instructions on how to use their codes.”

Response:
We updated the SchNet-vdW github repository with the additional files that were created during the review process and tagged this version as v0.1. We added the information in the ReadMe and the Code Availability Statement.
We provide the SchNet_EV-AuC code as a zipped file since this was used for training the ML models provided. We believe that the use of two different versions is not ideal and thus provide a clean spk version that allows to fit Hirshfeld volume charges and energies and forces. The code is provided on github: https://github.com/juliawestermayr/schnetpack
In the jupyter notebooks that provide short instructions and tutorials on how to train ML models, we link to our forked github version of SchNetPack, such that the user can work with one consistent version. It is shown in the tutorial how to use the code to train on energies, forces, and Hirshfeld volume ratios.
The reason why we still provide the SchNet_EV-AuC.zip file is that the modules are named differently, hence the provided models can only be used with this code. Any future model is recommended to be trained with the version used in the tutorial. We also zipped the current version of our forked SchNetPack and uploaded it to figshare.

Comment 10
“No plotting routines are provided. The data presented in the figures in the manuscript is, if provided, not linked to a workflow.
Summary: In summary, two items would help this manuscript tremendously - a "howto" on fitting DFT data to obtain the energies/forces as well as Hirschfeld charges models, and a "sample" use case on calculating energies. The latter of which is currently present in the tests folder, but the reader has to look for it and set up an environment themselves.”

Response:
As requested, we now include several jupyter notebooks and additional files and thank the reviewer once more for his assessment and comments that helped us to make the code and method more accessible.

Referee: 3

Comment 1
“Comments to the Author In this paper by Westermayr et al., the authors work on establishing effective methods modelling of surface nanostructures. They explore a combined approach of using state-of-the-machine-learning interatomic potentials for short-range interactions while developing separate ML methods to predict key parameters of various methods in the Tkatchenko et al. family of vdW-corrections to DFT. The paper is highly detailed and would provide valuable guidance to researchers attempting to model similar structures using MLIP. In particular, the exploration across material space is
interesting, though the performance is quite modest. In many cases, it is the energy difference between different sites that is the most important. I would suggest the authors, complement Table 1 with such numbers, for instance for benzene on Ag(111). It would provide valuable input (or warnings) to potential users.”

Response:
We thank the reviewer for their assessment on our work. We agree that the performance is modest, especially for energies. However, the main purpose of the work, as specified in the original manuscript text, is for high-throughput studies and to assist DFT calculations, not necessarily to replace DFT completely. This was stated in the Conclusions section:
“The goal of this study was to assess the applicability of ML models based purely on reused data from open data repositories without generating a tailor-made training data set. This reflects the realistic application scenario in which a small set of initial geometry optimizations can be used to construct an ML+vdW model that can computationally expedite structural pre-relaxation.”
This means that the current ML models are helpful to screen thousands of structures, but likely might not be helpful, e.g., to determine the energy differences at different adsorption sites of molecules at surfaces. This would likely require better tailormade training data. Especially for the X2O@Ag study, the existing literature data with which the model was trained is insufficient to achieve such a level of accuracy for molecules on surfaces (2,125 data points). For comparison, in recent studies by us and co-workers, several thousand data points were needed for small organic molecules. Here, we deal with organic molecules on surfaces and have much fewer data points. Therefore, the energies for molecules are only assessed at high symmetry surface adsorption sites. As can be seen in Figure 5c, the model cannot reliably distinguish between different sites. The same is true for molecules not included in the training set.
We agree that this should be mentioned more clearly in the manuscript as a potential warning for the user. If the energies are of interest, it is highly recommended to use the models for pre-relaxation only and subsequently to relax the structures with DFT, which can significantly reduce the computational cost. We added the following text in section 3.3 on page 9:
“While the model can distinguish adsorption energies between different molecules, it fails to distinguish the adsorption energies for energetically beneficial local minima of the same molecule at different symmetry sites. Achieving this would likely require more training data than what was provided in the original NOMAD data repository. ”
and 2 paragraphs below
According to literature [24,82,83,85] the most stable symmetry site was selected (indicated in table 1 in the first column) to compare our results to available DFT data in literature and experimental data. We note that models trained on such sparse data will likely fail to reliably predict energy differences between different adsorption sites.

Comment 2:
“One thing I really find missing in the paper is a discussion on whether the approach here is useful or at all relevant for the DFT-D and vdW-DF methods. Could one bypass the need for long-range training with DFT-D? If so, the readers should know this, but maybe also discuss whether DFT-D has had any success in modeling this class of systems. vdW-DFs (not the standard vdW-DF1 and DF2 variants tough) have been quite successful in describing surface nanostructures similar to the ones modeled here. The authors must have missed this in their literature review. But maybe the approach developed here is limited to the Tkatchenko-type methods? If so, this should be stated.”

Response:
Our model is only applicable when using methods, where the vdW correction is an a posteriori correction to an existing density functional approximation, such as is the case for the Tkatchenko-Scheffler methods or the Grimme dispersion corrections (which we have now included and for which we provide examples on how to use them).
The approach in its current form is not directly applicable to vdW-DF methods as the long-range correlation contributes to the variational solution of the Kohn-Sham equations, which we could not account for in a simple additive scheme. This was also shown by other recent works, which we now review in more detail in the introduction and which we reference in the main text. We have further included the following information at the end of the Introduction:
“The method is applicable for the incorporation of additive a posteriori dispersion correction schemes, such as vdW(TS), vdWsurf, and MBD methods that are implemented in Libmbd or the Grimme dispersion methods D3 and D4. [62,63] However, the method in its current form cannot be used for self-consistent vdW corrections such as the vdW-DF family of methods [64,65]. If training on data obtained from vdW-DF methods is sought for, adaptions are needed to accurately model long-range effects, e.g., as it is done in PhysNet or SpookyNet.”

Comment 3
“In the same vein, some discussion on long-range electrostatic effects in the conclusions would be welcome. I understand this method does not account for this? If so, how do the authors suggest we should proceed with accounting for this? Noting that this requires more work in the conclusions should be fine. Or are they not important for surface nanostructures?”

Response:
The reviewer is correct that we do not account for electrostatic effects explicitly, but they are included as long as they are captured by the ML interatomic potential based on the reference method. This will likely not be sufficient for ionic condensed phase systems. In the revised manuscript, we have added a discussion about electrostatics to the conclusion, which is indeed important too. We also point out what is needed to achieve a more accurate description and refer to some recent works on this topic:
“While our method accounts for long-range dispersion interactions, it does not explicitly treat electrostatic interactions. To account for this, the SchNet+vdW approach could be extended in a similar vein by learning partial atomic charges and using these to predict electrostatic long-range-interactions, similar to SpookyNet [54] or Behlers fourth-generation high-dimensional neural networks. [56,88]”

Comment 4
“One thing I find confusing about this paper, is the low computational cost of ML+MBD? It implies that the MBD is less costly than DFT. Is that really the case? If so, I do understand why the DFT-TS or DFT-TS_surf method has not fully been replaced by DFT-MBD. There is probably something I am missing here. Could the authors elucidate?”

Response
The cost of the MBD evaluation is negligible compared to the cost of the DFT calculation. The computational cost of ML+MBD is much lower than DFT+MBD for the systems we describe here. While DFT takes at least hundred CPUhs, the MBD correction alone takes about one minute. However, the vdW(TS) or vdWsurf correction only take 2-4 seconds on a single CPU, which is much less costly than the MBD method. Hence, ML+vdW(TS) or ML+vdWsurf are computationally much cheaper than ML+MBD, but ML+MBD still remains computationally much cheaper than DFT+MBD. We added the computational cost description to the revised SI:
“ML in combination with external vdW corrections is much more computationally efficient than DFT with vdW corrections. vdW(TS) and vdWsurf energy and force evaluations are about 20 times faster than MBD evaluations for the studied systems. MBD furtherrequires much more memory, which represents a computational bottleneck compared to the evaluation of the SchNet-based MLIP.”

Comment 5:
“Minor notes: Calling the DFT+TS method, the vdW method, is poor nomenclature. There are many “vdW methods”.”

Response:
In the revised manuscript, we have made sure that we consistently refer to the original method names, namely vdW(TS) and vdWsurf, respectively

Comment 6:
“In 2.1, the authors write “f is a damping function to avoid double counting of short-range contributions”. This is misleading. First, there are no formal mechanisms in place to avoid double counting. Second, it implies that C_6/r⁶ correctly describes short-range interactions. It does not; rather it is inaccurate and eventually diverges. The damping function serves primarily to mimic short-ranged vdW-interactions in such a manner that it is a good match to the specific exchange-correlation chosen.”

Response:
We have clarified this passage and have removed “to avoid double counting of short range contributions” from the text. We added the following text on page 11:
“… f is a damping function that serves to mimic short-range vdW interactions to match the specific exchange-correlation chosen.”

Comment 7:
“These comments are minor and I certainly recommend this paper for publication. However, my background is from the DFT side and I am unable to provide detailed comments on the MLIP approach beyond the fact that it seems sensible.”

Response:
We once again thank the reviewer for their assessment on our work and hope that we could clarify all raised points.

Round 2

Revised manuscript submitted on 17 May 2022

Editor’s decision letter

03-Jun-2022

Dear Dr Maurer:

Manuscript ID: DD-ART-02-2022-000016.R1
TITLE: Long-range dispersion-inclusive machine learning potentials for structure search and optimization of hybrid organic-inorganic interfaces

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

Reviewer comments

Reviewer 2

Dear Editors and Authors,
The authors have addressed all of my comments, clarified all unclear points, and incorporated most of the suggestions from all reviewers. In my view, the manuscript can be accepted.

Additionally, I had a look at the provided jupyter files, and I believe they make the manuscript a lot better. Anyone interested in this software should now be able to use it, provided they have some python experience.

I have a few minor comments that the authors might want to address, but I was able to get around those issues without much trouble, so in my view a formal "revision" is not necessary:

- The pre-built libmbd, which is a dependency of the Authors' software, is only available on Linux or Mac. This is not the "fault" of the authors, but could be mentioned somewhere in the install instructions, as I had to switch PCs during review...
- The file "AuC_dummydb.db" generated by 1_DataSetGeneration does not seem to contain information about forces. This means the 2_TrainingML notebook crashes. The file that can be downloaded from the archive seems to work fine.
- The meaning of "epoch" in 2_TrainingML is not defined or mentioned in the paper or SI. It might be standard ML language, so forgive me if this is obvious.
- In the 3_Optimization notebook, the "os" package should be imported, and "--modelpath" is an unrecognized parameter to the test_ml.py file. I believe the modelpath should be an argument (i.e. not optional) here.

Best wishes,
Peter

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.

From the journal Digital Discovery Peer review history

Long-range dispersion-inclusive machine learning potentials for structure search and optimization of hybrid organic–inorganic interfaces

Round 1

Reviewer 1

Reviewer 2

Reviewer 3

Round 2

Reviewer 2

Transparent peer review