From the journal Digital Discovery Peer review history

Towards the automated extraction of structural information from X-ray absorption spectra

Round 1

Manuscript submitted on 02 Jun 2023
 

07-Aug-2023

Dear Dr Penfold:

Manuscript ID: DD-ART-06-2023-000101
TITLE: Towards the Automated Extraction of Structural Information from X-ray Absorption Spectra

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

The authors have applied machine learning to XANES spectra to obtain local structural information, in particular, the pseudo-radial distribution function, which is the same information that is obtained by EXAFS. As the authors describe in the introduction, the application of machine learning to XANES has been reported previously. However, the previous works have largely focused on classification models, which translate spectra into structural properties such as coordination numbers. In this study, the authors set out to predict the pseudo-radial distribution function using a convolutional neural network (CNN).

However, I can easily find the following reports that have also achieved predicting the radial distribution function from near-edge information:

S. Kiyohara et al., "Radial Distribution Function from X-ray Absorption near Edge Structure with an Artificial Neural Network," J. Phys. Soc. Jpn., 2020, 89, 103001, doi:10.7566/JPSJ.89.103001
F. Iesari et al., "Extracting Local Symmetry of Mono-Atomic Systems from Extended X-ray Absorption Fine Structure Using Deep Neural Networks," Symmetry 2021, 13(6), 1070; https://doi.org/10.3390/sym13061070
M. Higashi et al., "Extraction of Local Structure Information from X-ray Absorption Near-Edge Structure: A Machine Learning Approach," Materials Transactions, (2023), doi.org/10.2320/matertrans.MT-MG2022028
These reports all successfully predict the radial distribution function from XANES, and they are not cited in the present manuscript. Therefore, it is difficult to see the novelty of the present manuscript. I believe that the manuscript must be revised to highlight the novelty of the present study.

Reviewer 2

This manuscript reports a machine learning work that translates a hard-to-interpret XANES spectrum to a pseudo radial distribution function. The technical detail is well described, and the performance is systematically characterized. The authors also applied to experimental spectra and reasonably explained performance. This work is not only valuable in itself but could also inspire further development in the field. The paper should be published after a few minor text edits listed below, which could otherwise be printed as is.

1) On Page 3, Section 2.2, in the 2nd paragraph, there is a typo in "This descriptor has previously been used in the reserve problem". The word "reserve" should be "reverse".
2) The labels (a) and (b) for the two subfigures in Figure 2 are missing.
3) Figure 3 is training details and is not necessary for the main text.
4) In the discussion of Figure 8, the lacking of response in G2 wACSF to pre-edge was attributed to the limitation of the training set. But it should be made that this failure can also be a consequence of unphysical changes in the spectrum shape test. A coordination number will unavoidably also change the white line shape in addition to the pre-edge intensity. A pre-edge only change has the possibility to create a spectrum that doesn't exist in the real world.

Reviewer 3

This work by Tudur et al. is a fascinating original entry into the growing field of ML in XAS and is suitable for Digital Discovery. The following comments are for the improvement of the work.

My main question surrounds the wACSF descriptor and its relationship to the pair distribution function. The answers to the following questions will help interpret your results.

How is the wACSF and radial distribution function related? You have probably worked with both; what are the benefits of using wACSF?

Can you deconvolute the wACSF into partial contributions in multielement systems?

How does the wACSF take dynamics into account? Is it calculated on static structures? What does the broadening represent if not vibrational motion if calculated on static structures?

The pseudo coordination number is mentioned multiple times, but the exact definition is not provided. Is it the integrated area of the wACSF in some bounds? Does it correspond to the usual coordination number (integrated area of radial distribution function peaks)?

How are the expected wACSF descriptors calculated for the experimental data? What structures were used? If only static structures are used, e.g., a cif file, it could be the source of your disagreement. Presumably, the experimental data contains more structural variation than a static structure.

The experimental comparison is a little shaky to me (no offense, I understand it is the nature of this type of work for which the target descriptor is not available via experiment) considering that wACSF descriptors need to be calculated on coordinates, which need to come from somewhere (theory, XRD refinement, EXAFS inversion), so I could imagine that being a source of bias compared to the prediction. Can you clarify the details of your procedure for generating experimental wACSF? Comparing the bond distances extracted from the analysis is a very good start, but can you comment on the origin of the distances (are they obtained experimentally with EXAFS)? I would be concerned about bias if they are obtained via theoretical models (especially the ones you are calculating wACSF on). Another option might be to compare CNs if you can convert pseudo-CNs into real CNs. Another option might be to compare the wACSF to XRD-PDF (total scattering).

How is the experimental data processed for use in your method? For example, data normalization and background subtraction (what happens with amplitude reduction factor?) and how is alignment performed?

Reviewer 4

The authors have introduced a convolutional neural network (CNN) capable of translating XANES spectra into a pseudo radical distribution function, utilizing G2 terms within a wACSF descriptor. This translation of spectra into structural insights holds significant importance, as traditionally, decoding experimental data of this nature demands intricate calculations. The presented machine learning approach represents a significant step towards faster and more reliable methods, making valuable contributions to the field for both experimentalists and theoretical chemists. Therefore, I recommend publication after minor revisions, particularly concerning the source data.

* Q1: There is a minor discrepancy in the number of structure-spectrum pairs mentioned in the article and the actual content of the file 'theory-dataset.npz' in 'spectrum-structure.zip' on GitLab (ref 26). The text mentions 36,657 spectra-structure pairs, but the file contains only 36,342 pairs.

* Q2: On their GitLab page, the authors mention that the structures are in the form of the wACSF descriptor and can be read as described in the file. However, the provided README file does not offer the correct instructions. To address this, the authors could include a small Python script with essential lines to read the data correctly. For example:

```python
import numpy
data = numpy.load('theory-dataset.npz')
wACSF_0 = data['x'][0]
spectrum_0 = data['y'][0]
```

* Q3: It would be beneficial if the authors briefly explain in the readme the meanings of the shapes of the respective properties, particularly the 'x' with a shape of (51,) and the 'y' with a shape of (475,). Clarification on the energy-axis of the 475 spectra values would also be helpful.


 

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters:

To the Editor,
We would like to thank the Reviewers for their time and their valuable comments about our work. Please find this revised version of our manuscript attached. We address the comments made by each of the Reviewers below, and detail the changes that we have made to our manuscript in response.

Reviewer 1
• I can easily find the following reports that have also achieved predicting the radial distribution function from near-edge information: S. Kiyohara et al., "Radial Distribution Function from X-ray Absorption near Edge Structure with an Artificial Neural Network," J. Phys. Soc. Jpn., 2020, 89, 103001 doi:10.7566/JPSJ.89.103001 F. Iesari et al., "Extracting Local Symmetry of Mono-Atomic Systems from Extended X-ray Absorption Fine Structure Using Deep Neural Networks," Symmetry 2021, 13(6), 1070; M. Higashi et al., "Extraction of Local Structure Information from X-ray Absorption Near-Edge Structure: A Machine Learning Approach," Materials Transactions, (2023). These reports all successfully predict the radial distribution function from XANES, and they are not cited in the present manuscript. Therefore, it is difficult to see the novelty of the present manuscript. I believe that the manuscript must be revised to highlight the novelty of the present study.

We thank the reviewer for these references which we had missed. The paper from F. Iesari et al. is focused upon ML for EXAFS analysis, and therefore has different objectives from the present work. The papers from Kiyohara et al. and Higashi et al. do indeed seek to convert XANES spectra into radial distribution functions at the Oxygen K-edge. Both of these use multi-layer perceptron models and their analysis is based, as with other papers referenced in the manuscript, entirely on theoretical spectra without application or analysis of performance on experimental data. Neither also seeks to analyse the uncertainty in the predictions made by the network. To correctly reflect their contribution to the field we have added these latter two references to the resubmitted manuscript, but given the above discussion feel that the novelty of the present manuscript is clearly demonstrated.

Reviewer 2
• On Page 3, Section 2.2, in the 2nd paragraph, there is a typo in "This descriptor has previously been used in the reserve problem". The word "reserve" should be "reverse".

This has been corrected

• The labels (a) and (b) for the two subfigures in Figure 2 are missing

These have been added

• Figure 3 is training details and is not necessary for the main text.

This has been moved to the supporting information

• In the discussion of Figure 8, the lacking of response in G2 wACSF to pre-edge was attributed to the limitation of the training set. But it should be made that this failure can also be a consequence of unphysical changes in the spectrum shape test. A coordination number will unavoidably also change the white line shape in addition to the pre-edge intensity. A pre-edge only change has the possibility to create a spectrum that doesn’t exist in the real world.

The referee is correct and we have added a discussion on this in the resubmitted manuscript.

Reviewer 3
• How is the wACSF and radial distribution function related? You have probably worked with both; what are the benefits of using wACSF?

The G2 wACSF and radial distribution function will contain similar information and both represent a distance distribution of atoms with respect to the absorbing atom.
Given the definition of the G2 wACSF:
Gi2 = XZj ·fc(rij)·exp−η(rij−µ)2 (1)
j,i
and the radial distribution function (see J. Phys. Soc. Jpn. 89, 103001 (2020)):
drn(R)
g(R) = 4 πR2ρdR (2)
The main difference between the two will be the atomic weight contributions in the G2 wACSF. As different elements will exhibit different backscattering amplitudes, this distinction between atomic contributions is advantageous.
While we anticipate, given the similarity between the two descriptions, that both would yield similar performance in the present work, we have chosen the wACSF descriptor as this has shown high performance for the forward problem (i.e. structure to spectrum; J. Chem. Phys. 156, 164102 (2022)). Consequently as a future goal will be to ensure networks which are internally consistent with respect to forward and reverse transformation, we have retained the wACSF descriptor.

• Can you deconvolute the wACSF into partial contributions in multi-element systems?

This is possible. The G2 functions are represented on a grid of 50 points, while differences in atomic number are accounted for using the weighting function. Alternatively, in line with the original ACSF descriptor of Behler (see J. Chem. Phys. 134, 074106 (2011)), a different grid could be used for each atomic species (or similar atomic species given the Z±1 limit in sensitivity of XANES spectra). This would achieve an output in terms of partial contributions and could be the focus of future work. However, given the present performance with experimental data, we felt this was too much detail for the present network and the initial focus should be upon improving the overall performance in this regime before seeking to extract finer details from each spectra.

• How does the wACSF take dynamics into account? Is it calculated on static structures? What does the broadening represent if not vibrational motion if calculated on static structures?

They are static structures. The broadening arises from the formulation of the wACSF terms described in J. Chem. Phys. 156, 164102 (2022). The term fc(rij)·exp−η(rij−µ)2 represents a series of overlapping Gaussians whose overall weight is controlled by the decaying cutoff function, fc. A single distance will contribute to multiple Gaussians, with a different weight and the size of broadening will therefore be controlled by the width of the Gaussians (η in the main paper) and grid spacing, i.e. the number of G2 functions.

• The pseudo coordination number is mentioned multiple times, but the exact definition is not provided. Is it the integrated area of the wACSF in some bounds? Does it correspond to the usual coordination number (integrated area of radial distribution function peaks)?

This described in the paper, at the bottom of page 3. The pseudo-coordination number is the number of atoms within 2.5 Å of the absorbing atom. The presence of compounds containing 1 or 2 Cyclopentadienyl lignds makes a defining a formal coordination number, such a tetrahedral or octrahedral difficult, hence the use of pseudo-coordination number, as defined above.

• How are the expected wACSF descriptors calculated for the experimental data? What structures were used? If only static structures are used, e.g., a cif file, it could be the source of your disagreement. Presumably, the experimental data contains more structural variation than a static structure.

The experimental data used has been carefully chosen for systems which have well characterised single component systems with structures reported. These single static structures are obtained from either crystallography or fitting the XANES spectra reported in the paper. All of the relevant citations can be found in the supporting information and the structures used can be downloaded from the GitLab repository. While this could potentially be a source of error, it is expected to be small, given the spectra and systems chosen.

• The experimental comparison is a little shaky to me (no offense, I understand it is the nature of this type of work for which the target descriptor is not available via experiment) considering that wACSF descriptors need to be calculated on coordinates, which need to come from somewhere (theory, XRD refinement, EXAFS inversion), so I could imagine that being a source of bias compared to the prediction. Can you clarify the details of your procedure for generating experimental wACSF? Comparing the bond distances extracted from the analysis is a very good start, but can you comment on the origin of the distances (are they obtained experimentally with EXAFS)? I would be concerned about bias if they are obtained via theoretical models (especially the ones you are calculating wACSF on). Another option might be to compare CNs if you can convert pseudo-CNs into real CNs. Another option might be to compare the wACSF to XRD-PDF (total scattering).

This comment appears to cover many of the points addressed individually above. We agree with the reviewer that the requirement to compute the wACSF descriptors for the experimental data is a potential source of error. This is why, as stated in the response to the previous question, we have specifically only chosen experimental spectra for which a detailed analysis or refinement of the structure is provided. The structure used to generate the experimental wACSFs have been uploaded to the GitLab repository discussed in the main text. In terms of coordination number (real or pseudo) as shown in the predicted wACSF of the experimental data there remains a larger uncertainty in the height of the peaks, even if their positions are accurate .This makes assessing the coordination numbers challenging. In addition, as these peaks are weighted by the atomic number of the atoms involved, a unambiguous determining of coordination number will not always be possible.

• How is the experimental data processed for use in your method? For example, data normalization and background subtraction (what happens with amplitude reduction factor?) and how is alignment performed?

The experimental data is digitised directly publications with the references provided in the supporting information. Each spectrum was then interpolated using a cubic spline between 7112.5-7060.0 eV with 475 points. These spectra are freely available on our GitLab page as described in the manuscript.

Reviewer 4
• There is a minor discrepancy in the number of structure-spectrum pairs mentioned in the article and the actual content of the file ’theory-dataset.npz’ in ’spectrum-structure.zip’ on GitLab (ref 26). The text mentions 36,657 spectra-structure pairs, but the file contains only 36,342 pairs.

We apologise for this error and and corrected it in the resubmitted manuscript.

• On their GitLab page, the authors mention that the structures are in the form of the wACSF descriptor and can be read as described in the file. However, the provided README file does not offer the correct instructions. To address this, the authors could include a small Python script with essential lines to read the data correctly.

We apologise and have corrected this.

• It would be beneficial if the authors briefly explain in the readme the meanings of the shapes of the respective properties, particularly the ’x’ with a shape of (475,) and the ’y’ with a shape of (51,). Clarification on the energy-axis of the 475 spectra values would also be helpful. This has also been added to the GitLab Readme.




Round 2

Revised manuscript submitted on 10 Aug 2023
 

29-Aug-2023

Dear Dr Penfold:

Manuscript ID: DD-ART-06-2023-000101.R1
TITLE: Towards the Automated Extraction of Structural Information from X-ray Absorption Spectra

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry


 
Reviewer 1

This would be acceptable for publication.

Reviewer 4

I am satisfied with the revisions that have been made to the manuscript. The modifications, particularly those reflected in the README.md file, have significantly improved the comprehension of the data's structure and utilization. In light of these changes, I recommend the publication of the manuscript without any additional modifications.

Reviewer 3

I thank the authors for thoroughly answering my questions about their methods! The edits to the main text are appropriate. The author's methodology is novel and their results will be very interesting to others investigating the link between XANES and structure. I also appreciate that they are transparent regarding sources of error, and the data and code are available for those interested. I recommend publication.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license