From the journal Digital Discovery Peer review history

Spinel nitride solid solutions: charting properties in the configurational space with explainable machine learning

Round 1

Manuscript submitted on 05 May 2022
 

12-Jul-2022

Dear Dr Butler:

Manuscript ID: DD-ART-05-2022-000038
TITLE: Spinel nitride solid solutions: charting properties in the configurational space with explainable machine learning

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy from CASRAI, https://casrai.org/credit/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

The authors present a DFT and ML-based approach to compute the energies and band gaps of GeSn2N4 spinel in a variety of cation configurations. Overall, I find the work to be technically sound but not sufficiently impactful or insightful to warrant consideration in Digital Discovery. While the methods used by the authors appear to be applied correctly, there is not a substantial advance presented over the state of the art for computing the configurational effects of inorganic crystals.

Essentially, the authors show that a rather typical cluster expansion is successful for predicting low-energy configurations of gamma-GeSn2N4 and the configurational dependence on bandgap. The authors have used ~1000 supercell calculations of Ge/Sn configurations in this crystal structure as inputs to the implemented models, so it is not surprising to me that they are able to develop effective models for the energy and band gap of these materials.

This work is interesting from a fundamental inorganic chemistry / materials science standpoint in terms of understanding cation disorder in nitride spinels (specifically this particular nitride spinel), but it does present a significant advance in the space of data-driven computational techniques to characterize these materials. As such, my recommendation to the authors is to consider submission to an alternative journal (e.g., PCCP, JMCC, J Chem Phys, or similar).

Reviewer 2

The work studied the properties of spinel nitride solid solution using density functional theory (DFT) and machine learning (ML). DFT calculations are used to evaluate the mixing energies and bandgaps (HSE). While three descriptors (CME, MBTR and CCFs are examined for different ML models such as LR, GBDT and MLP. The main finding is that formation energies are dominated by variations in local crystal structure (linear model suggested) while bandgaps are dominated by more extended structural motifs (non-linear model suggested); identified important configurations of GeSn2N4 with extreme properties; and the inversion degree would influence the configuration energies of the spinel nitrides.

The paper is generally well-written. The DFT calculations and data preparations appear valid and accurate. The ML models trained by DFT results exhibit good performance with MAE of a few meV. The models show a potential to predict bandgap of this material with composition tuning. The paper is publishable if the authors can address the following concerns.

1. In the DFT part, is there any reason U=2 is chosen for the PBE+U calculations? As it seems it remains off from experimental band gap. Have the authors tried to fit the band gap of the materials by varying the U value?

2. Basic DFT parameters should be given in the method part, e.g., force and energy convergent criteria, size of supercell, etc.

3. More discussions should be given on how to generate the random configurations for DFT calculations and how to ensure the homogeneous distribution of different inversion degrees. How does the random approach differ from the SQS approach when generating the random configurations?

4. More descriptions of the ML models such as those parameters used should be given.

5. Could the model predict the bandgap if the input configuration of the material contains vacancy defects?

Reviewer 3

This is an interesting paper which pushes forward the methodology for studying solid solutions and disordered compounds such as these spinels. The number of arrangements of the cations is very large - usually too large even when the unit cell is moderately sized- for explicit evaluation of the properties of all the different arrangements. Thus one has in the past had to restrict the averaging to just some configurations - hopefully those low in energy. This paper shows how machine learning can offer a promising way forward to examine the whole configurational space, or at least does so for this particular system. The key result - the calculated inversion as a function of temperature - is in Figure 7b. The paper is generally well written and clear.

However it is not clear to me how degeneracies are dealt with in the equations on page 18?

Reviewer 4

## General Feedback

very interesting work!

## Strong points

- the interpretation of feature importance in sec 3.3 is well done and interesting
- especially the increased importance of higher body-order features for accurately predicting band gaps is a useful insight and definitely something the field should look into more
- the investigation of disorder from the full configurational energy spectrum
- especially the insight that rapid quenching vs slow annealing might be used to tune the spinel's band gap to the requirements of specific photovoltaic applications

## Suggestions for improvement

- the description of model hyperparameters is weak
- what non-linearity was used in the MLP?
- how many trees did the gradient-boosted regressor use?
- what were the loss functions and/or learning rates for each model?
- add a sentence to explain the selection process for the subset of 59 configurations for which HSE bandgap calculations were run
- did you randomly sample from the 1013 DFT relaxed configurations? or was there some selection process at play to maximize configurational diversity?

## Questions

- how was the max cluster order of 3 for the CCF descriptor chosen?

- is the following statement referring to fig. 6.a) really true (emphasis added)?
> The marked dependence of the mixing energy (**but not of the bandgap**) on the inversion degree is apparent from the plot.

from the plot it looks like there's a weaker but still non-zero correlation between band gap and inversion degree y. might be worth quantifying this statement by adding Pearson correlations coefficients for both mixing energy and band gap w.r.t. y to the text.

having read on, i think this is actually a crucial point for this work since one of the main conclusions states:

> Our combined DFT and ML model allows to predict that the bandgap of this solid solution can be potentially tuned via feasible modifications of the cation distribution in the system.

this statement requires that the band gap be a function of the inversion degree at least to some extent. else how would any effects of thermal history during synthesis on inversion degree be a potential tool for tuning the band gap?

- are there any indications, say prior work that could be cited to support the hypothesis in sec. 3.6 that using rapid quenching after synthesis would prevent the return to the energetically favored state of full inversion at room temperature? any prior knowledge of transition energy barriers towards full inversion?

- have the authors investigated the top-right blue point in fig. 6a? does looking at the structure give insights on why it is a large outlier i.t.o. mixing energy?

- if i understood correctly, fig. 7.b) uses Boltzmann averaging over all 4222-point configurations, 3/4 of which were predicted by the ML models? does the curve doing the same Boltzmann averaging but using only the 1013 DFT calculations used to train the ML models result in roughly the same curve?

## Nitpicks

- GitHub repo link under data availability is broken, links to `https://github.com/pablos-`
- repo does not have a `.gitignore` file to exclude cache files like `.ipynb_checkpoints`
- calling it 'gradient-boosted decision tree (GBDT)' might be confusing as it suggests you're using a single tree, not an ensemble model (esp. since you didn't mention the size of the ensemble)
- section numbering is off in places, 3.6 directly follows 3.3, there's no 3.4 or 3.5

## Style Suggestions

- turn table 2 into a heatmap of model errors (.e.g blue table cells for low error, red cells for high error)
- maybe show smaller super cells in fig. 6.b) for visual clarity

Reviewer 5

The manuscript titled "Spinel nitride solid solutions: charting properties in the configurational space with explainable machine learning" by Sánchez-Palencia, Hamad, Palacios, Grau-Cespo and Butler compares different ML descriptors and models. It is a very interesting topic, which is also quite relevant and important for the computational discovery of new materials. While I enjoy the concept and approaches, the manuscript is written with some missing information that makes it difficult to follow exactly what the authors did. More specific points are given below.

page 5: It would be beneficial to add a figure illustrating the inversion degree for AB2X4.

page 6: As the work is on crystal structures with periodic boundary conditions, the descriptors mentioned (CME, MBTR and CCFs) should mention the cutoff diameters used. For example, CCFs can have a drastically different number of descriptors depending on the cut-off diameters for 2-, 3-, 4-body clusters. What were the cutoff size (N-body) and their diameters?

page 7: For LASSO, GBDT nad MLP, please specify which software package (e.g., scikit-learn) was used.

page 8: It would be helpful if the authors provide some physical insights on why adding the inversion degree to the linear fit improves the cross-validation.

page 9: The description of how the mixing energy is calculated is a bit vague. Could the authors elaborate on how it is done in a more precise manner?

page 11: It was not immediately clear that Figure 3 was referring to the CCF-based model. Please be specific in the discussion and figure caption.

page 11: The sentence "it can be seen that errors in the prediction of bandgaps are mainly concentrated in extremal configurations..." is unclear to me. Please elaborate what is meant by the extremal configurations.

page 12 & 13: It is not explained which model (MLP, GBDT or LR) is used for the SHAP analysis, making it difficult to understand or reproduce the reported results. Please clearly specify which model is used for Fig. 4b and c.

page 14: "... the next clusters in terms of relevance are clusters from 5 to 9, all clusters of order 2" It was mentioned on page 13 that the site-pair clusters are from 3 to 9. Please double-check.

page 14: I am only guessing that the authors used LR for mixing energies and MLP for bandgap for Figure 4 based on the values given in Table 2 (no explanation given). Based on this assumption, I would argue that SHAP analysis on mixing energy is not really needed as one can look at the ECI values from in Fig. 4a to deduce the same information (It is essentially what people do when constructing and analyzing cluster expansion models). If the authors used MLP for Fig. 4c, they should also include the coefficient values for LR fitted to bandgap (just like Fig. 4a but coefficients found for bandgaps). The discussion would be much more meaningful when comparing the coefficient values for LR (and their importance deduced from it) and the SHAP analysis for MLP.

page 14: "It is interesting to see too that the difference between the most important clusters and the following ones is not as pronounced.." This sentence is unclear. What do you refer to when saying difference? Is it the difference in SHAP values? Also, what do you mean by the "following ones"?

page 14 & 15: Is the covariance matrix in Fig. 5 based on the coefficient values shown in Fig. 4a (for mixing energy)? Please elaborate on how you got the matrix.

page 15: "We now illustrate the usefulness of being able to evaluate properties in the full configurational space..." I believe evaluating the full configurational space will be almost impossible, unless you are limiting the realizable configurational space to something relatively small, say 10x10x10 supercell. If you are really after the "full configurational space", you will face the combinatorial explosion rather quickly even for a relatively simple system you are investigating.

page 18 & 19: I am a bit puzzled to read this part of the manuscript. The authors essentially demonstrated that the cluster expansion model outperforms other methods (with an exception of CCF + MLP for bandgap, although the CCF + LR is quite similar). Once that comparison was done, the rest of the manuscript is essentially a discussion of the cluster expansion model. Then, why don't the authors use the constructed cluster expansion model to perform a Monte Carlo simulation on a much larger cell (e.g., 20x20x20 cell) and perform a rigorous sampling, either as a manual search or simulated annealing? I don't fully understand exactly what work went into Figure 7 because it was not discussed (which the authors are advised to elaborate). However, he caption says that it is based on the 4222-point configurational space. This may sound like a lot, but it is only a space covered by a unit cell. Many cluster expansion-based analyses (which the manuscript is doing) perform a much more rigorous sampling because the finite size effect can be quite substantial. As the authors claim, the true power of the ML (or cluster expansion) model is in its accelerated speed, so it does not make much sense to me to restrict the space to only 4222 configurations, where one can easily cover more than millions of configurations.

page 20: "...which means that clusters expansions of the bandgap, even online ones, require a very large cluster basis." The clusters used by the authors (up to three-body clusters with diameters up to 6.26 Å) are not considered large cluster basis at all. It is quite typical to have such settings, and it is not uncommon to have even higher thresholds (say four- or five-body clusters with diameters up to 7 Å or higher) for complex systems, even when bandgaps are not considered.

Reviewer 6

Pablo Sánchez-Palencia and colleagues present an intriguing study of the influence of configuration space on the properties of solid solutions by combining density functional theory and machine learning with a single GeSn2N4 composition. The idea of investigating solid solution property variation as a function of cation distribution in a single composition rather than engineering properties through composition changes appears very promising. We believe that the methodology proposed is easily transferable to other solid solutions and can tune target properties in a fraction of the computational time. The authors validate the approach's utility by predicting band gaps and mixing energies with remarkable accuracy using various cutting-edge ML models with descriptors such as CCF, MBTR, and CME.

The manuscript is undoubtedly well written and appealing, but we have some concerns about the data and codes' reproducibility. After implementing our suggested changes, I believe that the work will be suitable for acceptance. I also note here that I was not able to assess the Supplementary Information as it was not provided.
I now go through the checklist of the journal and add some more detailed comments below.

1) Data Sources
1a. Are all data sources listed and publicly available?
No. The raw data from VASP calculations are not provided publicly. Only final extracted data is provided in the form of CSV. It would be great if the authors could provide these data files and a script to extract data used for ML models. Providing a version number for VASP would be important as well. One can upload the most important data on zenodo.org.

1b. If using an external database, is an access date or version number provided?
Not applicable

1c. Are any potential biases in the source dataset reported and/or mitigated?
Not applicable

2) Data cleaning

2a. Are the data cleaning steps clearly and fully described, either in text or
as a code pipeline?
Not applicable


2b. Is an evaluation of the amount of removed source data presented?
No.
The authors present a very nice correlation between GGA+U and HSE band-gaps, however, it is not really clear on what criteria the set of 59 structures is selected for creating the LR model for predicting HSE band-gaps. Maybe some additional information about this selection would be helpful. Could you provide the code for the selection?

2c. Are instances of combining data from multiple sources clearly identified,
and potential issues mitigated?
Not applicable

3) Data representations

3a. Are methods for representing data as features or descriptors clearly
articulated, ideally with software implementations?

No.
Software implementation scripts for generating descriptors do not have correct text descriptions (scope) at the top. It would be great if the descriptive text were updated accordingly. It would also be great to add more comments describing what exactly is implemented. Also, no script is provided for obtaining cluster correlation functions (CCF) descriptors. No version numbers are provided which will hinder reproducibility. Absolute paths also hinder fast reproducibility.


3b. Are comparisons against standard feature sets provided?
Not applicable


4) Model choice
4a. Is a software implementation of the model provided such that it can be trained and tested with new data?
Yes.


4b. Are baseline comparisons to simple/trivial models (for example, 1-nearest neighbour, random forest, most frequent class) provided?
Yes. LR comparisons are provided

4c. Are baseline comparisons to current state-of-the-art provided? Yes

5) Model training and validation
5a. Does the model clearly split data into different sets for training (model selection), validation (hyperparameter optimization), and testing (final evaluation)?
No
Agreed that the authors split the data into train-validation-test sets in the 80-10-10 but I would have expected more information on the hyperparameter selection. The authors partially refer to earlier work, but I would have expected that the selection is shown based on the validation set.

5b. Is the method of data splitting (for example, random, cluster- or time- based splitting, forward cross-validation) clearly stated? Does it mimic anticipated real-world application?
Yes
5c. Does the data splitting procedure avoid data leakage (for example, is the same composition present in the training and test sets)?
Yes

6) Code and reproducibility
6a. Is the code or workflow available in a public repository? Yes
However, the code should be archived (e.g., on zenodo.org) and it should have a version number.
6b. Are scripts to reproduce the findings in the paper provided?
No.
E.g., Scripts to reproduce Figures 1b, 2a, 2b, 6a,7a, and 7b are not included.

In the Jupyter notebooks for LR and MLP models (GeSn2N4_ML/blob/main/ML_models /*.ipynb ), the purpose of the last code block is not clear to me. If that code block is known to not work without a loaded pickle file, would it not be better to load the necessary pickle file with the required data to generate the plot?

General comments:
1) It would be great to provide a requirements.txt file with version numbers that could be used to install the necessary python packages to run the scripts easily. Please add all version numbers.
2) The code provided consists of an absolute path and is not os independent. Thus, it cannot be used directly; one needs to modify the scripts to get it to work. It would be much better if one used the 'os' module of python to include relative paths.
3) It would be much better to have another script that reproduces the plots as presented in the paper rather than having it scattered around in different scripts.
4) Supplementary Information is currently missing but referenced in the manuscript.
5) Code needs to be saved in a long-term archive such as zenodo.org and it should have clear versioning.
6) Please provide the VASP raw data.
7) Hyperparameter selection is not clear and more comments could be added to the code to reproduce the results from the paper. Missing scripts should also be included.

Overall, I am suggesting a minor revision to provide the missing information and update code and data accordingly.


 

Dear Dr. Hippalgaonkar,
We would like to resubmit our manuscript after having considered and responded to the reviews below. This has been a very thorough review process and we would like to express gratitude to all of the reviewers who have given their time to the process. We feel that thanks to their efforts and expertise the paper is now in an even stronger position.

In the uploaded document we go through all reviews addressing each point raised. Our responses are in blue, the reviews are in black. In the manuscript we have also highlighted changes in blue. We have also indicated in the review where in the manuscript the appropriate changes can be found.

Yours sincerely

keith Butler

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters:

Referee: 1

Comments to the Author
The authors present a DFT and ML-based approach to compute the energies and band gaps of GeSn2N4 spinel in a variety of cation configurations. Overall, I find the work to be technically sound but not sufficiently impactful or insightful to warrant consideration in Digital Discovery. While the methods used by the authors appear to be applied correctly, there is not a substantial advance presented over the state of the art for computing the configurational effects of inorganic crystals.
Essentially, the authors show that a rather typical cluster expansion is successful for predicting low-energy configurations of gamma-GeSn2N4 and the configurational dependence on bandgap. The authors have used ~1000 supercell calculations of Ge/Sn configurations in this crystal structure as inputs to the implemented models, so it is not surprising to me that they are able to develop effective models for the energy and band gap of these materials.
This work is interesting from a fundamental inorganic chemistry / materials science standpoint in terms of understanding cation disorder in nitride spinels (specifically this particular nitride spinel), but it does present a significant advance in the space of data-driven computational techniques to characterize these materials. As such, my recommendation to the authors is to consider submission to an alternative journal (e.g., PCCP, JMCC, J Chem Phys, or similar).
We are glad that the reviewer agrees with the technical validity of our work. We do not agree about the impact of this work however, and note that five other reviewers agree with us. While it is true that the method that ends up giving the best result in the case of the energy prediction is the standard cluster expansion, it is not clear a priori that this must be the case. In fact, it was not the case for the bandgap prediction (where the optimal model is non-linear), and in previous work it was not the case even for the total energies (we recently found that Coulomb matrices performed better than cluster correlation functions as descriptors to predict configuration energies in a more ionic solid solution, MgO-ZnO). Given the huge computational cost of predicting properties in the configurational space of alloys, and the potential of machine learning techniques to vastly accelerate that task, we believe that our study questioning the optimal ML models and descriptors for each property is important. We have also not seen very much work in materials science applying methods from explainable machine learning to extract physical insights from ML models – in that aspect we think that this work is rather novel and shows the utility of a tool from the computer science community to work in materials science.
Referee: 2

Comments to the Author
The work studied the properties of spinel nitride solid solution using density functional theory (DFT) and machine learning (ML). DFT calculations are used to evaluate the mixing energies and bandgaps (HSE). While three descriptors (CME, MBTR and CCFs are examined for different ML models such as LR, GBDT and MLP. The main finding is that formation energies are dominated by variations in local crystal structure (linear model suggested) while bandgaps are dominated by more extended structural motifs (non-linear model suggested); identified important configurations of GeSn2N4 with extreme properties; and the inversion degree would influence the configuration energies of the spinel nitrides.
The paper is generally well-written. The DFT calculations and data preparations appear valid and accurate. The ML models trained by DFT results exhibit good performance with MAE of a few meV. The models show a potential to predict bandgap of this material with composition tuning. The paper is publishable if the authors can address the following concerns.
Thank you to the reviewer for taking the time to read and asses our work. We are clad that you find the work to be of interest and publishable. We are grateful to you for having picked up on some missing details and offering us some interesting thoughts on how this kind of model may be applied in future.
\1. In the DFT part, is there any reason U=2 is chosen for the PBE+U calculations? As it seems it remains off from experimental band gap. Have the authors tried to fit the band gap of the materials by varying the U value?
It is right that we should be clearer about this choice. An extensive number of preliminary calculations were performed to find the best methodology to describe this system. From those we saw that PBE+U plus HSE calculations was the best option to describe the system in an affordable way in terms of computational resources. A search for the best U value was carried out, not only looking for optimal bandgaps but also for sensible geometries, and taking care to avoid distortions of the relative position of the different levels in the electronic configuration. That analysis led to choose moderate U values. Note that the HSE calculation done at the PBE+U (with Ueff=2 eV) geometry does provide accurate band gaps in comparison with experiment. That outcome is more important than achieving accurate band gap directly from PBE+U, since anyway we can transform PBE+U gaps to HSE gaps very easily, as demonstrated in our work. A detailed table with lattice parameters and bandgap values for the different U values tested has been included in the supplementary material as Table S1.
\2. Basic DFT parameters should be given in the method part, e.g., force and energy convergent criteria, size of supercell, etc.
Force and energy convergence criteria have been included in section 2.1 as suggested. A more detailed description of the cell used has been included at the beginning of section 2.2, which is not a standard supercell but the convention cubic unit cell for this structure.
\3. More discussions should be given on how to generate the random configurations for DFT calculations and how to ensure the homogeneous distribution of different inversion degrees. How does the random approach differ from the SQS approach when generating the random configurations?
Thank you for raising this point. We have clarified that the configurations for the construction of models for predicting mixing energy and bandgap from configuration were chosen at random, uniformly across the inversion degrees. The configurations for the GGA -> HSE band gap conversion were sampled uniformly spaced across the GGA band gap range. This procedure is explained accordingly at the end of section 2.2.
\4. More descriptions of the ML models such as those parameters used should be given.
Some more details on the parametrization of the models have been added in section 2.3, where those are defined. More complete information about the models and their parameters can be found in Table S2, which has been added to the supplementary material.
\5. Could the model predict the bandgap if the input configuration of the material contains vacancy defects?
This is an interesting thought for future work. It is unlikely that the current models with no modifications would be able to deal with a scenario that is so far out of the training distribution. For example one may have defects that result in mid-gap states, but the model has never seen either the defect geometry nor midgap states and would therefore almost certainly breakdown for that scenario. However, extending the methods to look at for example vacancies is a very interesting one and would certainly be worth exploring.
Referee: 3

Comments to the Author
This is an interesting paper which pushes forward the methodology for studying solid solutions and disordered compounds such as these spinels. The number of arrangements of the cations is very large - usually too large even when the unit cell is moderately sized- for explicit evaluation of the properties of all the different arrangements. Thus one has in the past had to restrict the averaging to just some configurations - hopefully those low in energy. This paper shows how machine learning can offer a promising way forward to examine the whole configurational space, or at least does so for this particular system. The key result - the calculated inversion as a function of temperature - is in Figure 7b. The paper is generally well written and clear.
Thank you for your positive assessment of our paper. We are really glad that you are able to see the impact of this work and that the key results are of interest to you. We are also grateful to you for picking up on the omission of a degeneracy factor from the equations highlighted.
However it is not clear to me how degeneracies are dealt with in the equations on page 18?
The probability expression should be multiplied by the degeneracy, this is corrected in the revised manuscript. Only the expression typed in the manuscript was wrong, the analysis had been done correctly, as the correct equation is implemented in SOD.

Referee: 4

Comments to the Author ## General Feedback very interesting work!
Many thanks to the reviewer for taking the time to read our work – we are glad that you have enjoyed it. We would also like to thank you for raising several interesting and thoughtful points regarding the results that we have presented. We have addressed the points below.
## Strong points
- the interpretation of feature importance in sec 3.3 is well done and interesting
- especially the increased importance of higher body-order features for accurately predicting band gaps is a useful insight and definitely something the field should look into more
- the investigation of disorder from the full configurational energy spectrum
- especially the insight that rapid quenching vs slow annealing might be used to tune the spinel's band gap to the requirements of specific photovoltaic applications
## Suggestions for improvement
- the description of model hyperparameters is weak
Some more details on the parametrization of the models have been added in section 2.3, where those are defined. More complete information about the models and their parameters can be found in Table S2, that has been added to the supplementary material.
- what non-linearity was used in the MLP?
The different activation functions used for each layer of the MLP have been specified in section 2.3, where that model is introduced. Details on the non-linearity of the MLP model can be also found in Table S2 in the supplementary material.
- how many trees did the gradient-boosted regressor use?
The number of trees or estimators used for the GBDT model has been specified in section 2.3, where that model is described for the first time. More details on the architecture of the GBDT model can be found in Table S2 in the supplementary material.
- what were the loss functions and/or learning rates for each model?
The loss function used for each model has been specified in section 2.3, where the models are defined.
Details on that and the learning rates used can also be found in Table S2 in the supplementary material.
- add a sentence to explain the selection process for the subset of 59 configurations for which HSE bandgap calculations were run
Thank you for raising this – we have added a description of the sampling procedure to the end of section 2.2.
- did you randomly sample from the 1013 DFT relaxed configurations? or was there some selection process at play to maximize configurational diversity?
The sample of the 1013 configurations to calculate with DFT was chosen randomly within each inversion degree. The number of configurations for each inversion degree was decided trying to keep ratios of inversion degrees to one another constant, this ensures a representative configurational diversity.
## Questions
- how was the max cluster order of 3 for the CCF descriptor chosen?
We found that using order 3 CCF descriptors gave errors below the inherent DFT training data errors, higher orders would likely lead to overfitting to noise.
- is the following statement referring to fig. 6.a) really true (emphasis added)?
> The marked dependence of the mixing energy (but not of the bandgap) on the inversion degree is apparent from the plot.
from the plot it looks like there's a weaker but still non-zero correlation between band gap and inversion degree y. might be worth quantifying this statement by adding Pearson correlations coefficients for both mixing energy and band gap w.r.t. y to the text.
having read on, i think this is actually a crucial point for this work since one of the main conclusions states:
> Our combined DFT and ML model allows to predict that the bandgap of this solid solution can be potentially tuned via feasible modifications of the cation distribution in the system.
this statement requires that the band gap be a function of the inversion degree at least to some extent. else how would any effects of thermal history during synthesis on inversion degree be a potential tool for tuning the band gap?
This is a completely valid point. The statement about bandgap not depending on inversion degree was incorrect. We have corrected the text in section 3.4 to make this a more nuanced statement, reflecting the fact that there is indeed a week inverse correlation between inversion degree and bandgap.
- are there any indications, say prior work that could be cited to support the hypothesis in sec. 3.6 that using rapid quenching after synthesis would prevent the return to the energetically favored state of full inversion at room temperature? any prior knowledge of transition energy barriers towards full inversion?
We have added some references to back this up, first sentence of the final paragraph of section 3.4.
- have the authors investigated the top-right blue point in fig. 6a? does looking at the structure give insights on why it is a large outlier i.t.o. mixing energy?
This is an interesting observation. We added some text to paragraph 3 of section 3.4. This is actually the direct spinel with no inversion and is one of the least stable configurations and has one of the highest bandgaps – contrasted to the structure with y=0.5 that has the smallest bandgap but is also one of the least stable configurations. We added this observation to the text – section 3.4 paragraph 3, at the end.
- if i understood correctly, fig. 7.b) uses Boltzmann averaging over all 4222-point configurations, 3/4 of which were predicted by the ML models? does the curve doing the same Boltzmann averaging but using only the 1013 DFT calculations used to train the ML models result in roughly the same curve?
This is another good point. Yes, using only the DFT points does result in roughly similar results. The discrepancies in the calculated equilibrium inversion degree, with respect to using all the (4222) configurations, are all less than 4%. This, of course, may not always be the case and we also think that the value of the ML approach is the granularity that it allows for exploration of the configurational space, and the maps such as Fig 8a that it can generate.
## Nitpicks
https://github.com/pablos-
- GitHub repo link under data availability is broken, links to
The link for the repository has been updated and now it works perfectly. We apologize for the inconvenience.
.gitignore file to exclude cache files like .ipynb_checkpoints
- repo does not have a
A .gitignore file has been included in the repository.
- calling it 'gradient-boosted decision tree (GBDT)' might be confusing as it suggests you're using a single tree, not an ensemble model (esp. since you didn't mention the size of the ensemble)
The name of the GBDT model has been changed to gradient-boosted decision trees to take into account that several estimators have been used. The number of trees or estimators has been also included in section 2.3, where the model is described.
- section numbering is off in places, 3.6 directly follows 3.3, there's no 3.4 or 3.5 Thanks and apologies for this.
## Style Suggestions
- turn table 2 into a heatmap of model errors (.e.g blue table cells for low error, red cells for high error)
Table 2 has been turned into a heat map with the errors of the different models and descriptors as suggested. It can be found in the same place of the manuscript now as Figure 3. Nice idea, thanks.
- maybe show smaller super cells in fig. 6.b) for visual clarity - lets try with 1/2 size or 3/4 The size of the cells used in Figure 6 b) has been halved as suggested.
Referee: 5

Comments to the Author
The manuscript titled "Spinel nitride solid solutions: charting properties in the configurational space with explainable machine learning" by Sánchez-Palencia, Hamad, Palacios, Grau-Cespo and Butler compares different ML descriptors and models. It is a very interesting topic, which is also quite relevant and important for the computational discovery of new materials. While I enjoy the concept and approaches, the manuscript is written with some missing information that makes it difficult to follow exactly what the authors did. More specific points are given below.
Thank you very much to the reviewer for taking the time to give a thorough review to our work. We are glad that you enjoyed the paper and are very grateful for raising the important details that you have highlighted. We go through the points below and we are confident that we have addressed the questions that you have raised. It is very useful to have such detailed and useful feedback about the work. page 5: It would be beneficial to add a figure illustrating the inversion degree for AB2X4.
A new figure has been included in the manuscript to support the explanation of the inversion degree in section 2.2. The new Figure 1 includes a graphical example of a direct and inverse spinels with the same colour scheme used for Figure 6b.
page 6: As the work is on crystal structures with periodic boundary conditions, the descriptors mentioned (CME, MBTR and CCFs) should mention the cutoff diameters used. For example, CCFs can have a drastically different number of descriptors depending on the cut-off diameters for 2-, 3-, 4-body clusters. What were the cutoff size (N-body) and their diameters?
Thank you for mentioning this. None of the descriptors that we apply here use PBCs, they are all dependent on cell size used for calculation – we have clarified this in the text. Second paragraph of Section 2.3. The supercell that we use is now described at the start of Section 2.2.
page 7: For LASSO, GBDT nad MLP, please specify which software package (e.g., scikit-learn) was used.
This has been added to table S2.2 with details on the hyperparameters.
page 8: It would be helpful if the authors provide some physical insights on why adding the inversion degree to the linear fit improves the cross-validation.
When doing the simple regression without considering the inversion degree, we found that for lower inversion degrees we were underestimating and for higher degrees overestimating the correction. Therefore we decided to include the inversion degree in the regression. It is not surprising that including the inversion degree improves the correlation. HSE and GGA+U mostly affect the cation d-levels and the d-levels also depend on the coordination environment, and there are very different ligand fields for the tetragonal and octahedral sites. The inversion degree captures information about coordination environment distributions and therefore improves the model for describing how HSE affects the band gap. We have added this discussion to the text immediately after the second equation in section 3.1.
page 9: The description of how the mixing energy is calculated is a bit vague. Could the authors elaborate on how it is done in a more precise manner? We have clarified this – Section 3.2 paragraph 1.
page 11: It was not immediately clear that Figure 3 was referring to the CCF-based model. Please be specific in the discussion and figure caption.
At the end of page 11 we have added a clause and clarified this point in the text.
page 11: The sentence "it can be seen that errors in the prediction of bandgaps are mainly concentrated in extremal configurations..." is unclear to me. Please elaborate what is meant by the extremal configurations.
The reference to extremal configurations in section 3.2 has been substituted by a clearer sentence specifying which configurations are these. The new paragraph states: “it can be seen that errors in the prediction of bandgaps are mainly concentrated at the lower ends of the inversion degree and band gap ranges, where the model is less accurate because of the lack of similar structures for training.”
page 12 & 13: It is not explained which model (MLP, GBDT or LR) is used for the SHAP analysis, making it difficult to understand or reproduce the reported results. Please clearly specify which model is used for Fig. 4b and c.
We have clarified this in the first paragraph of Section 3.3. The LR model is used for formation energy and the MLP model is used for the band gap, according with the best metrics of performance.
page 14: "... the next clusters in terms of relevance are clusters from 5 to 9, all clusters of order 2" It was mentioned on page 13 that the site-pair clusters are from 3 to 9. Please double-check.
There was ambiguity in the way this was phrased. We have corrected that: are clusters from 5 to 9, all clusters of order two. -> are clusters from 5 to 9, which are clusters of order two. Paragraph 2 of section 3.3
page 14: I am only guessing that the authors used LR for mixing energies and MLP for bandgap for Figure 4 based on the values given in Table 2 (no explanation given). Based on this assumption, I would argue that SHAP analysis on mixing energy is not really needed as one can look at the ECI values from in Fig. 4a to deduce the same information (It is essentially what people do when constructing and analyzing cluster expansion models). If the authors used MLP for Fig. 4c, they should also include the coefficient values for LR fitted to bandgap (just like Fig. 4a but coefficients found for bandgaps). The discussion would be much more meaningful when comparing the coefficient values for LR (and their importance deduced from it) and the SHAP analysis for MLP.
The reviewer is correct that 4b is based on a LR model (hopefully we clarified this with a previous correction). Also, it is true that to some extent SHAP is over the top for this, but we wanted to present an analysis of both models on the same footing. We can see, in fact, that the SHAP coefficients are of the same order as the LR model coefficients.
page 14: "It is interesting to see too that the difference between the most important clusters and the following ones is not as pronounced.." This sentence is unclear. What do you refer to when saying difference? Is it the difference in SHAP values? Also, what do you mean by the "following ones"?
This was not clear in the original manuscript, we have clarified it:
It is interesting to see too that the difference between the most important clusters and the following ones is not as pronounced in bandgaps as in mixing energies predictions. -> It is also interesting to note that the weight of importance is not skewed so strongly to just a few important features when considering band gap predictions (as opposed to mixing energy predictions), suggesting that the contributions from features is more evenly distributed over the full set for predicting band gaps.
Section 3.3 paragraph 3.
page 14 & 15: Is the covariance matrix in Fig. 5 based on the coefficient values shown in Fig. 4a (for mixing energy)? Please elaborate on how you got the matrix.
The covariance matrix is not based on the linear coefficient presented in Figure 6a (previously Figure 4a) but on the values of the CCF themselves. We have made explicit how this covariance matrix is calculated by including the equation for its elements in paragraph 4 of section 3.3.
page 15: "We now illustrate the usefulness of being able to evaluate properties in the full configurational space..." I believe evaluating the full configurational space will be almost impossible, unless you are limiting the realizable configurational space to something relatively small, say 10x10x10 supercell. If you are really after the "full configurational space", you will face the combinatorial explosion rather quickly even for a relatively simple system you are investigating.
This is true, we should have been more specific and have stated the full configurational space for the particular cell size we used. We have clarified this in the first sentence of section 3.4.
page 18 & 19: I am a bit puzzled to read this part of the manuscript. The authors essentially demonstrated that the cluster expansion model outperforms other methods (with an exception of CCF + MLP for bandgap, although the CCF + LR is quite similar). Once that comparison was done, the rest of the manuscript is essentially a discussion of the cluster expansion model. Then, why don't the authors use the constructed cluster expansion model to perform a Monte Carlo simulation on a much larger cell (e.g., 20x20x20 cell) and perform a rigorous sampling, either as a manual search or simulated annealing? I don't fully understand exactly what work went into Figure 7 because it was not discussed (which the authors are advised to elaborate). However, he caption says that it is based on the 4222-point configurational space. This may sound like a lot, but it is only a space covered by a unit cell. Many cluster expansion-based analyses (which the manuscript is doing) perform a much more rigorous sampling because the finite size effect can be quite substantial. As the authors claim, the true power of the ML (or cluster expansion) model is in its accelerated speed, so it does not make much sense to me to restrict the space to only 4222 configurations, where one can easily cover more than millions of configurations.
While the energy model as linear cluster expansion can in principle be transferred to larger supercells, this cannot be done with the bandgap model (even if it was linear, which isn’t the case here). Constraining our analysis to the configurational space of the cubic unit cell (with 4222 symmetrically distinct configurations) allows us to cross-analyse the behaviour of energies and bandgaps. Furthermore, because our focus was on the performance of the models at reproducing results in the unit-cell configurational ensemble, we did not develop the cluster expansion in a way that it is rigorously transferrable to larger supercells (this requires a smaller cutoff radius in comparison with the cell dimension). We now note in the paper that supercell size effects have not been accounted for, but could, in principle be included; final sentence of section 3.4.
page 20: "...which means that clusters expansions of the bandgap, even online ones, require a very large cluster basis." The clusters used by the authors (up to three-body clusters with diameters up to 6.26 Å) are not considered large cluster basis at all. It is quite typical to have such settings, and it is not uncommon to have even higher thresholds (say four- or five-body clusters with diameters up to 7 Å or higher) for complex systems, even when bandgaps are not considered.
We have amended this text in line with the reviewer’s comments. Require a very large cluster basis -> require a cluster basis of at least order three. Paragraph 2 Conclusions.
Referee: 6


Comments to the Author
Pablo Sánchez-Palencia and colleagues present an intriguing study of the influence of configuration space on the properties of solid solutions by combining density functional theory and machine learning with a single GeSn2N4 composition. The idea of investigating solid solution property variation as a function of cation distribution in a single composition rather than engineering properties through composition changes appears very promising. We believe that the methodology proposed is easily transferable to other solid solutions and can tune target properties in a fraction of the computational time. The authors validate the approach's utility by predicting band gaps and mixing energies with remarkable accuracy using various cutting-edge ML models with descriptors such as CCF, MBTR, and CME.
The manuscript is undoubtedly well written and appealing, but we have some concerns about the data and codes' reproducibility. After implementing our suggested changes, I believe that the work will be suitable for acceptance. I also note here that I was not able to assess the Supplementary Information as it was not provided.
Thank you for the very high-quality code review. We are also glad that you enjoyed the main body of the paper. Code-reviewing can be a thankless task so we feel that it is very valuable to us that you have taken the time to do this – we are confident that thanks to the excellent review the study is now easier to reproduce with less effort; this is the corner-stone of the Open Science movement!
I now go through the checklist of the journal and add some more detailed comments below.
1) Data Sources
1a. Are all data sources listed and publicly available?
No. The raw data from VASP calculations are not provided publicly. Only final extracted data is provided in the form of CSV. It would be great if the authors could provide these data files and a script to extract data used for ML models. Providing a version number for VASP would be important as well. One can upload the most important data on zenodo.org.
A zenodo repository has been created containing all the OUTCAR files from which the total energies and bandgap DFT values were obtained, as well as the bash scripts to extract those with the folder structure used for the calculations.
1b. If using an external database, is an access date or version number provided?
Not applicable
1c. Are any potential biases in the source dataset reported and/or mitigated?
Not applicable
2) Data cleaning
2a. Are the data cleaning steps clearly and fully described, either in text or as a code pipeline?
Not applicable
2b. Is an evaluation of the amount of removed source data presented?
No.
The authors present a very nice correlation between GGA+U and HSE band-gaps, however, it is not really clear on what criteria the set of 59 structures is selected for creating the LR model for predicting HSE bandgaps. Maybe some additional information about this selection would be helpful. Could you provide the code for the selection?
We have clarified that the configurations for the construction of models for predicting mixing energy and bandgap from configuration were chosen at random, uniformly across the inversion degrees. The configurations for the GGA -> HSE band gap conversion were sampled uniformly spaced across the GGA band gap range. This procedure is explained accordingly at the end of section 2.2.
2c. Are instances of combining data from multiple sources clearly identified, and potential issues mitigated?
Not applicable
3) Data representations
3a. Are methods for representing data as features or descriptors clearly articulated, ideally with software implementations?
No.
Software implementation scripts for generating descriptors do not have correct text descriptions (scope) at the top. It would be great if the descriptive text were updated accordingly. It would also be great to add more comments describing what exactly is implemented. Also, no script is provided for obtaining cluster correlation functions (CCF) descriptors. No version numbers are provided which will hinder reproducibility. Absolute paths also hinder fast reproducibility.
Thanks for highlighting this. We have amended the code headers and have also included a new directory `cell_code` that contains the code used to generate the ccfs. We have changed to relative paths in some places, in the hope that this may help quicker reproducibility.
3b. Are comparisons against standard feature sets provided?
Not applicable
4) Model choice
4a. Is a software implementation of the model provided such that it can be trained and tested with new data?
Yes.
4b. Are baseline comparisons to simple/trivial models (for example, 1-nearest neighbour, random forest, most frequent class) provided?
Yes. LR comparisons are provided
4c. Are baseline comparisons to current state-of-the-art provided? Yes
5) Model training and validation
5a. Does the model clearly split data into different sets for training (model selection), validation (hyperparameter optimization), and testing (final evaluation)?
No
Agreed that the authors split the data into train-validation-test sets in the 80-10-10 but I would have expected more information on the hyperparameter selection. The authors partially refer to earlier work, but I would have expected that the selection is shown based on the validation set.
Some more details on the parametrization of the models have been added in section 2.3, where those are defined. More complete information about the models and their parameters can be found in Table S2, that has been added to the supplementary material.
Final selection of the hyperparameters is based on several tests performed on the validation set, though many of them were kept from the initial guesses taken from the earlier work referred.
5b. Is the method of data splitting (for example, random, cluster- or time- based splitting, forward crossvalidation) clearly stated? Does it mimic anticipated real-world application?
Yes
5c. Does the data splitting procedure avoid data leakage (for example, is the same composition present in the training and test sets)?
Yes
6) Code and reproducibility
6a. Is the code or workflow available in a public repository? Yes However, the code should be archived (e.g., on zenodo.org) and it should have a version number.
Totally agree: The code and data are now archived at: 10.5281/zenodo.6974760 6b. Are scripts to reproduce the findings in the paper provided?
No.
E.g., Scripts to reproduce Figures 1b, 2a, 2b, 6a,7a, and 7b are not included.
A separate folder within the repository has been created to reproduce all the figures presented in the manuscript, except for those than don’t need data (Figures 1 and 6b) or that need some parameters from the models (Figures 2, 4a and 4b), which are located instead in the respective code in the ML_models/ folder.
In the Jupyter notebooks for LR and MLP models (GeSn2N4_ML/blob/main/ML_models /*.ipynb ), the purpose of the last code block is not clear to me. If that code block is known to not work without a loaded pickle file, would it not be better to load the necessary pickle file with the required data to generate the plot?
The pickle file needed to run the last code blocks commented is generated from the mixing energy and bandgap predictions of those two notebooks. The new pickle file provided in the repository is now loaded with the necessary data to run those blocks.
General comments:
1) It would be great to provide a requirements.txt file with version numbers that could be used to install the necessary python packages to run the scripts easily. Please add all version numbers.
Thanks – you are right. A requirement.txt file with the version number of the different packages used has been generated and included in the repository. This was a little tricky as we are working on different OS’s and conflicts mean that Windows and Ubuntu have different working environments.
2) The code provided consists of an absolute path and is not os independent. Thus, it cannot be used directly; one needs to modify the scripts to get it to work. It would be much better if one used the 'os' module of python to include relative paths.
We have updated to relative paths, which will hopefully be quicker for the user to work with. We are not actually familiar with how to use the os package to achieve this – but hopefully a reasonably competent user can set the absolute paths. We would like to look into using os in future for this, so thanks for the heads up.
3) It would be much better to have another script that reproduces the plots as presented in the paper rather than having it scattered around in different scripts.
A separate folder within the repository has been created to reproduce all the figures presented in the manuscript, except from those than doesn’t need data (Figures 1 and 6b) or that need some parameters from the models (Figures 2, 4a and 4b), which are located instead in the respective code in the ML_models/ folder.
4) Supplementary Information is currently missing but referenced in the manuscript.
File supplied.
5) Code needs to be saved in a long-term archive such as zenodo.org and it should have clear versioning.
Done – see below.
6) Please provide the VASP raw data.
A Zenodo repository has been created containing all the OUTCAR files from which the total energies and bandgap DFT values were obtained, as well as the bash scripts to extract those with the folder structure used for the calculations.
7) Hyperparameter selection is not clear and more comments could be added to the code to reproduce the results from the paper. Missing scripts should also be included.
Some more details on the parametrization of the models have been added in section 2.3, where those are defined. More complete information about the models and their parameters can be found in Table S2, which has been added to the supplementary material.
Methodology to select the configurations to calculate with the two different DFT methods has been clarified in the manuscript at the end of section 2.2.
Missing scripts to reproduce figures have been added to the repository in the figures/ folder.

Overall, I am suggesting a minor revision to provide the missing information and update code and data accordingly.




Round 2

Revised manuscript submitted on 09 Aug 2022
 

13-Aug-2022

Dear Dr Butler:

Manuscript ID: DD-ART-05-2022-000038.R1
TITLE: Spinel nitride solid solutions: charting properties in the configurational space with explainable machine learning

Thank you for submitting your revised manuscript to Digital Discovery. I reviewed your response letter carefully and I am pleased to accept your manuscript for publication in its current form, based upon your responses to the reviewers' comments.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry


******
******

Please contact the journal at digitaldiscovery@rsc.org

************************************

DISCLAIMER:

This communication is from The Royal Society of Chemistry, a company incorporated in England by Royal Charter (registered number RC000524) and a charity registered in England and Wales (charity number 207890). Registered office: Burlington House, Piccadilly, London W1J 0BA. Telephone: +44 (0) 20 7437 8656.

The content of this communication (including any attachments) is confidential, and may be privileged or contain copyright material. It may not be relied upon or disclosed to any person other than the intended recipient(s) without the consent of The Royal Society of Chemistry. If you are not the intended recipient(s), please (1) notify us immediately by replying to this email, (2) delete all copies from your system, and (3) note that disclosure, distribution, copying or use of this communication is strictly prohibited.

Any advice given by The Royal Society of Chemistry has been carefully formulated but is based on the information available to it. The Royal Society of Chemistry cannot be held responsible for accuracy or completeness of this communication or any attachment. Any views or opinions presented in this email are solely those of the author and do not represent those of The Royal Society of Chemistry. The views expressed in this communication are personal to the sender and unless specifically stated, this e-mail does not constitute any part of an offer or contract. The Royal Society of Chemistry shall not be liable for any resulting damage or loss as a result of the use of this email and/or attachments, or for the consequences of any actions taken on the basis of the information provided. The Royal Society of Chemistry does not warrant that its emails or attachments are Virus-free; The Royal Society of Chemistry has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted. Please rely on your own screening of electronic communication.

More information on The Royal Society of Chemistry can be found on our website: www.rsc.org




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license