From the journal Digital Discovery Peer review history

Reproducibility in materials informatics: lessons from ‘A general-purpose machine learning framework for predicting properties of inorganic materials’

Round 1

Manuscript submitted on 04 Oct 2023
 

30-Oct-2023

Dear Mr Persaud:

Manuscript ID: DD-ART-10-2023-000199
TITLE: Reproducibility in Computational Materials Science: Lessons from 'A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials'

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery

************


 
Reviewer 1

The code and data shared in the paper are readily available to the public and straightforward to use. I confirmed that the scripts can reproduce the results presented in the manuscript.

Reviewer 2

Questions of reproducibility are of utmost importance in the scientific endeavour, and I strongly support efforts such as those reported in this paper, to reproduce results reported in older studies, and critically report on these, even when (and especially when!) the results are negative. However, in this case, I do not think the authors have done sufficient work to warrant publication. Several of the conclusions are already well known (even though not always obeyed), and more importantly, some of the causes for discrepancies and lack of reproducibility have not been investigated in depth. The work is too superficial for publication in the current stage.

The main issue is the question of sensitivity of the results to the choice of random seed. The authors do not identify or discuss the root cause for this, only that it lies in the "random seed of the underlying machine learning library, Weka". While exact reproduction of results indeed require a pseudo-random generator with a known initial state (or seed), this does not explain the extreme sensitivity in this case. In fact, I would consider that any ML model that depends so strongly on the PRNG seed is ill-conditioned, i.e., does not provide physically meaningful predictions. The authors show a range between 1 and 2 eV for the prediction of band gap, depending on this purely technical parameter: if this is true, this shows that the fundamental algorithm is simply not correct (maybe highly overfitting?). The authors should definitely investigate this and find the root cause, it is crucial in the discussion of the model.

In passing: please replace mention of randomness and random numbers by pseudorandom, which is more correct.

Secondly, the scope of the reproduction is very limited, probably too limited to be of use. Only predictions for 5 materials are investigated. The reason for this, as the authors state, is that Ward et al. did not publish their dataset… but Ward is a co-author of the present paper. This is baffling: is one author not allowing access to the data to the other co-authors? Or am I misunderstanding something?

The section on "Disseminate Dependencies" is very broadly worded and does not really offer any specific recommendation. The influence of hardware dependencies is not discussed, nor are specific issues with large software stacks (influence of compiler, python distribution, software forges, etc). To be frank, I fail to see the originality of most of Section 4 (Discussion) compared to existing discussions of best practices, as can be found in courses (including online sources). Important aspects of reproducible open science, such as FAIR data (findable, accessible, interoperable, reusable) and long-term archival of research data.

Finally, there is quite a bit of confusion in terms in the manuscript. The authors talk about "open-source data", which is a mix of "open data" and "open source" (which applies to software).

Reviewer 3

The paper tackles the reproducibility problem in machine-learning models of structure-property maps. The authors take the reader on the journey of reproducing, trying to reproduce results from the work published in 2016 (7 years ago). The overall conclusion of the authors is not surprising: "reproducibility demands deliberate effort, and without it, replication becomes very difficult," but requires reiterating. And this work is such reiteration. Many adepts of materials informatics (or science in general) embark on similar journeys, and having this work accessible may lead to increased confidence. Because, indeed, reproducing anyone's work (even if originally published, was intended to be general-purpose) is challenging. I have one general comment regarding the scope of the work. The authors use the data from computational models (DFT calculations), but the pipeline they try to reproduce belongs to the worlds of materials informatics or machine learning models. I suggest rephrasing the title and the text so as not to mislead the reader.


 

Dear Referees & Editor,

We are resubmitting the manuscript “Reproducibility in Computational Materials Science: Lessons from ‘A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials”, now renamed “Reproducibility in Materials Informatics: Lessons from ‘A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials” to the Reproducibility and Replicability in Data-Driven Research themed issue in Digital Discovery. We thank the editor and referees for their time and comments. In this letter we give point-by-point responses to all comments and provide a summary of changes made in the manuscript.

We hope the revised manuscript is now suitable for publication in the Reproducibility and Replicability in Data-Driven Research themed issue in Digital Discovery.

Daniel Persaud,
Department of Materials Science and Engineering, University of Toronto

************************************
Reviewer #1 (Comments to the Author):
Comment 1 - The code and data shared in the paper are readily available to the public and straightforward to use. I confirmed that the scripts can reproduce the results presented in the manuscript.

Response 1 - We appreciate the reviewer's diligent examination of the shared code and data, confirming their accessibility and functionality in reproducing the presented results.




Reviewer #2 (Comments to the Author):
Comment 1 - Questions of reproducibility are of utmost importance in the scientific endeavour, and I strongly support efforts such as those reported in this paper, to reproduce results reported in older studies, and critically report on these, even when (and especially when!) the results are negative. However, in this case, I do not think the authors have done sufficient work to warrant publication. Several of the conclusions are already well known (even though not always obeyed), and more importantly, some of the causes for discrepancies and lack of reproducibility have not been investigated in depth. The work is too superficial for publication in the current stage.

Response 1 - We thank the reviewer for the careful review of our manuscript and acknowledge their concerns regarding the depth of our investigation. In response, we have appended an extra paragraph to the introduction, clarifying the paper's scope within the themed issue as an instructive illustration of irreproducibility in materials informatics. Furthermore, we have undertaken additional investigations into the causes of discrepancies, as detailed below.

(Lines 28-44 – addition & change) Specifically for the bandgap study, the SI only included a text interface script to build the proposed framework (referred to as the model building script in this study), while any script necessary for model training, predictions or analysis (referred to as the model extensibility analysis in this study) were entirely missing. Since these scripts that are unavailable, in this work we focus on replicating (obtain consistent results using new data or methods~\cite{NASEM2019}) the associated analysis presented in Ward's original work, for which replication efforts have been unsuccessful in the past~\cite{Ward2017}. We provide an account of the methodology from the original work, describe difficulties encountered during the replication, and explain how we address these challenges, the result of which is a set of recommendations based on the findings during the replication process. This case study stands as an illustrative instance within the broader context of reproducibility issues in MI, with the intent to enhance the usability of future open-source tools.

(Lines 207-210 – change) As a result, the following sections detail suggestions based on our replication effort, for developers of MI tools to aid reproducibility and implicitly, the ease of use for other researchers.


Comment 2 - The main issue is the question of sensitivity of the results to the choice of random seed. The authors do not identify or discuss the root cause for this, only that it lies in the "random seed of the underlying machine learning library, Weka". While exact reproduction of results indeed require a pseudo-random generator with a known initial state (or seed), this does not explain the extreme sensitivity in this case. In fact, I would consider that any ML model that depends so strongly on the PRNG seed is ill-conditioned, i.e., does not provide physically meaningful predictions. The authors show a range between 1 and 2 eV for the prediction of band gap, depending on this purely technical parameter: if this is true, this shows that the fundamental algorithm is simply not correct (maybe highly overfitting?). The authors should definitely investigate this and find the root cause, it is crucial in the discussion of the model.

Response 2 - We appreciate the reviewer's insightful comment and prompting more investigation. In response to this concern, we have conducted further investigations, and we recognize the necessity of understanding the root cause of the observed extreme sensitivity. Our presumption was that the model building script would allow recreation of the exact model architecture from the original paper. Therefore, the pseudo-random seed test aimed to explore whether the inability to reproduce the original predictions was linked to the seed or the descriptors. As explain in the manuscript, the predictions exhibited a high dependency on the random seed, emphasizing the importance of documenting parameters for reproducibility and advocating for MLOps tracking. The appended investigation in the paper's supplementary materials demonstrates comparable 10-fold CV RMSE between the original hierarchical model and more advanced modern models (random forest and xgboost). However, the new models exhibit reduced sensitivity, yet still produce divergent results, indicating potential overfitting or extrapolation issues in the original model. These issues have been addressed in the revised manuscript.

(Lines 167-188 – addition & change) Additionally, the extreme sensitivity in pseudo-random seed variations in the replicated predictions raises concerns and has prompted further investigation. Utilizing the same descriptors, a SciKit-Learn Random Forest Regression (RF)~\cite{Pedregosa2011} model and an XGBoost Regression (XGB)~\cite{Chen2016} model were trained. While all three models exhibit similar performance in 10-fold random cross-validation (ESI A), the RF model proves considerably less sensitive to variations in the random seed, and the XGB model displays no sensitivity at all (ESI B).

The contrast in the sensitivity to pseudo-random seed between the original hierarchical models and the modern models (RF and XGB) highlighting the potential susceptibility of the original hierarchical model to overfitting or other issues regarding generalization capabilities. Notably, the pseudo-random seed sensitivity test employed here was not a standard practice at the time of the original publication, emphasizing the evolving standards in model validation. Further exploration and validation with alternative models contribute to a nuanced understanding of the original hierarchical model's predictive performance and limitations. While the spread in the pseudo-random seed sensitivity test is an issue, not being able to reproduce the original predictions is of major concern within the scope of this work.

(ESI Section A. Validating Models – addition) To validate that the model models perform comparably to the hierarchical model, we performed 10-fold random cross validation using a RF model with default parameters (setting the pseudo-random seed to 0), as well as a XGB model with default parameters (setting the pseudo-random seed to 0). All models, including the original hierarchical model are trained with the ICSD entries in OQMD [bandgap.data], for which descriptors are generated in the same way [make-features.in]. The mean RMSE across 10 random folds for all provided in table~\ref{tbl:10foldCV}. The mean RMSE being within 0.01eV demonstrates the models in distribution performance is comparable.

(ESI table 1 – addition) Mean of the RMSE of each of the 10 folds in a 10-fold random cross validation of the ICSD entries in OQMD

(ESI Section B. Modern Model Pseudo-Random Seed Sensitivity– addition) Once it has been validated that RF and XGB models with default parameters perform similarly to the hierarchical model, all models are initialized with 10 different pseudo-random seeds. All models are trained with the entire training set (OQMD 1.0) and used to predict the test set, just as described in the orignal work. Supplementary figure \ref{fig:ModernModelPredictions} demonstrates that the RF model is not very sensitive to pseudo-random seed and XGB is not sensitive.

(ESI Supplementary Figure 1 – addition) The original predictions (red x’s) compared to the predictions from the Random Forest pseudo-random seed sensitivity (purple violins) and the predictions from the XGBoost pseudo-random seed sensitivity (cyan violins).


Comment 3 - In passing: please replace mention of randomness and random numbers by pseudorandom, which is more correct.

Response 3 - We thank the reviewer for the correction. We have revised this error in all instances:

(all instances – change) pseudo-random.


Comment 4 - Secondly, the scope of the reproduction is very limited, probably too limited to be of use. Only predictions for 5 materials are investigated. The reason for this, as the authors state, is that Ward et al. did not publish their dataset… but Ward is a co-author of the present paper. This is baffling: is one author not allowing access to the data to the other co-authors? Or am I misunderstanding something?

Response 4 - We thank the reviewer for the observation, and we acknowledge the limitation in the scope of our reproduction effort, particularly regarding the investigation of predictions for only five materials. The rationale for this is explained in the revised Section 3.2, emphasizing that, despite Dr. Ward being a co-author of this paper, the unavailability of additional predictions is attributed to the passage of time, wherein the original predictions from his graduate school period (>6 years ago) were not retained. This serves as an additional impetus for emphasizing the importance of maintaining and tracking versions in a working directory.

(Line 130-138 – addition) Additionally, the original predictions are unavailable for reference in our study, because they were not archived by Ward and have since been lost. Therefore, our test set is limited to the predictions of the five most promising compounds published in the original work ~\ref{tbl:originalPredictions}. This unavailability of the original predictions underscores the importance of maintaining version control systems in working directories to track and document the evolution of a project. The absence of such a system can lead to the loss of critical information, hindering reproducibility.


Comment 5 -The section on "Disseminate Dependencies" is very broadly worded and does not really offer any specific recommendation. The influence of hardware dependencies is not discussed, nor are specific issues with large software stacks (influence of compiler, python distribution, software forges, etc). To be frank, I fail to see the originality of most of Section 4 (Discussion) compared to existing discussions of best practices, as can be found in courses (including online sources). Important aspects of reproducible open science, such as FAIR data (findable, accessible, interoperable, reusable) and long-term archival of research data.

Response 5 - We thank the reviewer for the suggestions and have reworked the wording throughout the manuscript to specify the intent of this manuscript. The revised manuscript now more precisely articulates the specific focus of this work within the themed issue, emphasizing its role as an illustrative example of a reproducibility challenge rather than an exhaustive guide encompassing all aspects of reproducibility, such as hardware dependencies and software stacks. As addressed above; the introduction, as well as other places throughout the revised paper (Reviewer 2, Comment 1) has been modified to clarify this manuscript’s scope and place in the themed issue, which will include an editorial based on the findings (including the completely valid hardware and software stack concerns) of the other reproducibility efforts of the issue. Your insightful comments about FAIR data principles and long-term archival have been incorporated into Section 4.2, enhancing the discussion of best practices for reproducibility.

(Abstract – Change) (1) reporting software dependencies

(Section 4.1 heading – Change) Disseminate Software Dependencies

(Line 247-252 – Addition) Publishing these logs to a digital library like Zenodo\cite{EuropeanOrganizationForNuclearResearch2013} provides a method of long term data archival, in-line with Findable, Accessible, Interoperable and Reusable (FAIR) data management principles\cite{Wilkinson2016} to aid in the reproducibility of materials informatics research in the long term.


Comment 6 - Finally, there is quite a bit of confusion in terms in the manuscript. The authors talk about "open-source data", which is a mix of "open data" and "open source" (which applies to software).

Response 6 - We appreciate the reviewer's clarification regarding the terminology. The abstract has been revised to eliminate the confusion.

(Abstract, first sentence – change) trend towards open data and open-source tool




Reviewer #3 (Comments to the Author):

Comment 1 -The paper tackles the reproducibility problem in machine-learning models of structure-property maps. The authors take the reader on the journey of reproducing, trying to reproduce results from the work published in 2016 (7 years ago). The overall conclusion of the authors is not surprising: "reproducibility demands deliberate effort, and without it, replication becomes very difficult," but requires reiterating. And this work is such reiteration. Many adepts of materials informatics (or science in general) embark on similar journeys, and having this work accessible may lead to increased confidence. Because, indeed, reproducing anyone's work (even if originally published, was intended to be general-purpose) is challenging. I have one general comment regarding the scope of the work. The authors use the data from computational models (DFT calculations), but the pipeline they try to reproduce belongs to the worlds of materials informatics or machine learning models. I suggest rephrasing the title and the text so as not to mislead the reader.

Response 1 - We appreciate the reviewer's insightful comments and have modified the wording in the title and relevant sections to better reflect the focus on materials informatics and machine learning models rather than computational models. This adjustment aims to provide clarity and avoid any potential misinterpretation for the reader.

(Title – change) Reproducibility in Materials Informatics: Lessons from ‘A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials’

(Abstract, last sentence – change) The result is a proposed set of tangible action items for those aiming to make material informatics tools accessible to, and useful for the community.

(line 105-109 – change) The training set is version 1.0 of Open Quantum Materials Database (OQMD) contains ~300,000 crystalline compounds and their properties (energy, bandgap, etc) computed via density functional theory, and was provided in the original work.




Round 2

Revised manuscript submitted on 09 Dec 2023
 

19-Dec-2023

Dear Mr Persaud:

Manuscript ID: DD-ART-10-2023-000199.R1
TITLE: Reproducibility in Materials Informatics: Lessons from 'A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials'

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below. If you care to address them, you may do so.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Joshua Schrier
Associate Editor, Digital Discovery


 
Reviewer 2

The authors have amended their manuscript in response to comments. I still believe the investigation is a bit short for a paper, even a comment, for two reasons : 1. the root cause of the sensitivity to random numbers is not fully identified, 2. the common co-author of both papers did not retain the data from the first study, so real questions can never actually be solved. I do not believe that not archiving research data was acceptable practice, even 6 years ago (which is not that long).

The editor and other reviewers seem to have indicated interest anyway, so I suppose this will be published.

Reviewer 3

The revised version of the paper addressed the concerns I raised. I suggest the paper to be published.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license