From the journal Digital Discovery Peer review history

A high-throughput computational dataset of halide perovskite alloys

Round 1

Manuscript submitted on 09 Feb 2023
 

01-Mar-2023

Dear Dr Mannodi Kanakkithodi:

Manuscript ID: DD-ART-02-2023-000015
TITLE: A High-Throughput Computational Dataset of Halide Perovskite Alloys

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery

************


 
Reviewer 1

Reviewer comments
Reducing the time required to explore perovskites for the discovery of composition–properties relationships and consequently discovering the materials with excellent optoelectronic properties is definitely of great interest to the materials science community. This paper used computation methods to create high-quality datasets that can be potentially used for AI-powered tools for accelerated perovskite discovery. I appreciate the authors' remarkable work and recommend considering the publication of the manuscript in Digital Discovery after the authors have addressed the data/code-specific issues listed below.
1- In the perovskite_dataset directory on Github, please provide a comprehensive README file that clearly explains
 the contents of each folder,
 data formats and filetype formats,
 script and file dependencies, and the main entry points,
 requirements for the codes to be successfully installed and executed (i.e., Python packages and libraries), and
 the section in the paper in which each code is used.
 Providing example outputs is also recommended.
2- In SLME.py, line 31, the comment says, "For flat plate solar panels, we want the "Global Tilt" spectra, this file is assumed to be in the directory." Please include the file and cite the source.
3- The comment in the SLME.py file, line 140, says, "# This example file is sort of like CdTe's absorption data. (It's a line between the endpoints.) Good enough for an example, but don't use it for actual scientific data."
a) Please use professional, clearer language here, as it was not easy to understand this comment and its relevance to the work presented. For example, does "sort of like CdTe's absorption data" mean that fabricated mock data is used here? If yes, why and how? If not, please specify the source of the absorption data for CdTe.
b) Why is this data not considered "actual scientific data"? Alternatively, please elaborate on what is your definition of actual scientific data. Please give examples of (and preferably use) "actual scientific data" relevant to your present work.
c) Please explain why the absorption data of CdTe (not a perovskite) is used here.
4- In the SLME.py file, line 141, the file "absorption.dat" could not be found; therefore, the code could not be run successfully. When I changed the file name to "am1.5G.dat", the code worked. Please address this issue and explain why file am1.5G.dat is not included in line 141 of the SLME.py code.
5- In the SLME.py file, a big portion of the code is commented out. When run as is (of course, after replacing absorption.dat with am1.5G.dat in line 141), the output of the code was "standard SLME = 0.0". When I changed the comments back to code, the output was a graph. Please explain this difference. If the commented-out parts of the code are not necessary, please remove them. If they are necessary for the successful execution of the code, change them to code. If they are needed on some special occasions, please specify when they should be run.
6- Please use variable names that are clear to public users or add explanations on what they represent. Examples are SLME_perc, J_sc, p, P_m, and P_in.
7- In DecompositionEnergyCalculator.py, running the code was unsuccessful due to the error "decom_calcs.xlsx not found." (Line 235).
8- In DecompositionEnergyCalculator.py, please add descriptions to the dictionaries used: ref_sol, ref_d3, ref_sol_d3, etc.
9- Please explain why 300 K is used as default here in DecompositionEnergyCalculator.py (line 142), whereas 293.15 K was used in SLME.py.
10- I was unable to run DecompositionEnergy_Xmix.py due to syntax errors in lines 178 and 186. Even after fixing those issues, the code did not run because the file "Decomp_HSErel_SOC_calcs.xlsx" was not found. Please address these issues so the codes can be executed successfully.
11- In the directory Pearson_Correlation, please include a readme file that explains the files and their order, as it appears the output of some codes is the input of others.
12- In the directory Pearson_Correlation, HSE_PBE_SOC_heatmap.py, line 27, "Corr.xlsx" cannot be found. When I changed the code so that Corr.csv could be read here, it worked using "Corr.csv," which was generated as the output of HSE_PBE_SOC_corr.py. Please make sure the right files are called in the code and refer to comment #11 regarding including a readme file that explains the main entries.
13- In the directory Pearson_Correlation, HSE_SOC_heatmap.py, line 27, "Corr.xlsx" cannot be found. When I changed the code so that Corr.csv could be read here, it worked using "Corr.csv," which was generated as the output of HSE_SOC_corr.py. Please make sure the right files are called in the code and refer to comment #11 for adding a readme file that explains the main entries.
14- Please use variable names that are clear to public users or add explanations on what they represent. Examples are x, xx, f, r, m, and justify the values selected for m (=14), f (=16), and r (=75).
15- Please remove unnecessary commented-out code.




Comments on the data checklist
1. Data sources
1a. Are all data sources listed and publicly available? Yes
1b. If using an external database, is an access date or version number provided? Yes
1c. Are any potential biases in the source dataset reported and/or mitigated? No
Reviewer comment: The authors used a previously developed dataset of their own published elsewhere and provided a citation to the publication and dataset. No biases in the source dataset were reported.

2.Data cleaning
2a. Are the data cleaning steps clearly and fully described, either in text or as a code pipeline? N/A
2b. Is an evaluation of the amount of removed source data presented? N/A
2c. Are instances of combining data from multiple sources clearly identified, and potential issues mitigated? N/A
Reviewer comment: no data cleaning was performed in this work.

3.Data representations
3a. Are methods for representing data as features or descriptors clearly articulated, ideally with software implementations? Yes
3b. Are comparisons against standard feature sets provided? Yes
Reviewer comment: Data features and their comparisons to standard features were clearly articulated.

4.Model choice
4a. Is a software implementation of the model provided such that it can be trained and tested with new data? N/A
4b. Are baseline comparisons to simple/trivial models (for example, 1-nearest neighbour, random forest, most frequent class) provided? N/A
4c. Are baseline comparisons to current state-of-the-art provided? N/A
Reviewer comment: No ML models were used in this work.

5.Model training and validation
5a. Does the model clearly split data into different sets for training (model selection), validation (hyperparameter optimization), and testing (final evaluation)? N/A
5b. Is the method of data splitting (for example, random, cluster- or time-based splitting, forward cross-validation) clearly stated? Does it mimic anticipated real-world application? N/A
5c. Does the data splitting procedure avoid data leakage (for example, is the same composition present in the training and test sets)? N/A
Reviewer comment: No ML models were used in this work.

6.Code and reproducibility
6a. Is the code or workflow available in a public repository? Yes
6b. Are scripts to reproduce the findings in the paper provided? No
6c. Have the authors clearly specified which versions of the software libraries they depend upon were used in the course of the work? No
Reviewer comment: The scripts are provided; however, they cannot be executed due to the programming issues listed in my reviewer report. Versions of software libraries were not specified.

Reviewer 2

The authors used special quasi-random structures approach to create structures with composition mixing at one of A/B/X sites. All the structures in the dataset are cubic or pseudo-cubic. And the authors utilized PBE and HSE with and without spin-orbit coupling (SOC) to obtain accurate description of properties for these halide perovskites (HaPs). Comparisons are made between the levels of theory used for different property predictions, and correlations between elements and their mixing with the properties are studied. Although the dataset is relatively small and confined to cubic and pseudo-cubic phases, it includes data from several computationally expensive supercell calculations using richer levels of theory, includes different properties, and would be helpful for future machine learning applications involving HaPs.
The discussion and analysis of the dataset will be useful for feature engineering of materials descriptors. However, more information in the analysis can be included while clearly discussing the small dataset issues. Following questions needs to be addressed for further considering the manuscript for publication.
1. How does the data from this dataset compare to the existing databases like materialsproject and others for any previously explored materials? Can this dataset be combined with other existing datasets with respect to convergence criteria and DFT settings?
2. Within each A/B/X-site mixing, the representation is different for element/molecule and the authors discuss in Fig. 2 the ratios of different elements compositions. However, the strategy behind choosing a certain element ratio for different species is not discussed. How are select compositions chosen? Is it random or constrained by some structural limits such as lattice size, compositions and so on.
3. Bandgap RMSE in comparison w/ experiments is too high. The authors explain that it could be due to ambiguity with non-cubic structural phases in experimental data, which makes it hard to judge the performance of a DFT functional’s prediction. Instead, the authors can use experimental data of known cubic phases and compare them with each set of DFT functional predictions and also include that information in the Table 2. That would allow for appropriate comparison even though for somewhat lesser data points.
4. For lattice constant, the authors provided only RMSE. Percentage errors might be appropriate as well since the lattice constants vary for different materials and sizes.
5. Fig. 4: shaded region should be described in the caption. (some figures in SI as well.)
6. The authors should mention the reason for different number of calculations in different settings between PBE and three types of HSE.
7. The authors report larger structural relaxations in HSE-relaxed compared to PBE caused 1 eV bandgap difference. How do these structural relaxations affect the relative stability between PBE and HSE-rel? Further, are the energy differences and the band gap differences between HSE-rel and PBE correlated?
8. According to Fig. 8a, deviation from cubicity might lower decomposition energy or helps in stability of the structure.
9. It raises an important question - does the authors try any symmetry breaking in the structures when relaxing the dataset, which may further deviate from cubicity but also stabilizes many more compositions. Although having precise control on such operations in a high-throughput format is difficult, it may be an interesting avenue to do some simple proof of concept testing with few (say 10) structures of different mixing.
In page 14, it is reported that some compounds with t>1.1 and low decomposition energy have large deviation from cubicity, which may further indicate that deviation from cubicity may not be necessarily bad, but maybe essential to stabilize more HaPs evaluated.
10. The authors made Pearson correlation on the screened materials shown in Fig. 10 and Fig. S12. Considering the non-uniformities in different element compositions in the initial dataset and further smaller number of screened materials sample size, the trends observed may not be generalizable and applicable to future design principles. As more compositions are evaluated including intentional symmetry breaking, the current trends may not hold true. The authors should discuss this in the manuscript and if possible, make comparisons with pure-cubic versus pseudo-cubic compositions.
11. The use cases of this dataset and based on the analysis of descriptors, the authors should include discussion in the current manuscript with regards to what kind of inverse design is possible, and what properties can be accurately predicted considering the dataset size and specifications?
12. Fig. S4: The authors mentioned that HSE and SOC would have sufficient theory and should provide accurate electronic and optical properties. As such, the max. SLME in figures S4b, S5b, S6b is 16% compared to 25% with PBE. The authors should explain the reason for this drop in SLME when using higher levels of theory.

Reviewer 3

Yang et al. reported a dataset of 495 pseudo-cubic ABX3 halide perovskite compounds using density functional theory calculations. Some critical material properties related to optoelectronic applications, including band gaps and theoretical photovoltaic efficiency are computed. The trends of materials properties dependent on the compositions are studied. This dataset and the related codes are useful for further computational and theoretical studies of halide perovskite materials. This manuscript could be published after addressing the following comments:

i) It is known that the halide compounds are not only limited to cubic (or pseudo-cubic) structures. In fact, some pseudo-cubic halide compounds are not stable, and their ground state structure could belong to a different lattice system, see a prior computational and theoretical study and the reported compounds inside, like (FA)3Sb2I9, Energy Environ. Sci., 12, 2233-2243, (2019). The authors should comment on this appropriately.

ii) From figure 1, it seems that the computed compounds are limited to the common 2+ cations such as Pb, Sn, Ge, Ba, Sr, Ca. This should be indicated in the title and/or abstract to clearly tell readers what kind of compounds are computed in this work since the “halide perovskite alloys” cover a broad range of compounds that contains much more than the computed ones in this work.

iii) For optoelectronic applications, the electron and hole transport properties are critical, which are ignored in this work. The authors should discuss it or explain why they are not needed/calculated.


 

RESPONSE TO REVIEWER COMMENTS

Dear Dr Schrier,

We thank you and the reviewers for consideration and careful review of our manuscript. In the following, we respond to individual reviewer comments in detail (in blue), and quote changes to the manuscript (in green). In the revised manuscript, all changes are indicated in red. We hope that the revised manuscript is suitable for publication in Digital Discovery. Thank you once again.
Best,
Jiaqi Yang, Panayotis Manganaris, and Arun Mannodi Kanakkithodi


Reviewers' Comments and our Responses


Reviewer 1

Reviewer comments:
Reducing the time required to explore perovskites for the discovery of composition–properties relationships and consequently discovering the materials with excellent optoelectronic properties is definitely of great interest to the materials science community. This paper used computation methods to create high-quality datasets that can be potentially used for AI-powered tools for accelerated perovskite discovery. I appreciate the authors' remarkable work and recommend considering the publication of the manuscript in Digital Discovery after the authors have addressed the data/code-specific issues listed below.

Response:
We thank the reviewer for their evaluation and detailed comments about our manuscript. Below, we present responses to specific comments and note changes made to the manuscript and associated data/code.


Reviewer comments:
1. In the perovskite_dataset directory on Github, please provide a comprehensive README file that clearly explains:
• the contents of each folder,
• data formats and filetype formats,
• script and file dependencies, and the main entry points,
• requirements for the codes to be successfully installed and executed (i.e., Python packages and libraries), and
• the section in the paper in which each code is used.
• Providing example outputs is also recommended.

Response:
We thank the reviewer for this excellent suggestion and agree that a file detailing the above information is essential in the Github directory. A proper README file has thus been uploaded including all the necessary details. We request the reviewer to check out this file on the repo: https://github.com/yjq829/perovskite_dataset/blob/main/README.md. Contents of the entire README file have been appended as part of the responses below.


Reviewer comments:

2. In SLME.py, line 31, the comment says, "For flat plate solar panels, we want the "Global Tilt" spectra, this file is assumed to be in the directory." Please include the file and cite the source.
3. The comment in the SLME.py file, line 140, says, "# This example file is sort of like CdTe's absorption data. (It's a line between the endpoints.) Good enough for an example, but don't use it for actual scientific data."
(a) Please use professional, clearer language here, as it was not easy to understand this comment and its relevance to the work presented. For example, does "sort of like CdTe's absorption data" mean that fabricated mock data is used here? If yes, why and how? If not, please specify the source of the absorption data for CdTe.
(b) Why is this data not considered "actual scientific data"? Alternatively, please elaborate on what is your definition of actual scientific data. Please give examples of (and preferably use) "actual scientific data" relevant to your present work.
(c) Please explain why the absorption data of CdTe (not a perovskite) is used here.
4. In the SLME.py file, line 141, the file "absorption.dat" could not be found; therefore, the code could not be run successfully. When I changed the file name to "am1.5G.dat", the code worked. Please address this issue and explain why file am1.5G.dat is not included in line 141 of the SLME.py code.
5. In the SLME.py file, a big portion of the code is commented out. When run as is (of course, after replacing absorption.dat with am1.5G.dat in line 141), the output of the code was "standard SLME = 0.0". When I changed the comments back to code, the output was a graph. Please explain this difference. If the commented-out parts of the code are not necessary, please remove them. If they are necessary for the successful execution of the code, change them to code. If they are needed on some special occasions, please specify when they should be run.
6. Please use variable names that are clear to public users or add explanations on what they represent. Examples are SLME_perc, J_sc, p, P_m, and P_in.

Response:
We thank the reviewer for these detailed comments; however, it needs to be clarified that the SLME code is not our own, but borrowed from the work of Liping Yu and Alex Zunger [1]. We apologize for any confusion and for not making this clearer in the manuscript. The SLME code is a previously published (and commonly used) open-source code for calculating the theoretical single-junction photovoltaic efficiency of a semiconductor as a function of sample thickness, using DFT-computed type (direct or indirect) and magnitude of band gap, optical absorption spectrum, and AM1.5 solar irradiance spectrum. All the necessary code and details can be accessed from the original authors’ repo: https://github.com/ldwillia/SL3ME.
The "Global Tilt" spectrum is already included in the folder as am1.5G.dat and a description has been added to the README file in our repository. The absorption data of CdTe is an example case that the developers included in their code. The DFT computed perovskite absorption spectra are included in the folder perovskite_dataset/SLME/strut_loptics_550/ on Github as .dat files for every compound, and they are used in conjunction with the original “SL3ME.py” code to calculate the SLME as a function of thickness. Furthermore, the “SLME_shift_fromdata.py” code is our own work, which takes care of shifting a PBE-computed absorption spectrum by the difference between the PBE band gap and the necessary HSE band gap.
With the README file, the necessary data files, and the original unchanged SL3ME.py script, the code should be executable for the reviewer and the general audience. The following information has been added to the README file related to SLME:

**[B] SLME calculations:**
==========================

The SLME calculation is based on the work of L. Yu, A. Zunger, Phys. Rev. Lett. 108, 068701 (2012). https://doi.org/10.1103/PhysRevLett.108.068701

The source code can be found at: https://github.com/ldwillia/SL3ME

**(1) Scripts and dependency:**

[i] SLME_shift_fromdata.py -> MAIN running entrance

This script performs the entire task of calculating the SLME for any given compound. It is a straightforward task for the PBE computed absorption spectrum, but for the HSE functionals, we do not calculate the absorption spectrum but rather obtain it by shifting the PBE computed spectrum by the difference between the PBE and HSE band gaps.

[ii] SL3ME.py -> SLME calculation function dependency

This is the original code developed and released by the authors of this publication: L. Yu, A. Zunger, Phys. Rev. Lett. 108, 068701 (2012). https://doi.org/10.1103/PhysRevLett.108.068701
We import the functions from this script as a module.

[iii] am1.5G.dat

This is the "Global Tilt" spectra used as reference spectrum in SL3ME.py.

**(2) input files:**

[i] data.xlsx

This is the input file for calculating SLME for any compound in our dataset. The first column is the perovskite index which refers to its unique chemical composition. The second column is the chemical formula, the third column is the PBE band gap, and the forth column is the band gap from the target functional (PBE or any of the HSE functionals).
Here, we use results from a particular HSE functional as an example input.

[ii] strut_loptics_550 folder

This folder includes all the PBE computed absorption spectra of halide perovskite compounds in our dataset. The file label is the same as the corresponding perovskite index in the spreadsheet perovs_data_final.xlsx.
We have further added the following sentence to section 2.3.3 on page 5 in the main manuscript to address the reviewer’s comments:
In this work, SLME is calculated considering a 5m sample thickness for every perovskite using equations (3), (4), and (5), combining the original SL3ME.py code from Yu et al.34 with our DFT computed absorption spectra and band gaps.


Reviewer comments:

7. In DecompositionEnergyCalculator.py, running the code was unsuccessful due to the error "decom_calcs.xlsx not found." (Line 235).
8. In DecompositionEnergyCalculator.py, please add descriptions to the dictionaries used: ref_sol, ref_d3, ref_sol_d3, etc.
9. Please explain why 300 K is used as default here in DecompositionEnergyCalculator.py (line 142), whereas 293.15 K was used in SLME.py.
10. I was unable to run DecompositionEnergy_Xmix.py due to syntax errors in lines 178 and 186. Even after fixing those issues, the code did not run because the file "Decomp_HSErel_SOC_calcs.xlsx" was not found. Please address these issues so the codes can be executed successfully.

Response:
We thank the reviewer for their comments and apologize for not including all the necessary data files for running the code. The data files “decom_calcs.xlsx” and “Decomp_HSErel_SOC_calcs.xlsx” have now been added to the Github repo and can be read by the associated scripts. The labels “ref_sol”, “ref_d3”, “ref_sol_d3”, etc. refer to different PBE functionals (PBEsol is a better parameterized PBE for solids, PBE-D3 is used for van der Waals corrections); however, this is not part of our current manuscript, but related to different work that we are doing. We have thus removed these labels from the code to avoid any confusion. About the temperature, a default of 293.15K is used in the SL3ME.py code as determined by the authors who developed it, whereas we use room temperature (300K) to calculate the mixing entropic contributions given by ikBTxiln(xi) (where xi is the mixing fraction of the i-th species at any A/B/X-site): we find that there is practically no difference in this energy (difference < 0.001 eV) if T is changed from 300K to 293.15K. Thus, our choice of temperature is not consequential here, and would only matter if we significantly increased or decreased the temperature.
We have further added the following information to the Github README file to address the reviewer’s comments and concerns about decomposition energy calculations:

**[A]- Decomposition Energy:**
==============================

**(1) Purpose:

Scripts used to calculate decomposition energy for all halide perovskites, applicable to both pure and alloyed compositions.

Python 3.9, pandas, numpy, sympy and matplotlib required.

Reference energies for all possible decomposed phases (AX, BX2, and A/B/X species) are stored in ref_data.xlsx, which will be read every time the script is run.

[i] DecompositionEnergyCalculator.py
General decomposition energy calculator for all halide perovskite compositions, used to generate all values presented in the paper.

[ii] DecompositionEnergy_Xmix.py
Special version of decomposition energy calculator for X-site mixed halide perovskites, which generally have multiple possibilities for decomposed phases. This script finds the most likely (ABX3  AX + BX2) decomposition reaction and calculates the decomposition energy based on the appropriate phases. Process is described in the SI under "Decomposition Energy correction for X site mixed perovskites".


**(2) input files:

The input files to use for the decomposition calculator is a spreadsheet.
The format of the input file is a 14-dimensional composition vector (columns 1 to 14) and total DFT energy per ABX3 functional unit as the 15th column.

Example input files include:

[i] decomp_calcs.xlsx  used for DecompositionEnergyCalculator.py

[ii] Decomp_HSErel_SOC_calcs.xlsx  used for DecompositionEnergy_Xmix.py


**(3) output files:

Both scripts will write out a spreadsheet as output, with the last column showing the calculated decomposition energy.


Reviewer comments:
11. In the directory Pearson_Correlation, please include a readme file that explains the files and their order, as it appears the output of some codes is the input of others.
12. In the directory Pearson_Correlation, HSE_PBE_SOC_heatmap.py, line 27, "Corr.xlsx" cannot be found. When I changed the code so that Corr.csv could be read here, it worked using "Corr.csv," which was generated as the output of HSE_PBE_SOC_corr.py. Please make sure the right files are called in the code and refer to comment #11 regarding including a readme file that explains the main entries.
13. In the directory Pearson_Correlation, HSE_SOC_heatmap.py, line 27, "Corr.xlsx" cannot be found. When I changed the code so that Corr.csv could be read here, it worked using "Corr.csv," which was generated as the output of HSE_SOC_corr.py. Please make sure the right files are called in the code and refer to comment #11 for adding a readme file that explains the main entries.
14. Please use variable names that are clear to public users or add explanations on what they represent. Examples are x, xx, f, r, m, and justify the values selected for m (=14), f (=16), and r (=75).
15. Please remove unnecessary commented-out code.

Response:
We thank the reviewer for these important comments and apologize for not including all the data files with the scripts to calculate and plot the Pearson coefficients of linear correlation. We have now added all the necessary *_data.csv and Corr.xlsx files. We have also added several comments in our code to explain all variables being defined and used (xx, f, r, etc.), and all the unnecessary commented-out code is removed.
We have further added the following information to the Github README file to address the reviewer’s comments and concerns about the Pearson correlation calculations:

**[C] Pearson Correlation Calculations:**
==========================

**(1) Scripts and Dependency**

[i] Folders:

All folders are named after the functionals corresponding to the DFT data.

There are 5 files inside:

*_data.csv (e.g. PBE_data.csv)  File containing PBE computed properties and descriptors for calculating Pearson coefficients of linear correlation.

Func_corr.py (e.g. PBE_corr.py)  First step script, generates Pearson Correlation values.

test.csv -> output of Pearson correlation values generated by Func_corr.py.

Func_heatmap.py. -> plots correlation values as a heatmap, using test.csv as input.

Corr.xlsx -> label file showing names of all descriptors.

**(2) Input Formats**
The input format is *_data.csv. For every functional, there is an example file provided.

For PBE and HSErel functionals, the format of the .csv file is:

Formula, type of mixing, 4 columns of properties, 14 columns of species fraction descriptors, 36 columns of elemental property descriptors.

For HSErel-SOC and HSE-PBE-SOC functionals, the format is:

Formula, type of mixing, 3 columns of properties, 14 columns of species fraction descriptors, 36 columns of elemental property descriptors.

For more details about the descriptors, please refer to our paper.



Reviewer 2

Reviewer comments:
The authors used special quasi-random structures approach to create structures with composition mixing at one of A/B/X sites. All the structures in the dataset are cubic or pseudo-cubic. And the authors utilized PBE and HSE with and without spin-orbit coupling (SOC) to obtain accurate description of properties for these halide perovskites (HaPs). Comparisons are made between the levels of theory used for different property predictions, and correlations between elements and their mixing with the properties are studied. Although the dataset is relatively small and confined to cubic and pseudo-cubic phases, it includes data from several computationally expensive supercell calculations using richer levels of theory, includes different properties, and would be helpful for future machine learning applications involving HaPs. The discussion and analysis of the dataset will be useful for feature engineering of materials descriptors. However, more information in the analysis can be included while clearly discussing the small dataset issues. Following questions needs to be addressed for further considering the manuscript for publication.

Response:
We thank the reviewer for a thorough evaluation of our manuscript, and appreciate all the detailed comments which will no doubt help us improve it. Below, we respond to each of the reviewer’s comments and show the necessary changes made to the manuscript.


Reviewer comments:
1. How does the data from this dataset compare to the existing databases like materials project and others for any previously explored materials? Can this dataset be combined with other existing datasets with respect to convergence criteria and DFT settings?

Response:
We thank the reviewer for this important comment. Indeed, our perovskite dataset must be compared with and ultimately merged with other computational databases such as the Materials Project (MP). From our examination of compounds in the MP, we find that it contains halide perovskites such as MAPbI3, CsPbI3 and MAPbBrI2, with properties such as lattice parameters, formation energy, and band gap computed for the materials in multiple phases. However, MP contains very few mixed composition perovskites, and only a fraction of the pure ABX3 compounds from our dataset. Compounds in our dataset cover a very wide chemical space of A, B, and X species, as well as various types of mixing at each site. The dataset reported as part of the current manuscript is aimed at analysis of a uniform composition-property space, but would be complementary to databases such as MP. It should also be noted that pure (pseudo-)cubic compounds such as MAPbI3 and CsPbI3 collected from MP served as the starting point for our study, as we built all structures (including alloyed supercells) based on these existing previously optimized geometries.
Furthermore, the existence of a variety of ABX3 compounds in MP and other databases such as OQMD and ICSD means that chemical insights, and better still—ML models trained on our DFT data—can be instantly applied to these compounds to enable prediction and screening. Though not part of the current manuscript, we have ongoing work on training a variety of ML models for on-demand prediction of halide perovskite decomposition energy, band gap, and SLME, using either the composition and phase as input, or more generic models using the entire crystal structure as input. Once such models are rigorously optimized and tested, we will select thousands of possible out-of-sample ABX3 compounds (potentially even including chalcogenide or other types of perovskites) from online databases to predict their properties, validate them against existing computations, and select promising new materials for subsequent computation and analysis. To address the reviewer’s comment, the following sentence has been added to the final paragraph of Section 4 (Perspective and Future Work) in the main manuscript:

Once composition-based and/or structure-based ML predictive models are rigorously optimized and validated, they could be deployed for prediction over thousands of ABX3 compounds available in databases such as the Materials Project53 or Open Quantum Materials Database54, as well as over millions of hypothetical materials, for prediction, screening, and discovery.


Reviewer comments:
2. Within each A/B/X-site mixing, the representation is different for element/molecule and the authors discuss in Fig. 2 the ratios of different elements compositions. However, the strategy behind choosing a certain element ratio for different species is not discussed. How are select compositions chosen? Is it random or constrained by some structural limits such as lattice size, compositions and so on.

Response:
We thank the reviewer for raising this issue of selecting the type of mixing fractions of different species in the ABX3 compounds. Currently, the mixing ratios of any molecular or elemental species are constrained by consideration of the cubic 2x2x2 supercell size for all halide perovskites, which means there are a total of 8 A sites, 8 B sites, and 24 X sites. We thus perform mixing only in fractions of 1/8, 2/8, … 8/8 at each site (even at the X-site, for simplicity and uniformity) to generate any alloyed composition. While any number of species are allowed to be mixed at any site, only one type of site mixing is allowed at a time (i.e., there cannot be mixing at both B-site and X-site simultaneously). The configuration of each mixed compound is generated using the special quasi-random structures (SQS) approach and full geometry optimization is performed for every mixed compound. Furthermore, the fraction of mixing is chosen completely randomly, but keeping in mind that each species and each type of mixing must be well represented in the dataset. As such, the set of 495 pseudo-cubic compounds contains all 14 independent species appearing in fractions of 1/8, 2/8, … 8/8 over the dataset. Figure R1 below (also added as new Figure S3 in the SI) shows the overall distribution of each species in the dataset, including pure and mixed compositions, and it can be seen that all species are well represented. Figure 2 in the manuscript further shows that every species occurs in every type of mixing across all the compounds in the dataset.



Reviewer comments:
3. Bandgap RMSE in comparison w/ experiments is too high. The authors explain that it could be due to ambiguity with non-cubic structural phases in experimental data, which makes it hard to judge the performance of a DFT functional’s prediction. Instead, the authors can use experimental data of known cubic phases and compare them with each set of DFT functional predictions and also include that information in the Table 2. That would allow for appropriate comparison even though for somewhat lesser data points.

Response:
We understand the reviewer’s concerns and agree that the RMSE values of DFT computed band gaps against experiments are too large. We reiterate that reasons for this arise from a difficulty of correctly assigning perovskite phase to experimental data, collecting cubic-only data points, and generating sufficient computational data on non-cubic perovskite phases (which is part of our ongoing research and initial results are captured in Figure R6). However, we collected experimental band gap values for halide perovskite compositions assumed to be cubic in the literature [2-5]; we are currently able to collect only 9 data points for well-known halide perovskites including FA/MA/Cs-Pb/Sn-based compounds. Figure R2 shows the different PBE and HSE values plotted against the measured values, and Table R1 shows the RMSE values for each functional on this smaller dataset. It can be seen that HSE-PBE-SOC does indeed have the best prediction accuracy with an RMSE of 0.4 eV, which is much smaller than the errors over the entire set of experimental compounds. This figure and table have further been added to the SI (Figure S4 and Table SI).



Functional PBE HSE-rel HSE-rel+SOC HSE-PBE+SOC
RMSE of Band Gap (eV) 0.50 1.81 1.41 0.40
Table 1. RMSE values for cubic halide perovskites gathered from DFT calculations and experimental measurement.

Finally, the following sentence has been added to the discussion in Section 3.1 to address the reviewer’s comment:

We additionally performed a comparison for a much smaller dataset of 9 compounds known to be cubic from experiments; a plot showing these band gaps is presented in Figure S4, and corresponding RMSE values in Table S1 show that EgapHSE-PBE-SOC has a respectable RMSE against experiments of 0.4 eV.


Reviewer comments:
4. For lattice constant, the authors provided only RMSE. Percentage errors might be appropriate as well since the lattice constants vary for different materials and sizes.

Response:
We thank the reviewer for this valuable suggestion and agree that the percentage error would be an even better metric for lattice constant comparison. Table R2 shows the RMSE and percentage error values for lattice constants from PBE-relaxation and HSE-relaxation compared with known experimental values. We find that errors in pseudo-cubic lattice constant prediction from either functional lie between 2% and 4%. The table has been added as Table SII to the SI.

Functional Lattice Constant RMSE (A) Lattice Constant Percentage Error
PBE relaxed 0.27 2.21%
HSE relaxed 0.31 3.91%
Table R2. RMSE values and error percentage of PBE and HSE lattice constants against experimental values.

The following sentence has been added to the discussion in Section 3.1 to address the reviewer’s comment:

The percentage error of PBE-relaxed lattice constants compared to experiment is 2.21%, and the corresponding HSE-relaxed percentage error is 3.91%.


Reviewer comments:
5. Fig. 4: shaded region should be described in the caption. (some figures in SI as well.)

Response:
We thank the reviewer for pointing this out; all figure captions are now updated with a description of the shaded region, which is meant to capture the compounds with suitable properties (decomposition energy < 0.0 eV, SLME > 15%, band gap between 1 and 2.5 eV).

The following sentence has been added to the caption of Figure 4 (similar sentences are also added to Figures S4, S5, and S6):

The shaded regions attempt to capture compounds with negative decomposition energy, band gap between 1 eV and 2.5 eV, and SLME larger than 15%.

The following sentence has been added to the caption of Figure 8 (similar sentences are also added to Figures S10 and S11):

The vertical and horizontal dashed lines aim to distinguish between negative and positive decomposition energies and highlight desirable ranges of other quantities, namely DCavg < 5% (a), t ∈ (0.813 - 1.107) (b), tB < 4.18 (c), and o ∈ (0.442 - 0.895) (d).


Reviewer comments:
6. The authors should mention the reason for different number of calculations in different settings between PBE and three types of HSE.

Response:
We thank the reviewer for raising this very valid question. The reason behind the different number of data points from PBE and each HSE functional is simply that we were able to finish running computations for a far fewer number of data points from HSE within the constraints of available computing time and researcher time. In total, we optimized 495 compounds from GGA-PBE constituting a dataset where each of the 14 species and each type of mixing are adequately represented. 299 of these compounds were used for full relaxation from HSE, selected in such a way that every A/B/X species and every type of mixing is represented adequately. The same is true for 282 compounds calculated with HSE-relaxed+SOC and 244 compounds from HSE-PBE+SOC, with each dataset again maintaining sufficient diversity. We posit that composition-property relationships can be comprehensively learned from the larger PBE dataset and going forward, the principles of multi-fidelity learning [6] can be applied to extend these insights to much smaller datasets from expensive HSE (or other) functionals. To address the reviewer’s comment, the following sentence has been added to the end of Section 2.1:

The different number of data points from different functionals is a consequence of the number of computations that were completed within the constraints of computing resources and researcher time, but adequate chemical diversity is maintained in each dataset, and as explained later—insights from cheaper functionals can be extended to more expensive theories.


Reviewer comments:
7. The authors report larger structural relaxations in HSE-relaxed compared to PBE caused 1 eV bandgap difference. How do these structural relaxations affect the relative stability between PBE and HSE-rel? Further, are the energy differences and the band gap differences between HSE-rel and PBE correlated?

Response:
We thank the reviewer for raising this excellent point. Based on our earlier comparisons of PBE-relaxed and HSE-relaxed pseudo-cubic lattice constants with experimental measurements, we find that the HSE relaxation is unnecessary and often causes additional distortions and rearrangements that take the structure farther away from the ground state. For an exhaustive look at the effect of HSE-relaxation, we present three plots below which compare the pseudo-cubic lattice constants (Figure R3) decomposition energies (Figure R4), and band gaps (Figure R5, also Figure 6a in the manuscript) from PBE-relaxation and HSE-relaxation. We find a good correlation between PBE and HSE in all the plots, but especially in lattice constants and decomposition energies. On average, lattice constant is slightly overpredicted from HSE compared to PBE, while there isn’t much difference between the two decomposition energies. There is a strong correlation between HSE-relaxed and PBE-relaxed band gaps as well, with the former being 1 eV greater than the latter on average, and this difference being even larger when it comes to hybrid perovskites with organic species at the A-site. We further note that HSE relaxation may be useful and essential for some compositions, but not for all. Learning the relationship between PBE and HSE relaxed properties can help us further improve predictions at more expensive levels of theory. Furthermore, incorporating SOC with HSE relaxation brings the band gap down (Figure 6b in the manuscript) and improves correlation with PBE band gap. The figures below have been included as new SI figures (Figure S6, S7, and S8).



Furthermore, to address the reviewer’s comment, the following sentence has been added to page 9 of the manuscript:

Furthermore, we observe from our dataset that HSE-relaxation might be superfluous, as HSE-relaxed lattice parameters, decomposition energies, and band gaps largely correlate with corresponding PBE-relaxed values, as shown in Figures S6, S7, and S8.


Reviewer comments:
8. According to Fig. 8a, deviation from cubicity might lower decomposition energy or helps in stability of the structure.

Response:
We thank the reviewer for this comment and completely with it—a larger deviation from cubicity does tend to decrease the decomposition energy further. This is natural, as an increased relaxation and distortion is taking place during the DFT optimization so as to further lower the total energy. We especially find this effect in compounds containing organic molecules at the A-site mixed with either other organic or inorganic species; the large ionic radii differences lead to substantial distortions and geometries very far from an ideal cubic perovskite structure. Our intention with this analysis is to ensure that any promising compounds we select from computational screening is actually a perovskite structure and hasn’t relaxed to a non-perovskite like configuration, for which we use the “deviation from cubicity” as a metric. In ongoing work, we are investigating the relationship between lattice strain, octahedral distortion and tilting, and molecular rotation, on the stability and optoelectronic properties of halide perovskites. To address the reviewer’s comment, we added the following sentence to the end of Section 3.5 in the manuscript:

It should be noted that larger DCavg values tend to correspond to negative ∆HPBE, but despite their stability from DFT, such compounds have a non-perovskite like phase and are thus excluded from current screening and saved for future analysis.


Reviewer comments:
9. It raises an important question - does the authors try any symmetry breaking in the structures when relaxing the dataset, which may further deviate from cubicity but also stabilizes many more compositions. Although having precise control on such operations in a high-throughput format is difficult, it may be an interesting avenue to do some simple proof of concept testing with few (say 10) structures of different mixing.
In page 14, it is reported that some compounds with t > 1.1 and low decomposition energy have large deviation from cubicity, which may further indicate that deviation from cubicity may not be necessarily bad, but maybe essential to stabilize more HaPs evaluated.

Response:
We thank the reviewer for this excellent suggestion, and agree that symmetry-breaking and intentional deviation from cubicity can help stabilize perovskites further. This is indeed motivated by the plots in Figure 8 in the manuscript, where some compounds are found to be stable despite deviation from cubicity and tolerance/octahedral factors outside the preferred range. In ongoing work, we are investigating the relationship between lattice strain, octahedral distortion and tilting, and molecular rotation, on the stability and optoelectronic properties of halide perovskites. We are planning a comprehensive manuscript on computational investigation of polymorphism in halide perovskites motivated by symmetry-breaking and re-optimization of previously known low energy phases. As an example, we show below a plot (Figure R6, added as new Figure S18 in the SI) from our ongoing work showing the DFT-computed decomposition energy for MAPbBr3 cubic 2x2x2 supercells with varying degrees of lattice and octahedral distortion. We find an expected change in the decomposition energy vs average distortion, with many of the slightly distorted structures still showing low enough energies to be metastable, revealing that some amount of symmetry-breaking could indeed help find stable materials with different properties.



To address the reviewer’s comment, the following passage has been added to page 14 of the manuscript:

As an example, plots showing the computed decomposition energy for selected perovskites with varying degrees of octahedral distortion as well as in different prototypical phases are presented in Figs. S18 and S19; it can be seen that some amount of distortions can keep the perovskite stable, the cubic phase is not always the ground state, and sometimes the range of decomposition energies for a given composition can be quite wide.


Reviewer comments:
10. The authors made Pearson correlation on the screened materials shown in Fig. 10 and Fig. S12. Considering the non-uniformities in different element compositions in the initial dataset and further smaller number of screened materials sample size, the trends observed may not be generalizable and applicable to future design principles. As more compositions are evaluated including intentional symmetry breaking, the current trends may not hold true. The authors should discuss this in the manuscript and if possible, make comparisons with pure-cubic versus pseudo-cubic compositions.

Response:
We thank the reviewer for this comment and would like to clarify that Figures 10 and S12 do not show Pearson correlation values, but simply the frequency of different types of mixing fractions adopted by various A/B/X species within the PBE and HSE screened lists. These plots are analogous to Figure 2, which shows the frequencies of mixing fractions for the entire PBE dataset, revealing that every species adopts every type of mixing at some point in the dataset, but there is a higher frequency of any species as an unmixed/pure component; for instance, there are many more Cs-based pure, B-mixed, and X-mixed compounds than compounds with Cs mixing in 1/8, 2/8, etc. fractions. Similarly, Figure 10 tells us that many of the screened compounds are pure MA or pure Pb iodides and bromides, and the other bars reveal in what fractions different species like to mix at different sites to yield desirable properties. At the A-site, species prefer mixing in very small fractions or no mixing at all. In the screened set of 32 compounds from PBE, we find that 26 are pure-cubic and the remaining are pseudo-cubic.
We agree with the reviewer that these observations may change if we performed screening over thousands of compounds instead of only ~ 500 as done in this work. In our ongoing work, we are training ML predictive models using the DFT datasets and deploying best models over hundreds of thousands of hypothetical compounds to obtain a much larger list of screened promising compositions. This extended study will help us obtain mixing fraction distributions similar to Figures 10 and S12 and make the comparisons suggested by the reviewer. The following sentence has been added to the second last paragraph of Section 3.7 to address the reviewer’s comment:

We note that our conclusions on the populations of different species and types of mixing in the screened set of compounds may change slightly in the future if a much larger dataset is available, such as via machine learning-based predictions.


Reviewer comments:
11. The use cases of this dataset and based on the analysis of descriptors, the authors should include discussion in the current manuscript with regards to what kind of inverse design is possible, and what properties can be accurately predicted considering the dataset size and specifications?

Response:
We thank the reviewer for this valuable comment, and certainly plan to apply a variety of inverse design techniques for future discovery of halide perovskites. To address the reviewer’s comment, we modified the final paragraph of Section 4 in the manuscript as follows:

In general, ML has a massive role to play here, as has been demonstrated for HaPs in multiple prior works16,22. Concurrent manuscripts are planned to report rigorously optimized predictive models for multiple properties and fidelities, based on the datasets and descriptors discussed in this work. Such models can easily be extended to new choices for A/B/X ions such as transition metals16, as well as other phases, by addition of new dimensions to the descriptors. The inclusion of more general crystalline structure representations as inputs for ML, such as using crystal graphs and graph neural networks50-52, would be essential for treating same compositions and structures with a variety of distortions or lattice strains. Once composition-based and/or structure-based ML predictive models are rigorously optimized and validated, they could be deployed for prediction over thousands of ABX3 compounds available in databases such as the Materials Project53 or Open Quantum Materials Database54, as well as over millions of hypothetical materials, for prediction, screening, and discovery. Finally, inverse design techniques, such as using genetic algorithm55 or generative neural networks56, could be applied upon the DFT-ML surrogate models to drive the efficient discovery of new HaP compositions/structures with multiple desired properties. For instance, our ongoing work involves generating populations of novel HaP compositions using GA while optimizing a fitness function that includes metrics for chemical feasibility, negative ∆H, Egap between 1 and 2 eV, and SLME > 15%; this process can yield thousands of promising compounds beyond the scope of current work and beyond brute-force enumeration. The dataset and analysis presented in this work serves as a springboard for efforts that are currently underway, to ultimately accelerate the prediction and design of novel perovskites for optoelectronics, and to extend such approaches to other material classes and applications.


Reviewer comments:
12. Fig. S4: The authors mentioned that HSE and SOC would have sufficient theory and should provide accurate electronic and optical properties. As such, the max. SLME in figures S4b, S5b, S6b is 16% compared to 25% with PBE. The authors should explain the reason for this drop in SLME when using higher levels of theory.

Response:
We thank the reviewer for raising this excellent point. The reason why the PBE computed SLME peak at 25% while the corresponding HSE peaks are around 16% is because the SLME for the latter are computed by shifting the PBE-based optical absorption spectrum by the difference between the PBE and HSE band gap. For the compounds with the highest PBE SLME values, the PBE band gaps are around 1 eV, while the corresponding HSE-rel band gaps are higher and the HSE-rel+SOC and HSE-PBE-SOC are lower. These band gap differences take the HSE SLME lower than the PBE peak. This effect holds for all compounds with PBE SLME > 15%, such that the SLME from HSE following the band gap shift always tends to be lower and peaks at 15 to 16%. We reiterate that a full HSE-based optical absorption calculation is not performed in this work due to computational expense. The absorption spectrum is assumed to have the same shape as from PBE, only shifted in the energy axis by the difference between the PBE and HSE band gap. To address the reviewer’s comment, the following statements have been added to the discussion on page 9 of the manuscript:

While SLMEPBE peaked at around 25%, the corresponding HSE peaks appear around 16% as a consequence of shifting the optical absorption spectrum by the difference between the PBE and HSE band gap. For compounds with the highest SLMEPBE values, EgapPBE are around 1 eV, while the corresponding HSE-rel band gaps are higher and the HSE-rel+SOC and HSE-PBE-SOC are lower. These band gap differences take SLMEHSE lower than the PBE peak. This effect holds for all compounds with SLMEPBE > 15%, such that SLMEHSE following the band gap shift always tends to be lower and peaks at 15 to 16%.

Reviewer 3

Reviewer comments:
Yang et al. reported a dataset of 495 pseudo-cubic ABX3 halide perovskite compounds using density functional theory calculations. Some critical material properties related to optoelectronic applications, including band gaps and theoretical photovoltaic efficiency are computed. The trends of materials properties dependent on the compositions are studied. This dataset and the related codes are useful for further computational and theoretical studies of halide perovskite materials. This manuscript could be published after addressing the following comments:

Response:
We thank the reviewer for their encouraging comments about the manuscript and future studies it would lead to. Below, we respond to the reviewer’s individual comments.


Reviewer comments:
1. It is known that the halide compounds are not only limited to cubic (or pseudo-cubic) structures. In fact, some pseudo-cubic halide compounds are not stable, and their ground state structure could belong to a different lattice system, see a prior computational and theoretical study and the reported compounds inside, like (FA)3Sb2I9, Energy Environ. Sci., 12, 2233-2243, (2019). The authors should comment on this appropriately.

Response:
We thank the reviewer for raising this very important point, and certainly agree that a comprehensive understanding of halide perovskite properties is missing without the consideration of non-cubic phases. The current work is restricted to the (pseudo-)cubic phase for every composition to keep the chemical space tractable and to understand how properties change only with composition but not with phase. We reported in past work [7] that going from cubic to non-cubic phases could change the DFT band gap of an ABX3 compound by as much as 0.5 eV. We are further exploring this effect as part of a major extension of current work: we have performed more calculations for dozens of compounds in several prototypical perovskite phases (cubic, tetragonal, orthorhombic, and hexagonal) and computed their stability and optoelectronic properties. As an example, we present below a plot (Figure R7, also added as Figure S19 in the SI) of the computed decomposition energy for 11 selected hybrid perovskites in 4 different phases (also added as an SI figure): it is clear that the cubic phase is not always the ground state, and sometimes the range of decomposition energies for the same composition can be quite wide. Nevertheless, we believe our current choice of restricting all compounds to the well-known cubic phase is justified in terms of learning composition-property relationships and by the fact that most cubic phases can be stabilized by pressure or temperature. We are currently preparing a new manuscript on comprehensive computational investigation of polymorphism in halide perovskites, which includes both comparisons of cubic and non-cubic phases as well as a look at many distorted and metastable structures within the same perovskite composition/phase.



Finally, we edited the second paragraph of Section 4 (Perspective and Future Work) in the main manuscript to the following to address the reviewer’s comments:

In our work, the immediate next extension is towards non-cubic perovskite phases. For instance, CsPbBr3 may prefer the orthorhombic phase, while MAPbI3 and MA(Pb-Sn)I3 may assume tetragonal phase, and this work considers all such compounds only in a cubic or pseudo-cubic rendition. In previous work16, it was shown that for the same composition, unalloyed or with mixing, changing the phase could modify the band gap by 0.5 eV or more in many cases. Cubic perovskite phases are often not the ground state, and are in many cases very unstable—as indicated by the large positive decomposition energies for some compounds in our dataset. Non-cubic phases are either the most stable, or metastable/competing phases for most of the compositions studied in this work. Currently, we have high-throughput computations ongoing for tetragonal, orthorhombic, and hexagonal phases of several mixed HaPs; the perovskite phase itself can be added as an input to the compositional and elemental descriptors to obtain new correlations. In addition, computations are being performed for further tailoring of properties by accessing polymorphs within each phase, e.g., via octahedral distortion and rotation45, or via re-optimization of the same composition in larger supercells with slight distortions46. As an example, plots showing the computed decomposition energy for selected perovskites with varying degrees of octahedral distortion as well as in different prototypical phases are presented in Figs. S18 and S19; it can be seen that some amount of distortions can keep the perovskite stable, the cubic phase is not always the ground state, and sometimes the range of decomposition energies for a given composition can be quite wide.


Reviewer comments:
2. From figure 1, it seems that the computed compounds are limited to the common 2+ cations such as Pb, Sn, Ge, Ba, Sr, Ca. This should be indicated in the title and/or abstract to clearly tell readers what kind of compounds are computed in this work since the “halide perovskite alloys” cover a broad range of compounds that contains much more than the computed ones in this work.

Response:
We appreciate the reviewer’s comment and agree that it is important to identify that the B-site cations are restricted to those that adopt the 2+ oxidation state. Our reasons for adopting Pb, Sn, and Ge are because of their ubiquity in halide perovskites, whereas other 2+ cations like Ca, Sr, and Ba have been used in small amounts in these compounds as well. To address this comment, we have modified the third sentence in the abstract to the following:

In this work, we report a density functional theory (DFT) dataset of 495 ABX3 halide perovskite compounds, with monovalent organic or inorganic cations as A, divalent Group 2 or Group 14 elements as B, and I, Br, or Cl as X, and different amounts of mixing applied at each site using the special quasirandom structures (SQS) approach.


Reviewer comments:
3. For optoelectronic applications, the electron and hole transport properties are critical, which are ignored in this work. The authors should discuss it or explain why they are not needed/calculated.

Response:
We thank the reviewer for raising this very important point. Indeed, properties such as the effective masses and mobilities of electrons and holes are crucial for semiconductors used in optoelectronic applications. While a comprehensive calculation and analysis of these properties is beyond the scope of our current work, we certainly plan to extend our computations to electron and hole transport properties of halide perovskites by utilizing, for instance, Boltzmann Transport Equation (BTE) models and phonon scattering. Since we already have computed optimized structures and electronic properties from PBE and HSE functionals in our work, we have the initial data to serve these additional calculations. Many open-source software such as BoltzTraP from Madsen et al. [8] can be used for studying carrier transport as part of our future work. It should be noted that accurate electronic band structures from DFT are needed for calculating such properties: while our dataset contains band structures from GGA-PBE, extending the calculations to obtain full band structures from HSE functionals will be very expensive and part of a larger endeavor. To address the reviewer’s comment, we added the following sentence to the end of the first paragraph of Section 4 (Perspective and Future Work) in the main manuscript:

It is important to note that while the bulk stability, band gap, and theoretical single-junction PV efficiency provide essential parameters for initial screening of PV-relevant halide perovskites, extensions need to be made to other crucial properties, including electron and hole transport properties, formation energies and electronic levels of point defects, and the behavior of relevant perovskite surfaces and interfaces.







REFERENCES

[1] Yu, Liping, and Alex Zunger. “Identification of Potential Photovoltaic Absorbers Based on First-Principles Spectroscopic Screening of Materials.” Physical review letters 108.6 (2012): 068701–068701.
[2] Ferrara, Chiara et al. “Wide Band-Gap Tuning in Sn-Based Hybrid Perovskites through Cation Replacement: The FA1-xMAxSnBr3 Mixed System.” Journal of materials chemistry. A, Materials for energy and sustainability 5.19 (2017): 9391–9395.
[3] Morana, Marta et al. “Cubic or Not Cubic? Combined Experimental and Computational Investigation of the Short-Range Order of Tin Halide Perovskites.” The journal of physical chemistry letters 14.8 (2023): 2178–2186.
[4] Mannino, Giovanni et al. “Temperature-Dependent Optical Band Gap in CsPbBr3, MAPbBr3, and FAPbBr3 Single Crystals.” The journal of physical chemistry letters 11.7 (2020): 2490–2496.
[5] Chen, Zhuo et al. “Direct Synthesis of Cubic Phase CsPbI3 Nanowires.” CrystEngComm 21.9 (2019): 1389–1396.
[6] W. Y. X. L. Chi Chen, Yunxing Zuo and S. P. Ong, Nat. Comput Sci, 2021, 1, 46–53.
[7] A. Mannodi-Kanakkithodi and M. K. Y. Chan, Energy Environ. Sci., 2022, 15, 1930–1949.
[8] Madsen, Georg K.H., and David J. Singh. “BoltzTraP. A Code for Calculating Band-Structure Dependent Quantities.” Computer physics communications 175.1 (2006): 67–71.




Round 2

Revised manuscript submitted on 18 Apr 2023
 

03-May-2023

Dear Dr Mannodi Kanakkithodi:

Manuscript ID: DD-ART-02-2023-000015.R1
TITLE: A High-Throughput Computational Dataset of Halide Perovskite Alloys

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after minor revisions regarding captioning, raised by Reviewer #2.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery

************


 
Reviewer 2

In Figure R6, it is unclear how many compounds are present and how each structure's decomposition energy varies with lattice distortions.
The concerns raised in the initial review have been adequately addressed by the authors in their responses and the manuscript. The manuscript may be accepted for publication in Digital Discovery.

Reviewer 1

RE: re-review of code/data (DD-ART-02-2023-000015.R1)


Title: A High-Throughput Computational Dataset of Halide Perovskite Alloys
Authors: Jiaqi Yanga, Panayotis Manganarisa, and Arun Mannodi-Kanakkithodia

The authors have included all the requested information in their GitHub repository and have clearly listed the main inputs and entry points. The clarification about the SLME file in the revised manuscript is helpful as the statement “Introduced by Yu and Zunger, the SLME is a convenient metric for evaluating a semiconductor’s suitability for single junction photovoltaic (PV) absorption” in the original manuscript was misleading. Finally, the codes were able to be executed properly after missing data files were added.

I would like to thank the authors for clearly addressing all my concerns. As I have no further concerns or questions regarding the code/data aspects of this work, I recommend publishing this paper in Digital Discovery.


 


Referee #1 comments:

The authors have included all the requested information in their GitHub repository and have clearly listed the main inputs and entry points. The clarification about the SLME file in the revised manuscript is helpful as the statement “Introduced by Yu and Zunger, the SLME is a convenient metric for evaluating a semiconductor’s suitability for single junction photovoltaic (PV) absorption” in the original manuscript was misleading. Finally, the codes were able to be executed properly after missing data files were added.

I would like to thank the authors for clearly addressing all my concerns. As I have no further concerns or questions regarding the code/data aspects of this work, I recommend publishing this paper in Digital Discovery.

Author response:

We thank the reviewer very much for the comments!


Referee #2 comments:

In Figure R6, it is unclear how many compounds are present and how each structure's decomposition energy varies with lattice distortions.
The concerns raised in the initial review have been adequately addressed by the authors in their responses and the manuscript. The manuscript may be accepted for publication in Digital Discovery.

Author response:
Thank you very much for the comments!
In Figure R6 of the response letter, we plotted only MAPbBr3 with multiple lattice strains and random octahedral distortions. For some data points with different types of lattice and octahedral distortions, the average of distortion comes out to be the same, thus resulting in a range of decomposition energies for the same average of distortion value. To reiterate, figure R6 shows only 1 compounds, and similar plots can be made for other compounds as well.




Round 3

Revised manuscript submitted on 03 May 2023
 

04-May-2023

Dear Dr Mannodi Kanakkithodi:

Manuscript ID: DD-ART-02-2023-000015.R2
TITLE: A High-Throughput Computational Dataset of Halide Perovskite Alloys

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Joshua Schrier
Associate Editor, Digital Discovery


******
******

Please contact the journal at digitaldiscovery@rsc.org

************************************

DISCLAIMER:

This communication is from The Royal Society of Chemistry, a company incorporated in England by Royal Charter (registered number RC000524) and a charity registered in England and Wales (charity number 207890). Registered office: Burlington House, Piccadilly, London W1J 0BA. Telephone: +44 (0) 20 7437 8656.

The content of this communication (including any attachments) is confidential, and may be privileged or contain copyright material. It may not be relied upon or disclosed to any person other than the intended recipient(s) without the consent of The Royal Society of Chemistry. If you are not the intended recipient(s), please (1) notify us immediately by replying to this email, (2) delete all copies from your system, and (3) note that disclosure, distribution, copying or use of this communication is strictly prohibited.

Any advice given by The Royal Society of Chemistry has been carefully formulated but is based on the information available to it. The Royal Society of Chemistry cannot be held responsible for accuracy or completeness of this communication or any attachment. Any views or opinions presented in this email are solely those of the author and do not represent those of The Royal Society of Chemistry. The views expressed in this communication are personal to the sender and unless specifically stated, this e-mail does not constitute any part of an offer or contract. The Royal Society of Chemistry shall not be liable for any resulting damage or loss as a result of the use of this email and/or attachments, or for the consequences of any actions taken on the basis of the information provided. The Royal Society of Chemistry does not warrant that its emails or attachments are Virus-free; The Royal Society of Chemistry has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted. Please rely on your own screening of electronic communication.

More information on The Royal Society of Chemistry can be found on our website: www.rsc.org




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license