Daniel W.
Davies
a,
Keith T.
Butler
*a,
Jonathan M.
Skelton
a,
Congwei
Xie
b,
Artem R.
Oganov
cde and
Aron
Walsh
*fg
aCentre for Sustainable Chemical Technologies, Department of Chemistry, University of Bath, Claverton Down, Bath BA2 7AY, UK. E-mail: k.t.butler@bath.ac.uk
bScience and Technology on Thermostructural Composite Materials Laboratory, International Center for Materials Discovery, School of Materials Science and Engineering, Northwestern Polytechnical University, Xian, Shaanxi 710072, Peoples Republic of China
cInternational Center for Materials Discovery, School of Materials Science and Engineering, Northwestern Polytechnical University, Xian, Shaanxi 710072, Peoples Republic of China
dSkolkovo Institute of Science and Technology, 3 Nobel Street, Moscow Region 143026, Russia
eMoscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141700, Russia
fDepartment of Materials Science and Engineering, Yonsei University, Seoul 03722, Korea. E-mail: a.walsh@imperial.ac.uk
gDepartment of Materials, Imperial College London, Exhibition Road, London SW7 2AZ, UK
First published on 4th December 2017
The standard paradigm in computational materials science is INPUT: STRUCTURE; OUTPUT: PROPERTIES, which has yielded many successes but is ill-suited for exploring large areas of chemical and configurational hyperspace. We report a high-throughput screening procedure that uses compositional descriptors to search for new photoactive semiconducting compounds. We show how feeding high-ranking element combinations to structure prediction algorithms can constitute a pragmatic computer-aided materials design approach. Techniques based on structural analogy (data mining of known lattice types) and global searches (direct optimisation using evolutionary algorithms) are combined for translating between chemical composition and crystal structure. The properties of four novel chalcohalides (Sn5S4Cl2, Sn4SF6, Cd5S4Cl2 and Cd4SF6) are predicted, of which two are calculated to have bandgaps in the visible range of the electromagnetic spectrum.
There are vast areas of unexplored chemical space for inorganic compounds.30 Such a space is intractable to high-throughput first-principles computation, even with tremendous advances in computing power and algorithms. As such, a different approach is required to efficiently explore the search space – one that is less computationally demanding overall, but sufficiently accurate.
One modern tool that is providing impressive leaps forward in this area is machine learning (ML), a subfield of artificial intelligence that involves statistical algorithms whose performance improves with experience. A growing infrastructure of ML tools has enabled its application to complex problems in many areas of chemistry and materials science.6,20,21 This includes the development of models that relate system descriptors to desirable properties in order to reveal structure–property relationships,31 the prediction of the likelihood of a composition to adopt a given crystal structure,32 and the use of quantum-mechanics results as training data to extrapolate and discover new materials at a fraction of the computational cost.29,33
Another approach is to apply a hierarchy of screening steps, based on pre-existing methods, whereby the fact that accuracy is low in initial steps is counteracted by the idea that as the size of the search space that can be screened is so large, the chance of finding a promising material at the end of the process remains high. Here we present one such workflow incorporating simple chemical descriptors, data mining from public databases, density functional theory (DFT) calculations and global structure searching algorithms (Fig. 1) to translate from a compositional search space to compounds predicted to have target properties by quantum-mechanical calculations.
We employ a multi-stage screening approach in a search for new photoactive semiconductors. While metal oxides combine many attractive properties for energy materials (e.g. chemical stability and low cost), they usually have bandgaps too large to absorb a significant fraction of sunlight. The formation of multi-anion compounds offers a route to modifying the electronic structure, so we consider all ternary metal chalcohalides, (i.e., AxByCz with B = [O, S, Se, Te] and C = [F, Cl, Br, I]). As a target application, we search for materials for solar fuel generation, specifically for photoelectrochemical water splitting, where a set of well-defined screening criteria enables us to quickly narrow down the search space. Our searching methodology is built on already established and freely available materials design tools (SMACT, PYMATGEN and USPEX) and can be adapted to search for different classes of materials, in a wide range of contexts of technological interest.
First, the SMACT code30 is used to narrow down the ternary compound search space of roughly 32 million compositions to the chalcohalide search space of 161000 compositions. The SSE scale is then used to screen for suitable bandgaps and band-edge positions. The A cations are restricted to those with a SSE higher than the water reduction potential (approximately 4.5 V in relation to the vacuum at pH = 0) and the bandgap window was set to 1.5–2.5 eV. The latter criterion is set to a value range higher than the free energy for water dissociation (1.2 eV), in order to compensate for the combination of loss mechanisms found in practical devices that mean a bandgap as large as 2.2 eV could be required.35,36 This results in in 7676 candidate AxByCz compositions with unique x, y, z stoichiometries.
Next, we sort the candidates by the sustainability of their constituent elements based on the Herfindahl–Hirschman Index for elemental reserves (HHIR).37 The HHIR includes factors such as geopolitical influence over materials supply and price, and for a given composition can be obtained as the weighted average over the constituent elements. At this stage, because stoichiometry is variable, we consider the mean value for each AxByCz chemical system. The six most sustainable chemical systems according to this scale are SnxSyXz, CdxSyXz and TixSyXz, where X = [Cl, F]. Of these, the Sn- and Cd-containing compositions are selected and Ti3+ compounds are excluded due to the d1 electronic configuration being linked to fast electron–hole recombination, and, more practically, the well-known challenges for electronic-structure modelling due to the high correlation.38
The HHIR scores of ZnxSyXz and CdxSeyXz are the next lowest in the ranking, making these the next most sustainable according to this scale. This is because Zn and Se have higher HHIR scores than Ti and S respectively. These systems could be of interest for future studies in the same spirit, particularly the Zn-containing compositions due to their low toxicity. This rapid screening process based on composition alone constitutes the first phase of our overall procedure (part 1 of Fig. 1).
We combine two machine learning approaches for generating candidate crystal structures from chemical composition, viz. (1) analogy with known crystal structures reported in crystallographic databases, and (2) direct global crystal structure searching. The first approach has a much lower computational cost, exploiting data on existing compounds, and we use this step to assess the metastability of a candidate composition. Those compounds that fall within an acceptable window of metastability are then passed to the second method, which is a more rigorous search of configurational space and allows for new structure types to be adopted.
For crystal structure prediction by analogy, we adopt the structure substitution algorithm developed by Hautier et al.,40 as implemented in the PYMATGEN framework.41 In this method, a combination of ions are substituted onto lattice sites in known structures from the Inorganic Crystal Structure Database (ICSD).42 Each ion substitution is associated with a certain probability, which comes from a statistical model trained on the compounds that already exist in the ICSD. If the overall probability for a given set of substitutions is above a certain threshold, it is added to a list of possible structures. This substitution process is performed on each known crystal structure in the database.
For each S-for-O of the four compositions, the candidate crystal structures are locally optimized using DFT calculations and the structure with the lowest energy per atom selected. Fig. 2 illustrates this process for one of the structures suggested by the algorithm for the CdxSyClz chemical system. In this case, the structure suggested is based on Hg5O4Cl2 due to the high probabilities associated with both Cd-for-Hg and S-for-O substitutions. Table 2 contains the chemical formulae of the four compounds deemed to be the most stable as a result of this process, along with the formulae of their parent structures in the ICSD. We next assess the thermodynamic stability of the candidate materials.
Fig. 2 Illustration of the process of crystal structure prediction by ion substitution into existing lattice types. The Hg5O4Cl2 structure (a) is identified as a candidate structure for the CdxSyClz chemical system. The Hg2+ (grey balls) and O2− ions (red balls) are replaced by Cd2+ (purple balls) and S2− ions (yellow balls), respectively, to produce the Cd5S4Cl2 structure (b). Forces on the ions are then minimised using DFT with the PBEsol functional43 to produce the relaxed structure (c). |
Fortunately, the existence of databases of DFT total energies have all but eliminated the need for carrying out calculations for all phases of a given chemical system. Instead, one can perform calculations on new compounds using identical parameters to those used for the data in a given database, thus allowing for direct comparison of energies. Similarly, one can use the energy values in a database to construct a phase diagram and identify where on the diagram the new phase would appear. In doing so, the set of polymorphs and decomposition products that require explicit calculation can be identified. We note that it is standard to calculate such convex hulls based on internal energies, which neglect finite temperature contributions to the free energy of a compound.
Here, we use the Materials Project database to construct phase diagrams using the PYMATGEN code,41 and hence identify decomposition products. As mentioned above, and as depicted in the phase diagrams in Fig. 3, it is not necessary to consider competing polymorphs as no compounds have yet been reported for these compositions. As can be seen from Table 2, all of the values of Ehull for the structures predicted by analogy lie between 18 and 97 meV per atom. Hence, all the compounds can formally be described as thermodynamically metastable at 0 K, but does this rule out their existence?
Metastable materials exist and are ubiquitous in both nature and technology. This includes obvious examples such as diamond vs. the lower energy allotrope of carbon, graphite, as well as classes of materials such as zeolites and metal–organic frameworks.44 It was recently estimated by Sun et al. that around half of all known inorganic materials are metastable.25 Whether or not the value of Ehull is enough to predict the likelihood of successful synthesis of a material is a question that has yet to be answered. In the same work by Sun et al., it was shown that the likelihood of existence drops off exponentially as Ehull increases. The exact rate of the drop depends on the chemistry of the system. We use 100 meV per atom as a guiding principle for the maximum Ehull, as this criteria covers approximately 90% of compounds in the Materials Project database that represent fully-characterised structures in the ICSD. The four structures found by analogy all fall within this metastability window, so they are all carried forward to the global structure searching stage.
For each of the four compositions, the global structure search algorithm45,46 yields a different crystal structure to that found by analogy with known structures (Fig. 4). For each of the structures generated by the global search, there is no way in which the data-mining algorithm could have arrived at the same result. This is an intrinsic limitation of the data-mining approach, as it relies on a database of known structures and it is therefore incapable of predicting new structure types. Three of the four compounds adopt structure types that have not yet been reported, disregarding those with fractional occupancy on some lattice sites. The remaining compound, Cd5S4Cl2, adopts the same structure type as Li5BiO5.47 However, this substitution is rejected by the structure prediction algorithm on the basis that the resulting formula is not charge neutral – the structure we find is partially inverted in terms of anion/cation occupancy.
Fig. 4 Crystal structures of the four candidate compositions as predicted by analogy through data mining of other structures and by a first-principles global structure search algorithm. |
The values of Ehull for the structures predicted by global structure search are also shown in Table 2, and are universally lower than those found by analogy. While the structural analogy procedure is limited by the diversity of known structure types, the global structure search approach is restricted only by the structural complexity (number of formula units) included in the search. A holistic assessment of performance in the context of high-throughput screening must however also take into account time and resources: the data-mining algorithm takes only a few minutes to run on a desktop computer, while the global structure searching requires a supercomputing resource where around 10000 CPU hours were needed for each material.
In addition to thermodynamic stability, another factor that cannot be ignored is dynamic stability, to ensure that the crystal structures are true local minima (and not saddle points) on the potential energy surface. Finite-displacement calculations were carried out to obtain the vibrational (phonon) frequencies of each of the compounds, and no negative-frequency (imaginary) phonon modes were found at the zone centre (Γ point) for any of the structures. Full details of this analysis can be found in the ESI.†
Compound | Space group | a (Å) | b (Å) | c (Å) | Formula units per cell |
---|---|---|---|---|---|
Sn5S4Cl2 | Pma2 | 17.529 | 5.771 | 5.817 | 2 |
Sn4SF6 | R3 | 8.615 | 8.615 | 9.528 | 3 |
Cd5S4Cl2 | Cm | 14.507 | 4.212 | 15.631 | 2 |
Cd4SF6 | Rm | 3.832 | 3.832 | 37.148 | 3 |
Having established promising compositions and their candidate structures, we next go on to perform quantitative analyses of the detailed electronic structure of these materials.
The first-principles values of Eg are presented in Table 2 alongside the bandgaps estimated using the SSE scale. Two of the compounds found by the screening procedure, Cd5S4Cl2 and Cd4SF6, have bandgaps in the visible range of 2.75 and 2.15 eV, respectively. Sn5S4Cl2 has a bandgap of 0.9 eV, which is better suited for solar cell or thermoelectric applications. This is encouraging, given the small set of compounds that have been brought through to this stage of the screening process and the qualitative nature of the SSE metric employed to screen the bandgaps.
Compound | Parent | E analogyhull (meV per atom) | E globalhull (meV per atom) | E SSEg (eV) | E g (eV) | EA (eV) | IP (eV) | ||
---|---|---|---|---|---|---|---|---|---|
a When only polar surfaces could be found, a dipole correction term was added to the calculation of the surface dipole, which yields upper and lower bounds to the EA and IP values (see Computational methods section). | |||||||||
Sn5S4Cl2 | Hg5 (O2Cl)2 | 96.5 | 61.8 | 2.0 | 0.91 | 3.30 | 4.21 | 0.50 | 0.40 |
Sn4SF6 | Hg4OF6 | 51.8 | 46.7 | 2.0 | 3.36 | 2.45–2.94a | 5.81–6.30a | 0.86 | 2.01 |
Cd5S4Cl2 | Hg5 (O2Cl)2 | 83.5 | 50.2 | 1.9 | 2.75 | 3.33 | 6.08 | 0.18 | 2.58 |
Cd4SF6 | Cd4F6O | 18.2 | 18.0 | 1.9 | 2.15 | 4.33 | 6.48 | 0.25 | 2.00 |
Beyond the bandgap, quantum-mechanical calculations can also provide access to optical absorption spectra via computation of the complex dielectric function. Fig. 5 shows the simulated spectra of the four compounds of interest. The Cd compounds display moderate absorption in the visible region, indicating their potential for use as solar fuel or photovoltaic materials. Of the two, Cd4SF6 absorbs photons with energy across more of the visible range but quite weakly, suggesting that thicker layers would be needed in a device. Meanwhile, Cd5S4Cl2 absorbs more strongly but at a higher energy, so would be suited to incorporation into a tandem solar cell.
The absolute band edge positions are also calculated using surface (non-polar slab) models of the four materials. The CBM position is the negative of the electron affinity (EA), and as indicated in Table 2, the EA values are all <4.5 eV. This indicates that as well as having promising bandgaps, the two Cd-based compounds have potential for use in photoelectrochemical water splitting applications, with VBM and CBM positions that bridge the water oxidation and reduction potentials, enabling the redox reaction. For Sn4SF6, no slab without an overall dipole could be found, so we instead report a likely range for the EA and IP values after applying a dipole correction in the slab calculation (see Computational methods section). This material also bridges these energies, but has too wide a band gap, while the other Sn-containing compound, Sn5S4Cl2, has an appropriate EA, but too small a bandgap, as has already been discussed. This is summarised in the energy band alignment diagram, Fig. 6.
Finally, carrier effective mass (m*) is a quantity that can also provide preliminary insight into the performance of a semiconducting material, with smaller m* values being more desirable as this quantity is inversely proportional to conductivity. The two Cd-containing compounds have lower values than the Sn-containing compounds (Table 2). This is a result of the metallic s-states forming the lower conduction band in the former case which give higher dispersion than the more directional metallic p-states in the latter (Fig. 7a and b). The values are in general much higher, with the sulphur and halide p-states dominating the upper valence band. One notable exception is Sn5S4Cl2 with a value of 0.40 me. This is a result of strong hybridisation between the Sn s and S p orbitals which form a two-dimensional Sn–S network along which carriers can transport without encountering a Cl atom (Fig. 4). This is possible due to the Sn2+ oxidation state, which results in the Sn s orbitals remaining occupied. In the case of Sn4SF6, no such Sn–S network exists and S p states dominate the VBM, while F p states also contribute (Fig. 7c).
The calculated band structure of Sn5S4Cl2 reveals the presence of multiple band extrema (“multi-valley”), a sought-after feature in the design of thermoelectric materials.48 Furthermore, the effective number of extrema is increased by the presence of multiple bands within a few kBT in energy of each other at the R, T, S and U points in the Brillouin zone (see ESI Fig. S4†).
xqA + yqB + zqC = 0 | (1) |
The SSE scale34 is used to limit the A cations to those with a SSE higher than the water reduction potential and the bandgap window was set to 1.5–2.5 eV. The SSE provides information on valence and conduction bands on the basis of the Frontier orbitals of the constituent ions. It reflects ionisation potential of an anion (filled electronic states) and electron affinity of a cation (empty electronic states). The energies of 40 elements were originally fitted from a test set of 69 closed-shell binary inorganic compounds, and now the SSE values for 94 elements are available.50 The bandgap (Eg) can then be estimated from the tabulated SSE values as
ESSEg = SSEcation − SSEanion | (2) |
For multicomponent systems, the limiting SSE values are used.
Absolute electron energies (IP and EA values) are calculated by generating 2D slab models of low Miller index, non-polar surfaces of the crystal structures. Hybrid DFT (HSE06 functional) is used to calculate the surface dipole, D, which is the difference between the average electrostatic potential in the slab and that in the vacuum level. The VBM and CBM positions from the bulk calculations can then be used to calculated the true VBM and CBM positions. These are simply the differences between D and VBMbulk, and D and CBMbulk, respectively. Convergence with respect to slab thickness and vacuum distance was achieved within two repeat layers and 15 Å respectively, in all cases. When no non-polar surfaces could be found for a material, a dipole–dipole correction, as implemented in the VASP code, was added to the potential. This leads to an upper and lower limit of the potential in the vacuum level, and hence an upper and lower limit to D.
Carrier effective masses are calculated using band structures generated from hybrid DFT (HSE06 functional) calculations. The SeeKpath code60 is used to generate a suitable path through the Brillouin zone, which is sampled at a resolution of 0.01 Å−1 between each k-point. In order to calculate effective masses, a parabola is fit to all points from the minimum (maximum) of the CBM (VBM) to the points kBT higher (lower).
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc03961a |
This journal is © The Royal Society of Chemistry 2018 |