Open Access Article
Laura-Bianca
Paşca
a,
Yuanbin
Liu
a,
Andy S.
Anker
ab,
Ludmilla
Steier
a and
Volker L.
Deringer
*a
aInorganic Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford OX1 3QR, UK. E-mail: volker.deringer@chem.ox.ac.uk
bDepartment of Energy Conversion and Storage, Technical University of Denmark, Kgs. Lyngby 2800, Denmark
First published on 2nd September 2025
The chalcogenide perovskite material BaZrS3 is of growing interest for emerging thin-film photovoltaics. Here we show how machine-learning-driven modelling can be used to describe the material's amorphous precursor as well as polycrystalline structures with complex grain boundaries. Using a bespoke machine-learned interatomic potential (MLIP) model for BaZrS3, we study the atomic-scale structure of the amorphous phase, quantify grain-boundary formation energies, and create realistic-scale polycrystalline structural models which can be compared to experimental data. Beyond BaZrS3, our work exemplifies the increasingly central role of MLIPs in materials chemistry and marks a step towards realistic device-scale simulations of materials that are gaining momentum in the fields of photovoltaics and photocatalysis.
Given the rapidly growing interest in BaZrS3, computational methods are increasingly used to complement experimental studies of this material. Density-functional theory (DFT) and phonon computations were employed to map out the thermodynamic conditions under which BaZrS3 films might form and which surface termination is expected to be the most stable.13,14 To reach beyond the system-size limits of DFT-based methods, machine-learned interatomic potentials (MLIPs) have now been applied to many functional materials,15–17 including halide perovskites.18–21 The chalcogenide alternatives, viz. BaZrS3 and homologous compounds, were recently studied in a comprehensive work using ML-accelerated molecular dynamics (MD).22 These studies have typically focused on the crystalline material20,22 and the formation of other phases, such as the binary crystals or 2D Ruddlesden–Popper structures.13 To validate ML-accelerated MD, Kayastha et al. compared simulated XRD patterns for MD-generated BaZrS3 structures with experimental XRD patterns.23 However, these simulations also were focused on ordered unit cells, corresponding to single-crystalline samples.
This limitation is more generally a current research challenge in modelling perovskite solar-cell materials: experimentally synthesised materials are usually polycrystalline, and fully realistic simulations would therefore need to involve structural models representing individual grains, with sizes typically on the order of tens or hundreds of nanometres.10,12 A single-crystalline structural model will therefore likely not suffice to fully understand the structure and properties of BaZrS3 films. We have recently reported very-large-scale atomistic models of functional materials, including phase-change materials for data storage30 and amorphous silicon which is relevant to solar cells.31,32 It would seem highly beneficial to achieve this type of realism for chemically complex, perovskite-type photoabsorber materials as well.
Here, we introduce a machine-learned interatomic potential (MLIP) model for ordered and disordered forms of BaZrS3, based on the atomic cluster expansion (ACE) framework.33,34 For training, we employ a combination of de novo35,36 and domain-specific iterative training (Fig. 1), aiming for the final dataset to capture the structural complexities of BaZrS3. We show how ML-driven simulations can describe three scenarios relevant to experimental studies: (i) the amorphous precursor; (ii) large-scale grain boundaries; and (iii) polycrystalline BaZrS3 structures. This way, ML-driven simulations can corroborate experimental observations regarding the atomistic structure of this material and provide insights that experiments on their own can not. Beyond their application to BaZrS3, we expect that ML-driven approaches for simulating polycrystalline structures—from precursors to individual grains—can more broadly accelerate computational studies of diverse polycrystalline solar-cell materials.
![]() | ||
| Fig. 1 A machine-learned interatomic potential for crystalline and amorphous BaZrS3. (a) Schematics of the different approaches used in training dataset construction, showing examples of the different configuration types sampled. Structural images were created using OVITO.24 (b) Evolution of the training dataset, visualised in a style similar to ref. 27. Each slice provides a two-dimensional representation (using the UMAP algorithm28) of the relevant configurational space, showing the reference training dataset of the ML potential, distributed based on the structures' average atomic-environment similarity. The latter is quantified using the SOAP kernel similarity metric29 with a cut-off radius of 5 Å and smoothness of 0.75 Å (see SI for more details): the distances between points in this two-dimensional space therefore reflect the structural (dis-) similarity between entries of the training dataset. | ||
We used different MLIP fitting approaches as part of the development of the final model. Initially, the Gaussian Approximation Potential (GAP)43 framework was used because of its data efficiency: it allowed us to generate a stable initial potential with relatively few training data points (90 initial RSS structures, with a further 899 structures obtained from de novo GAP-RSS exploration).35,36 Once a larger dataset had been built by iterative training, the computationally efficient ACE framework as implemented in
was used to fit a faster MLIP model to that dataset.34,44
The final ACE model was obtained by iterative training until it could reliably generate a structural model for the amorphous phase using an MD melt–quench protocol in the NPT ensemble (see SI for further details). Two model versions were fitted to the final dataset: the first by filtering out structures with high DFT energies (>1 eV per atom), indicative of very close contacts between atoms or high-energy RSS structures, and the second using the full reference dataset (SI). The first version was used to generate all the quantitative data presented, as it achieved good accuracy on the structures relevant in the present study, showing an energy root-mean-square error (RMSE) of 13.9 meV per atom relative to DFT results using the PBEsol functional.45 The second version, which includes higher-energy dimers and random structures, is less accurate (energy RMSE: 23.1 meV per atom), but it did not fail when handling structures with closer contacts between atoms, and therefore it was used to relax the polycrystalline structures with randomly-oriented grains. (We consider a simulation to have “failed” if, during the MD run, atoms come closer to each other than 1 Å, collapsing the structure, or if atoms are lost during the simulation.) Details of numerical errors are provided in Fig. S1 and S2 in the SI. As an additional test, we computed the phonon dispersion curves for crystalline BaZrS3 (Fig. S3).
The local ordering of S atoms around the A- and B-site cations can be observed in the radial distribution function (RDF) of the quenched amorphous structure (Fig. 2d, left), which shows well-defined peaks in the short-range region, at distances roughly below 4 Å. These two main peaks correspond to the Zr–S and Ba–S interatomic distances, as confirmed by the partial RDF plots (Fig. 2d, right). The RDF data obtained from atomistic simulations can be compared with experimental extended X-ray absorption fine structure spectroscopy (EXAFS) results,10 which probe the local structure around Zr atoms in the amorphous phase. Although the techniques are different, both the EXAFS data and our simulations qualitatively indicate under-coordinated Zr environments in amorphous BaZrS3 compared to its crystalline counterpart. Specifically, analysis of the EXAFS data yielded a Zr–S bond length of 2.593 Å and coordination number (CN) of 5.2 (see ref. 10 for details), while our simulations yield an average bond length of 2.575 Å and CN of 5.9 (determined using a 3.1 Å cut-off in OVITO).24 A more in-depth comparison between the experimental and simulated values of the bond length and CN of Zr is provided in Table S5. In addition to the differences in methodology, the experimental values may be influenced by factors such as defects in the amorphous precursor material, for example sulfur vacancies, which were not taken into account in our current model. While modelling defects is beyond the scope of the current work, datasets such as those provided by the ab initio study of the defect landscape of BaZrS3 by Desai et al.,25 as well as methodologies such as those developed by Mosquera-Lois et al.,26 could be used to expand the training of MLIP models for BaZrS3 further.
The presence of B-site cation halide fragments in our simulated structure, some maintaining a similar octahedral geometry to the crystalline phase, is in agreement with experimentally reported structural details of the amorphous phases derived from other materials adopting the perovskite structure.10,46,47 Preserved local bonding units of TiO6 connected in a random network via apex-, edge-, and face-sharing octahedra have also been observed in the amorphous phases of BaTiO3 (ref. 48 and 49) and SrTiO3.50
The expected bulk density of amorphous BaZrO3 phases was reported to be in the range of 82–84% of the crystalline density (ref. 51). In our ML-driven NPT simulation of amorphous BaZrS3, the observed density was 3.94 g cm−3, which is roughly 92.5% of the crystalline density, in qualitative agreement with the lower density observed in the related oxide compound. The average bond length of the 6-fold-coordinated Zr atoms in the amorphous phase, with a value of 2.58 Å, is similar to the expected bond length of 2.55–2.56 Å in the ZrS6 octahedra of the crystalline phase. In the case of the 5- and 7-coordinated Zr–S environments, the bonds are slightly compressed or elongated compared to the crystalline phase (2.50 Å and 2.66 Å, respectively). The Ba–S bonds vary in length within a similar range to that observed in the crystalline phase, around 3.0–3.4 Å; however, as also observed in the partial RDF peak, the coordination can vary more than in the case of the Zr environment.
The relationship between the geometry of the cation environments in the amorphous and crystalline forms of BaZrS3 can be observed in the angle distribution function (ADF) plots for the S–Zr–S and S–Ba–S bond angles (Fig. 2e and f, respectively). The main ADF peaks in the case of Zr are clearly distributed around the values expected for the octahedral crystalline environment, specifically 90° angles between equatorial and axial Zr–S bonds, 180° between the axial Zr–S bonds, and about 150° for the bonds connecting the octahedra in the orthorhombic crystal structure. The distribution is harder to assess for the S–Ba–S bond angles, both due to the wider range of bond angles observed for higher coordination geometries, and due to the greater variety of CNs present in the amorphous phase. Given the undercoordination of the cations compared to the crystalline phase, there is an observed increase in close S–S contacts in the amorphous structure (Fig. S4).
We note that structural properties of the amorphous phase are of interest not only to provide insight into the atomic environments found in the as-deposited precursor to the polycrystalline material, but additionally to reveal possible structure–property relationships in amorphous or surface-amorphised perovskites which have shown good performance as electro- or photoelectrocatalysts in the case of oxide perovskite materials.52 The presence of dangling bonds from undercoordinated atoms has been suggested as a possible reason for the efficiency of perovskites with amorphised surfaces during electrochemical processes.52 Further experimental work could determine whether electrocatalytic surface reconstruction occurs in BaZrS3 and therefore whether this could explain the performance of the material in electrocatalysed oxygen or hydrogen evolution reactions, where the reactant binds to undercoordinated Zr sites.53,54 In the long run, knowledge of the coordination environments in the amorphous precursor phase might aid in the development of higher-quality crystallised materials with fewer defects.
For most GB systems studied, the MLIP's prediction has a minimum accuracy of 0.10 J m−2 relative to DFT values. This is within 0.07 J m−2 of the errors obtained in a study by Ito et al. for an ML potential specialised on grain-boundary structures.57 The maximum error is obtained for the GB with a misorientation angle of 1.5°, which is 0.18 J m−2 off the DFT value. Overall, the potential is also able to capture the relative stabilities of the different GB systems, with the exception of the GB with an angle of 5.5°, where the energy prediction incorrectly identifies it as higher in energy than its relative ground-truth value. These errors could be addressed by adding the structures of interest to the dataset or changing the weightings of relevant configuration types such that the potential is more specialised on the region of interest (Fig. S5). As noted, it has been previously shown that ML potentials can be specifically trained on different GB structures.57,58 However, such targeted training is beyond the scope of the current work.
An example of the successful relaxation of the expected CSL Σ31[001] per (001) GB system is shown in Fig. 3b. The relaxation of an approximately 1000-atom GB structure, at the limit of what is achievable using DFT methods, is achieved within seconds using the MLIP, while systems of up to hundreds of thousands of atoms can be relaxed on the timescale of minutes on a 128-core CPU compute node.
000 atoms (Fig. 3c). The polycrystalline unit cell with 6 randomly-oriented grains was generated by Voronoi tessellation in Atomsk59 and relaxed with the potential trained on the full dataset, to avoid unphysical close contacts between atoms (see SI for details). The extrapolation grade based on the D-optimality algorithm is an established method for measuring the uncertainty of an ACE ML potential in a particular region of the configurational space being modelled.57,60 We found that the MLIP is able to relax the polycrystalline unit cell, resulting in a structure with extrapolation grades up to a maximum value of 2, which is considered accurate, as described in ref. 60 (see Fig. S6). This result suggests that the potential can predict the atomistic structure with relatively high certainty, including in the boundary region. Given the potential's performance for modelling the amorphous phase, its transferability to the polycrystalline system is unsurprising, as we found that in the polycrystalline relaxed structure, the coordination around the Zr atom is similar to that found in the amorphous phase, but with a shorter Zr–S bond length (CNZr = 5.87 and dZr–S = 2.54 Å; see also Fig. S7). Thus, the undercoordinated Zr environment suggests that the relaxed grain boundaries are S-deficient (Fig. S8).
While the potential can efficiently relax systems exceeding 600
000 atoms, calculating scattering patterns of discrete structures remains computationally demanding,61 and so we focused our study on polycrystalline unit cells with a side length of 20 nm (≈310
000 atoms). Fig. 4 illustrates the effect that polycrystallinity has on the simulated XRD pattern and pair distribution function (PDF) of BaZrS3, which were generated using the DebyeCalculator package (ref. 61). Going from a single-crystal model shown in Fig. 4a to a polycrystalline model with 8 grains and 50 grains in Fig. 4b and c, respectively, the peaks in the simulated XRD patterns broaden—an effect primarily attributable to size variation: the size of the grains decreases when their number increases in a simulation box of constant size, as expected from applying the Scherrer equation in XRD experiments. This broadening causes the smaller, secondary peaks observed in the crystalline pattern to gradually become unresolved for the systems with a larger number of grains; the effect is detailed in Fig. S9 and S10, which compare the simulated partial XRD patterns of single-crystal and polycrystalline BaZrS3 models, showing the effect of increasing the number of grains even further.
![]() | ||
Fig. 4 Effect of grain size on the X-ray diffraction (XRD) patterns and pair distribution functions (PDFs) of crystalline BaZrS3. (a) A single-crystal 10 240-atom model (top) and its simulated PDF [G(r); bottom]. (b) A polycrystalline model with a simulation box side length of 20 nm containing 8 grains (313 449 atoms). (c) A polycrystalline model with a simulation box side length of 20 nm containing 50 grains (312 370 atoms). (d and e) Comparison between the XRD patterns and PDFs, respectively, for the simulated models and the experimental data from ref. 55. For simulating the scattering data, we employed the DebyeCalculator default parameters, except for the intensity data shown in panel (d) where Qmin and Qmax were set to 1.61 Å−1 and 5.25 Å−1, respectively, assuming the use of K-α1 radiation for the conversion to 2θ. | ||
In the PDF, size effects manifest as a more rapid dampening of the signal at high r values in the model with a large number of grains relative to the polycrystalline model with fewer, larger grains and to that of the single-crystal model. Additionally, the local structure of each model varies slightly, presumably due to differences in the grain-boundary fraction. In Fig. 4d and e, the simulated data are compared to experimental data from ref. 55. The experimental XRD and PDF patterns are described best by the polycrystalline 50-grains model, consistent with the nanoparticulate nature of the material described in ref. 55, which presents a solution-phase synthesis of plate-like, aggregated BaZrS3 nanoparticles. Similar results were also reported in ref. 62: a low-temperature synthesis of BaZrS3 nanoparticles with grain sizes of 3–5 nm. A related trend in the experimental XRD patterns can be observed when samples are progressively annealed at higher temperatures, whereby the crystal size increases.3
The present study is an early step towards the realistic modelling of thin-film photovoltaic materials. In the future, as an extension to this work, the modelling of surface structures, or amorphised surfaces, could provide further insights into the material's functionality. More generally, our work shows how polycrystalline perovskite materials can now be modelled with atomistic machine learning, having in view the realism that has already been achieved for other technologically relevant systems.30,63 The present study thus lays the groundwork for such studies involving more complex photovoltaic materials, including mixed-cation and anion perovskites, with hybrid organic–inorganic components, and for starting to approach their modelling at the length scale of real devices.
Supplementary information: details about training data, model hyperparameters, and simulation protocols. See DOI: https://doi.org/10.1039/d5ta04536c.
| This journal is © The Royal Society of Chemistry 2025 |