Peichen
Zhong
*ab,
Xinzhe
Dai
bc,
Bowen
Deng
bc,
Gerbrand
Ceder
bc and
Kristin A.
Persson
*abc
aBakar Institute of Digital Materials for the Planet, UC Berkeley, California 94720, USA. E-mail: zhongpc@berkeley.edu
bMaterials Sciences Division, Lawrence Berkeley National Laboratory, California 94720, USA. E-mail: kapersson@lbl.gov
cDepartment of Materials Science and Engineering, UC Berkeley, California 94720, USA
First published on 4th August 2025
Unconditional crystal structure generation with diffusion models faces challenges in identifying symmetric crystals as the unit cell size increases. We present the crystal host-guided generation (CHGGen) framework to address this challenge through conditional generation using an inpainting method, which optimizes a fraction of atomic positions within a predefined and symmetrized host structure to improve the success rate for symmetric structure generation. By integrating inpainting structure generation with a foundation potential for structure optimization, we demonstrate the method on the ZnS–P2S5 and Li–Si chemical systems, where the inpainting method generates a higher fraction of symmetric structures than unconditional generation. The practical significance of CHGGen extends to enabling the structural modification of crystal structures, particularly for systems with partial occupancy or intercalation chemistry. The inpainting method also allows for seamless integration with other generative models, providing a versatile framework for accelerating materials discovery.
New conceptsGraph neural network-based diffusion models suffer from locality bias, generating reasonable local environments but failing to propagate long-range crystallographic order. We demonstrate that crystal-host guided inpainting generation (CHGGen) can mitigate this issue. The inpainting method is a conditional generation originally developed in computer vision for context-based image generation. In crystal structure prediction problems, it optimizes atomic positions within symmetrized host structures rather than generating complete structures unconditionally. Our approach achieves higher symmetry compared to unconditional methods, particularly for polyanion systems. Beyond structure prediction, CHGGen enables structural modification of materials with partial occupancy or intercalation chemistry. By integrating with foundation potentials for structure optimization, CHGGen provides a modular, practical framework for accelerating materials discovery across diverse chemical spaces. |
Recent advances in graph neural network (GNN)-based machine learning models have introduced promising alternatives to traditional CSP methods, with a key milestone being the development of foundation potentials, or universal machine learning interatomic potentials–offering accurate and transferable modeling across diverse material systems.11–14 These foundation potentials trained on millions of DFT calculations demonstrate remarkable generalizability in exploring vast chemical spaces for materials discovery.15–18 Another emerging direction is deep-generative models, particularly diffusion models, which learn the data manifold or probabilistic distribution and generate new configurations via stochastic or variational approaches.19–21 Xie et al.22 introduced CDVAE that uses a variational autoencoder to sample lattice parameters and compositions and a diffusion model to optimize atomic coordinates. Although promising, CDVAE-generated structures are predominantly thermodynamically unstable or lack symmetry.23,24 Kurz et al.25 introduced a wrapped normal distribution to effectively couple lattice diffusion with fractional coordinates, a strategy that was successfully implemented in DiffCSP.25–27 Zeni et al.28 further adapted the scheme with edge features for lattice scores in the MatterGen framework that enables diffusions on lattices, fractional coordinates, and chemical species. By learning from materials datasets such as Materials Project (MP)29 and Alexandria,30 MatterGen is capable of conducting scalable and universal exploration in high-dimensional design space and achieves excellent performance in structural stability, uniqueness, and novelty, albeit with an observed limitation to a smaller scale (e.g., Natom ≤ 20). Beyond GNN-based models, other approaches such as U-net-based diffusion models,31 optimization of subcell structures from amorphous configurations,32 and large language models33,34 have shown promise in crystal structure generations without the need to limit structure sizes.
In this work, we extend GNN-based diffusion models to enable fractional crystal structure design via inpainting generation—that is, given a host or substrate structure, we optimize the placement of additional ‘guest’ atoms within the existing framework. This application is particularly valuable in several material domains, e.g., defective materials, intercalation electrodes,35 molecular absorption on catalyst surfaces,36 and interfacial solid reactions where surfaces reconstruct while bulk structures remain unchanged.37 We first summarize the fundamentals of diffusion models and inpainting generation, and discuss the locality bias of GNN-based diffusion models, particularly as a key challenge when generation is performed at large scales. To address these gaps, we introduce crystal host-guided generation (CHGGen), which integrates inpainting generation based on symmetrized frameworks and a foundation potential for structure optimization. We demonstrate the effectiveness of host-guided generation through a case study on CSP within the ZnS–P2S5 chemical space, and showcase the broader applicability of CHGGen across the continuous chemical space of the Lix–Si alloy system. Finally, we discuss the limitations and potential opportunities of applying CHGGen with state-of-the-art generative models for CSP problems in future directions.
can be achieved by modeling the gradient of the log-probability density, known as the score function ∇xlog
p(x) in diffusion models. Song et al.38 demonstrated that both the diffusion process and its reverse can be formulated as stochastic differential equations (SDE)| dx = f(x,t)dt + g(t)dw, | (1) |
dx = [f(x,t) − g2(t)∇xlogpt(x)]dt + g(t)d , | (2) |
represent the standard Brownian motion process and its time-reversed analogue, respectively. f(x,t) is the drift coefficient and g(t) is the diffusion coefficient of x(t). pt(x) denotes the probability density of x(t). Here t is the time variable t∈ [0,T] to describe the diffusion process {x(t)}Tt=0.
Eqn (1) describes the forward process to corrupt the data distribution x(0) ∼ p0(x) to obtain the prior distribution x(T) ∼ pT(x), which follows a uniform distribution. Eqn (2) describes the reverse process to sample x(0) by solving the reverse SDE with the score term ∇xlog
p(x). For crystal structure generation, we adopt the variance-exploding (VE) diffusion scheme for the atomic coordinates, where the process {x(t)}Tt=0 is given by the SDE
![]() | (3) |
The samples can be generated using ancestral sampling, where successive states are sampled according to:
![]() | (4) |
, and
. In the continuous limit,
. The implementation is achieved using a predictor-corrector sampling strategy with the Langevin corrector. We refer the readers to ref. 38 for mathematical details of score-based SDE and sampling strategies.
pt(x), we use score matching (SM) to optimize the model parameters θ* by minimizing![]() | (5) |
represents the expectation value with respect to the probability distribution pt(x), which can be approximated by a Gaussian transition probability p(x(t)|x(0)) ∝ e−[x(t) − x(0)]2/2σ2 such that eqn (5) is formulated as denoising score matching (DSM) with39,40![]() | (6) |
. In the training process, σ is sampled uniformly from the interval [σmin,σmax] to perturb the configuration x(0) and obtain the noisy configuration x(t) to construct the DSM loss in eqn (6).
Unlike training a certain distribution of the mask, Lugmayr et al.43 introduced the repaint algorithm (inpainting + resampling) for high-quality 2D image inpainting using diffusion models. One can simply train the diffusion model with DSM to learn the joint distribution. During inference, the conditional distribution is approached using the resampling technique for inpainting generation. As shown in the Algorithm, in addition to the unconditional generation steps, (the resampling repeatedly “jumps back” in the diffusion process and resamples the unknown regions multiple times with r steps) at each timestep t, with a mask m to separate the host and guest atoms. This resampling procedure helps harmonize the generated content with existing regions by allowing multiple attempts at generating coherent inpainted content. For detailed implementation and theoretical foundations, we refer readers to ref. 43 and 44 for details of this approach.
Fig. 1 illustrates the iterative sampling procedure for inpainting generation. During each reverse diffusion step, the process follows several distinct stages: first, the atoms in the host structure are perturbed using Gaussian noise determined by the noise scheduler of the subsequent step σt−1 (process A):
![]() | (7) |
![]() | (8) |
Through this iterative reverse diffusion process, the noise scale {σt} gradually decreases, resulting in a final crystal structure that is closely aligned with the original host structure with minimal deviation (e.g., σmin = 0.001 Å). The positions of the guest atoms are therefore determined by the distribution conditioned on the host structure.
In the following sections, we first examine the locality bias encountered when generating structures with large unit cells through unconditional generation. These insights led to the development of the CHGGen framework. We demonstrate the effectiveness of CHGGen on two example chemical systems: Zn–P–S and Li–Si, which are complemented by CHGNet as a foundation potential for iterative structural optimizations and thermodynamic stability screening. Additionally, we also demonstrate example studies of CSP of 16 compositions used in DiffCSP27 and solid–solid interface in SI.
We evaluated local coordination environments using the LocalGeometryFinder toolkit45 and classified the local environments based on coordination numbers. In Fig. 2(e), P atoms predominantly occupy tetrahedral sites (4-coordinated), consistent with known Li–P–S crystal structures.46 In contrast, Li exhibits a broad distribution of coordination numbers, with peaks at 5-fold (∼40%), 4-fold (∼25%), and 6-fold (∼30%) geometries. Notably, these coordination statistics qualitatively align with patterns observed in the MP training dataset (Fig. 2(f)), where P atoms maintain rigid tetrahedral coordination while Li atoms display more variable environments.
Based on the successful learning and reconstruction of local distribution from the generative model, we hypothesize that the failure to propagate long-range order in generated structures stems from two interrelated factors: (1) locality bias in GNNs: while the model effectively captures short- to medium-range atomic correlations, its finite receptive field constrains the learning of global crystallographic patterns. (2) Stochasticity in reverse diffusion: the stochastic differential equation for reverse diffusion processes inherently samples from a learned distribution of the entire dataset. Without coupling the atomic arrangements and supercell parameters, the diffusion process tends to sample from the entire distribution of the dataset in the generated structures, rather than from a narrowed distribution in specific crystal systems. Consequently, structures with large unit cells manifest as “mosaics” of local structure motifs rather than coherent crystalline structures. These limitations may be universal even with lattice diffusion, as the GNN architecture lacks explicit mechanisms to maintain long-range crystallographic order when featuring the atomic configurational space.47
Fig. 3 illustrates the CHGGen computational workflow. The process begins with sampling various Bravais lattices at a fixed volume through a random search over lattice constant ratios and angles (see Methods in SI). The unit cell volume is determined as N × V0, where N represents the number of atoms and V0 denotes the atomic volume. The V0 can be initialized either from related crystalline phases or predicted by composition-based regression models.48 This atomic volume serves as prior information subject to optimization in subsequent steps. Following lattice determination, fractional coordinates for all atoms are initialized with random numbers drawn from
. The diffusion process then proceeds by solving the reverse SDE using scores predicted by the SE(3)-GNN (unconditional generation). Given that the volumes and lattices of the generated structures are drawn from simple priors and random search, the CHGNet is employed for structure relaxation to optimize both unit cells and atomic coordinates. This process represents a well-defined local energy minima search task and does not suffer from the locality bias encountered during the diffusion process.
The next phase is initiated by removing atoms that exhibit broad local environment distributions (e.g., Li). The remaining structure (framework) undergoes symmetry refinement using spglib through incremental structural matching tolerance to obtain a space group with higher symmetry (i.e., until the space group is not P1). Since the guest atoms exhibit diverse local environment distributions, refinement without them is more feasible for obtaining the symmetric structure. The fractional coordinates of the removed guest atoms are then reinitialized from
within the symmetrized framework, and inpainting generation is performed using masks m and (1 − m) for the guest and framework atoms, respectively.
The inpainting-generated structures are further relaxed using CHGNet and structure refinement is performed with a small tolerance to obtain the space group. The CHGNet-calculated energy for the relaxed structure is used to determine the decomposition energy Ed relative to the MP phase diagram at the GGA/GGA+U level of accuracy. Finally, structures with Ed within a specified threshold (e.g., Ed < 0.1 eV per atom) are submitted for DFT calculations to obtain more accurate thermodynamic stability assessments. In our studies, we used the r2SCAN functional to evaluate the DFT decomposition energy against the MP r2SCAN phase diagram.
We focused on the CSP of ZnP2S6 and Zn2P2S7 using CHGGen. To assess the local stability of the generated structures, we evaluated the structural and energetic differences between the initially generated structures and their CHGNet-relaxed counterparts. In Fig. 4(a) and (b), we present the energy and geometrical differences between the relaxed structures and generated structures. Most of the generated structures exhibit energy changes of ΔE < 0.1 eV per atom, with a median value of 0.08 eV per atom. The geometric differences, quantified by maximum pair-wise root-mean-squared distance (RMSD), show a median value of 0.10 Å between relaxed and generated structures. As illustrated in Fig. 4(a) and (b), outliers with large energy changes correspond to RMSD values exceeding 0.3 Å, indicating that most of the generated structures are close to the local minima and can be reasonably searched using foundation potential structure relaxation.
Fig. 4(c) presents the CHGNet-predicted decomposition energies (Ed) to quantify the thermodynamic stability. The distribution shows a median Ed of 0.07 eV per atom, indicating that the majority of generated structures are metastable in this chemical space, while 6.5% of the generated structures are identified as stable with Ed < 0 at the CHGNet level of accuracy.
After r2SCAN-DFT calculations and reevaluation of Ed with respect to the MP-r2SCAN phase diagram, the structure with negative Ed was confirmed to be metastable. This discrepancy can be attributed to CHGNet's approximate 30 meV per atom prediction error, which may alter the predicted phase stability when Ed values are small. As illustrated in Fig. 4(d) and (e), while CHGNet demonstrates good overall agreement with DFT, it systematically underestimates the decomposition energy for structures with high values (Ed > 0.1 eV per atom), consistent with the known softening effect of foundation potentials.50 This finding suggests that practitioners could consider using a lower threshold (e.g., Ed < 30 meV per atom) when screening generated structures for DFT validation.
To evaluate how diffusion models perform in generating symmetric crystal structures, we define a simple metric symmetry success rate as the fraction of generated structures possessing space groups with higher symmetry than P1 or P
after relaxation and refinement (see Methods in SI). Fig. 4(f) compares the symmetry success rates from unconditional (blue) and inpainting (orange) generations. The inpainting approach uses the refined P–S frameworks to achieve significantly higher success rates compared to the unconditional ones (<5%), which highlights the effectiveness and practical advantages of conditional generation with structural priors. The structures with the lowest-Ed for ZnP2S6 and Zn2P2S7 are illustrated in Fig. 4(g). While the generated structures are predicted to be metastable by DFT calculations, this example demonstrates CHGGen's potential for exploring crystal structures in the chemical space that is currently absent from existing databases.
To demonstrate the effectiveness of identifying low-energy polymorphs, we performed structure generation for Li-rich phases (LixSiy, where x > y, a composition range known to contain many stable phases52) following the workflow outlined in Fig. 3 (see Methods in SI). After CHGGen generation and r2SCAN-DFT calculations, we constructed the phase diagram using DFT formation energies from both MP structures and the generated structures. In Fig. 5(a), green circles represent new stable polymorphs identified by CHGGen, while blue squares indicate the reported stable MP structures. We calculated decomposition energies using the MP phase diagram (without the on-the-hull structure of Li5Si2). Negative values of Ed therefore indicate compounds that break the existing MP convex hull. Notably, CHGGen successfully predicted Li5Si2 (R
m, Ed = −7 meV per atom) as a thermodynamically stable polymorph beyond the known MP stable structures (Fig. 5(b)). Interestingly, this structure had also been identified through previous studies, including work by Tipton et al.53 using genetic algorithms and Morris et al.54 using random structure search coupled with DFT calculations.
Using generative models, one can also identify metastable polymorphs with low decomposition energies in related compositions. For example, structures for compositions corresponding to Li5Si2 (Ed = −0.006 eV per atom), Li2Si (Ed = 0.003 eV per atom), and Li4Si (Ed = 0.010 eV per atom) with C2/m space group are shown in Fig. 5(c). These low-energy polymorphs with different local structure motifs provide a more detailed mapping of the potential energy landscape, which benefits the training of MLIPs to understand the Si network aggregation and its effect on Li transport kinetics.55 The case study illustrates how practitioners can benefit from combining diffusion models and foundation potentials to explore continuous unknown chemical spaces, such as those encountered in alloy design.
Based on this concept, diffusion models with GNNs represent a logical framework for generating reasonable local structure motifs (e.g., atomic bonding patterns), which proves valuable when optimizing local atomic arrangements in CSP problems. Nonetheless, their inherent locality bias limits their ability to capture long-range periodic orders. Gong et al.47 revealed that state-of-the-art GNNs fall short of accurately capturing the periodicity of crystal structures, i.e., lattice parameters. This fundamental limitation explains the diminished performance of GNN-based diffusion models when generating structures with large unit cells, where long-period crystallinity is significant but hard to capture. This locality bias is exacerbated for species that can adapt to diverse local environments (e.g., Li), which have a broad distribution of stable coordination geometries. The unconditional generation defaults to producing a disordered “mosaic” of these competing local motifs rather than a coherent crystal structure.
The practical advantage of host-guided generation with inpainting is to augment the symmetric structure generation. To evaluate this, we compared the success rates of obtaining symmetric crystals across different approaches in Zn–P–S and Li–Si chemical spaces: MatterGen (green), CHGGen with unconditional generation (blue), and CHGGen with inpainting generation (orange). MatterGen demonstrates superior performance compared to our baseline model for Li–Si alloys (Fig. 6(b)), highlighting the importance of explicitly modeling lattice diffusion in conjunction with atomic arrangements. However, for more complex systems such as Zn–P–S that contain polyanions, MatterGen performs less effectively when the number of atoms exceeds 10, with success rates lower than CHGGen with inpainting generation. This highlights the importance of inpainting generation when dealing with chemical systems involving polyanions. Notably, MatterGen exhibits declining success rates as system size increases in both cases, suggesting that it is likely all GNN-based diffusion models face scalability challenges. This scalability constraint potentially limits their ability to predict complex structural prototypes (e.g., NASICON-type frameworks) across various solid-state materials.
Although CHGGen has not yet achieved industry-level performance as MatterGen in stability or general symmetry success rate, it shows clear improvements in generating symmetric crystal structures compared to unconditional generation methods. Assessing crystallographic symmetry in generated structures is critical for CSP, as discovering new structural prototypes is a key step toward materials discovery.23 These prototypes can subsequently guide elemental substitution strategies15 or evolutionary searches62 for compositional optimization. The inpainting-based generation approach samples from the conditional distribution of unknown structural components within a given framework, yielding more well-defined local atomic arrangements than de novo sampling from a fully unconstrained distribution. Importantly, this inpainting strategy is simple to implement, as both conditional and unconditional generation operate within the same unified model—distinguished only by the masking strategy used during the reverse diffusion process. This modular design enables seamless integration with existing foundational generative models and provides a flexible mechanism for enforcing symmetry constraints during structure generation.
In addition, we highlight the practical significance of CHGGen for the structural modification of materials, which often relys on existing database structures. A notable example is the superionic conductor Li0.388Ta0.238La0.475Cl3, discovered through lithiation of the LaCl3-type host structure with additional aliovalent substitution.63 CHGGen framework offers a probabilistic approach to such design task that circumvents the need for topological analysis64 or additional inputs such as DFT-derived charge densities,65 which are often computationally prohibitive or not universally applicable.
Finally, while our framework demonstrates useful augmentation to diffusion-based generative models, the current symmetry refinement approach remains preliminary as it relies on spglib by simply increasing the tolerance threshold. As a proof-of-concept, this method predominantly yields structures with moderate symmetry (e.g., C2, Cm in monoclinic systems), which may limit the discovery of novel structure prototypes (see Fig. S1 and S2). Looking forward, several promising approaches have emerged for novel framework generation, including symmetry-constrained diffusion24 and prototype-based generation using Wyckoff position-based representations.66–68 The integration of these advanced symmetry handling strategies with CHGGen could enhance the discovery of crystal structures particularly with intercalation chemistry.
In summary, we present CHGGen as an integrated framework that combines unconditional and inpainting generation with foundation potential optimization for CSP. While challenges in scaling complexity persist, the inpainting method provides a useful approach for generating symmetric crystals and incorporating intercalants into existing database structures. We anticipate broad adoption of this framework as the modular design of the inpainting methodology enables seamless integration with emerging diffusion models, which will ultimately accelerate materials discovery across diverse chemical spaces.
The SI includes methods and supporting results for CHGGen-based structure generation. See DOI: https://doi.org/10.1039/d5mh00774g
| This journal is © The Royal Society of Chemistry 2025 |