Open Access Article
Takuya Taniguchi
*a and
Ryo Fukasawa
b
aCenter for Data Science, Waseda University, 1-6-1 Nishiwaseda, Shinjuku-ku, Tokyo 169-8050, Japan. E-mail: takuya.taniguchi@aoni.waseda.jp
bGraduate School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
First published on 13th October 2025
Predicting the crystal structures of organic molecules remains a formidable challenge due to intensive computational cost. To address this issue, we developed a crystal structure prediction (CSP) workflow that combines machine learning-based lattice sampling with structure relaxation via a neural network potential. The lattice sampling employs two machine learning models—space group and packing density predictors—that reduce the generation of low-density, less-stable structures. In tests on 20 organic crystals of varying complexity, our approach achieved an 80% success rate—twice that of a random CSP—demonstrating its effectiveness in narrowing the search space and increasing the probability of finding the experimentally observed crystal structure. We also characterized which molecular and crystal parameters influence the success rate of CSP, clarifying the effectiveness and limitation of the current workflow. This study underscores the utility of combining machine learning models with efficient structure relaxations to accelerate organic crystal structure discovery.
Predicting the crystal structure of an organic molecule is challenging, due to the weaker atomic interactions unique to organic crystals.8 Unlike inorganic crystals, which often rely on stronger bonds, organic crystals are stabilized by relatively weak intra- and inter-molecular interactions such as van der Waals forces, hydrogen bonds, and π–π stacking. Even minor variations in these interactions can give rise to entirely different crystal structures, making accurate prediction difficult. In addition, many organic molecules exhibit considerable conformational flexibility because of rotatable bonds, significantly increasing the number of possible configurations.9 Even for relatively rigid molecules, identifying the global energy minimum is still computationally intensive. Consequently, the interplay between molecular flexibility and subtle intermolecular interactions renders the accurate prediction of organic crystal structures a challenge.
In general, CSP can be divided into two stages: structure generation (or exploration) and structure relaxation. To address the challenges in these stages, many CSP workflows have been proposed, as tackled in the series of CSP blind tests.10 In the structure generation (exploration), the quasi-random method, genetic algorithms, particle swarm optimization, and Bayesian optimization have been developed.11–18 The quasi-random approach stochastically arranges the lattice parameters, as well as the positions and orientations of molecules, to cover the search space and explore a broad range of possible crystal structures.11 However, it yields a large number of candidates, many of which are less dense and less stable. Genetic algorithms and Bayesian optimization aim to identify global minimum through iterative active sampling.12–17 Genetic algorithms work by modifying or combining local optimized structures, while Bayesian optimization speculates a black-box function by regression and performs active sampling based on an acquisition function. Although both methods are expected to find a global minimum after many rounds or iterations, they also produce numerous less dense and less stable structures in earlier rounds. Recently, researchers have reported a CSP approach using a generative adversarial network (GAN) to produce more realistic crystal structures.19 While this method is innovative, the optimization logic of GANs can be difficult to interpret, and the technique may be limited to specific molecular families or crystal systems that have sufficient training data.
Regarding structure relaxation, conventional approaches typically rely on force fields or density functional theory (DFT) calculations. Force fields enable rapid structural relaxation, but their accuracy may not match that of DFT. In contrast, DFT calculation affords more accurate results depending on calculation level, but is computationally expensive, time-consuming, and requires extensive computational resources. In recent years, neural network potentials (NNPs) trained on DFT data have gained attention for achieving near-DFT-level accuracy at a fraction of the cost.20–26 For organic crystals, some pre-trained base models such as PFP and ANI have demonstrated efficacy and can, in some instances, surpass quantum chemical methods in accuracy.27,28 NNPs can also be fine-tuned for specific systems by additional training, making them highly versatile.29–31 Consequently, NNPs are increasingly used to filter or rank candidate structures within CSP workflows, offering a promising balance between computational efficiency and accuracy.
Although a variety of CSP methods have been proposed as described, there is still a need for approaches that reduce the generation of less dense, less stable structures to improve the efficiency of CSP. Indeed, leveraging predicted density or volume to guide the search is a recognized strategy to enhance efficiency. For example, the recently developed low-energy region explorer (LoreX) predicts an optimal cell volume from fundamental atomic properties to constrain the initial sampling space.32 This constrain-then-sample approach is highly effective for inorganic systems. Our work builds on this concept but adapts it specifically for organic molecules by employing a different sample-then-filter strategy. We use molecular fingerprint to predict space groups and a target crystal density. The predicted density is then used as a criterion to filter randomly sampled lattice parameters, accepting or rejecting them prior to crystal structure generation. This approach is tailored to capture how the unique functional groups of organic molecules influence crystal packing, and could be combined with genetic algorithms or Bayesian optimization for a synergistic effect. It is also pivotal to investigate the effectiveness of NNP for organic crystals for advancing organic CSP. In this study, we developed a workflow, named SPaDe-CSP, that leverages Space group and Packing Density predictor (SPaDe) to decrease the production of low-density, unstable structures, followed by structure relaxation via NNP (Fig. 1). Specifically, we narrowed the search space by predicting space group candidates and crystal density. To clarify which processes were key to CSP success, we evaluated the performance of these machine learning models using a representative molecule. We also examined the generalizability of this workflow on a validation dataset and assessed how SPaDe-CSP improves success rate compared to that of random-CSP, a baseline relying on random structure generation.
278. To ensure sufficient data for pattern recognition via machine learning, we restricted space groups containing more than 100 entries, resulting in 32 space groups. The data size after this space group filtering was 169
656, and this dataset was used for ML. This ML-dataset covers 99.6% of the filtered search result.
:
2. Two machine learning models were constructed for space group prediction and density prediction, trained by the training subset and evaluated by the test subset. For both predictions, MACCSKeys was used as molecular fingerprint for the interpretation of the ML result. LightGBM, random forest, and neural network was compared for the ML model. As the loss functions, we used cross-entropy loss for space group prediction and L2 loss for density prediction. Since ML classifier output the probabilities of 32 classes, we set the probability threshold to filter the space group candidates. We evaluated the accuracy and the number of space group candidates in the threshold range of 1 × 10−10 and 0.5 using test subset. For the prediction of crystal density, regression models of LightGBM, random forest, and neural network was compared as well. The prediction ability was evaluated by mean squared error (MAE) and determination coefficient R2. The molecular fingerprint and the ML training are implemented using rdkit and scikit-learn packages in Python.
In the structure generation of random-CSP, when a molecular structure is provided, the PyXtal's function ‘from_random’ generates crystal structures until 1000 valid structures are generated. The space group is randomly selected from among 32 candidates for each iteration.
In the structure generation of SPaDe-CSP, the SMILES string is converted to a MACCSKeys vector, and the space group candidates and crystal density are predicted by trained LightGBM models. One of the predicted space group candidates is then randomly selected, and lattice parameters are sampled within predetermined ranges of 2 ≤ a, b, c ≤ 50 and 60 ≤ α, β, γ ≤ 120. We checked whether the sampled space group and lattice parameters satisfied the density tolerance using molecular weight and Z value, and if they did, we placed the molecules in the lattice. This initial structure generation continues until 1000 crystal structures are produced for each run, and we repeat the run 10 times to evaluate the efficacy and variation of CSP for each compound.
For both CSP approaches, we use the same structure relaxation procedure. The generated structures are optimized with PFP21 version 6.0.0 at CRYSTAL_U0_PLUS_D3 mode. We employ the limited BFGS (L-BFGS) algorithm, allowing up to 2000 iterations and imposing a residual force threshold of 0.02 eV Å−1. The structural relaxations were performed using the FrechetCellFilter to simultaneously minimize both atomic forces and unit cell stresses, thereby optimizing the atomic positions and lattice parameters. Throughout this process, the FixSymmetry constraint was applied to ensure the initial space group was preserved. To quantify agreement between experimental and calculated structures, we compute the root-mean-square deviation (RMSD) of 30 molecules using the COMPACK algorithm.34 Both structure generation and optimization are implemented via the PyXtal and ASE libraries.35,36
656 (named ML-dataset). These 32 space groups comprise 99.6% of the search result, reflecting the variety of organic crystals (Fig. 2a). The most frequent space group 14 (P21/c) occupies nearly 40% of the ML-dataset. The next frequent space groups are group 2 (P
) and 19 (P212121), and the top 10 space groups account for 96.0% of the ML-dataset.
Because the Z value is determined by the space group in nearly all cases, the distribution of Z value reflects that of space groups (Fig. 2b). For example, crystals in space groups 14 and 19 always have Z = 4, resulting in the most frequent Z value. Space groups 2 and 4 correspond to Z = 2, and their combined frequency is the frequency of Z = 2. Although there are a few exceptions where Z takes a different value, in general Z value depends on the space group, so we evaluated that there is no need to predict Z value.
Crystal density, molecular weight, and lattice lengths each exhibit a single-peaked smooth distribution. Density mostly falls between 1.0 and 2.0 g cm−3, with a peak around 1.3 g cm−3 (Fig. 2c). Molecular weight peaks around 300 g mol−1 with a long tail extending to higher values (Fig. 2d). The lengths of a- and b-axes peak around 10 Å, with a longer tail on the right side (Fig. 2e and f). The length of c-axis peaks around 13 Å and shows a broader distribution than a- and b-axes (Fig. 2g). Lattice angles are often constrained to 90° by the space group symmetry, resulting in unique distributions (Fig. 2h–j). Among these, β angle has a different distribution because the triclinic and monoclinic crystal systems, which account for 72.6% of the ML-dataset, has a unique angle that is frequently larger than 90°. As angles move away from 90°, their frequency decreases in all cases.
It is important to ensure interpretability of the trained model. An examination of the top-ranking substructures in feature importance of LightGBM indicates that structural characteristics such as stereochemistry and the presence of methyl groups have strong influence (Fig. 3c and SI Fig. 1). This observation suggests that molecular conformation such as the type and number of substituents, and the ring environment serve as major determinants in classifying space groups. Because space groups describe the symmetry of crystals, factors such as substituents, ring structures, and stereochemical substructures are often critically important. Consequently, assigning high importance to these features within the model is justified.
Furthermore, Shapley additive explanations (SHAP) analysis was conducted to interpret whether these features contribute positively or negatively.37,38 Here, we present the result for the most frequent space group since positive or negative effect can be visualized in each class (Fig. 3d). Consistent with the feature importance findings, the top 10 SHAP features include the presence of multiple six-membered rings, the number of methyl groups, and oxygen-related substructures. SHAP analysis further clarifies. For instance, bit ID 143 represents a substructure in which a bond transitions from “not aromatic” to “aromatic” and then back to “not aromatic,” and it exhibits a positive contribution. Because this bit broadly captures configurations where two aromatic rings are connected by a rotatable single bond, it is hypothesized that such compounds can adopt diverse conformations upon crystallization and readily form stable packing arrangements via π–π or CH–π interactions between the aromatic rings.
Next, we performed regression of crystal density using the combination of MACCSKeys and LightGBM as well. The metrics for the training (R2 = 0.85, MAE = 0.0044) and test subsets (R2 = 0.80, MAE = 0.049) showed no significant deviation, indicating that overfitting did not occur (Fig. 4a). Since mean model, which assumed no correlation between molecular structure and density, afforded MAE = 0.125 g cm−3, the prediction accuracy was sufficient. However, a negative bias was observed, where the predicted values were higher in low-density regions and lower in high-density regions (Fig. 4b). We reasoned that the source of this bias lies in attempting to predict a crystal property solely from the molecular structure. Because crystal density is influenced by intermolecular interactions, using only molecular structure to make such predictions has inherent limitations. This negative bias also appeared in other ML models, including graph neural networks in our preliminary validation. Nevertheless, given that most errors are distributed around zero, we consider the regression performance to be acceptable.
We also interpreted the trained model using feature importance and SHAP analysis. Unlike the case of space group, substructures related to halogens had a substantial impact. Among the top 10 bits in the feature importance, four corresponds to halogen-related substructures, with the top-ranking bit indicating whether Br atom was present (Fig. 4c and SI Fig. 2). The SHAP analysis revealed that all top-ranking substructures related to halogens contributed positively, whereas other substructures contributed negatively (Fig. 4d). Halogens are generally incorporated into the molecule by substituting for hydrogen; in the ML-dataset, the average density of crystals without halogens was 1.309 g cm−3 (72.6%), whereas that of crystals with halogens was 1.514 g cm−3 (27.4%). This difference exerted the strongest influence on the density prediction. Besides halogens, substructures involved in hydrogen bonding—for example, the presence of carbonyl oxygen—also contributed positively, though to a lesser extent than halogens in the density prediction. Thus, since the combination of MACCSKeys and LightGBM effectively captured the relationship between molecular structure, space group, and density, we adopted these ML models for CSP.
We compared the CSP success rates over ten trials, each of which involved generating 1000 initial structures followed by structure relaxation on PFP. A trial was recognized as successful if at least one among 1000 relaxed structures matched with the reference structure based on root mean square deviation of 30 molecules (RMSD30). When density tolerance w = 0.1 and 0.5, the success rate was 1.0 regardless of the space group threshold (Fig. 5c). In contrast, when w = 1.0, the success rate was not 1.0 at lower thresholds but reached 1.0 at higher threshold. These results exceeded the success rate obtained when structures were generated without any ML (random-CSP). Moreover, when using only the space group prediction (i.e., without ML for density), the success rate was intermediate among those values.
When we calculated the probability of encountering a structure that matches with the reference structure, the hit probability increases as the space group threshold becomes larger and the density tolerance becomes smaller (Fig. 5d). For example, at w = 0.1 and threshold of 10−10, 5 out of 1000 generated structures matches with the reference structure on average. At threshold of 10−2, 20 out of 1000 structures matched on average. Since space group prediction narrows down space group candidates and density prediction constraints lattice parameters close to stable structures, their synergistic effect contributes to the increase of success rate and hit probability.
Typical energy-density diagrams at threshold of 10−2 verify the effectiveness of ML models and the stability of predicted structures matching with the reference structure. Random-CSP of the first trial among 10 trials did not find a matched structure, while SPaDe-CSP with any density tolerance found several structures matching with the reference (Fig. 5e). The structure with the lowest potential energy matches with the reference structure, and this verifies the adaptability of PFP. The number of structures in the region of high dense and low energy structures increased depending on narrowing the density tolerance, leading to the increase of hit probability.
Here, it is important to understand how density constraint works on the structure relaxation. Comparing the crystal densities of the initial unrelaxed structures with those after structure relaxation, the density difference in each density tolerance at threshold of 10−2 showed characteristic distributions (Fig. 5f). The density difference of random-CSP distributed positive region, which means that relaxed structures are more dense than initial unrelaxed structures. This is because initial structures are generated by relatively large lattice parameters and then optimized to more dense structure through structure relaxation. The distribution at w = 1.0 showed similar character. In contrast, the density difference at w = 0.5 is distributed across both positive and negative values. This indicates that overly dense unrelaxed structures were generated and then became less dense structures through structural relaxation. The density difference at w = 0.1 is distributed in more negative values, indicating that too dense unrelaxed structures are generated. It is estimated that an overly dense structure is in a steep region of the potential energy surface due to intermolecular repulsion, so it would require fewer iterations for structure relaxation than a less dense structure. Indeed, setting a smaller density tolerance shortened the optimization time for each structure on average (SI Fig. 3). On the other hand, a smaller density tolerance restricts the acceptance criteria, necessitating more time to generate valid structures. Consequently, striking a balance between these effects led us to select w = 0.5 as the most appropriate tolerance value, and we employed this setting in the subsequent validation.
| Entry | CSD code | M (g mol−1) | Nrot | SG | Z | NDoF | V (A3) | Random-CSP | SPaDe-CSP |
|---|---|---|---|---|---|---|---|---|---|
| 1 | MEYCIC | 66.1 | 0 | Pbca | 8 | 3 | 860.727 | 0.2 | 0.5 |
| 2 | IMAZOL15 | 68.1 | 0 | P21/c | 4 | 4 | 355.508 | 0.1 | 0.9 |
| 3 | MOTLAL | 99.1 | 1 | P21/c | 4 | 4 | 467.487 | 0 | 0.1 |
| 4 | CEBYUD | 136.1 | 0 | P32 | 3 | 2 | 426.65 | 1 | 1 |
| 5 | NISNAE | 150.1 | 0 | P21 | 2 | 4 | 335.252 | 0.3 | 1 |
| 6 | BAQBUR | 194.2 | 0 | P1 | 1 | 6 | 246.267 | 1 | 1 |
| 7 | FAMDUS | 263.8 | 0 | P1 | 1 | 6 | 355.334 | 0.7 | 1 |
| 8 | LOMPUY | 268.3 | 4 | P21/c | 4 | 4 | 1484.313 | 0 | 0.5 |
| 9 | WURVEM | 279.2 | 1 | P21/c | 4 | 4 | 1254.9 | 0 | 0.3 |
| 10 | CINYOO01 | 279.4 | 4 | P21/c | 4 | 4 | 1439.328 | 0 | 0.1 |
| 11 | COCAIN10 | 303.4 | 2 | P21 | 2 | 4 | 807.479 | 0.1 | 0.7 |
| 12 | HUFXAH | 306.8 | 3 | P![]() |
2 | 6 | 813.864 | 0.2 | 0.2 |
| 13 | JOLLUT | 321.4 | 6 | P21/c | 4 | 4 | 1739.135 | 0 | 0.1 |
| 14 | GEZPIK | 330.4 | 6 | P21/c | 4 | 4 | 1641.079 | 0 | 0 |
| 15 | XILPAN | 344.4 | 1 | P21/c | 4 | 4 | 1859.966 | 0 | 0.1 |
| 16 | BESLOE | 361.4 | 0 | Pbca | 8 | 3 | 3996.051 | 0 | 0.5 |
| 17 | HETTUZ | 368.4 | 5 | P21/c | 4 | 4 | 1750.583 | 0 | 0 |
| 18 | SIKFIB | 382.3 | 5 | P21/c | 4 | 4 | 1982.361 | 0 | 0 |
| 19 | QEVWUJ | 420.4 | 8 | P21/c | 4 | 4 | 1978.025 | 0 | 0.2 |
| 20 | CIDTAN | 440.5 | 4 | P21/c | 4 | 4 | 2272.42 | 0 | 0 |
The first category includes entry numbers 4 and 6 (CEBYUD and BAQBUR), which resulted in a success rate of 1.0 using both random- and SPaDe-CSP (Table 1). A key factor in this category is that these crystals either have fewer lattice degrees of freedom or have smaller lattice size. CEBYUD belongs to the space group P32, giving only two degrees of freedom for the lattice (lengths of the a- and b-axes). Even with random-CSP, once P32 is selected from 32 space group candidates, it is more likely to produce an initial structure that converges to the correct local minimum due to low degrees of freedom for the lattice. Random- and SPaDe-CSP showed similar distributions in the energy-density diagram, while SPaDe-CSP yielded slightly more stable high-density structures (Fig. 7a and b). The distribution of initially generated space groups confirms that SPaDe-CSP effectively narrowed the candidates including the experimentally observed one, whereas random-CSP sampled a wider range (Fig. 7c). In addition, SPaDe-CSP preferentially generated structures closer to the experimental density, and the difference between initial and optimized densities was smaller compared to random-CSP (Fig. 7c). These observations highlight that even when the success rate is saturated, the predictors in SPaDe-CSP improve the quality of the initial structural pool, thereby reducing unnecessary optimization steps.
The other crystal BAQBUR belongs to the space group P1 and has a small unit cell due to Z = 1. A small unit cell reduces the complexity of the search space. This leads to fewer local minima on the potential energy surface, making it easier to converge on the global minimum (SI Fig. 4). Furthermore, a P1 crystal have enough flexibility to represent the same crystal in multiple lattice sets due to six degrees of freedom for lattice. Consequently, the high success rate of CSP for BAQBUR can be attributed to both its small unit cell and the characteristics of the P1 space group.
The second category includes systems for which SPaDe-CSP achieved higher success rates than random-CSP (Table 1). Although the degree of improvement varies, compounds that had higher success rates of random-CSP tend to show higher success rates with SPaDe-CSP. In addition to the model molecule used for the hyperparameter study, 12 other molecules fell into this category (entry numbers 1, 2, 3, 5, 7, 8, 9, 10, 11, 13, 15, 16, 19). Crystals in this category tend to have limited lattice degrees of freedom or moderate unit cell sizes. In a case, FAMDUS has space group P1 and a larger molecular weight than BAQBUR. This demands a larger cell and thus more possible combinations of lattice parameters, decreasing the success rate in random-CSP. With SPaDe-CSP, however, its success rate rose to 1, presumably because space group prediction narrowed down the candidates from 32 to 15, and density prediction preferentially filtered lattice combinations near the stable structure.
In this category, two crystals NISNAE and COCAIN10 belongs to the space group P21. In both cases, the success rate of SPaDe-CSP increased more than double compared to random-CSP. The key difference between them lies in their molecular weights: COCAIN10 (303.35 g mol−1) is larger and more structurally complex than NISNAE (150.13 g mol−1), making CSP more challenging. Accordingly, NISNAE, which had a higher success rate in random-CSP, also showed a higher success rate under SPaDe-CSP.
Another noteworthy example with strong SPaDe-CSP effects is BESLOE and MEYCIC, which belong to the space group Pbca. This space group is attributed to the orthorhombic crystal system, and has all angles fixed at 90°. This is why only the three lattice lengths can change once the correct space group is selected. Since this space group corresponds to Z = 8, the unit cell is relatively large than cells with more frequent Z = 2 and 4, resulting in more possible lattice combinations. Random-CSP of BESLOE afforded relatively low success rates for these crystals because the difference between the initial and optimized densities of the random-CSP is distributed near zero, which means that the loose initial structures hardly became dense through structural relaxation (Fig. 7d and f). With SPaDe-CSP, the success rate improved to 0.5 by increasing the probability of generating more dense initial structures (Fig. 7e and f). MEYCIC resulted in higher success rate than BESLOE probably due to smaller molecular weight (SI Fig. 4).
The final category comprises structures for which SPaDe-CSP did not improve the success rate even though there is room for improvement. This category contains 5 structures in total, one belonging to P
and four belonging to P21/c (entry numbers 12, 14, 17, 18, and 19). HUFXAH which belongs to P
has six degrees of freedom for lattice and Z = 2, showing a success rate of 0.2 under both random- and SPaDe-CSP. When compared with crystals of space group P1 (BAQBUR and FAMDUS) which achieved higher success rates, HUFXAH requires twice the unit cell volume of those other systems due to Z = 2. Even though ML narrows down the space group candidates, six lattice degrees of freedom and a larger cell would lead to unchanged success rate with SPaDe-CSP.
The other four structures resulted in success rates of 0 with both random- and SPaDe-CSP. It is sure that SPaDe-CSP improved to generate more stable high-density structures than random-CSP, while the matched structure based on RMSD30 was not obtained (Fig. 7g–i and SI Fig. 4). This insists that molecular arrangement was not matched with the reference even when the lattice was sufficiently similar. This is probably because the inter-sectional effect of degree of freedom for lattice and lattice/molecular size made it difficult to match with the reference structure.
To quantify the factors that determine the success or failure of CSP, we investigated which parameters are correlated to the success rate. Molecular-and crystal-level descriptors included in Table 1 are used for this analysis. Since some descriptors have high correlations with each other, we picked up one to exclude multi-collinearity (SI Fig. 5). For clarity in interpretation, a linear model was adopted, and multiple approaches to incorporate descriptors were tested (SI Table 2). Based on the Bayesian information criteria (BIC), we selected the linear regression model that uses the descriptors calculated according to the following formula.
| X = rNDoF2Z + (1 − r)M |
![]() | ||
| Fig. 8 Relationship between CSP-descriptor and success rate. (a) Random-CSP. (b) SPaDe-CSP. The dashed lines are the linear regression, and the highlighted region are the 95% prediction interval. | ||
Although the present benchmark focused on comparing SPaDe-CSP with random-CSP to isolate and demonstrate the effect of space group and density filtering, it should be emphasized that these predictors can also be incorporated into global optimization frameworks such as genetic algorithms (GA) or Bayesian optimization (BO). In GA, for example, the space group predictor reduces the candidate set from 32 common space groups to 7–8 on average (threshold = 10−2, applicable to ∼90% of molecules), while the density predictor further constrains lattice parameters. These predictors therefore provide effective guidance for defining the search domain and generating the initial population, and are expected to synergistically improve the efficiency of GA- or BO-based CSP workflows.
The second limitation is the temperature-effect on the crystal stability. The lattice parameters of organic crystals are known to be more susceptible to temperature changes than inorganic crystals, leading to larger thermal expansion coefficients. In the 20 crystals used for the present verification, those that appeared most stable at 0 K generally matched the reference structures. However, in other cases, the stability of crystal structures at 0 K could differ from that near room temperature. Accounting for temperature effects requires calculating the Gibbs free energy, for which various computational approaches have been proposed.40–43 Yet, DFT-based calculations of Gibbs free energy are computationally expensive. Because NNPs reduce such computational demands, they are considered relatively easy to introduce into the SPaDe-CSP workflow. We focused on the efficiency of lattice sampling in this work and therefore will incorporate Gibbs free energy calculations in future work.
We would like to stress once more the distinction between SPaDe-CSP and commonly used structure-search techniques like GA and BO. While these techniques parametrize crystal structures and actively search for the global optimum, the current SPaDe-CSP method is a technique that filters space groups and lattice constants. These approaches are not meant to replace each other one-to-one; rather, they can be synergistically combined.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5dd00304k.
| This journal is © The Royal Society of Chemistry 2025 |