Volker L.
Deringer
*ab,
Davide M.
Proserpio
cd,
Gábor
Csányi
a and
Chris J.
Pickard
ef
aDepartment of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK. E-mail: vld24@cam.ac.uk
bDepartment of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
cDipartimento di Chimica, Università degli Studi di Milano, Milano, Italy
dSamara Center for Theoretical Materials Science (SCTMS), Samara State Technical University, Samara 443100, Russia
eDepartment of Materials Science and Metallurgy, University of Cambridge, Cambridge CB3 0FS, UK
fAdvanced Institute for Materials Research, Tohoku University, 2-1-1 Katahira, Aoba, Sendai, 980-8577, Japan
First published on 12th April 2018
Crystal structure prediction algorithms, including ab initio random structure searching (AIRSS), are intrinsically limited by the huge computational cost of the underlying quantum-mechanical methods. We have recently shown that a novel class of machine learning (ML) based interatomic potentials can provide a way out: by performing a high-dimensional fit to the ab initio energy landscape, these potentials reach comparable accuracy but are orders of magnitude faster. In this paper, we develop our approach, dubbed Gaussian approximation potential-based random structure searching (GAP-RSS), towards a more general tool for exploring configuration spaces and predicting structures. We present a GAP-RSS interatomic potential model for elemental phosphorus, which identifies and correctly “learns” the orthorhombic black phosphorus (A17) structure without prior knowledge of any crystalline allotropes. Using the tubular structure of fibrous phosphorus as an example, we then discuss the limits of free searching, and discuss a possible way forward that combines a recently proposed fragment analysis with GAP-RSS. Examples of possible tubular (1D) and extended (3D) hypothetical allotropes of phosphorus as found by GAP-RSS are discussed. We believe that in the future, ML potentials could become versatile and routine computational tools for materials discovery and design.
Despite their predictive power, these structure searching methods are normally driven by quantum-mechanical density-functional theory (DFT) computations, and therefore they are limited to systems with relatively small unit cells. The allotropes of elemental phosphorus,20–22 which are the topic of the present paper, directly illustrate the problem at hand. The thermodynamically stable “black” (orthorhombic) form, as well as the high-pressure As-type (rhombohedral) allotrope, exhibit simple crystal structures with only one symmetry-independent atom each (Table 1). Such systems are easily amenable to ab initio crystal structure prediction, and various, especially layered, hypothetical allotropes have been proposed in recent years.23–32 On the other hand, “violet” (Hittorf’s) phosphorus33 has a notoriously complex structure that was solved more than 100 years after the initial synthesis34 and contains no fewer than 21 symmetry-independent atoms in the unit cell. Systems of this size have hitherto been out of reach for ab initio crystal structure prediction.
Space group | Z | Z′ | Source | Ref. | |
---|---|---|---|---|---|
a Occasionally referred to as “blue P” in recent literature,46 especially in the monolayer form. b To explore whether our GAP would be able to find molecular (“white”) P, we carried out a set of preliminary searches at low density (1.0 g cm−3). The resulting structures did include distorted P4 units, but also other small fragments, and we expect that the GAP will need to have “seen” these in iterative training to distinguish them more clearly. We also expect that fully capturing the intricate structural details of white P, including its packing variants,74 will require additional reference data; this is the topic of ongoing work. Regarding amorphous forms, our searches are restricted to relatively small periodic unit cells which cannot fully represent the amorphous allotrope(s). | |||||
Black P | Cmca | 8 | 1 | High-pressure synthesis | 43 |
As-type Pa |
R![]() |
2 | 1 | From black P (p > 5 GPa) | 44 |
Simple cubic P |
Pm![]() |
1 | 1 | From black P (p > 11 GPa) | 44 |
Hittorf’s P | P2/c | 84 | 21 | Slow cooling from Pb melt | 34 |
Fibrous P |
P![]() |
42 | 21 | Resublimation with I catalyst | 45 |
It has recently been suggested that machine learning (ML) based interatomic potentials could help with this long-standing issue.35–38 Such potentials are fitted to reference databases of DFT energies and forces and, once generated, they allow one to perform atomistic simulations with close to DFT quality but at a computational cost that is orders of magnitude lower.39 Indeed, we have recently shown that an ML-based Gaussian approximation potential (GAP), initially developed for amorphous carbon,40 can be used to identify crystalline allotropes.36 This has added around 150 entries to the Samara Carbon Allotrope Database (SACADA, ref. 15). Very recently, the use of ML to speed up global searching and crystal structure optimization itself has been reported,41,42 but we focus here on the use of explicit interatomic potentials (“force fields”) for this purpose. What is more, in recent work,37 we have argued that RSS can be used to construct the interatomic potential from scratch, exploring and fitting a complex potential energy surface (PES) at the same time. This points towards a more general “data-driven” strategy for atomistic materials modelling.
In this Discussion paper, we aim to take new steps in this direction and further develop our emerging method (which we call “GAP-RSS”). After briefly summarising its components, we present results for elemental phosphorus, generating and applying the first GAP-RSS potential for this material. Our protocol “discovers” the crystal structure of black P during the iterations and, by construction, adds it to the reference database without prior knowledge of any existing allotropes. We then use this potential for some exercises in crystal structure prediction: we show that a fibrous allotrope of P (Table 1) appears to be prohibitively hard to find in free searches, and we outline the use of an alternative approach, showing exemplary predicted structures in 1D and 3D. We also discuss open questions and expected future directions.
![]() | ||
Fig. 1 Overview of the GAP-RSS protocol as introduced in ref. 37 and extended here by a selection step. We start from a set of randomised seed structures for a given chemical composition (in generations 0–3, 250 each; in generation 4, 5000). The reference database of energies and forces is then built by single-point DFT computations: first, for unrelaxed structures (generation 0), later, at various stages of structural relaxation (1–4), performed by fitting interim GAPs to the evolving database. Initially, we feed all structures back into the training (up to generation 3); later, we only add selected structures. As a criterion to select cells, here we use coordination numbers, demanding that all atoms in a candidate structure must have three nearest neighbours, and discarding any structures which do not comply. The GAP-RSS iterations are deemed “converged” when the resulting potential shows satisfactory performance. Here, we do so after generation 4, as that version of the potential has “found” black P from scratch (see Fig. 3 and 4). |
![]() | ||
Fig. 2 Left: energy–volume plot of single-point DFT data during the generation of a GAP-RSS model for phosphorus, given relative to the most stable structure. The different stages of building the database (cf.Fig. 1) are highlighted in different colours. Right: binned distribution for energies in these datasets, drawn on the same vertical axis. |
We will show in the following that, concomitant with increasing exploration of the phase space, the GAP-RSS interatomic potential becomes more accurate, ultimately describing crystal structures without prior knowledge. We stress that in most of the previous literature on fitting ML potentials, reference databases are constructed by including known crystal structures (and distorted variants thereof).39,40,53
To assess the quality of the potential, we use it to compute energies for independent test sets of structures not included in the training, for which the DFT energies are known. The results are shown in Fig. 3a. The description of the higher-energy regions (“RSS intermediates”, open symbols) appears to converge quite early on, but it carries a residual root-mean-square prediction error of the order of 0.05–0.10 eV per at. that does not improve with further iterations. The same is observed, and is even more pronounced, for a test set of randomised seed structures, where the error is practically constant at ≈0.10 eV per at. for all the GAP generations (not shown in Fig. 3a for clarity). This is intuitively understandable: the PES is smooth, and since the high-energy structures are so diverse, only a handful of them are required for acceptable sampling. A similar observation was made for liquid carbon in our earlier work: the high-temperature liquid contains many different environments and is thus straightforward to “learn” from a single ab initio MD trajectory. In contrast, the amorphous regions of the PES required several rounds of GAP-driven MD and iterative refinement of the potential.40
For the purposes of GAP-RSS, these findings are encouraging: the early steps of relaxation appear to be easily “learned” by the potential, and even a notable degree of residual error can be tolerated if the potential is successful in navigating the high-energy region of the PES (say, the green data points in Fig. 2). In contrast, a growing database is needed for the relaxed minima. Indeed, testing for a set of local RSS minima outside the database initially leads to a significant prediction error above 0.5 eV per at. (filled symbols in Fig. 3a), but this gradually improves and falls below 0.1 eV per at. with increasing quality of the potential.
The latter accuracy turns out to be acceptable for exploratory searches, but it is still worse than what one would expect from an ML potential for the stable allotropes. Our long-term vision for GAP-RSS is therefore to find stable minima (such as black or orthorhombic P) during the iterations and automatically include them in the reference database. We expect this to improve the accuracy substantially, which is supported by the following result. Fig. 3b shows test set errors as before, but now for ensembles of distorted unit cells of two experimentally known allotropes. The orthorhombic structure of black P is a particularly interesting test, as it competes with other, more highly symmetric structures that are very close in energy on the PES.22 Our potential, during generation 4, did succeed in finding black P, and hence includes this structure type in the final reference database. This is reflected in Fig. 3b: the energy errors for black P get progressively lower, but only in generation 4 do they drop to very close to zero.
Likewise, the computed energy–volume scans for black P (Fig. 4a) progressively improve during the GAP-RSS iterations, but the structure needs to have been “seen” and included in the fitting (in generation 4) to achieve an accurate result. The same is true for computed structural properties (Fig. 4b): for this test, we fully relaxed the black P structure with each version of our potential, as well as with DFT. In generation 4, all three GAP-computed lattice parameters come very close to the DFT benchmark.
P(p) = λ![]() |
The results are easily rationalised by looking at the evolving energy–volume scans again, but now for the high-pressure forms (Fig. 5). For reference, our pressure-free search (p0 = 0 GPa) already yields an acceptable description of both allotropes in generation 4 (purple lines), but it fails to reach quantitative accuracy, especially for the simple cubic form. Searching with moderate external pressure (p0 = 10 GPa) visibly improves the description of both high-pressure forms: the GAP and DFT data are now in better agreement. Note that due to the nature of the exponential distribution, most of the pressure values drawn are significantly smaller than p0, and thus correspond to typical experimental conditions. When using external pressures that are an order of magnitude higher (p0 = 100 GPa), the description of the As-type form deteriorates visibly, whereas the simple cubic form is already “learned” very easily in generation 2 (bottom right panel in Fig. 5). All this is intuitively understood, as the experimentally observed sequence of pressure-induced phase transitions is black → As-type → simple cubic, with transition pressures around 5 GPa and 11 GPa, respectively.44,57
![]() | ||
Fig. 5 Energy–volume scans similar to those in Fig. 4a, but now assessing the high-pressure forms, As-type (top) and simple cubic P (bottom). Data were obtained with three separate sets of GAP-RSS iterations that utilise, from left to right: no external pressure (i.e. the main dataset), 10 GPa, or 100 GPa of reference pressure p0, as defined in the text. In all cases, the respective lowest DFT energy point is set as zero. The data for generation 0 are outside the range of the energy axis. |
We performed 100000 attempts to find the structure of fibrous P using GAP-RSS, seeding with 21 independent, randomised atomic positions in space group P
. However, this search was unsuccessful, and none of the attempts led to the known structure after minimisation. We show three of the resulting structures, as mere examples, in Fig. 6. This reflects the more general problem that in global structure determination, the searching task becomes exponentially more complex with system size (and each independent atom adds three structural degrees of freedom), as discussed by Stillinger60 and in ref. 9. While the GAP still provides an approximation of the DFT potential energy surface, and therefore some care must be taken when transferring conclusions from GAP-RSS to (DFT-based) AIRSS, we take the present findings as clear evidence that the structure of fibrous P (and, by extension, Hittorf’s P) is too complex to find in free random searches.
Fig. 7a summarises the procedure and illustrates how the structure of fibrous P is decomposed into two fragments using the approach of ref. 61. While this is a purely automated, data-driven procedure, its outcome is often in line with chemical intuition.61 Indeed, in the original publication on fibrous P, the structure was described as a sequence of alternating [P8] and [P9] cages, interspersed with P2 dumbbells.45 Similar fragments have been identified by Thurn and Krebs in the crystal structure of Hittorf’s P;34 they have been discussed in Baudler’s seminal work on the nomenclature of phosphorus cages,58 and have been the topic of comprehensive theoretical analyses both in the gas phase and in the bulk.22,59,64 Without chemical knowledge, but with an algorithm to find the most simple and representative building units, the network analysis likewise “cut” the chains into fragments.61 In these, the well-known [P9] cage was combined with P2 dumbbells on either side, forming a [P13] unit (“Fragment A”), whereas the [P8] cage was recovered directly (“Fragment B”). We believe that alternative ways of decomposition, e.g. isolating the [P9] cage, may also be useful as starting points for structure searches—as will combinations of different fragments. We also note the advantage of a chemically “agnostic” approach: while there is an extensive body of literature on the building blocks of P allotropes,22,58,59 future work might be concerned with new materials where the local atomic structure is a priori unknown.
![]() | ||
Fig. 7 (a) Overview of the fragment-based approach to structure analysis and structure searching introduced in ref. 61 that we combine here with GAP-RSS. Two fragments have previously been identified in the structure of fibrous P (see text),61 and here we use both separately to generate seed structures for GAP-RSS structure searches. (b) Example of a 1D-periodic structure identified by this procedure. The tube “inherits” some of the structural information from Fragment B, namely, fused five-membered rings (dashed box). Full topology information is given in the text. (c) The same for a 1D structure composed of six-membered rings only. (d–f) Examples of 3D-periodic structures from the same search, labelled according to their network topologies.63 |
Even with an exemplary and simplified approach that uses only one fragment or the other (Fig. 7a), our GAP-RSS searches readily identified several hypothetical phosphorus allotropes with different dimensionalities. We filtered the output with more stringent criteria than in the preceding GAP-RSS generations, enforcing a minimum “non-bonded” (beyond-nearest-neighbour) distance of 2.9 Å, removing any structures that contain three- or four-membered rings (which are expected to be under significant strain), and post-relaxing selected structures using dispersion-corrected DFT.65–67 We explored the effect of feeding structural data back into the GAP as before, but now for the more ordered structures coming from fragment-based searches, and generated two additional generations of the potential. An intermediate generation was used to find the structures shown in Fig. 7 (generation 5a); another, final one additionally contains all the minima found in that search (5b). We observed that this reduces the energy error for distorted unit cells of Hittorf’s P (cf.Fig. 3b) from 0.19 eV per at. (generation 4) to 0.16 eV per at. (5a), and further to 0.13 eV per at. (5b); note that in none of these generations has our GAP-RSS potential “seen” the actual crystal structure of Hittorf’s P. A full account of this, including the results of much more diverse searches, will be published elsewhere, and for the sake of brevity we present only a few salient examples here.
As expected, several 1D tubular motifs were found in our GAP-RSS searches, packed in different ways to form extended structures. One of these tubes, shown in Fig. 7b, retains some features of the [P8] cage from which the initial seed is constructed: namely, fused five-membered rings, of which there are four in the [P8] cage (topology symbol [54]), and up to three in the chain structure (dashed box in Fig. 7b). Another structure found by GAP-RSS, shown in Fig. 7c, breaks up the initial seed fragment, forming a tube that consists of only six-membered rings. This corresponds to a sphere-packing graph for a hexagonal net (63) with coincidence vectors (m, n) = (3, 2).68 In this, it is also equivalent to one of the smallest theoretically possible carbon nanotubes.
The same protocol also identified fully three-dimensional structures. One of them is a rare binodal net, bbe-3,3-Imma (Fig. 7d), which has occasionally been observed in MOFs.69 We find another structure with a uninodal pbp net containing six- and eight-membered rings (Fig. 7e) that had originally been proposed for a hypothetical carbon structure dubbed “6.82P” (ref. 70) and also investigated as a hypothetical allotrope of P.27 We finally find the uninodal lig net (Fig. 7f), which corresponds to the anion network in the Zintl phase LiGe.71 Such a structure has not been proposed for P thus far, but it is in conceptual agreement with the Zintl–Klemm–Busmann framework (in which the group-14 element is viewed as “Ge−”, and thus takes the structural signature of a group-15 element due to its excess electron).72 The computed dispersion-corrected DFT energies65–67 for the structures shown in Fig. 7d–f are ≈+2 kJ mol−1 (bbe-3,3-Imma) and ≈+8 kJ mol−1 (pbp and lig) above that of black P, placing them well within the experimentally and computationally derived stability range of known allotropes.22
In ongoing work, beyond the scope of this paper, we are performing much larger-scale searches, including attempts to find “fibrous” P and possible related structures using GAP-RSS, and trying to understand the observed preference for the experimentally known form. It seems especially interesting to study such tubular forms since some of these structural units can be chemically extracted from phosphorus-rich compounds.73
This journal is © The Royal Society of Chemistry 2018 |