How many more polymorphs of ROY remain undiscovered

With 12 crystal forms, 5-methyl-2-[(2-nitrophenyl)amino]-3-thiophenecabonitrile (a.k.a. ROY) holds the current record for the largest number of fully characterized organic crystal polymorphs. Four of these polymorph structures have been reported since 2019, raising the question of how many more ROY polymorphs await future discovery. Employing crystal structure prediction and accurate energy rankings derived from conformational energy-corrected density functional theory, this study presents the first crystal energy landscape for ROY that agrees well with experiment. The lattice energies suggest that the seven most stable ROY polymorphs (and nine of the twelve lowest-energy forms) on the Z′ = 1 landscape have already been discovered experimentally. Discovering any new polymorphs at ambient pressure will likely require specialized crystallization techniques capable of trapping metastable forms. At pressures above 10 GPa, however, a new crystal form is predicted to become enthalpically more stable than all known polymorphs, suggesting that further high-pressure experiments on ROY may be warranted. This work highlights the value of high-accuracy crystal structure prediction for solid-form screening and demonstrates how pragmatic conformational energy corrections can overcome the limitations of conventional density functionals for conformational polymorphs.


S1 Crystal structure prediction methods
Generation of putative crystal structures was undertaken through six stages: (i) isolated molecule studies; (ii) global search; (iii) refinement with a transferable repulsion/dispersion (r/d) potential; (iv) refinement with a tailored r/d potential; (v) structure geometry refinement with B86bPBE-XDM, and (vi) final single-point energies with B86bPBE-XDM + ∆SCS-MP2D. The first three stages follow the general strategy presented in Pantelides et al. 1 and were employed in previous work on ROY. 2 However, the structure prediction process here was performed from scratch using the latest version of the codes and methodology as well as a consistent B3LYP level of theory throughout (instead of the mixed set of methods used in the earlier study). In particular, the version of CrystalPredictor used here includes an improved surrogate model (Local Approximate Model or LAM) that offers greater accuracy and speed through the use of adaptive LAM point generation 3 and LAM smoothing. 4 These stages are described in greater detail below. Stage I: Isolated molecule studies. The gas-phase molecular geometry of ROY was first optimized at the B3LYP/6-31G(d,p) level of theory using Gaussian 09. 5 Based on the insight gained from a previous ROY study, 2 two conformational degrees of freedom were chosen as independent variables for the purpose of lattice energy minimization at the global search stage, namely C6-C5-N1-C1 (θ 1 ) and C5-N1-C1-C2 (θ 2 ) (see Figure S1). Their feasible ranges were determined by selecting values for which the increase in intramolecular energy over that of the in vacuo conformation is lower than 20 kJ/mol in one-dimensional scans, giving θ 1 ∈ [−20.0, 180.0] and θ 2 ∈ [100.0, 260.0]. The adaptive Local Approximate Model (LAM) algorithm 3 was used to generate a database of 38 LAMs that describe the conformationally-dependent properties within this space, using an initial uniform grid generated with ∆θ = ±20 • . The high-energy (∆ ) and accuracy cutoff (∆ ) parameters were set to 20 kJ/mol and 1.0 kJ/mol, respectively.
Stage II: Global crystal structure search. The global search for low-energy minima on the Z =1 lattice energy surface was performed using CrystalPredictor II. 6 The smoothed intramolecular potential algorithm 4 was employed to describe intramolecular interactions and point charges for electrostatic interactions, while r/d parameters were taken from the FIT potential set [7][8][9][10] Two million minimizations in 61 space groups were carried out. Structures within 20 kJ/mol of the global minimum were clustered using the CSD Python API 11 to compute RMSD 15 values. 12 If two structures are found to have an RMSD 15 less than 0.3Å, a lattice energy difference of less than 0.2 kJ/mol and a density difference of less than 1.0 kg/m 3 , the higher-energy structure is eliminated. This led to the identification of 2,869 distinct structures within 20 kJ/mol of the global minimum.
Stage IV: Further crystal structure refinement with a tailored r/d potential: The 50 lowest-energy structures from Stage III were fully relaxed with dispersion-corrected periodic planewave DFT calculation in QuantumEspresso 14,15 using the B86bPBE functional 16,17 and the exchange-dipole moment (XDM) dispersion correction. 18 This provided a set of optimized crystal geometries and energies which were then used as reference data for fitting an updated set of r/d potential parameters that are tailored to the ROY molecule. The CrystalEstimator 19 code was used for this purpose. The updated r/d parameter estimates, listed in Table 1, were then used to re-refine the lowest 1,000 structures from the global search at Stage II, using the same conformational degrees of freedom and electrostatic model as at Stage III.
The final set of ∼300 structures carried forward represents the union of all structures which lie within 10 kJ/mol of the global minimum on either the Stage III or Stage IV landscapes, as indicated in Figure 2a of the main paper. This set was augmented with the structures of the three experimental polymorphs which have Z =2: R05, R18, and RPL. Stage V: DFT relaxation of the crystal structures. All ∼300 structures carried forward from Stages III and IV were fully relaxed with B86bPBE-XDM using Quantum Espresso. After removal of duplicate structures which coalesced in the DFT optimizations, 264 unique structures remained.
The B86bPBE-XDM calculations employed a 50 Ry planewave energy cutoff and reciprocal space Monkhorst-Pack k-point grid densities of 0.04Å −1 or better. Core electrons were treated with the projector augmented wave (PAW) approach using potentials produced with Atomic v6.1. 14 The gas-phase DFT energies for the monomer conformational energy correction were computed with the same parameters in a periodic box with an intermolecular spacing of at least 20Å between any two periodic image atoms. Due to the large unit cell size, only the Γ-point was sampled in reciprocal space. The high-pressure calculations relaxed the structures under isotropic hydrostatic pressures ranging from 1-10 GPa.
Stage VI: Conformational energy correction and final energy ranking. Finally, given the known problems GGA (and many other) density functionals have describing the ROY conformations, the final landscape was obtained by applying a singlepoint monomer conformational energy correction at the SCS-MP2D level of theory to each crystal from Stage V. The corrected energy per unit cell is computed as, Energies per molecule are obtained by dividing by Z. The final results are denoted as B86bPBE-XDM + ∆SCS-MP2D. Here, the final energies were evaluated relative to that of form Y. The SCS-MP2D calculations were performed using PSI4 20 (for MP2) and a custom implementation of the spin-component scaling and dispersion correction. 21 The SCS-MP2D energy at the complete-basis-set (CBS) limit is computed as the HF/aug-cc-pVQZ energy plus the scaled same-spin and opposite-spin MP2/CBS correlation energies. We then subtract the correspondingly scaled amounts of same-and opposite-spin uncoupled Hartree-Fock (UCHF) dispersion energies and replace them with the full, unscaled coupled Kohn-Sham (CKS) dispersion energy. See Ref 22 for details of these corrections and the values of the c os and c ss coefficients.
The complete-basis-set (CBS)-limit same-spin and opposite-spin correlation energies were extrapolated 23 from density-fitted MP2 calculations performed in the aug-cc-pVTZ and augcc-pVQZ basis sets: S2 Additional Data Table S2 compares the DFT-optimized crystal structures against experimental ones reported in the Cambridge Structure Database in terms of densities and structure overlays. Fifteenmolecule structure overlays were used for the comparison (RMSD 15 metric), as implemented in Mercury. The densities of the structures optimized with B86bPBE-XDM are consistently higher than experimental ones due to the neglect of zero-point vibrational energy and the thermal expansion of the crystals at finite temperatures. Most of the experimental structures were determined at room temperature, though a few were taken at lower temperatures, as indicated in the Table. The role of thermal expansion can be seen in the fact that the density differences between the experimental and DFT crystal structures are about twice as large for the polymorphs whose experimental structures were solved at room temperature compared to those solved at 100-150 K.  Figure S3 plots the key gas-phase conformational energy profile for ROY. It shows how a GGA functional like B86bPBE-XDM over-stabilizes the more planar conformations, including those associated with red and orange polymorphs, relative to those less-planar conformations associated with Y polymorphs. It exhibits a root-mean-square (rms) error of 2.5 kJ/mol relative to complete-basis-set CCSD(T) benchmarks. 24 In contrast, complete-basisset SCS-MP2D performs much better, with an rms error of 0.

S2.2 ROY conformational energy profile
Energy (kJ/mol) Dihedral Angle (degrees) Figure S3: Conformational energy profile for rotating the key S-C-N-C dihedral angle θ. The experimental dihedral angles associated with several ROY polymorphs are indicated. Table S3 lists the computed lattice energies and energies relative to form Y as computed with B86bPBE-XDM and after the SCS-MP2D monomer correction is applied. These relative energies correspond to the crystal energy landscapes presented in the main paper. The experimental polymorphs are indicated in bold, while the high-pressure candidate structure #15 is highlighted in red. A CIF file containing all crystal structures is provided separately. . This landscape was using dispersion-corrected PBE DFT calculations and a DFT-tailored force field using the GRACE program. The authors published both their final optimized predicted structures as well optimized geometries for each of the experimental forms that were known at the time. The search was imperfect-it found only six experimental polymorphs. It was also heavily biased toward red/orange forms due to delocalization error in the PBE functional; only 2% of the structures in the final set exhibit an SCNC dihedral angle of 90±40 • , which is the approximate range associated with yellow polymorphs (as discussed in the main paper). In contrast, more than a third of the lower-energy structures reported in our search lie in that range. Nevertheless, because our own search only examined Z =1 structures, the Nyman et al landscape enables assessment of the importance of Z =2 structures. We performed singlepoint B86bPBE-XDM energy calculations and applied the SCS-MP2D monomer energy correction to the 500 lowest-energy structures from their final landscape of 1,077 DFT-optimized structures. About three-quarters of these structures have Z =2. Differences in the functionals and dispersion corrections between that study and our own leads to changes in crystal packing densities, intramolecular bond lengths, and overall energies that hinder direct comparison between the single-point energies for these structures and the energies from our own landscape. For example, the raw SCS-MP2D monomer energies for the experimental forms differ by up to 3.5 kJ/mol depending on which functional was used to optimize them. On the other hand, relative energies between forms are far more consistent across the two landscapes. Accordingly, Figure S4 presents the energies in each landscape relative to form Y from that same landscape, and the two landscapes are presented side-by-side.

S2.3 Relative energies of the ROY polymorphs
Unsurprisingly, given the the PBE-NP results from Ref 26 and our B86bPBE-XDM data for our own landscape in the main paper, B86bPBE-XDM predicts the yellow forms on the Nyman et al landscape to be far less stable than both the red/orange experimental forms and most of the predicted structures. Performing the SCS-MP2D monomer correction transforms the landscape dramatically, and form Y becomes the most stable. As can be seen from Figure S4, the relative SCS-MP2D-corrected B86bPBE-XDM single-point energies of the experimental polymorphs are reasonably similar regardless of whether they are optimized with PBE-NP or B86bPBE-XDM. The energies of forms OP, ORP, PO13, R05, YN, and YT04 vs. Y vary by 0-0.3 kJ/mol between the two landscapes in Figures S4b and c. The exceptions are form ON, for which Nyman et al reported problems reproducing the experimental geometry with the PBE-NP model, 26 and form R, which is 0.9 kJ/mol less stable when optimized with PBE-NP.
Looking at the revised landscape after the conformational energy correction, we find few if any low-energy Z =2 structures which compete with the low-energy experimental forms. R05 remains the lowest-energy Z =2 structure. The next lowest predicted structure (Rank #19 in their list) with Z =2 lies only slightly above R05 in energy. However, this structure involves two distinct ab-plane layers which alternate intramolecular conformation and packing motif along the c direction. One of these 2-D layers is very similar to the packing found in form YN, and experimental realization of this structure might be inhibited by preferential crystallization of the more stable YN polymorph. The other Z =2 structures lie in higher-energy regions of the landscape where many unobserved candidates lie and do not particularly stand out.
Overall this analysis of the CSP landscape from Ref 26 bolsters the argument in the main paper that the important low-energy polymorphs of ROY have already been found. It is worth reiterating, however, that this landscape is heavily biased toward more planar conformations of the ROY molecule, and the possibility of missing Z = 2 structures with yellow-color conformations cannot currently be ruled out.  Figure S5: Pressure-dependent enthalpy calculations were performed for all experimental structures in red except RPL (Z=16) and for the candidate structures shown in blue. The rank #15 structure which becomes more stable than all experimental polymorphs around 10 GPa is indicated.