Open Access Article
David S.
Hughes
*ab,
Ann L.
Bingham
b,
Michael B.
Hursthouse
b,
Terry L.
Threlfall
b and
Andrew D.
Bond
*a
aYusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK. E-mail: dh536@cam.ac.uk; adb29@cam.ac.uk
bSchool of Chemistry, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, SO17 1BJ, UK
First published on 15th December 2021
A set of 96 crystal structures containing sulfathiazole (SLFZ) is presented, comprising 52 new crystal structures and 39 structures retrieved from the Cambridge Structural Database. The set comprises five polymorphs, 59 co-crystals, 29 salts and three other structures, providing one of the most extensive solid-form landscapes established for a single active pharmaceutical ingredient. The crystal structures are energy-minimised using DFT-D calculations to yield a standardised set. Geometrical comparisons are made using the programs CrystalCMP, COMPACK and XPac, and the results are combined and compared. Consistent conclusions are drawn on full 3-D isostructurality within the set, identifying a group of 18 isostructural co-crystals, and 11 further isostructural groups of salts or co-crystals comprising two or three structures. Aside from the fully isostructural groups, common 2-D supramolecular constructs (SCs) are restricted to groups of only two or three structures and there are no 2-D SCs that are observed especially frequently. Transferable 1-D SCs are more common, and examples are identified based on hydrogen-bonded and non-hydrogen bonded interactions between SLFZ molecules. Closely-related 1-D SCs comprising translated SLFZ molecules linked by hydrogen bonds are found in one polymorph and almost half of the multi-component set. A comparison of the five SLFZ polymorphs and the 91 multi-component crystal structures identifies several pairwise interactions between SLFZ molecules that are present in one of the polymorphs and at least one multi-component structure. A centrosymmetric R22(8) N–H⋯N hydrogen-bonded pair occurs in one polymorph and approximately 80% of the co-crystals. Intermolecular interaction energies, calculated using the PIXEL method, show that this R22(8) dimer is by far the most stabilising pairwise interaction in any structure. In general, however, there is no straightforward correlation between intermolecular interaction energies of the pairwise motifs in the polymorphs and their frequency of occurrence in the multi-component set. The extensive SLFZ set provides a challenge for systematic geometrical comparison of crystal structures, and some observations are made on the methodology and consistency of the applied programs.
In this context, one compound that we have studied extensively is sulfathiazole (SLFZ), a well-known active pharmaceutical ingredient (API; Scheme 1).25–27 Whilst in search of reliable crystallisation procedures for the SLFZ polymorphs, an extensive solvent-screening exercise was carried out, which yielded over 100 crystalline solvates.28 This represents one of the most extensive sets of multi-component crystals to be established for a single API,29 providing a rich opportunity to explore its solid-form landscape. This paper presents crystallographic data for the multi-component SLFZ crystal forms and discusses some initial efforts to analyse the structures. In total, the structure set comprises 91 multi-component structures (52 new structures, 39 retrieved from the Cambridge Structural Database (CSD)30) plus five SLFZ polymorphs. Systematic comparison of such a large set is a significant challenge which might be approached in various ways. The principal focus in this paper is on geometrical similarity, assessed using three generally available programs: COMPACK,31 (as implemented in Mercury32), CrystalCMP33,34 and XPac.35 Of particular interest is the comparison of results from the three different sources and the challenges that arise while seeking to establish consistent conclusions. The geometrical analysis is accompanied by calculation of intermolecular interaction energies in the polymorphs, with a view towards establishing the extent to which these might be correlated with transferability of pairwise motifs within the structure set. Seaton et al. have previously taken a similar approach with a more limited set of SLFZ salts.36 A complementary analysis of the SLFZ set based on hydrogen-bond topology is planned for a subsequent paper.
| CSD refcode | Space group | Unit-cell parameters (Å, °) | Vol (Å3) | Z/Z′ | |||
|---|---|---|---|---|---|---|---|
| 1p | SUTHAZ16 | P21/c | 10.534 | 12.936 | 17.191 | 2230.8 | 8/2 |
| 90 | 107.77 | 90 | |||||
| 2p | SUTHAZ05 | P21/n | 10.399 | 15.132 | 14.280 | 2246.6 | 8/2 |
| 90 | 91.21 | 90 | |||||
| 3p | SUTHAZ17 | P21/c | 17.448 | 8.498 | 15.511 | 2120.0 | 8/2 |
| 90 | 112.81 | 90 | |||||
| 4p | SUTHAZ18 | P21/c | 8.193 | 8.538 | 15.437 | 1077.2 | 4/1 |
| 90 | 94.01 | 90 | |||||
| 5p | SUTHAZ19 | P21/n | 10.774 | 8.467 | 11.367 | 1036.5 | 4/1 |
| 90 | 91.65 | 90 | |||||
In each structure, a consistent atom numbering scheme is applied to SLFZ, as shown in Fig. 1. Since several structures have Z′ > 1, a 2-digit code is adopted, where the first number identifies the molecule index and the second is the atom label within the molecule. The molecule has two torsion angles expected to show significant variation amongst the set, denoted τ1 and τ2 in Scheme 1. In the crystal, the molecule can exist in two pseudo-chiral conformations, leading to atropisomerism.39 For consistency in the standardised set, the SLFZ molecule in the asymmetric unit (or the molecule given index 1 in cases with Z′ > 1) is chosen so that the thiazole ring lies to the left when the molecule is viewed along the bisector of the SO2 group with the S
O bonds directed toward the viewer (Fig. 1), which corresponds to a negative value for τ2. This is referred to as the “R” (reference) conformation, and the conformation of other molecules is labelled R or S relative to that reference. The choice of the R conformation is arbitrary; the purpose is to describe whether specific molecules have the same or different pseudo-chirality. In some of the non-centrosymmetric structures, experimental absolute structure determination indicated that the S conformation is exclusively present in the crystal analysed, while others showed inversion twinning. All structures are converted to the R conformation in the standardised set.
A previous survey of the conformational characteristics of N-substituted arylsulfonamides45 identified two torsion angles expected to show significant variation, denoted τ1 and τ2 in Scheme 1. An analysis of these torsion angles for the SLFZ set (see ESI†) shows that τ2 resembles a Gaussian distribution with mean 78° and standard deviation 8°. Torsion angle τ1 also resembles a Gaussian with approximate mean 111° and standard deviation 10° but with a residual tail extending to higher values, populated principally by salts. A scatterplot of τ2versus τ1 shows a loose cluster centred around the mean values of τ1 and τ2, with the extension to higher τ1 values seen clearly for the salts (ESI†). The polymorphs fall mostly within the bulk cluster, except for molecule 1 of 2p (τ1 = 137.2, τ2 = −98.6°), which is an outlier due to its high τ2 value. On this basis, 2p might be distinguished as a conformational polymorph.46
CrystalCMP produces a single continuous figure-of-merit, PSAB, calculated from the distances and relative orientations of mapped molecules.33,34 A smaller value of PSAB indicates a greater degree of structural similarity. Clusters A and B are initially aligned to give the least-squares distance overlay of the atoms in the kernel molecules, then the remaining molecules in A and B are mapped by identifying the shortest distances between their centroids. A search fragment must be defined to compare molecules, and the PSAB value incorporates an RMSD measure between corresponding atoms.34 The final PSAB value is based on all (usually 15) mapped molecules.
COMPACK and XPac map molecules in a fundamentally different way, by considering local pairwise similarity. A shell is built around the kernel in the reference cluster A, comprising molecules “connected” by any intermolecular interatomic distance shorter than the sum of the van der Waals radii plus some tolerance. Each shell molecule in A is then compared to each molecule in cluster B, and molecules are retained in the growing group of mapped molecules if an A ↔ B match is established according to specified distance and angle criteria (which differ between the two programs). The process is continued to second-shell contacts for those molecules retained in the growing group until all molecules in cluster A have been visited. Full 3-D isostructurality is established in COMPACK if all (usually 15) molecules in cluster A are matched, and sub-structure similarity is indicated where only some molecules are matched. XPac interprets the established mappings to identify supramolecular constructs (SCs) within structure sets, comprising groups of matched molecules that may be 0-D (isolated) or extend in 1-D, 2-D or 3-D. XPac also reports symmetry operators applied to generate each molecule within each SC, which can be helpful to identify them within a large set of structures and to compare with other programs such as PLATON or PIXEL. XPac, like CrystalCMP, requires a search fragment to be defined, while COMPACK establishes corresponding atoms automatically by comparing atom types and connectivity. In both COMPACK and XPac, the requirement for threshold judgements during the mapping of molecules means that results depend on the chosen tolerances.
![]() | ||
| Fig. 2 Extract from the CrystalCMP dendrogram for the multi-component SLFZ set. These structures (excluding 7 and 37) constitute a large 3-D isostructural group (group 1 in Table 2). Structures 7 and 37 form a separate group (group 2), as discussed in the text. | ||
| Group 1 | {8, 11, 12, 13, 14, 15, 18, 21, 22, 28, 29, 30, 31, 32, 33, 34, 35, 41} | Co-crystals |
| Group 2 | {7, 37} | Co-crystals |
| Group 3 | {56, 57, 58, 59} | Co-crystals |
| Group 4 | {62, 63, 72} | Salts |
| Group 5 | {16, 17, 19} | Co-crystals |
| Group 6 | {71, 85} | Salts |
| Group 7 | {69, 73} | Salts |
| Group 8 | {25, 47} | Co-crystals |
| Group 9 | {38, 52} | Co-crystals |
| Group 10 | {39, 40} | Co-crystals |
| Group 11 | {77, 79} | Salts |
| Group 12 | {66, 88} | Salts |
The structures of 7 and 37 are useful to illustrate the sensitivity and potential ambiguity of CrystalCMP. Visually, 7 and 37 appear similar as a pair, although the distortion of the unit cell is quite substantial (Fig. 4). The unit-cell parameters are comparable to group 1, but 7 and 37 are described in space group P21/c rather than P21/n (for the same unit-cell setting). Comparing 7 or 37 to group 1, the structures look essentially identical when viewed along the a axis (Fig. 5), and they contain consistent columns of hydrogen-bonded pairs running along a. However, neighbouring columns are shifted relative to each other along a. In group 1, the relative position of neighbouring columns is established through N11–H⋯O hydrogen bonds between SLFZ molecules. In 7 and 37, these are replaced by hydrogen bonds to the solvent molecules (γ-butyrolactone in 7 and pyridazine in 37), and the SLFZ molecules instead form O⋯S12 interactions.47,48 The geometrical difference between the molecular positions is subtle, but the difference in hydrogen bonding is clearly significant, and identifies 7 and 37 as a separate group (group 2). This conclusion is subsequently supported by results from COMPACK and XPac (see below).
Visual inspection of the other groups in the CrystalCMP dendrogram identifies 3-D isostructurality as listed in Table 2. Isostructural groups exist for both co-crystals and salts, but there are no mixed groups. Some further observations can be made in relation to the methodology. In the dendrogram, 38, 52 and 50 are linked at PSAB ≈ 12. These structures resemble group 1/group 2 in that 38 and 52 are isostructural, but 50 is subtly different. As for group 1/group 2, identical 1-D columns exist in all three structures along the a axis, but neighbouring columns in 50 are shifted by ½a compared to the other two. Again, this is driven by the occurrence of N11–H⋯O hydrogen bonds between SLFZ molecules in 50, which are replaced by N–H⋯solvent hydrogen bonds in 38 and 52. Hence, 50 is not included in group 9. For group 12, comprising 66 and 88, the structures are clearly isostructural on visual inspection, but their similarity measure (PSAB = 14.9780) is significantly larger than some of the cases deemed not to be isostructural. Given the clear visual similarity between 66 and 88, the value of PSAB is surprisingly high, and could indicate that corresponding molecules may not be appropriately mapped. In general, the geometrical PSAB measure is clearly helpful to identify cases of potential 3-D isostructurality, but it is difficult to select a consistent cut-off value for fully automated grouping of the SLFZ set.
Although COMPACK automatically groups structures having 15-molecule similarity, it is a substantial manual task to distil the information for sub-structure similarity. An example has been published for 50 structures containing carbamazepine.49 For the SLFZ set, 2234 out of 4560 pairwise comparisons identify some match beyond the kernel molecule, so a fully comprehensive description of the COMPACK output is impractical. The discussion below is restricted to a few illustrative examples.
Considering pairwise matches down to the 9-molecule level yields only a few new links between structures in addition to the groups identified in Table 2. An interesting methodological feature emerges, however. Structure 48 shows 13-molecule similarity with group 3, while 64 matches the same group at the apparently less similar 8-molecule level. Visual inspection shows that both matches actually correspond to the same structural feature, which is a 2-D hydrogen-bonded layer (Fig. 6). For 64versus56, one clearly corresponding molecule within the layer just fails to match at the 30%/30° tolerance level (so the match essentially involves 9 molecules), but the remaining difference in the number of matched molecules is not due to tolerances. Rather, it is a consequence of the relative positions of the common SLFZ layers. In 48, they are well separated due to inclusion of 18-crown-6 and acetonitrile in the multi-component structure. As a result, 13 of the 15 SLFZ molecules in the initial cluster built for 48 belong to the common 2-D layer, and the different relative positions of the layers compared to group 3 is revealed by only 2 mismatched molecules (Fig. 7, top). For 64, the common SLFZ layers are in direct contact, and only 8 molecules in the initial cluster around the kernel molecule belong to the common 2-D layer. Now the difference between layers is revealed by 6 mismatched molecules in neighbouring layers (Fig. 7, bottom). This example highlights that it is not straightforward to interpret the substructure information generated by COMPACK, or even to state immediately that a greater number of matched molecules corresponds to a higher degree of structural similarity. Although it is possible to vary the size of the initial cluster in COMPACK, this type of discrepancy will remain in situations where common sub-structure motifs are arranged in significantly different ways. Multi-component structures will be more susceptible to such effects because the target molecules are likely to be dispersed more widely to accommodate the partner molecules.
![]() | ||
| Fig. 6 Projection onto the plane of the common 2-D layer of SLFZ molecules identified in the structures of group 3, 48 and 64 (structure 48 is shown). The dashed lines indicate N–H⋯O hydrogen bonds. | ||
Amongst the other structures matched at the 9-molecule level or greater in COMPACK, a polytypic relationship is identified between polymorphs 3p, 4p and 5p (Table 1), whereby consistent 2-D layers lie in the (100) planes for 3p and 4p and in the (10−1) planes for 5p (Fig. 8). For 4p, the layers are stacked by translation along the a axis (AAAA stacking pattern), while in 5p, every second layer is mirrored perpendicular to the b axis (ABAB stacking pattern). Polymorph 3p shows an intermediate AABB pattern. The various pairwise matches between 3p/4p/5p in COMPACK range between 9 and 13 matched molecules. This is clearly helpful to draw attention to the similarity between the structures, but manual inspection is still required to extract details of the polytypism.
![]() | ||
| Fig. 8 Polytypic relationship between the structures of 3p (red), 4p (blue) and 5p (green). The structures share common 2-D layers (horizontal), but have different stacking sequences. | ||
Across the whole structure set, common 2-D SCs are generally restricted to groups of only two or three structures and there are no 2-D SCs that are observed especially frequently. One example links groups 8 and 10, which contain a common 2-D SC comprising SLFZ molecules linked by N–H⋯O hydrogen bonds into polar layers (Fig. 9). In 25/47, the SLFZ molecules in neighbouring layers are linked by N–H⋯N hydrogen bonds forming an inversion-symmetric R22(8) motif (discussed further below). The structure adopted by 39/40 is more complex, showing alternating polar and non-polar layers (Fig. 9). An R22(8) motif is again found between neighbouring layers, but with C2 symmetry rather than inversion symmetry.
Transferable 1-D SCs are more common within the set. For example, the arrangement along the a axis of 4p is built from N–H⋯O hydrogen bonds between SLFZ molecules related by translation (Fig. 10). XPac identifies this 1-D SC in 14 co-crystals, 7 salts and one other structure, totalling ca. one quarter of the multi-component crystals. An identical arrangement of hydrogen-bonded aminobenzene rings is seen along the a axis of the group 1 structures, plus two other co-crystals (20, 37) and two salts (70, 84), again totalling ca. one quarter of the multi-component set. Hence, in total, almost one half of the multi-component crystals adopt this hydrogen-bonding arrangement. The two 1-D SCs are geometrically different because the N–H⋯O hydrogen bonds are formed either by H10 (in 4p; Fig. 10(a)) or H11 (in group 1; Fig. 10(b)), so that the direction of the translation relative to the SLFZ molecule is different. A closer look at some of the structures reveals the possibility for a subtle change in hydrogen bonding within these 1-D SCs. For example, the N–H10 bond in 2 points clearly at N12 rather than O11 (Fig. 11). In some of the salts (75, 83, 86), the amino group is protonated, and the NH3+ group clearly interacts with both O11 and N12. It is perhaps to be expected that these transferable SCs should be built from hydrogen bonds, but the structure set also contains other 1-D SCs that are not based on hydrogen bonding, e.g. see Fig. 12.
![]() | ||
| Fig. 10 1-D SC built from N–H⋯O hydrogen bonds between SLFZ molecules related by translation: (a) along the a axis in 4p; (b) along the a axis in group 1 (8 is shown). | ||
![]() | ||
| Fig. 12 Thiazole–thiazole and C–H⋯O dimer interactions produce a 1-D supramolecular construct common to the structures of 2p, 3p, 4p and 5p (5p is shown). | ||
To summarise the extensive XPac output, a Hasse diagram might typically be constructed, showing the relationships between SCs identified in all structures.35,50,51 A complete diagram for the SLFZ set would be extraordinarily complex, however, and the largely manual task of constructing it is forbidding. Details of the XPac comparison between the polymorphs and multi-component structures (5vs.91) are included in the ESI. Further description of the XPac output is deferred for a potential additional study.
| Motif | Found in polymorph | No. of structures | H-Bond? | PIXEL interaction energy (kJ mol−1) |
|---|---|---|---|---|
| A | 1p | 42 | Y | −147.3 to −136.7 |
| B | 3p, 4p | 25 | Y | −33.2 to −30.1 |
| C | 2p, 3p, 4p | 17 | N | −50.4 to −35.9 |
| D | 2p | 15 | Y | −39.6 to −39.5 |
| E | 3p, 4p, 5p | 11 | N | −31.9 to −30.1 |
| F | 3p, 5p | 11 | N | +19.5 |
| G | 1p | 9 | N | +15.7 |
| H | 3p, 5p | 8 | N | −22.4 to −22.2 |
| I | 2p, 3p, 4p, 5p | 8 | N | −48.7 to −36.7 |
The pairwise motif seen most frequently is the centrosymmetric R22(8) dimer formed by a complementary pair of N–H⋯N hydrogen bonds. This was also highlighted in the study by Seaton et al.36 The PIXEL calculations confirm that this is by far the most stabilising pairwise interaction in any of the polymorphs, and it occurs in roughly half of the multi-component structures, including the large isostructural group 1. An alternative C2-symmetric motif with the same R22(8) hydrogen-bonding pattern has been mentioned earlier (Fig. 9). Since the R22(8) motif requires N13 to be protonated, it is seen only in the co-crystals, and in total ca. 80% of the co-crystals contain either the centrosymmetric or C2-symmetric R22(8) motif. In the 12 structures where the R22(8) motif is not seen, all but one make an N–H⋯N/O hydrogen bond to the solvent molecule. The sole exception is 10, where N13 makes an N–H⋯O interaction to another SLFZ molecule. Hence, the co-crystals are dominated by the R22(8) motif, but the probability of its formation is reduced where the co-former molecule is able to accept an N–H⋯N/O hydrogen bond.
In most cases, the pairwise interaction energies (assessed only for the polymorphs) are consistent for a given pair found in different structures, but some instances were identified where a subtle change in geometry has quite a significant effect on the resulting interaction energy. For example, the structures of 2p, 3p, 4p and 5p contain a common 1-D motif along one lattice direction, comprising two alternating SLFZ-SLFZ pairwise interactions (Fig. 12): (i) a centrosymmetric “closed” dimer involving face-to-face contact between thiazole rings, and (ii) a centrosymmetric pair involving C–H⋯O interactions between aminobenzene rings. The geometries and interaction energies are consistent in 3p, 4p and 5p, but 2p shows a subtle geometrical distortion that affects both interactions. For the thiazole–thiazole pair, a greater degree of face-to-face overlap in 2p gives a larger repulsion term in the PIXEL energy and changes the total interaction energy from ca. −46 to −38 kJ mol−1. For the C–H⋯O interaction, the change in geometry in 2p is visually more subtle, but the centroid–centroid distance decreases by ca. 0.2 Å and the total interaction energy changes from ca. −48 to −37 kJ mol−1. This example illustrates that the premise of transferable pairwise motifs, each with a well-defined interaction energy, must be viewed flexibly.
For the SLFZ set, some confident conclusions can be drawn. For example, 3-D isostructural groups amongst the multi-component structures are robustly established. Some transferable supramolecular constructs have also been shown, although a comprehensive overview for the whole structure set is still to be addressed. Common pairwise motifs are identified in the polymorphs and multi-component structures, some of which are based on conventional hydrogen bonding, and some of which are not. Although PIXEL calculations confirm that frequently occurring pairwise motifs are generally quite strongly stabilising, some less stabilising and even destabilising pairs are transferred, and there are numerous more stabilising interactions in the polymorphs that are not seen in the multi-component structures. Hence, there is no straightforward correlation between interaction energy and transferability of a given pairwise motif between the polymorphs and multi-component structures.
There is undoubtedly a great deal more knowledge to be extracted from the SLFZ structure set. A planned subsequent paper will augment this geometrical study with a complementary topological analysis of hydrogen bonding. Many more questions might be considered. For example, can the structure set reveal why SLFZ should be so prolific in forming multi-component crystal forms? Is it a quantifiable function of its shape and/or propensity to form H-bonded networks, or is it simply proportional to the time that has been spent looking? What can be learned about the likelihood of SLFZ forming a multi-component crystal with a given solvent/partner molecule? Are 91 known multi-component structures sufficient to make meaningful conclusions, or do we need more? These types of questions are directly relevant to the practical task of “de-risking” the solid-form selection process, for example in pharmaceutical production. It is hoped that the SLFZ set will be valuable in this and other similar contexts.
Footnote |
| † Electronic supplementary information (ESI) available: Set of standardised, energy-minimised structures in CIF format; list of the 96 crystal structures, including chemical identity, molecular diagrams and unit-cell parameters; refinement details for the 52 new crystal structures (CCDC 2120768–2120819); details of the DFT-D calculations; summaries of program output from CrystalCMP, COMPACK and XPac; PIXEL energies for the polymorphs; pairwise motifs identified in the polymorphs and multi-component structures. For ESI and crystallographic data in CIF or other electronic format see DOI: 10.1039/d1ce01516h |
| This journal is © The Royal Society of Chemistry 2022 |