Organic solvates in the Cambridge Structural Database
Abstract
Data informatics approaches were applied to the Cambridge Structural Database (CSD) in an effort to discern fundamental trends related to the preparation, occurrence, and general properties of organic solvates. Foremost, the 50 most abundant solvate classes in the CSD were identified through SMILES string matching implemented through CSD Python API, and their relative occurrence rates were compared against data reported 20 years prior. These two sets of data suggest that solvate preparation methods have become less diverse over that time period with an increasing fraction derived from a smaller subset of solvents, though the relative abundance of hetero-solvates containing more than one type of solvent molecule simultaneously increased. A subsequent SMILES string matching facilitated the identification of ∼2700 pairs of solvate and solvent-free structures from the top 10 solvate classes. Data from the two related groups showed statistical differences in both the lattice symmetries and packing fractions. Solvates exhibited an inherent bias favoring triclinic lattice symmetry, which is likely related to the larger number of unique molecular components in the asymmetric unit. More surprising was the fact that solvates that do not exhibit disorder statistically had lower packing fractions than their solvent-free analogues. While solvate formation may in fact be a means to achieve phases with higher packing efficiency for some organic molecules, the data indicate this is not a general trend.
- This article is part of the themed collections: Introducing the CrystEngComm Advisory Board and their research and Database Analysis