Jump to main content
Jump to site search
Access to RSC content Close the message box

Continue to access RSC content when you are not at your institution. Follow our step-by-step guide.



Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings

Author affiliations

Abstract

Many organic molecules can crystallize in either hydrated or anhydrous forms. Predicting the formation of hydrates and their relative stability with respect to water-free alternative phases are significant challenges. Here we use the Cambridge Structural Database (CSD) and data informatics to identify and analyze hydrate–anhydrate structure pairs. A search method was developed based on Simplified Molecular-Input Line-Entry strings (SMILES) matching and implemented through the CSD Python Application Programming Interface. Of the >23 000 molecular hydrates containing no metal ions, ∼1400 were found to have at least one corresponding anhydrous form, yielding just over 2000 unique pairs in the CSD. Hydrates with and without a reported anhydrate showed a similar distribution in their water stoichiometries. Lattice symmetry and packing fraction comparisons are reported for the paired hydrates and anhydrates. Structure pairs with one organic component and multiple organic components showed some subtle differences. The details and limitations of the method are outlined in a way that can encourage and guide other types of CSD searches using SMILES.

Graphical abstract: Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings

Back to tab navigation

Supplementary files

Article information


Submitted
24 Feb 2020
Accepted
24 Mar 2020
First published
24 Mar 2020

CrystEngComm, 2020, Advance Article
Article type
Paper

Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings

J. E. Werner and J. A. Swift, CrystEngComm, 2020, Advance Article , DOI: 10.1039/D0CE00273A

Social activity

Search articles by author

Spotlight

Advertisements