Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings

Jen E. Werner; Jennifer A. Swift

doi:10.1039/D0CE00273A

Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings†

Jen E. Werner^a and Jennifer A. Swift

*^a

Author affiliations

* Corresponding authors

^a Georgetown University, Department of Chemistry, Washington, DC 20057-1227, USA
E-mail: jas2@georgetown.edu

Abstract

Many organic molecules can crystallize in either hydrated or anhydrous forms. Predicting the formation of hydrates and their relative stability with respect to water-free alternative phases are significant challenges. Here we use the Cambridge Structural Database (CSD) and data informatics to identify and analyze hydrate–anhydrate structure pairs. A search method was developed based on Simplified Molecular-Input Line-Entry strings (SMILES) matching and implemented through the CSD Python Application Programming Interface. Of the >23 000 molecular hydrates containing no metal ions, ∼1400 were found to have at least one corresponding anhydrous form, yielding just over 2000 unique pairs in the CSD. Hydrates with and without a reported anhydrate showed a similar distribution in their water stoichiometries. Lattice symmetry and packing fraction comparisons are reported for the paired hydrates and anhydrates. Structure pairs with one organic component and multiple organic components showed some subtle differences. The details and limitations of the method are outlined in a way that can encourage and guide other types of CSD searches using SMILES.

This article is part of the themed collections: Introducing the CrystEngComm Advisory Board and their research, Database Analysis and The Cambridge Structural Database - A wealth of knowledge gained from a million structures

Supplementary files

Article information

DOI: https://doi.org/10.1039/D0CE00273A
Article type: Paper
Submitted: 24 Feb 2020
Accepted: 24 Mar 2020
First published: 24 Mar 2020

Download Citation

CrystEngComm, 2020,22, 7290-7297

Author version available

Download author version (PDF)

Permissions

Request permissions

Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings

J. E. Werner and J. A. Swift, CrystEngComm, 2020, 22, 7290 DOI: 10.1039/D0CE00273A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

CrystEngComm

Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings†

Abstract

Supplementary files

Article information

Download Citation

Author version available

Permissions

Data mining the Cambridge Structural Database for hydrate–anhydrate pairs with SMILES strings

Social activity

Search articles by author

Spotlight

Advertisements