Making the InChI FAIR and sustainable while moving to inorganics

Gerd Blanke; Jan Brammer; Djordje Baljozovic; Nauman Ullah Khan; Frank Lange; Felix Bänsch; Clare A. Tovee; Ulrich Schatzschneider; Richard M. Hartshorn; Sonja Herres-Pawlis

doi:10.1039/D4FD00145A

Making the InChI FAIR and sustainable while moving to inorganics†‡

Gerd Blanke,

*^a Jan Brammer,^b Djordje Baljozovic,

^b Nauman Ullah Khan,

^b Frank Lange,

^b Felix Bänsch,^c Clare A. Tovee,

^d Ulrich Schatzschneider,

^e Richard M. Hartshorn

^f and Sonja Herres-Pawlis

*^b

Author affiliations

* Corresponding authors

^a StructurePendium GmbH, Essen, Germany
E-mail: gerd.blanke@structurependium.com

^b Institut für Anorganische Chemie, Landoltweg 1a, 52074 Aachen, Germany
E-mail: sonja.herres-pawlis@ac.rwth-aachen.de

^c Beilstein-Institut zur Förderung der Chemischen Wissenschaften, Trakehner Straße 7-9, 60487 Frankfurt am Main, Germany

^d Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge, UK

^e Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Am Hubland, 97074 Würzburg, Germany

^f School of Physical and Chemical Sciences, University of Canterbury, Christchurch, New Zealand

Abstract

The InChI (International Chemical Identifier) standard stands as a cornerstone in chemical informatics, facilitating the structure-based identification and exchange chemical information about compounds across various platforms and databases. The InChI as a unique canonical line notation has made chemical structures searchable on the internet at a broad scale. The largest repositories working with InChIs contain more than 1 billion structures. Central to the functionality of the InChI is its codebase, which orchestrates a series of intricate steps to generate unique identifiers for chemical compounds. Up to now, these steps have been sparsely documented and the InChI algorithm had to be seen as a black box. For the new v1.07 release, the code has been analyzed and the major steps documented, more than 3000 bugs and security issues, as well as nearly 60 Google OSS-Fuzz issues have been fixed. New test systems have been implemented that allow users to directly test the code developments. The move to GitHub has not only made the development more transparent but will also enable external contributors to join the further development of the InChI code. Motivation for this modernisation was the urgency to treat molecular inorganic compounds by the InChI in a meaningful way. Until now, no classic string representation fulfills this need of molecular inorganic chemistry. Currently bonds to metal centers are by definition disconnected which makes most inorganic InChIs meaningless at the moment. Herein, we propose new routines to remedy this problem in the representation of molecular inorganic compounds by the InChI.

This article is part of the themed collection: Data-driven discovery in the chemical sciences

Faraday Discussions

Making the InChI FAIR and sustainable while moving to inorganics†‡

Abstract

Supplementary files

Article information

Download Citation

Permissions

Making the InChI FAIR and sustainable while moving to inorganics

Social activity

Search articles by author

Spotlight

Advertisements