Issue 9, 2024

Connectivity stepwise derivation (CSD) method: a generic chemical structure information extraction method for the full step matrix

Abstract

Emerging advanced exploration modalities such as property prediction, molecular recognition, and molecular design boost the fields of chemistry, drugs, and materials. Foremost in performing these advanced exploration tasks is how to describe/encode the molecular structure to the computer, i.e., from what the human eye sees to what is machine-readable. In this effort, a chemical structure information extraction method termed connectivity step derivation (CSD) for generating the full step matrix (MSF) is exhaustively depicted. The CSD method consists of structure information extraction, atomic connectivity relationship extraction, adjacency matrix generation, and MSF generation. For testing the run speed of the MSF generation, over 54 000 molecules have been collected covering organic molecules, polymers, and MOF structures. Test outcomes show that as the number of atoms in a molecule increases from 100 to 1000, the CSD method has an increasing advantage over the classical Floyd–Warshall algorithm, with the running speed rising from 28.34 to 289.95 times in the Python environment and from 2.86 to 25.49 times in the C++ environment. The proposed CSD method, that is, the elaboration of chemical structure information extraction, promises to bring new inspiration to data scientists in chemistry, drugs, and materials as well as facilitating the development of property modeling and molecular generation methods.

Graphical abstract: Connectivity stepwise derivation (CSD) method: a generic chemical structure information extraction method for the full step matrix

Supplementary files

Article information

Article type
Paper
Submitted
06 May 2024
Accepted
05 Aug 2024
First published
08 Aug 2024
This article is Open Access
Creative Commons BY-NC license

Digital Discovery, 2024,3, 1842-1851

Connectivity stepwise derivation (CSD) method: a generic chemical structure information extraction method for the full step matrix

J. Xiong, X. Feng, J. Xue, Y. Wang, H. Niu, Y. Gu, Q. Jia, Q. Wang and F. Yan, Digital Discovery, 2024, 3, 1842 DOI: 10.1039/D4DD00125G

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements