Melting point prediction of organic molecules by deciphering the chemical structure into a natural language

Weiming Mi; Huijun Chen; Donghua (Alan) Zhu; Tao Zhang; Feng Qian

doi:10.1039/D0CC07384A

Melting point prediction of organic molecules by deciphering the chemical structure into a natural language†

Weiming Mi,

‡^a Huijun Chen,‡^b Donghua (Alan) Zhu,

*^c Tao Zhang*^a and Feng Qian

*^b

Author affiliations

* Corresponding authors

^a Department of Automation, Tsinghua University, Beijing National Research Center for Information Science and Technology, Beijing 100084, P. R. China
E-mail: zhangtao@tsinghua.edu.cn

^b School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, P. R. China
E-mail: qianfeng@tsinghua.edu.cn

^c Pharmaceutical Product Development & Supply, Chemical Pharmaceutical Development & Supply, Janssen Research & Development, Johnson & Johnson, Shanghai 200233, P. R. China
E-mail: dzhu7@its.jnj.com

Abstract

Establishing quantitative structure–property relationships for the rational design of small molecule drugs at the early discovery stage is highly desirable. Using natural language processing (NLP), we proposed a machine learning model to process the line notation of small organic molecules, allowing the prediction of their melting points. The model prediction accuracy benefits from training upon different canonicalized SMILES forms of the same molecules and does not decrease with increasing size, complexity, and structural flexibility. When a combination of two different canonicalized SMILES forms is used to train the model, the prediction accuracy improves. Largely distinguished from the previous fragment-based or descriptor-based models, the prediction accuracy of this NLP-based model does not decrease with increasing size, complexity, and structural flexibility of molecules. By representing the chemical structure as a natural language, this NLP-based model offers a potential tool for quantitative structure–property prediction for drug discovery and development.

Supplementary files

Article information

DOI: https://doi.org/10.1039/D0CC07384A
Article type: Communication
Submitted: 10 Nov 2020
Accepted: 11 Jan 2021
First published: 11 Jan 2021

Download Citation

Chem. Commun., 2021,57, 2633-2636

Permissions

Request permissions

Melting point prediction of organic molecules by deciphering the chemical structure into a natural language

W. Mi, H. Chen, D. (. Zhu, T. Zhang and F. Qian, Chem. Commun., 2021, 57, 2633 DOI: 10.1039/D0CC07384A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Chemical Communications

Melting point prediction of organic molecules by deciphering the chemical structure into a natural language†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Melting point prediction of organic molecules by deciphering the chemical structure into a natural language

Social activity

Search articles by author

Spotlight

Advertisements