Machine learning methods to predict the crystallization propensity of small organic molecules

Florbela Pereira

doi:10.1039/D0CE00070A

Machine learning methods to predict the crystallization propensity of small organic molecules

Florbela Pereira

*^a

Author affiliations

* Corresponding authors

^a LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal
E-mail: florbela.pereira@fct.unl.pt

Abstract

Machine learning (ML) algorithms were explored for the prediction of the crystallization propensity based on molecular descriptors and fingerprints generated from 2D chemical structures and 3D molecular descriptors from 3D chemical structures optimized with empirical methods. In total, 57 815 molecules were retrieved from the Reaxys® database, from those 53 998 molecules are recorded as crystalline (class A), 3097 as polymorphic (class B), and 720 as amorphous (class C). A training data set with 40 462 organic molecules was used to build the models, which were validated with an external test set comprising 17 353 organic molecules. Several ML algorithms such as random forest (RF), support vector machines (SVM), and deep learning multilayer perceptron networks (MLP) were screened. The best performance was achieved with a consensus classification model obtained by RF, SVM, and MLP models, which predicted the external test set with an overall predictive accuracy (Q) of up to 80%.

This article is part of the themed collection: Editor’s Collection: Computer aided solid form design

Article information

https://doi.org/10.1039/D0CE00070A

Article type

Paper

Submitted

17 Jan 2020

Accepted

26 Mar 2020

First published

26 Mar 2020

Download Citation

CrystEngComm, 2020,22, 2817-2826

Permissions

Request permissions

Machine learning methods to predict the crystallization propensity of small organic molecules

F. Pereira, CrystEngComm, 2020, 22, 2817 DOI: 10.1039/D0CE00070A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

CrystEngComm

Machine learning methods to predict the crystallization propensity of small organic molecules

Abstract

Article information

Download Citation

Permissions

Machine learning methods to predict the crystallization propensity of small organic molecules

Social activity

Search articles by author

Spotlight

Advertisements