Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

Ayana Ghosh; Lydie Louis; Kapildev K. Arora; Bruno C. Hancock; Joseph F. Krzyzaniak; Paul Meenan; Serge Nakhmanson; Geoffrey P. F. Wood

doi:10.1039/C8CE01589A

Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients†

Ayana Ghosh,

^a Lydie Louis,^a Kapildev K. Arora,

^b Bruno C. Hancock,

^b Joseph F. Krzyzaniak,^b Paul Meenan,^b Serge Nakhmanson

^a and Geoffrey P. F. Wood

*^b

Author affiliations

* Corresponding authors

^a Department of Materials Science and Engineering and Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA

^b Pfizer Worldwide Research & Development, Pharmaceutical Sciences, Pfizer Inc., Groton, CT 06340, USA
E-mail: Geoffrey.Wood@pfizer.com

Abstract

In the current report, three machine learning approaches were assessed for their ability to predict the crystallization propensities of a set of small organic compounds (<709 Da). The algorithms evaluated included: random forest regression (RFR), support vector machine regression (SVMR) and neural networks (NN). In addition to these algorithms, the influence of different molecular descriptors, the size of the training sets used, and various experimental factors on the predictive ability of the methods were also taken into consideration. For example, factors such as the solvent used, presence of impurities and/or degradants, influence of potential seeded crystallizations and implied supersaturation levels were explicitly investigated. For smaller training set sizes (e.g., ∼50), very little difference in the accuracy of the three algorithms was observed. However, beyond training set sizes of 150, the RFR algorithm typically outperformed the others by up to 20% RMSE. Additionally, as a result of the improved performance with larger training set sizes, the RFR models built with the explicit treatment of solvent typically outperformed models only considering the active pharmaceutical ingredient (API). For example, the best performing API only model had an RMSE of 30% whereas for the API + solvent models the RMSE was found to be 20%. Beyond inclusion of the solvent, it was found that the presence of impurities and/or degradants had the greatest influence on model accuracy. When these experiments were excluded, an additional improvement of up to 10% RMSE was observed in some cases.

This article is part of the themed collection: Editor’s Collection: Computer aided solid form design

Article information

https://doi.org/10.1039/C8CE01589A

Article type

Paper

Submitted

17 Sep 2018

Accepted

19 Nov 2018

First published

20 Nov 2018

Download Citation

CrystEngComm, 2019,21, 1215-1223

Permissions

Request permissions

Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

A. Ghosh, L. Louis, K. K. Arora, B. C. Hancock, J. F. Krzyzaniak, P. Meenan, S. Nakhmanson and G. P. F. Wood, CrystEngComm, 2019, 21, 1215 DOI: 10.1039/C8CE01589A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

CrystEngComm

Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

Social activity

Search articles by author

Spotlight

Advertisements