Jump to main content
Jump to site search


Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

Author affiliations

Abstract

In the current report, three machine learning approaches were assessed for their ability to predict the crystallization propensities of a set of small organic compounds (<709 Da). The algorithms evaluated included: random forest regression (RFR), support vector machine regression (SVMR) and neural networks (NN). In addition to these algorithms, the influence of different molecular descriptors, the size of the training sets used, and various experimental factors on the predictive ability of the methods were also taken into consideration. For example, factors such as the solvent used, presence of impurities and/or degradants, influence of potential seeded crystallizations and implied supersaturation levels were explicitly investigated. For smaller training set sizes (e.g., ∼50), very little difference in the accuracy of the three algorithms was observed. However, beyond training set sizes of 150, the RFR algorithm typically outperformed the others by up to 20% RMSE. Additionally, as a result of the improved performance with larger training set sizes, the RFR models built with the explicit treatment of solvent typically outperformed models only considering the active pharmaceutical ingredient (API). For example, the best performing API only model had an RMSE of 30% whereas for the API + solvent models the RMSE was found to be 20%. Beyond inclusion of the solvent, it was found that the presence of impurities and/or degradants had the greatest influence on model accuracy. When these experiments were excluded, an additional improvement of up to 10% RMSE was observed in some cases.

Graphical abstract: Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

Back to tab navigation

Supplementary files

Publication details

The article was received on 17 Sep 2018, accepted on 19 Nov 2018 and first published on 20 Nov 2018


Article type: Paper
DOI: 10.1039/C8CE01589A
Citation: CrystEngComm, 2019, Advance Article
  •   Request permissions

    Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

    A. Ghosh, L. Louis, K. K. Arora, B. C. Hancock, J. F. Krzyzaniak, P. Meenan, S. Nakhmanson and G. P. F. Wood, CrystEngComm, 2019, Advance Article , DOI: 10.1039/C8CE01589A

Search articles by author

Spotlight

Advertisements