Will it crystallise? Predicting crystallinity of molecular materials†
Predicting and controlling crystallinity of molecular materials has applications in a crystal engineering context, as well as process control and formulation in the pharmaceutical industry. Here, we present a machine learning approach to this problem which uses a large input training set which is classified on a single measurable outcome: does a substance have a reasonable probability of forming good quality crystals. While the related problem of crystal structure prediction requires reliable calculation of three dimensional molecular conformations, the method employed here for predicting crystallisation propensity uses only “two dimensional” information consisting of atom types and connectivity. We show that an error rate lower than 10% can be achieved against unseen test data. The predictive model was also tested in a blind screen of a set of compounds which do not have crystal structures reported in the literature, and we found it to have a 79% classification accuracy. Analysis of the most significant descriptors used in the classification shows that the number of rotatable bonds and a molecular connectivity index are key in determining crystallisation propensity and using these two measures alone can give 80% accurate classification of unseen test data.