Predicting the aggregation number of cationic surfactants based on ANN-QSAR modeling approaches: understanding the impact of molecular descriptors on aggregation numbers†
In this work, a quantitative structure–activity relationship (QSAR) study is performed on some cationic surfactants to evaluate the relationship between the molecular structures of the compounds with their aggregation numbers (AGGNs) in aqueous solution at 25 °C. An artificial neural network (ANN) model is combined with the QSAR study to predict the aggregation number of the surfactants. In the ANN analysis, four out of more than 3000 molecular descriptors were used as input variables, and the complete set of 41 cationic surfactants was randomly divided into a training set of 29, a test set of 6, and a validation set of 6 molecules. After that, a multiple linear regression (MLR) analysis was utilized to build a linear model using the same descriptors and the results were compared statistically with those of the ANN analysis. The square of the correlation coefficient (R2) and root mean square error (RMSE) of the ANN and MLR models (for the whole data set) were 0.9392, 7.84, and 0.5010, 22.52, respectively. The results of the comparison revealed the efficiency of ANN in detecting a correlation between the molecular structure of surfactants and their AGGN values with a high predictive power due to the non-linearity in the studied data. Based on the ANN algorithm, the relative importance of the selected descriptors was computed and arranged in the following descending order: H-047 > ESpm12x > JGI6> Mor20p. Then, the QSAR data was interpreted and the impact of each descriptor on the AGGNs of the molecules were thoroughly discussed. The results showed there is a correlation between each selected descriptor and the AGGN values of the surfactants.