Classification study of solvation free energies of organic molecules using machine learning techniques

N. S. Hari Narayana Moorthy; Silvia A. Martins; Sergio F. Sousa; Maria J. Ramos; Pedro A. Fernandes

doi:10.1039/C4RA07961B

Classification study of solvation free energies of organic molecules using machine learning techniques†

N. S. Hari Narayana Moorthy,*^a Silvia A. Martins,^a Sergio F. Sousa,^a Maria J. Ramos^a and Pedro A. Fernandes*^a

* Corresponding authors

^a REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, s/n, Rua do Campo Alegre, 4169-007 Porto, Portugal
E-mail: hari.moorthy@fc.up.pt, hari.nmoorthy@gmail.com, pafernan@fc.up.pt
Fax: +351-220-402-506
Tel: +351-220-402-506

Abstract

In this work, we have developed a list of classification models to categorise organic molecules with respect to their solvation free energies using different machine learning approaches (decision tree, random forest and support vector machine). The solvation free energies of the molecules (experimental values obtained from the literature) were split into highly favourable (<−3 kcal mol⁻¹) and less favourable (>−3 kcal mol⁻¹) values; −3 kcal mol⁻¹ was set as the threshold value for the classification model development. The MACCS fingerprint along with a set of physicochemical descriptors such as atom count, topology, vdW surface area (volsurf) and subdivided surface area contributed to the classification models. The validation studies using test set and 10-fold cross-validation methods provide statistical parameters such as accuracy, sensitivity and specificity with >90% significance. The sum of ranking difference (SRD) analysis reveals that the support vector machine models are comparatively significant, while the MACCS fingerprints containing models are ranked as good models in all approaches. The MACCS fingerprints indicate that the presence of halogen atoms causes less favourable solvation free energies. However, the presence of polar atoms/groups and some functional groups such as heteroatoms, double bonded branched aliphatic chains, C [double bond, length as m-dash] N, N–C–C–O, NCO, >1 heterocyclic atoms, OCO, etc. cause highly favourable solvation free energies. The results derived from these investigations can be used along with some quantitative models to predict the solvation free energies of organic molecules and to design novel molecules with acceptable solvation free energies.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C4RA07961B
Article type: Paper
Submitted: 01 Aug 2014
Accepted: 03 Nov 2014
First published: 03 Nov 2014

Download Citation

RSC Adv., 2014,4, 61624-61630

Author version available

Download author version (PDF)

Permissions

Request permissions

Classification study of solvation free energies of organic molecules using machine learning techniques

N. S. H. Narayana Moorthy, S. A. Martins, S. F. Sousa, M. J. Ramos and P. A. Fernandes, RSC Adv., 2014, 4, 61624 DOI: 10.1039/C4RA07961B

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

RSC Advances

Classification study of solvation free energies of organic molecules using machine learning techniques†

Abstract

Supplementary files

Article information

Download Citation

Author version available

Permissions

Classification study of solvation free energies of organic molecules using machine learning techniques

Social activity

Search articles by author

Spotlight

Advertisements