Jump to main content
Jump to site search

Issue 31, 2017, Issue in Progress
Previous Article Next Article

Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues

Author affiliations

Abstract

With the increase of complexity and risk in drug discovery processes, human intestinal absorption (HIA) prediction has become more and more important. Up to now, some predictive models have been constructed to estimate HIA of new drug-like compounds with acceptable accuracies, but there are still some issues to be explored including the limited and unbalanced HIA data, the performance of different types of descriptors and the application domain issues of published models. To address these problems, in this study, we collected a relatively large dataset consisting of 970 compounds, and 9 different types of descriptors were calculated for further modeling. For all the modeling processes, a parameter named samplesize in the random forest (RF) method was applied to balance the dataset. And then, classification models were established based on different training sets and different combinations of descriptors. After a series of modeling processes and various comparisons among these statistical results, we explored the aforementioned problems and evaluated the reliabilities of existing HIA classification models and subsequently obtained a robust and applicable model based on a combination of 2D, 3D, N+ and Nrule-of-five (for the training set, SE = 0.892, SP = 0.846; for the test set, SE = 0.877, SP = 0.813). Compared with other published models, our model exhibits some advantages in data size, model accuracy and model practicability to some extent. This structure–activity relationship model is necessary and useful for HIA prediction and it could be a convenient tool for virtual screening in the early stage of drug development.

Graphical abstract: Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues

Back to tab navigation

Supplementary files

Article information


Submitted
20 Dec 2016
Accepted
14 Mar 2017
First published
29 Mar 2017

This article is Open Access

RSC Adv., 2017,7, 19007-19018
Article type
Paper

Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues

N. Wang, C. Huang, J. Dong, Z. Yao, M. Zhu, Z. Deng, B. Lv, A. Lu, A. F. Chen and D. Cao, RSC Adv., 2017, 7, 19007
DOI: 10.1039/C6RA28442F

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. Material from this article can be used in other publications provided that the correct acknowledgement is given with the reproduced material and it is not used for commercial purposes.

Reproduced material should be attributed as follows:

  • For reproduction of material from NJC:
    [Original citation] - Published by The Royal Society of Chemistry (RSC) on behalf of the Centre National de la Recherche Scientifique (CNRS) and the RSC.
  • For reproduction of material from PCCP:
    [Original citation] - Published by the PCCP Owner Societies.
  • For reproduction of material from PPS:
    [Original citation] - Published by The Royal Society of Chemistry (RSC) on behalf of the European Society for Photobiology, the European Photochemistry Association, and RSC.
  • For reproduction of material from all other RSC journals:
    [Original citation] - Published by The Royal Society of Chemistry.

Information about reproducing material from RSC articles with different licences is available on our Permission Requests page.


Social activity

Search articles by author

Spotlight

Advertisements