Issue 1, 2021

iDHS-DASTS: identifying DNase I hypersensitive sites based on LASSO and stacking learning

Abstract

The DNase I hypersensitivity site is an important marker of the DNA regulatory region, and its identification in the DNA sequence is of great significance for biomedical research. However, traditional identification methods are extremely time-consuming and can not obtain an accurate result. In this paper, we proposed a predictor called iDHS-DASTS to identify the DHS based on benchmark datasets. First, we adopt a feature extraction method called PseDNC which can incorporate the original DNA properties and spatial information of the DNA sequence. Then we use a method called LASSO to reduce the dimensions of the original data. Finally, we utilize stacking learning as a classifier, which includes Adaboost, random forest, gradient boosting, extra trees and SVM. Before we train the classifier, we use SMOTE-Tomek to overcome the imbalance of the datasets. In the experiment, our iDHS-DASTS achieves remarkable performance on three benchmark datasets. We achieve state-of-the-art results with over 92.06%, 91.06% and 90.72% accuracy for datasets [Doublestruck S]1, [Doublestruck S]2 and [Doublestruck S]3, respectively. To verify the validation and transferability of our model, we establish another independent dataset [Doublestruck S]4, for which the accuracy can reach 90.31%. Furthermore, we used the proposed model to construct a user friendly web server called iDHS-DASTS, which is available at http://www.xdu-duan.cn/.

Graphical abstract: iDHS-DASTS: identifying DNase I hypersensitive sites based on LASSO and stacking learning

Supplementary files

Article information

Article type
Research Article
Submitted
31 Aug 2020
Accepted
12 Nov 2020
First published
12 Nov 2020

Mol. Omics, 2021,17, 130-141

iDHS-DASTS: identifying DNase I hypersensitive sites based on LASSO and stacking learning

S. Zhang, Z. Duan, W. Yang, C. Qian and Y. You, Mol. Omics, 2021, 17, 130 DOI: 10.1039/D0MO00115E

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements