Jump to main content
Jump to site search

Issue 7, 2015
Previous Article Next Article

MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs

Author affiliations

Abstract

The recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency. To address these challenges, we present a novel motif-based hierarchical method (namely MtHc) for clustering massive 16S rRNA sequences into OTUs with high clustering accuracy and low memory usage. Suppose all the 16S rRNA sequences can be used to construct a complete weighted network, where sequences are viewed as nodes, each pair of sequences is connected by an imaginary edge, and the distance of a pair of sequences represents the weight of the edge. MtHc consists of three main phrases. First, heuristically search the motif that is defined as n-node sub-graph (in the present study, n = 3, 4, 5), in which the distance between any two nodes is less than a threshold. Second, use the motif as a seed to form candidate clusters by computing the distances of other sequences with the motif. Finally, hierarchically merge the candidate clusters to generate the OTUs by only calculating the distances of motifs between two clusters. Compared with the existing methods on several simulated and real-life metagenomic datasets, we demonstrate that MtHc has higher clustering performance, less memory usage and robustness for setting parameters, and that it is more effective to handle the large-scale metagenomic datasets. The MtHC software can be freely download from http://compgenomics.utsa.edu/mthc/ for academic users.

Graphical abstract: MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs

Back to tab navigation

Publication details

The article was received on 30 Jan 2015, accepted on 14 Apr 2015 and first published on 14 Apr 2015


Article type: Paper
DOI: 10.1039/C5MB00089K
Citation: Mol. BioSyst., 2015,11, 1907-1913
  •   Request permissions

    MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs

    Z. Wei and S. Zhang, Mol. BioSyst., 2015, 11, 1907
    DOI: 10.1039/C5MB00089K

Search articles by author

Spotlight

Advertisements