Issue 8, 2012

Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins

Abstract

Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of ‘protein-like’ sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a ‘roulette-wheel’ selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5–10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.

Graphical abstract: Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins

Supplementary files

Article information

Article type
Paper
Submitted
22 Mar 2012
Accepted
15 May 2012
First published
13 Jun 2012

Mol. BioSyst., 2012,8, 2076-2084

Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins

S. Sandhya, R. Mudgal, C. Jayadev, K. R. Abhinandan, R. Sowdhamini and N. Srinivasan, Mol. BioSyst., 2012, 8, 2076 DOI: 10.1039/C2MB25113B

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Spotlight

Advertisements