Genome-wide survey of putative RNA-binding proteins encoded in the human proteome†
Abstract
RNA-binding proteins (RBPs) are involved in various post-transcriptional gene regulatory processes and are also functionally important members of the ribosome and the spliceosome. However, RBPs and their interactions with RNA are less well-studied in comparison to DNA-binding proteins. We have classified the existing RBP structures, available in complexes with RNA and RNA/DNA hybrids, into different structural families and created Hidden Markov Models (HMMs). These structure-centric family HMMs, along with the sequence-centric family HMMs, were used as a primary database to systematically search the human proteome for the presence of putative RBPs. We have found more than 2600 gene products with RBP signatures in humans, of which around 28% are likely to bind to RNA but not DNA, whereas 9% might bind to both RNA and DNA. 11% of them do not contain an explicit functional annotation yet. Nearly 30% of the putative RBPs are exclusively nuclear, 15% have known disease associations and around 30% are enzymes. Around 40% of the proteins identified in this study are novel and have not been reported by recent large-scale studies on human RBPs.