The distribution pattern of genetic variation in the transcript isoforms of the alternatively spliced protein-coding genes in the human genome†
Abstract
By enabling the transcription of multiple isoforms from the same gene locus, alternative-splicing mechanisms greatly expand the diversity of the human transcriptome and proteome. Currently, the alternatively spliced transcripts from each protein-coding gene locus in the human genome can be classified as either principal or non-principal isoforms, providing that they differ with respect to cross-species conservation or biological features. By mapping the variants from the 1000 Genomes Project onto the coding region of each isoform, an interesting pattern of the genetic variation distributions of the coding regions for these two types of transcript isoforms was revealed on a whole-genome scale: compared with the principal isoform-specific coding regions, the non-principal isoform-specific coding regions are significantly enriched in amino acid-changing variants, particularly those that have a strong impact on protein function and have higher derived allele frequencies, suggesting that non-principal isoform-specific substitutions are less likely to be related to phenotype changes or disease. The results herein can help us better understand the potential consequences of alternatively spliced products from a population perspective.