Finding co-mutated genes and candidate cancer genes in cancer genomes by stratified false discovery rate control
Finding candidate cancer genes playing causal roles in carcinogenesis is an important task in cancer research. The non-randomness of the co-mutation of genes in cancer samples can provide statistical evidence for these genes’ involvement in carcinogenesis. It can also provide important information on the functional cooperation of gene mutations in cancer. However, due to the relatively small sample sizes used in current high-throughput somatic mutation screening studies and the extraordinary large-scale hypothesis tests, the statistical power of finding co-mutated gene pairs based on high-throughput somatic mutation data of cancer genomes is very low. Thus, we proposed a stratified FDR (False Discovery Rate) control approach, for identifying significantly co-mutated gene pairs according to the mutation frequency of genes. We then compared the identified co-mutated gene pairs separately by pre-selecting genes with higher mutation frequencies and by the stratified FDR control approach. Finally, we searched for pairs of pathways annotated with significantly more between-pathway co-mutated gene pairs to evaluate the functional roles of the identified co-mutated gene pairs. Based on two datasets of somatic mutations in cancer genomes, we demonstrated that, at a given FDR level, the power of finding co-mutated gene pairs could be increased by pre-selecting genes with higher mutation frequencies. However, many true co-mutation between genes with lower mutation rates will still be missed. By the stratified FDR control approach, many more co-mutated gene pairs could be found. Finally, the identified pathway pairs significantly overrepresented with between-pathway co-mutated gene pairs suggested that their co-dysregulations may play causal roles in carcinogenesis. The stratified FDR control strategy is efficient in identifying co-mutated gene pairs and the genes in the identified co-mutated gene pairs can be considered as candidate cancer genes because their non-random co-mutations in cancer genomes are highly unlikely to be attributable to chance.