 Open Access Article
 Open Access Article
Hua Yeab, 
Zhengshi Zhangab, 
Chaowei Zhouab, 
Chengke Zhuab, 
Yuejing Yangab, 
Mengbin Xiangab, 
Xinghua Zhouab, 
Jian Zhou*c and 
Hui Luo *ab
*ab
aCollege of Animal Science, Southwest University, Chongqing 402460, China. E-mail: luohui2629@126.com
bKey Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, 400175, China
cFisheries Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu, 611731, China. E-mail: zhoujian980@126.com
First published on 16th April 2018
Schizothorax waltoni (S. waltoni) is one kind of the subfamily Schizothoracinae and an indigenous economic tetraploid fish to Tibet in China. It is rated as a vulnerable species in the Red List of China's Vertebrates, owing to overexploitation and biological invasion. S. waltoni plays an important role in ecology and local fishery economy, but little information is known about genetic diversity, local adaptation, immune system and so on. Functional gene identification and molecular marker development are the first and essential step for the following biological function and genetics studies. For this purpose, the transcriptome from pooled tissues of three adult S. waltoni was sequenced and analyzed. Using paired-end reads from the Illumina Hiseq4000 platform, 83![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 103 transcripts with an N50 length of 2337 bp were assembled, which could be further clustered into 66
103 transcripts with an N50 length of 2337 bp were assembled, which could be further clustered into 66![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 975 unigenes with an N50 length of 2087 bp. The majority of the unigenes (58
975 unigenes with an N50 length of 2087 bp. The majority of the unigenes (58![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 934, 87.99%) were successfully annotated by 7 public databases, and 15 KEGG pathways of immune-related genes were identified for the following functional research. Furthermore, 19
934, 87.99%) were successfully annotated by 7 public databases, and 15 KEGG pathways of immune-related genes were identified for the following functional research. Furthermore, 19![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 497 putative simple sequence repeats (SSRs) of 1–6 bp unit length were detected from 14
497 putative simple sequence repeats (SSRs) of 1–6 bp unit length were detected from 14![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 690 unigenes (21.93%) with an average distribution density of 1
690 unigenes (21.93%) with an average distribution density of 1![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 3.28 kb. We identified 3590 unigenes (5.36%) containing more than one SSR, providing abundant potential polymorphic markers in functional genes. This is the first reported high-throughput transcriptome analysis of S. waltoni, and it would provide valuable genetic resources for the functional genes involved in multiple biological processes, including the immune system, genetic conservation, and molecular marker-assisted breeding of S. waltoni.
3.28 kb. We identified 3590 unigenes (5.36%) containing more than one SSR, providing abundant potential polymorphic markers in functional genes. This is the first reported high-throughput transcriptome analysis of S. waltoni, and it would provide valuable genetic resources for the functional genes involved in multiple biological processes, including the immune system, genetic conservation, and molecular marker-assisted breeding of S. waltoni.
Schizothorax waltoni, an indigenous economic tetraploid fish to Tibet in China, is one kind of the subfamily Schizothoracinae and only distributed in the middle reaches of the Yarlung Tsangpo River.12 It is rated as a vulnerable species in the Red List of China's Vertebrates.13 Owing to overexploitation and biological invasion, the population and catched individual size of S. waltoni have been declining rapidly in recent years.14 Although S. waltoni plays an important role in ecology and local fishery economy, little information is known about genetic diversity, local adaptation, immune system and so on. Existing related studies of S. waltoni were mainly focused on population recording, morphology, phylogeny, mitogenome, and several fundamental aspects of biology.5,9,12,15–17 In order to protect germplasm resources of S. waltoni, the research on artificial culture and breeding has been carried out in Tibet. However, the fishery industry is threatened by the multiple infectious pathogens, since S. waltoni is particularly vulnerable to pathogenic microorganism and environmental pollutants under artificial culture conditions. Prerequisite condition to prevention and treatment of diseases is understanding the functions of genes and pathways involved in the immune system. Meanwhile, microsatellite markers, the versatile and popular genetic marker with applications in conservation biology, population genetics, and evolutionary biology, can be used for the studies of marker-assisted selection in the improvement of economic traits.18,19 The large-scale development of microsatellite markers is necessary to carry out the research of marker-assisted selection, conservation biology, and population genetics. However, functional gene identification and marker development of S. waltoni still remain poorly explored.14,20
With the advent and development of next generation sequencing (NGS) technology, it is a cost- and time-efficiency method of transcriptome sequences to identify genes and develop genetic markers.21–23 In this study, we used the Illumina Hiseq 4000 platform to analyze the pooled tissues transcriptome of S. waltoni. The major objective of this study was to obtain a comprehensive S. waltoni transcriptome information from the multiple tissues, and provide an abundant resource for the following functional studies. Genes involved in immune system were annotated and emphasized since the general interest of immune response of fish species, including S. waltoni. A large number of microsatellite markers were also detected and analyzed for the conservation genetics studies and marker-assisted selection breeding in S. waltoni.
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 932
932![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 320 raw reads with a read length of 150 bp. After read quality evaluation, low quality trimming and length filtering, 147
320 raw reads with a read length of 150 bp. After read quality evaluation, low quality trimming and length filtering, 147![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 852
852![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 684 clean reads were left. As a result for the trinity assembly, we obtained 83
684 clean reads were left. As a result for the trinity assembly, we obtained 83![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 103 transcripts ranging from 224 to 29
103 transcripts ranging from 224 to 29![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 395 bp with average length of 1139 bp (Table 1). The transcripts were clustered into 66
395 bp with average length of 1139 bp (Table 1). The transcripts were clustered into 66![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 975 unigenes with an average length of 956 bp (ranging from 224 to 29
975 unigenes with an average length of 956 bp (ranging from 224 to 29![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 395 bp) (Table 1). The N50 lengths of transcripts and unigenes were 2337 and 2087 bp, respectively, which is similar to that of Gymnodiptychus dybowskii (N50 of 2407 bp) and Schizothorax pseudaksaiensis (N50 of 2283 bp),24 and Gymnodiptychus pachycheilus (N50 of 2322 bp),10 and longer than that of Gymnocypris przewalskii (N50 of 1495 bp),25 G. przewalskii (N50 of 1836 bp),26 shorter than that of Schizothorax prenanti (N50 of 2539 bp).27 Irrespective of difference between organisms, several factors can affect the N50 length of transcripts and unigenes, including sequencing technique, read number, assembly software, and parameters. Among these unigenes, 42
395 bp) (Table 1). The N50 lengths of transcripts and unigenes were 2337 and 2087 bp, respectively, which is similar to that of Gymnodiptychus dybowskii (N50 of 2407 bp) and Schizothorax pseudaksaiensis (N50 of 2283 bp),24 and Gymnodiptychus pachycheilus (N50 of 2322 bp),10 and longer than that of Gymnocypris przewalskii (N50 of 1495 bp),25 G. przewalskii (N50 of 1836 bp),26 shorter than that of Schizothorax prenanti (N50 of 2539 bp).27 Irrespective of difference between organisms, several factors can affect the N50 length of transcripts and unigenes, including sequencing technique, read number, assembly software, and parameters. Among these unigenes, 42![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 890 (51.6%) unigenes were no more than 500 bp in length, 28
890 (51.6%) unigenes were no more than 500 bp in length, 28![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 037 (33.7%) unigenes exceeded 1000 bp and 14
037 (33.7%) unigenes exceeded 1000 bp and 14![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 968 (18.0%) unigenes were longer than 2000 bp (Fig. 1). These proportions were also similar to that of S. prenanti.27 The detailed frequency distribution of the unigenes length is shown in Fig. 1.
968 (18.0%) unigenes were longer than 2000 bp (Fig. 1). These proportions were also similar to that of S. prenanti.27 The detailed frequency distribution of the unigenes length is shown in Fig. 1.
| Terms | Transcripts | Unigenes | 
|---|---|---|
| Total number | 83 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 103 | 66 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 975 | 
| Shortest length (bp) | 224 | 224 | 
| Longest length (bp) | 29 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 395 | 29 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 395 | 
| Total length (bp) | 94 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 640 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 129 | 64 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 031 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 094 | 
| Average length (bp) | 1139 | 956 | 
| N50 length (bp) | 2337 | 2087 | 
|  | ||
| Fig. 1 Length distribution of assembled unigenes. Assembled unigene numbers (y-axis) were plot against length interval (x-axis). | ||
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 301 (85.6%), 35
301 (85.6%), 35![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 510 (53.0%), and 28
510 (53.0%), and 28![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 877 (43.1%) unigenes showed significant similarities (E-value < 10−5) to the NCBI nucleotide (NT), protein (NR), and Swiss-prot databases, respectively (Table 2). A total of 58
877 (43.1%) unigenes showed significant similarities (E-value < 10−5) to the NCBI nucleotide (NT), protein (NR), and Swiss-prot databases, respectively (Table 2). A total of 58![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 934 (88.0%) unigenes showed homologous matches in at least one database (Table 2). The annotation ratio of assembled unigenes higher than previously reported in other fish, such as Hyporthodus septemfasciatus (41.5%),28 G. przewalskii (77.8%),25 Cyprinus carpio (81.1%)29 and Ctenopharyngodon idella (82.8%),30 slightly lower than that of S. prenanti (94.4%).27 The high percentage of annotation ratio might be due to many long sequences obtained from transcriptome data and higher average length of unigenes, as well as the abundant sequence databases.31,32 The E-value distribution of unigenes which could be correctly annotated in the NR database showed that 28.1% of the unigenes had perfect matches, 30.3% of the unigenes showed significant homology to the previously stored sequences (less than 1 × 10−45), and 41.6% of the unigenes showed homology ranging from 1 × 10−45 to 1 × 10−5 (Fig. 2A). The similarity distribution of the top Blast hits for each sequence ranged from 17–100%. Among the similarity distribution, 16.5% of the unigenes had the similarity of 60–80%, and 69.8% of the unigenes obtained the similarity between 80–100% with the deposited sequences (Fig. 2B). According to the top-hit species distribution, we found that 17
934 (88.0%) unigenes showed homologous matches in at least one database (Table 2). The annotation ratio of assembled unigenes higher than previously reported in other fish, such as Hyporthodus septemfasciatus (41.5%),28 G. przewalskii (77.8%),25 Cyprinus carpio (81.1%)29 and Ctenopharyngodon idella (82.8%),30 slightly lower than that of S. prenanti (94.4%).27 The high percentage of annotation ratio might be due to many long sequences obtained from transcriptome data and higher average length of unigenes, as well as the abundant sequence databases.31,32 The E-value distribution of unigenes which could be correctly annotated in the NR database showed that 28.1% of the unigenes had perfect matches, 30.3% of the unigenes showed significant homology to the previously stored sequences (less than 1 × 10−45), and 41.6% of the unigenes showed homology ranging from 1 × 10−45 to 1 × 10−5 (Fig. 2A). The similarity distribution of the top Blast hits for each sequence ranged from 17–100%. Among the similarity distribution, 16.5% of the unigenes had the similarity of 60–80%, and 69.8% of the unigenes obtained the similarity between 80–100% with the deposited sequences (Fig. 2B). According to the top-hit species distribution, we found that 17![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 611 (49.6%) unigenes exhibited homology hits in the NR search to the sequences of C. carpio, 10
611 (49.6%) unigenes exhibited homology hits in the NR search to the sequences of C. carpio, 10![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 484 (29.5%) to the sequences of Brachydanio rerio (Fig. 2C). This result show that evolutionary relationship is very close between S. waltoni, C. carpio, and B. rerio, consistent with the fact that three species belong to the Cyprinidae family.33
484 (29.5%) to the sequences of Brachydanio rerio (Fig. 2C). This result show that evolutionary relationship is very close between S. waltoni, C. carpio, and B. rerio, consistent with the fact that three species belong to the Cyprinidae family.33
| Database | Hit number | Percentage (%) | 
|---|---|---|
| Nr | 35 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 510 | 53.02% | 
| Nt | 57 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 301 | 85.56% | 
| Swiss-prot | 28 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 877 | 43.12% | 
| GO | 16 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 278 | 24.30% | 
| KEGG | 25 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 742 | 38.44% | 
| COG | 11 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 334 | 16.92% | 
| Pfam | 21 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 692 | 32.39% | 
| Total | 58 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 934 | 87.99% | 
|  | ||
| Fig. 2 The length distribution of coding sequences (CDS). (A) E-value distribution, (B) similarity distribution, (C) species distribution. | ||
The potential functions of all unigenes were predicted using the COG database. In all, 11![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 334 unigenes were grouped into 25 COG classifications (Fig. 3) in S. waltoni. The biggest category was the general function prediction only (4899, 43.2% of the matched unigenes), followed by the replication, recombination and repair (2170, 19.1%), the transcription (2127, 18.8%), the translation, ribosomal structure and biogenesis (1891, 16.7%), and the post-translational modification, protein turnover (1876, 16.6%) (Table S1,† Fig. 3). Furthermore, 106 unigenes were classified into defense mechanisms, implying that these unigenes might be related to immune defense in S. waltoni.
334 unigenes were grouped into 25 COG classifications (Fig. 3) in S. waltoni. The biggest category was the general function prediction only (4899, 43.2% of the matched unigenes), followed by the replication, recombination and repair (2170, 19.1%), the transcription (2127, 18.8%), the translation, ribosomal structure and biogenesis (1891, 16.7%), and the post-translational modification, protein turnover (1876, 16.6%) (Table S1,† Fig. 3). Furthermore, 106 unigenes were classified into defense mechanisms, implying that these unigenes might be related to immune defense in S. waltoni.
To further functionally classify S. waltoni transcripts, GO terms were assigned to each unigenes. Among the 35![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 510 annotated unigenes against NR database, a total of 16
510 annotated unigenes against NR database, a total of 16![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 278 unigenes were categorized into 64 level-2 GO terms in three major GO categories (Fig. 4). The most enriched components in Biological Process (BP) terms were cellular process (10
278 unigenes were categorized into 64 level-2 GO terms in three major GO categories (Fig. 4). The most enriched components in Biological Process (BP) terms were cellular process (10![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 088 unigenes, GO: 0009987), single-organism process (8768 unigenes, GO: 0044699), and metabolic process (8116 unigenes, GO: 0008152). For Cellular Component (CC) terms, a large number of unigenes were involved in cell (8777 unigenes, GO: 0005623), cell part (8718 unigenes, GO: 0044464), and organelle (5836 unigenes, GO: 0043226). In the Molecule Function (MF) category, a high percentage of unigenes were related to the terms binding (8606 unigenes, GO: 0005488), and catalytic activity (5904 unigenes, GO: 0003824), followed by the transporter activity (1071 unigenes, GO: 0005215).
088 unigenes, GO: 0009987), single-organism process (8768 unigenes, GO: 0044699), and metabolic process (8116 unigenes, GO: 0008152). For Cellular Component (CC) terms, a large number of unigenes were involved in cell (8777 unigenes, GO: 0005623), cell part (8718 unigenes, GO: 0044464), and organelle (5836 unigenes, GO: 0043226). In the Molecule Function (MF) category, a high percentage of unigenes were related to the terms binding (8606 unigenes, GO: 0005488), and catalytic activity (5904 unigenes, GO: 0003824), followed by the transporter activity (1071 unigenes, GO: 0005215).
To further identify biological pathways of assembled unigenes in S. waltoni, we mapped these unigenes to the reference canonical pathways in the KEGG database. A total of 25![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 742 unigenes were hit to KEGG Orthology (KO) terms and grouped into 259 different pathways, and the number of unigenes in different pathways ranged from 2 to 2880 (Table S2†). These pathways were grouped into six level-1 KO terms: cellular process, environmental information processing, genetic information processing, human diseases, metabolism, and organismal systems. Among these unigenes, 23
742 unigenes were hit to KEGG Orthology (KO) terms and grouped into 259 different pathways, and the number of unigenes in different pathways ranged from 2 to 2880 (Table S2†). These pathways were grouped into six level-1 KO terms: cellular process, environmental information processing, genetic information processing, human diseases, metabolism, and organismal systems. Among these unigenes, 23![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 677 were mapped to human diseases groups, mostly involving in pathways in cancer (1127, ko05200), influenza A (1092, ko05164), tuberculosis (855, ko05152), and dilated cardiomyopathy (848, ko05414). The second largest level-1 KO terms is the organismal systems (17
677 were mapped to human diseases groups, mostly involving in pathways in cancer (1127, ko05200), influenza A (1092, ko05164), tuberculosis (855, ko05152), and dilated cardiomyopathy (848, ko05414). The second largest level-1 KO terms is the organismal systems (17![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 107), involving vascular smooth muscle contraction (721, ko04270), cardiac muscle contraction (707, ko04260), and NOD-like receptor signaling pathway (585, ko04621). In the third largest level-1 KO terms, the metabolic pathways was the largest pathway, which contained 2880 unigenes, followed by purine metabolism (643, ko00230), and pyrimidine metabolism (467, ko00240).
107), involving vascular smooth muscle contraction (721, ko04270), cardiac muscle contraction (707, ko04260), and NOD-like receptor signaling pathway (585, ko04621). In the third largest level-1 KO terms, the metabolic pathways was the largest pathway, which contained 2880 unigenes, followed by purine metabolism (643, ko00230), and pyrimidine metabolism (467, ko00240).
NOD-like receptors are cytoplasmic pattern-recognition receptors, expressed intracellularly and have been proved to respond to a wide range of classes of bacterial wall component, ligands, toxin, and host-derived ligands, such as uric acids, damaged membrane.34 They are intracellular sentinels of cytosolic sanctity, which able to orchestrate innate immunity and inflammatory responses ensuring the detection of noxious signals within the cell.35 In this study, we identified several members in the NOD-like receptor family, including NOD1, NOD2, NLRP1, NLRP3, and NLRP12. NOD1 and NOD2 are critical receptors of minimal peptidoglycan motifs of Gram-postive bacteria and Gram-negative bacteria,36,37 which play a vital role in protecting the host against invasion by microbial pathogens.34 NLRP1, NLRP3, and NLRP12 are the key components of inflammasome.35 NLRP1 and NLRP3 can activate the inflammasome, which further activates caspase-1 giving rise to the processing and release of IL-1β and IL-18 and other targets.34,38 NLRP12 is shown as a negative regulator of immune response by brushing with NF-κB activation.39 Information on unigenes involved in NOD-like receptor signaling pathway was listed in Fig. S1 and Table S2.†
Neither the adaptive nor innate immune system responds unless leukocytes cross blood vessels.40 The migration of leukocytes is indispensable to drive immune responses, including immune surveillance, chronic and acute inflammatory.41 This process includes several distinct steps: firstly, the leukocytes are mediated by adhesion molecules and roll over the endothelial cells. When the leukocytes get close to endothelium, chemotactic cytokines activate the leukocytes. Finally, the activated leukocytes will firmly adhere to the endothelium and migrate in an ameboid fashion through the intercellular clefts between the endothelial cells and, in some cases, through the endothelial cell itself to the proper locations.40,41 In this study, we identified a large amount of important unigenes in leukocyte transendothelial migration pathway, such as intercellular cell adhesion molecule 1 (ICAM-1), vascular cell adhesion molecule 1 (VCAM-1), CD11b/CD18, CD29, CD99, junctional adhesion molecule 1 (JAM-1), junctional adhesion molecule 2 (JAM-2), integrin alpha L (ITGAL), integrin alpha M (ITGAM), integrin beta 1 (ITGB1), and integrin beta 2 (ITGB2). ICAM-1 and VCAM-1 are not involved in diapedesis per se, however, they seem to be involved in processes that directly precede diapedesis.40 They are both recruited to the endothelial cell border during transmigration in mammal. ICAM-1 is involved in the firm adhesion of leukocytes to the apical surface of endothelial cells, and VCAM-1 is involved in the firm adhesion of leukocytes and monocytes.40 Integrins play a prominent role in the control of trafficking of leukocytes during transendothelial migration.42 In addition, we have identified several negative regulators during the transendothelial migration, such as vascular endothelial cell-specific cadherin (VE-cadherin), the main adhesion molecule of the endothelial adherens junction.40,43 Although the functions of mammalian and human genes in the leukocyte transendothelial migration pathway have been studied adequately.40,41 The study of JAM-1, an important gene involved in the pathway, in grass carp (Ctenopharyngodon idellus) indicated that JAM-1 have similar expression patterns and similar functions to that in mammalian, however, similar studies on fish is rarely reported.44 These sequences information will benefit functional researches of important genes in S. waltoni. All unigenes involved in the leukocyte transendothelial migration were included in Fig. S2 and Table S2.†
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 497 SSRs of 1 to 6 bp unit length were identified (Table 3) in 14
497 SSRs of 1 to 6 bp unit length were identified (Table 3) in 14![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 690 unigenes (21.93%), and 3590 unigenes (5.36%) contained more than one SSR. This number of SSRs corresponds to a frequency of about one SSR per 3.28 kb of expressed sequences (one SSR per 5.16 kb after eliminating the mono-nucleotide repeats). This distribution density was higher than previously reported for other fish, including S. prenanti (one SSR per 9.60 kb),27 Larimichthys polyactis (one SSR per 7.50 kb),47 Paramisgurnus dabryanus (one SSR per 6.99 kb),32 comparable to Scophthalmus maximus (one SSR per 3.36 kb),48 nevertheless lower than Haliotis midae (one SSR per 0.76 kb).49 Without regard to species per se, several decisive factors can affect the distribution density of SSRs, including database-mining software, the parameters for identification of SSRs, SSRs detection standard, and dataset size.50,51
690 unigenes (21.93%), and 3590 unigenes (5.36%) contained more than one SSR. This number of SSRs corresponds to a frequency of about one SSR per 3.28 kb of expressed sequences (one SSR per 5.16 kb after eliminating the mono-nucleotide repeats). This distribution density was higher than previously reported for other fish, including S. prenanti (one SSR per 9.60 kb),27 Larimichthys polyactis (one SSR per 7.50 kb),47 Paramisgurnus dabryanus (one SSR per 6.99 kb),32 comparable to Scophthalmus maximus (one SSR per 3.36 kb),48 nevertheless lower than Haliotis midae (one SSR per 0.76 kb).49 Without regard to species per se, several decisive factors can affect the distribution density of SSRs, including database-mining software, the parameters for identification of SSRs, SSRs detection standard, and dataset size.50,51
| Repeat numbers | Motif length | Total | Percent (%) | |||||
|---|---|---|---|---|---|---|---|---|
| Mono | Di | Tri | Tetra | Penta | Hexa | |||
| 4 | 0 | 0 | 0 | 0 | 150 | 82 | 232 | 1.19 | 
| 5 | 0 | 0 | 1480 | 214 | 27 | 4 | 1725 | 8.85 | 
| 6 | 0 | 2331 | 635 | 127 | 6 | 6 | 3105 | 15.93 | 
| 7 | 0 | 1274 | 385 | 14 | 4 | 4 | 1681 | 8.62 | 
| 8 | 0 | 785 | 221 | 13 | 4 | 2 | 1025 | 5.26 | 
| 9 | 0 | 604 | 37 | 7 | 0 | 2 | 650 | 3.33 | 
| 10 | 0 | 503 | 44 | 7 | 2 | 0 | 556 | 2.85 | 
| 11 | 0 | 801 | 30 | 3 | 0 | 1 | 835 | 4.28 | 
| 12 | 1213 | 421 | 14 | 4 | 3 | 2 | 1657 | 8.50 | 
| 13 | 846 | 192 | 12 | 3 | 3 | 0 | 1056 | 5.42 | 
| 14 | 668 | 191 | 12 | 1 | 1 | 0 | 873 | 4.48 | 
| 15 | 477 | 200 | 7 | 5 | 2 | 0 | 691 | 3.54 | 
| 16 | 374 | 159 | 6 | 8 | 2 | 0 | 549 | 2.82 | 
| 17 | 287 | 161 | 1 | 1 | 1 | 1 | 452 | 2.32 | 
| 18 | 238 | 114 | 0 | 3 | 0 | 0 | 355 | 1.82 | 
| 19 | 162 | 97 | 1 | 6 | 1 | 0 | 267 | 1.37 | 
| ≥20 | 2813 | 943 | 10 | 18 | 2 | 2 | 3788 | 19.43 | 
| Total | 7078 | 8776 | 2895 | 434 | 208 | 106 | 19 ![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 497 | 100 | 
| Percent (%) | 36.30 | 45.01 | 14.85 | 2.23 | 1.07 | 0.54 | 100 | |
Among those SSRs, we identified 7078 (36.30%) mono-nucleotide repeats, 8776 (45.01%) di-nucleotide repeats, 2895 (14.85%) tri-nucleotide repeats, and 748 (3.84%) tetra-/penta-/hexa- nucleotide repeats (Table 3). The SSRs number of S. waltoni was obviously higher than other two Schizothorax fishes, S. prenanti (7998 SSRs) and Schizothorax biddulphi (1379 SSRs).27,52 In the study of S. prenanti, the samples came from cultured population, less quantity of SSRs possibly because that artificial culture reduced the genetic diversity. On the other hand, this might be partly due to the effective assembly of the S. waltoni transcriptome. The difference of the SSRs number between S. waltoni and S. biddulphi mainly due to use two kinds of different next generation sequencing techniques.
The copy number of repeat motifs ranged from 4 to 127. 15.93% of SSRs had the copy number of six, followed by those with five copy number (8.85%), seven copy number (8.62%), and twelve copy number (8.50%). The copy number of different repeats in the SSR sequences was distributed unequally, it is consistent with the previous studies of teleosts, such as S. prenanti,27 and P. dabryanus.32 Without regard to the mono-nucleotide repeats, 132 types of repeats motifs were found among the S. waltoni transcriptome, and di-, tri-, tetra-, penta-, and hexa-nucleotide repeats had 4, 10, 23, 46, and 49 types, respectively. The most frequent type was (AC/GT)n (5101, 41.07%), followed by (AT/AT)n (1880, 15.14%), (AG/CT)n (1766, 14.22%), (AAT/ATT)n (852, 6.86%), (ATC/ATG)n (537, 4.32%), and (AGG/CCT)n (498, 4.01%). In addition, the most abundant type in tetra- and penta-nucleotide SSRs was (AGAT/ATCT)n (105, 0.85%), and (AAAAC/GTTTT)n (31, 0.25%) (Fig. 6) respectively. Di-nucleotide repeats of S. waltoni accounted for 45.01%. (AC/GT)n motif was the most abundant repeat in di-nucleotide SSRs, consistent with previously reported in vertebrate animal species.50
|  | ||
| Fig. 6 Frequency distribution of the top twenty most abundant SSRs based on repeated nucleotide types. Each histogram represented one detected SSR type in transcriptome of Schizothorax waltoni. | ||
| Footnote | 
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/c8ra00619a | 
| This journal is © The Royal Society of Chemistry 2018 |