Open Access Article
Austin
Marshall†
a,
Daniel T.
Fuller†
b,
Paul
Dougall
b,
Kavindra
Kumaragama
a,
Suresh
Dhaniyala
c and
Shantanu
Sur
*a
aDepartment of Biology, Clarkson University, 8 Clarkson Ave, Potsdam, NY 13699, USA. E-mail: ssur@clarkson.edu
bDepartment of Mathematics, Clarkson University, USA
cDepartment of Mechanical and Aeronautical Engineering, Clarkson University, USA
First published on 23rd May 2024
Bioaerosol samples are characterized by very low biomass, so culture-based detection remains a reliable and acceptable technique to identify and quantify microbes present in these samples. The process typically involves the generation of bacterial colonies by inoculating the sample on an agar plate, followed by the identification of colonies through DNA sequencing of a PCR-amplified targeted gene. The Sanger method is often the default choice for sequencing, but its application might be limited in identifying multi-species microbial colonies that could potentially form from bacterial aggregates present in bioaerosols. In this work, we compared Sanger and MinION nanopore sequencing techniques in identifying bioaerosol-derived bacterial colonies using 16S rRNA gene analysis. We found that for five out of the seven colonies examined, both techniques indicated the presence of the same bacterial genus. For one of the remaining colonies, a noisy Sanger electropherogram failed to generate a meaningful sequence, but nanopore sequencing identified it to be a mix of two bacterial genera. For the other remaining colony, the Sanger sequencing suggested a single genus with a high sequence alignment and clean electropherogram; however, the nanopore sequencing suggested the presence of a second less abundant genus. These findings were further corroborated using mock colonies, where nanopore sequencing was found to be a superior method in accurately classifying individual bacterial components in mock multispecies colonies. Our results show the advantage of using nanopore sequencing over the Sanger method for culture-based analysis of bioaerosol samples, where direct inoculation to a culture plate could lead to the formation of multispecies colonies.
Environmental significanceCulture-based detection of microbes present in bioaerosols typically involves growing microorganisms into colonies and subsequent identification by targeted gene sequencing. Sanger sequencing of the 16S rRNA gene is most commonly used for the identification of bacterial colonies formed after inoculation of bioaerosol samples on agar media. In this study, the performance of Sanger sequencing was compared with MinION nanopore sequencing for the identification of bioaerosol-derived bacterial colonies. Nanopore sequencing outperformed Sanger sequencing by detecting bacteria at higher taxonomic resolution and identifying individual bacterial components in multispecies colonies. Our findings demonstrate the potential of nanopore sequencing for microbial identification in culture-based analysis of bioaerosol and other complex environmental samples. |
A common workflow for culture-based studies is the generation of colonies from microorganisms present in a sample, followed by analysis of the colonies. This is based on the assumption that the colonies are homogeneous and are generated by aggregated growth from a single microorganism. However, for bioaerosol samples, the approach can encounter certain challenges in achieving an accurate identification of the colonies. Bioaerosols are distributed over a range of sizes with up to 30% of the total number being ≥4.7 μm in diameter, suggesting potential aggregation of microbes in these larger particles.13 Indeed, microbes in the air are reported to exist as aggregates of variable size, often tightly bound to particulate matters, and thus, can potentially form multispecies colonies when bioaerosol samples are inoculated on agar media.14–16 Culture-based detection is commonly conducted by capturing bioaerosol particles on an agar plate (e.g., by depositional sampling and impaction), followed by growing viable microbes.5 However, the particle-bound or aggregated microorganisms would remain in close physical proximity on the agar plate and could potentially lead to growth without a distinct colony boundary. Indeed, the existence of multispecies colonies is reported, where distant bacterial species associate in a single colony structure with specific interactions observed between them.17,18 Although information is lacking regarding the formation of multispecies colonies from inoculation of bioaerosols, the fact that airborne microbes often exist as aggregates raises such possibility and emphasizes the need for accurate identification of microbial colonies.
The classical approach to identify microbial colonies involves a battery of biochemical tests, but that is now mostly replaced by sequencing-based techniques, which offer a rapid, accurate, and sensitive method for bacterial identification. Targeted amplicon analyses are commonly used for taxonomic classification and for studying phylogenetic relationships due to the conserved nature of essential genes.19,20 The 16S ribosomal RNA gene (henceforth abbreviated as 16S) contains nine variable regions and is present universally in bacteria and archaea, providing a robust tool for the classification of bacteria and archaea.19–23 16S amplicon analysis is widely used for the identification of bacteria and archaea; however, the implementation of different sequencing technologies can influence the resolution of the results and the scope of application.
Sanger sequencing, first introduced more than four decades ago, is still widely used and remains as one of the primary sequencing tools for the identification of microbial colonies through targeted gene amplification. Sanger sequencing is highly accurate in sequencing reads up to ∼1000 bases and is often held as a reference or standard to compare the accuracy of other sequencing techniques.24–27 One major limitation of Sanger sequencing is that only one homogeneous DNA sequence can be read by this technique, and the presence of additional sequences in the sample will impact the output.12,13 Thus, Sanger sequencing is not suitable for sequencing 16S amplicons from a sample with a mixed microbial composition. For such applications, next-generation sequencing (NGS) such as Illumina, which employs massively parallel short-read sequencing, is commonly used to classify all bacterial taxa. However, the sequencing platform allows only short-reads with a sequence length of <500 bp, restricting the coverage of the 16S gene to a maximum of two variable regions and limiting the taxonomic classification up to the genus level.28–30
Third-generation sequencing, commonly referred to the sequencing platforms offered by Oxford Nanopore Sequencing (ONT) and Pacific Biosciences (PacBio), overcomes some of the major limitations of NGS by enabling long-read sequencing.31,32 Among these techniques, MinION nanopore sequencing from ONT utilizes a protein nanopore complex to guide a DNA strand to translocate through the pore and determines the sequence from the changes in ionic conductivities as different nucleotide bases pass through the pore.32 Nanopore sequencing has significantly advanced in the last decade with improvements in sequencing accuracy and capacity. Combined with packaging in an extremely portable, inexpensive sequencing device, and relatively simple library preparation procedures, applications of nanopore sequencing have grown tremendously in recent years.33–37 The long-read capability of nanopore sequencing allows for full-length 16S gene amplicon sequencing with the ability to discriminate up to the species level in a sample of mixed bacterial composition.30 Furthermore, multiplexing the samples by barcoding enables running multiple samples on a single run, enhancing the throughput and reducing the cost. Together, these features of nanopore sequencing make it a potentially attractive procedure for the identification of bacterial colonies through 16S amplicon analysis.
In this study, we compared Sanger and nanopore sequencing for the identification of bacterial colonies derived from bioaerosol samples and explored any advantages afforded by nanopore sequencing. Targeted amplification of full-length 16S genes was conducted for individual colonies, and the amplicons were sequenced using both techniques. We investigated the accuracy of these two techniques in colony identification, especially when there is potential for the existence of multispecies colonies. The findings were further corroborated with mock multispecies colony samples.
:
1, 1
:
1, and 1
:
9) to prepare mock multispecies colony samples with different proportional abundances of these two DNA. For example, a sample with 90% A. baumannii and 10% S. maltophilia was prepared by mixing 4.5 μL of A. baumannii DNA with 0.5 μL of S. maltophilia DNA. The mixed DNA or pure bacterial DNA were further used for Sanger and nanopore sequencing as mock colony samples.
The nanopore 16S sequencing was performed using either a Flongle™ or a MinION™ flow cell (R.9.4.1) attached to a MinION MK1B device. The MinION™ flow cell is capable of generating greater sequencing output due to a larger number of pores, while Flongle™ is more suitable for sequencing a smaller number of samples in a single run. Flongle sequencing was performed as follows: the flow cell was first primed with a mix of 117 μL of flush buffer and 3 μL of flush tether to wash out the storage buffer solution. Once flushed, the flow cell was loaded with a solution containing 5 μL of DNA amplicon library (premixed with 0.5 μL rapid adapter protein), 15 μL of sequencing buffer, and 10 μL of library loading beads, after which the sequencing run was started. To conduct sequencing on a MinION flow cell, it was first primed by loading 800 μL of flush buffer/flush tether mix through the priming port and incubating for 5 min. Immediately before loading the DNA library another 200 μL of flush buffer/flush tether mix was added to the priming port with the Spot-On port open. Sequencing was started after adding through the Spot-On port a solution mix containing 11 μL of sample DNA library (previously mixed with 1 μL of the rapid adapter protein), 34 μL of sequencing buffer, 4.5 μL of nuclease-free water, and 25.5 μL of the loading beads (added immediately before use).
Nanopore read sequences were identified using the EPI2ME 16S analysis pipeline (EPI2ME Fastq 16S v21.03.05), which performs the Nucleotide Basic Local Alignment Search Tool (BLASTn) on each individual read against the NCBI 16S reference database. For taxonomical classification, minimum identity thresholds for species and the genus level were set at 99% and 95%, respectively.28 Additionally, only reads that returned the lowest common ancestor (lca) value of 0 during EPI2ME alignment were considered successfully classified, and sequences with a lca value of −1 or 1 were considered unclassified. For mock colony samples, the classified sequences were further separated into correctly classified and misclassified categories based on whether the classification correctly matched with known references.40 The classification of Sanger sequences was also performed by alignment against the NCBI 16S reference database using the BLASTn tool. The minimum identity threshold criteria used for taxonomic classification were the same as those used for nanopore sequences. Additionally, the top three alignment matches were examined for agreement to assign to a specific taxon (e.g., the top three alignment matches should be from the same genus for genus-level assignment). If an agreement was not reached, then the assignment was provided to the lowest common ancestor.
149 and 163
552 read sequences, respectively. Table 1 summarizes the outcome for genus-level classification of both basecalling methods, including the number of reads, the percentage of total reads that were successfully classified, and the mean percent identity (Ī) of all sequences for each colony. We observed that switching from Fast to Bonito basecalling while maintaining a constant Quality Score (Q-score) of 13 leads to an increase in the number of total reads and Ī, but the percentage of correct classification remains comparable. However, implementing the Bonito basecalling at a Q-score threshold of 13 and increasing the identity threshold (I) to 95%, considered to be optimal for taxonomic identification at the genus level,28 we noticed a substantial improvement in the percentage of correct classification (≥99.4% reads were correctly classified). It is to be noted that this improvement in classification accuracy was associated with a considerable drop in the total read number.25 The Bonito-basecalled sequences (Q ≥ 13 and I ≥ 95%) were taxonomically identified using the EPI2ME 16S workflow, which utilizes the NCBI 16S database as a reference database (Fig. 3). The calculation of relative abundances from these classified sequences showed a single bacterial genus for colonies 3–7 (relative abundance ≥ 99.5%) and their identities matched well with the findings from Sanger sequencing (Fig. 3). In the case of colony 1, for which Sanger sequencing failed to assign a taxonomic classification with an inferior quality electropherogram, nanopore sequencing indicated the presence of two dominant taxa, namely Alkalihalobacillus (87.1%) and Kocuria (10.9%). For colony 2 also, nanopore sequencing showed the presence of two bacteria, namely Micrococcus (68.4%) and Paraburkholderia (27.7%). Interestingly, Sanger sequencing of the colony classified it as Micrococcus with a high 97.4% identity and had an electropherogram with clean, distinct peaks. Thus, not only the less abundant bacteria in colony 2 were not identified by Sanger sequencing but also the potential presence of a second bacterial species was not suggested by the electropherogram, or the sequence obtained.
| Colony | Fast basecalling Q ≥ 7 | Bonito basecalling Q ≥ 7 | Bonito basecalling Q ≥ 13, I ≥ 95% | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Total | Classified (%) | Ī (%) | Total | Classified (%) | Ī (%) | Total | Classified (%) | Ī (%) | |
| a The total number of reads, the percentage of reads successfully classified, and the mean percent identities are presented in the table. Q thresholds of 7 and 13 correspond to an 85% and a 95% chance that each base is accurate, respectively. The classification was performed using the NCBI 16S database for reference. Abbreviations: quality score (Q), percent identity (I), and mean percent identity (Ī). | |||||||||
| 1 | 19 270 |
89.5 | 86.2 | 24 753 |
89.6 | 91.4 | 2271 | 99.4 | 95.9 |
| 2 | 14 037 |
90.7 | 86.9 | 18 517 |
91.1 | 92.0 | 1894 | 99.7 | 96.2 |
| 3 | 12 063 |
96.0 | 87.5 | 15 469 |
93.9 | 92.5 | 1961 | 99.6 | 96.6 |
| 4 | 24 741 |
91.9 | 86.6 | 28 734 |
92.7 | 92.1 | 3345 | 99.9 | 96.3 |
| 5 | 15 598 |
96.0 | 87.4 | 19 182 |
93.9 | 92.5 | 2543 | 99.7 | 96.6 |
| 6 | 18 471 |
96.0 | 87.3 | 21 534 |
93.7 | 92.4 | 2910 | 99.8 | 96.6 |
| 7 | 28 969 |
94.3 | 87.0 | 35 363 |
92.6 | 92.5 | 4022 | 99.8 | 96.6 |
![]() | ||
| Fig. 3 Relative abundances of bacterial genera in colonies 1–7, obtained by nanopore sequencing of 16S rDNA amplicons (Bonito basecalling; Q-score ≥ 13 and percent identity ≥ 95%). | ||
:
1, 1
:
1, and 1
:
9. These proportions were selected to compare the performance of the two sequencing techniques under scenarios where two bacterial species within a colony exist at a similar level of abundance and where the abundance of one species dominates over the other. Sanger sequencing was conducted in these samples and the results were compared against the pure DNA controls of A. baumannii and S. maltophilia (Fig. 4). The sequences from pure DNA controls of A. baumannii (sample A) and S. maltophilia (sample E) were classified correctly when aligned against the NCBI 16S reference database, although identity matches were 95.5% and 97.4%, respectively, allowing for genus level identification. The Sanger sequence generated from the sample with 90% A. baumannii and 10% S. maltophilia (sample B, Fig. 4) was classified as A. baumannii with 96.8% identity match, while the sequence generated from the sample with 10% A. baumannii and 90% S. maltophilia (sample D) had the closest alignment with S. maltophilia with 81.3% identity. For the sample with 50% of both A. baumannii and S. maltophilia (control sample C), the Sanger sequence was found to have the closest alignment with A. baumannii with a low 79.3% identity. Unlike samples A, B, and E, the lower identity match for samples C and D would allow the lowest taxonomic classification to the phylum level only. The electropherograms also showed distinct changes to different levels of DNA mixing (Fig. 4). The electropherogram of pure A. baumannii DNA (sample A) had clear and separated peaks, while pure S. maltophilia DNA (sample E) showed some peak overlap even though the alignment of the Sanger sequence showed a high percentage identity. In mixed samples, the presence of 10% S. maltophilia DNA made only a minimal change to the electropherogram signal of A. baumannii (sample B); however, the presence of 10% A. baumannii caused substantial degradation of the electropherogram signal of S. maltophilia (sample D). Counterintuitively though, equal mixing of each of these two DNA (sample C) resulted in a relatively clean electropherogram, although the identity match of the Sanger sequence obtained from the electropherogram was lower than that of the other two mixed samples.
In parallel to Sanger sequencing, we performed nanopore sequencing of the samples from mock colonies. The Bonito-basecalled reads correctly classified >99% of reads at the genus level (Q-score ≥ 13 and I ≥ 95%) and the performance was comparable for all samples irrespective of different proportions of DNA mixing. We found that the nanopore sequencing was able to successfully classify pure DNA samples (samples A and E) as well as mixed DNA samples (samples B, C, and D) with the relative abundances reflecting the proportion of mixing (Fig. 5). It is to be noted that for samples C–E, where the proportions of S. maltophilia were relatively higher, a small fraction of total reads was identified as Xanthomonas. Although S. maltophilia is phenotypically distinct from Xanthomonas species, at the rRNA gene level a high degree of sequence similarity exists between them;42 indeed, due to such sequence similarity, S. maltophilia was previously classified as Xanthomonas maltophilia.43 Their closeness in the rRNA gene sequence combined with the read accuracy limits for nanopore sequencing potentially resulted in the observed appearance of a small population of Xanthomonas in samples containing S. maltophilia genomic DNA. Since the species-level information of pure DNA samples was known for the control experiment, we also attempted species-level identification for the sequences raising the threshold of I to ≥99%.44 The new threshold criteria reduced the total number of passed reads to 5148 (Table 2). We found that for the pure A. baumannii and S. maltophilia samples, 97.4% and 92.1% of the correctly classified sequence reads from nanopore sequencing accurately matched to the species level. A similar degree of accuracy was maintained in the species-level identification of mixed samples.
![]() | ||
| Fig. 5 Relative abundances of bacterial genera in mock colony samples A–E, obtained by nanopore sequencing of 16S amplicons (Bonito basecalling; Q-score ≥ 13 and percent identity ≥ 95%). | ||
| Mock colony | Bonito genus-level Q ≥ 13 and I ≥ 95% | Bonito species-level Q ≥ 13 and I ≥ 99% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Total | CC (%) | MC (%) | UC (%) | Ī (%) | Total | CC (%) | MC (%) | UC (%) | Ī (%) | |
| a The table shows the total number of reads along with the percentage of reads Correctly Classified (CC), Misclassified (MC), and Unclassified (UC) for both genus and species-level classification. The classification was performed using the NCBI 16S database for reference. Abbreviations: Q-score (Q), percent identity (I), and mean percent identity (Ī). | ||||||||||
| A | 41 158 |
99.9 | <0.1 | 0.1 | 97.3 | 573 | 94.1 | 2.5 | 3.4 | 99.2 |
| B | 79 618 |
99.7 | <0.1 | 0.3 | 97.2 | 1419 | 86.2 | 13.6 | 2.2 | 99.3 |
| C | 73 554 |
99.6 | <0.1 | 0.4 | 97.3 | 1266 | 88.2 | 10.2 | 1.6 | 99.3 |
| D | 84 019 |
99.9 | <0.1 | 0.1 | 97.3 | 1233 | 91.3 | 4.0 | 4.7 | 99.2 |
| E | 40 366 |
99.6 | <0.1 | 0.4 | 97.3 | 657 | 85.5 | 5.0 | 9.5 | 99.2 |
| NCBI accession no. | Taxonomic classification | Percent identity (%) | ||
|---|---|---|---|---|
| Mock colony | A | NR_113 237.1 |
Acinetobacter baumannii | 100 |
| B | NR_113 237.1 |
Acinetobacter baumannii | 100 | |
| C | NR_113 237.1 |
Acinetobacter baumannii | 100 | |
NR_112 030.1 |
Stenotrophomonas maltophilia | 98.04 | ||
| D | NR_112 030.1 |
Stenotrophomonas maltophilia | 99.93 | |
| E | NR_112 030.1 |
Stenotrophomonas maltophilia | 99.93 | |
| Bioaerosol-derived colony | 1 | NR_108 311.1 |
Alkalihalobacillus rhizosphaerae | 99.1 |
NR_025 723.1 |
Kocuria marina | 99.86 | ||
| 2 | NR_116 578.1 |
Micrococcus yunnanensis | 99.65 | |
NR_025 058.1 |
Paraburkholderia fungorum | 99.73 | ||
| 3 | NR_115 526.1 |
Bacillus cereus | 100 | |
| 4 | NR_112 628.1 |
Lysinibacillus fusiformis | 99.66 | |
| 5 | NR_114 581.1 |
Bacillus thuringiensis | 100 | |
| 6 | NR_114 581.1 |
Bacillus thuringiensis | 99.93 | |
| 7 | NR_042 337.1 |
Bacillus altitudinis | 99.87 | |
After successful construction of consensus sequences of 16S amplicons from mock colonies, we aimed to implement the tool on the data from bioaerosol-derived colonies. The consensus sequences obtained for bioaerosol samples were found to have >99% identity match with sequences in the NCBI 16S database, enabling species-level identification (Table 3). For example, the top three matches of the consensus sequences for colonies 3, 5, and 6 were found to be strains of B. cereus, B. thuringiensis, and B. thuringiensis, respectively with the percent identity ranging from 100% to 99.73, which correspond to a mismatch of 0–4 bases for ∼1.5 kb amplicon. Moreover, for colonies 1 and 2, consensus sequences of both bacteria that contributed to each of these mixed colonies could be generated maintaining a similarly high identity match to the NCBI database. It is to be noted that the total number of nanopore sequence reads for bioaerosol-derived colonies was substantially lower than the reads from the mock colonies (compare the Q ≥ 13 and I ≥ 95% read numbers in Tables 1 and 2) but that didn't adversely impact the quality of the consensus sequence. Taken together, our results indicate that 16S reads from nanopore sequencing could be utilized not only to identify individual bacteria from mixed bacterial colonies and estimate their proportional abundance but also to construct a full-length amplicon sequence with high accuracy.
While several studies have compared nanopore sequencing with NGS such as Illumina sequencing,40,46,47 relatively few studies are directed toward the comparison of MinION sequencing and Sanger sequencing.48 The interest in using nanopore sequencing as a potential alternative sequencing tool to Sanger sequencing is primarily for amplicon-based assays, commonly used in forensic genetics or tracking species in the field.45,48 Since Sanger and nanopore sequencing are based on completely different technologies and generate a different form of output, the comparison is not straightforward and done through the generation of a consensus sequence from nanopore read sequences.27 Among the few comparisons being made for targeted amplicon analysis, Vasiljevic et al. reported using nanopore sequencing to identify animal species via the species-diagnostic region of the mitochondrial cytochrome b (mtDNA cyt b) gene with an amplicon length of approximately 421 bp.49 Their results showed that the consensus sequences derived from nanopore sequencing were remarkably close to Sanger sequences with a deviation of not more than 1 bp. It is to be noted that the performance of Sanger sequencing starts to decrease for longer sequence lengths (>1000 bases), and therefore, could have less accuracy in sequencing the full-length 16S gene (∼1.5 kb).50 Not surprisingly, we found that Sanger method-derived 16S sequences had <98% maximum identity match against the NCBI database for both pure bacterial DNA and cultured colonies, preventing a species-level classification. The nanopore sequencing technology, however, is not constrained by such limitations as the read accuracy is independent of the length of DNA fragments sequenced. This is further supported by our findings that 16S consensus sequences generated from nanopore reads of pure DNA samples of A. baumannii and S. maltophilia strains deviate by only one base from the maximally aligned sequences in the NCBI 16S database.
Sanger sequencing is still a default choice in many fields for the identification and comparison of homogeneous genetic material, particularly when amplicons of sub-thousand base pairs are used for characterization.26 The technique can also be applied to mixed microbial samples through colony culture and isolation of individual taxa from distinct colonies. In this work, we found that the Sanger sequence of 16S amplicons from colony 1 has a low identity match and a noisy electropherogram with overlapping peaks, raising the suspicion of the presence of more than one bacterial species. Standard microbiological practice can address this by subculturing the culture isolate, followed by Sanger sequencing of the pure colonies generated. Sanger sequencing for colonies 2–7 demonstrated a clean electropherogram with distinct peaks and the sequences had an identity match of ≥97% against the NCBI 16S database. For colonies 3–7, the genus level identification from Sanger sequencing matched accurately with the nanopore sequencing result. However, for colony 2, which was identified as Micrococcus by Sanger sequencing, nanopore sequencing revealed that almost one third of the amplicons are from taxonomically distant Paraburkholderia. The mock colony experiment using pure bacterial DNA further confirmed that the presence of additional taxa in a sample has a variable impact on electropherogram quality and percent identity match for the dominant taxa, and such effects are taxa specific. We observed that the presence of 10% S. maltophilia in A. baumannii genomic DNA had minimum impact on the percent identity match for A. baumannii; however, the presence of 10% A. baumannii in S. maltophilia genomic DNA resulted in a drastic reduction in the identity match for S. maltophilia (97.4% to 81.3%) along with a conspicuous drop in the electropherogram quality. These results together suggest that the Sanger method has a limited ability to discriminate from 16S amplicons whether a bacterial colony is a true homogeneous colony (as in colonies 3–7) or a multispecies colony (as in colony 2). This uncertainty can pose a serious limitation on the applicability of Sanger sequencing in colony identification when there is potential for multispecies colony formation.42
Nanopore sequencing technology is emerging fast in the landscape of sequencing and is being applied for an increasing range of applications, including whole genome sequencing, microbiome analysis, and transcriptome analysis.34,36,37,51 Underlying the rapid growth and increased acceptance of this technology is a continual advancement in the sequencing platform and basecalling algorithm, increased availability of protocols and bioinformatics tools for analysis, along with benchmarking studies confirming the robustness and reliability of performance.52 Indeed, when comparing the preexisting Fast basecalling with the more recently introduced Bonito basecalling, we observed a substantial improvement in the quality of sequence reads (Ī increased by ∼5%) along with a larger number of reads passing a preset quality threshold. Implementing an appropriate quality threshold (Q-score ≥ 13, I ≥ 95%), we were able to achieve accurate genus-level classification. However, the biggest advantage of nanopore sequencing over the Sanger method in the context of colony identification comes from its ability to identify all bacterial taxa present in the colony as in the case of colonies 1 and 2. Additionally, a control experiment with pure genomic DNA demonstrated that the relative abundance of bacteria observed by the nanopore sequencing reasonably reflects their proportion in a mixed sample, although some deviation can result from the variability in 16S gene copies present among bacterial species.22,53
The capability of long-read sequencing by the MinION nanopore or PacBio sequencing platform offers a clear advantage over short-read sequencing technologies for applications in taxonomical identification or classification through targeted gene amplification. For the 16S gene, sequences longer than 1300 bases are considered to be suitable for reliable results.20 However 16S taxonomical classification by Illumina-based sequencing usually targets the hypervariable regions V4, V3–V4, or V4–V5 of the 16S gene due to the limitation of this technique to read only a short span of the 16S sequence.29,30 Such a restriction imposed on the amplicon length allows identification only up to the genus level. While near full-length 16S sequencing on the Illumina platform has been achieved by using unique, random sequences to tag individual 16S gene templates, the long, complex procedure is not practical for routine implementation.54 The long-read sequencing enables the analysis of full-length 16S gene amplicons and such coverage has been shown to successfully identify microbiota to species-level resolution.30 A recent study using the PacBio long-read sequencing platform achieved a read accuracy to the single nucleotide level through the construction of circular consensus sequences (CCSs), followed by the implementation of an advanced algorithm for analysis that enabled strain level identification.55 Although this result is highly accurate, the need for expensive equipment, and a relatively complex sample preparation and analysis process along with the higher cost associated with sequencing make it less suitable for routine identification of bacterial colonies.
In our work, we found that the implementation of Bonito basecalling followed by selection of high-quality reads (Q-score ≥ 13 and I ≥ 99%) enabled successful species-level classification of 97.4% and 92.1% of the amplicon sequences derived from pure DNA samples of A. baumannii and S. maltophilia, respectively. It is to be noted that an identity threshold of ∼99% is recommended for species-level identification from the full-length 16S sequence, and such a high threshold leads to the rejection of a significant proportion of reads, and therefore, requires a larger number of raw reads for analysis. Our results show that the construction of consensus sequences would be an attractive alternative strategy for species-level identification, where a highly accurate (>99%) identity match was observed even for mixed colony samples.44 Interestingly, this high accuracy match with the NCBI 16S database (only 0–4 base mismatch for top matches) enabled the species-level classification of Bacillus colonies, which is otherwise known to be an extremely challenging task to accomplish through the 16S sequencing approach. Moreover, the quality of the consensus sequence was maintained even with a smaller number of reads per sample (as observed with bioaerosol-derived colonies), which could be highly advantageous in reducing the sequencing cost by enabling a larger number of samples to be sequenced per flow cell or utilization of Flongle, a more affordable option for nanopore sequencing. One limitation of this consensus sequence-based identification approach would be potentially missing a bacterial species having a very low abundance in a multispecies culture, similar to what we observed for mock mixed colonies B and D, where only A. baumannii and S. maltophilia containing 90% of total DNA, respectively, were being identified.
Emerging bodies of work suggest that airborne microbes can pose multiple health risks that include transmission of infectious diseases, triggering of chronic diseases, and the spread of antimicrobial resistance genes.16,56 Microbes in the air exist both as single organisms and as aggregates, often bound to particulate matter. Due to their low numbers in the air, bioaerosol samples are usually plated directly on agar media for culture-based assays to study viable microorganisms. Since microbes can exist as aggregates in bioaerosols,14,15 such an aggregated form can potentially lead to the formation of multispecies colonies, and therefore, such possibilities should be considered when analyzing single colonies. While adjusting the culture conditions (e.g., temperature and incubation period) and subculturing could help in the generation of pure isolates and subsequent identification by the Sanger sequencing method, our results show that nanopore sequencing could enable fast and accurate identification of bacterial taxa in such samples. It is to be noted that even though nanopore sequencing costs can be substantially reduced by multiplexing samples to run on a single flow cell, the classical microbiological approach of subculturing to generate pure colonies of single bacterial species followed by Sanger sequencing would remain a more cost-effective approach for routine colony identification. However, with continued advancements and innovations in nanopore sequencing technology, the cost might come down substantially in the near future to be competitive with Sanger sequencing cost. Furthermore, the remarkable improvements in nanopore sequencing accuracy in recent years, the availability of a streamlined protocol for 16S analysis, and the ability to conduct sequencing experiments in the lab could make this technology a powerful tool for culture-based assays of bioaerosol or other complex environmental samples.
Footnote |
| † These authors contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2024 |