A Practical Primer to BLAST Sequence Similarity Searching

Literature database systems are built upon the fact that each component of the record is immutable. The title of the article, where, when, and by whom it was published, is not going to change. Stating the obvious, each journal article record in a literature database is unique. Search engines are built to match text in a query to identical text in a literature record. In most sciences, molecular uniqueness is also true. In physics, a muon is a muon. In chemistry a carbon atom is a carbon atom. Search algorithms in these disciplines are built upon this concept whether they are text-based or structural-based. Biological systems, however, are completely different as they are built upon enhancing diversity. No two organisms are exactly identical, including their genome and gene sequences. Even within the same organism, genes and proteins can have variable sequences of the same protein in different cellular, tissue, or organ compartments. In the biological sciences, if one were to use a DNA or protein sequence as a query and use a search engine based upon identity as literature databases do, the only sequence retrieved would be the query sequence itself. Thus, in bioinformatics, searching sequence data in bioinformatics records are based upon sequence similarity searching.

