2. INTRODUCTION
An important goal of genomics and proteomics is to determine
if a particular sequence is like another sequence. This is
accomplished by comparing the new sequence with sequences
that have already been reported and stored in a database.
This process is principally one that uses alignment procedures
to uncover the “like” sequence in the database.
The alignment process will uncover those regions that are
identical or closely similar and those regions with little (or
any) similarity.
Two alignment types are used: global and local.
3. BLAST
BLAST stands for Basic Local Alignment Search Tool
BLAST was developed by Stephen Altschul, Warren
Gish, Webb Miller, Eugene Myers, and David J. Lipman at
NCBI in 1990.
It is a local alignment tool.
It helps to find regions of local similarity between sequences.
It is a program compares nucleotide or protein sequences to sequence
databases and calculates the statistical significance of matches.
BLAST can be used to infer functional and evolutionary
relationships between sequences as well as help identify members of
gene families.
7. STEPS
Specifying A Sequence Of Interest
Selecting BLAST Program
Selecting Database
Selecting Optional Parameters
Selecting Formatting Parameters
8. PROCESS
The first step of the BLAST algorithm is to break the query
into short words of a specific length.
For example, twelve amino acids near the amino terminal of the
Aradbidopsis thaliana protein phosphoglucomutase sequence are:
NYLENFVQATFN
This sequence is broken down into three character words by
selecting the first amino acid characters.
NYL YLE LEN ENF NFV FVQ VQA QAT ATF TFN
These words are then compared against a sequence in a
database.
For example, word match with rabbit muscle phosphoglucomutase:
Query ENF
Subject SSTNYAENTIQSIISTVEPAQR
9. This search is performed for all words. Those words whose T
value was greater than 18 were used as to extend the
alignment.
For every pair of sequences (query and target) that have a
word or words in common, BLAST extends the alignment in
both directions to find alignments that score greater (are more
similar) until the alignment score decreases in value.
For example, consider the following alignment between the A. thaliana
and rabbit muscle phosphoglucomutase:
Query NLYENFVQATFNALTAEKV
NY ENF+Q + + + +
Subject NYAENTIQSIISTVEPAQR
10. Once this alignment process is completed for a query and each
subject sequence in the database, a report is generated. This
report provides a list of those alignments (default size of 50)
with a value greater than the S cutoff value.
Those alignments whose score is above the cutoff are called a
High Scoring Segment Pair (HSP).
For each alignment reported, an Expect (e) Value is reported.
11. BLAST OUTPUT
The blast output is basically displayed in three ways or
formats.
A. Graphical display: shows where the query is similar to other
sequences.
B. Hit list: number of sequences similar to query, ranked by
similarity.
C. Alignment: every alignment between the query and the
reported hits.
12. BLAST OUTPUT
A. GRAPHICAL DISPLAY
• Query sequence is at the top,
with colour key for alignment
scores.
• Each bar represents the portion
of another sequence that’s
similar to your query sequence :-
Red bars- most similar
sequence.
Pink bars- match less good.
Green bars- not impressive
match.
Blue bars- worst score.
Black bars- bad hits.
13. BLAST OUTPUT
B. HIT LIST
1 - This portion of each description links to the sequence record for a particular hit.
2 - Score or bit score is a value calculated from the number of gaps and substitutions
associated with each aligned sequence. The higher the score, the more significant the
alignment.
3 - E Value (Expect Value) describes the likelihood that a sequence with a similar score
will occur in the database by chance. The smaller the E Value, the more significant the
alignment
4 - These links provide the user with direct access from BLAST results to related
entries in other databases. ‘L’ links to Locus Link records and ‘S’ links to structure
records in NCBI's Molecular Modelling Database.
21. APPLICATIONS
• BLAST can be used for
several purposes. These
include:
Identifying Species
Establishing Phylogeny
DNA Mapping
Locating Domains