Introduction
This online tutorial is designed to help the first time BLAST user.
This tutorial will teach you to input a sequence into the Basic BLAST web
page, choose a program and database, and examine the results.
The core of NCBI 's BLAST services is BLAST 2.0 otherwise known as "Gapped
BLAST". This service is designed to take protein and nucleic acid
sequences and compare them against a selection of NCBI databases.
The BLAST algorithm was written balancing speed and increased sensitivity
for distant sequence relationships. Instead of relying on global
alignments (commonly seen in multiple sequence alignment programs)
BLAST emphasizes regions of local alignment to detect relationships among
sequences which share only isolated regions of similarity (Altschul et
al., 1990). Therefore, BLAST is more than a tool to view sequences aligned
with each other or to find homology, but a program to locate
regions of sequence similarity with a view to comparing structure
and function.
Selecting
the BLAST Program
The BLAST search pages allow you to select from several different programs.
Below is a table of these programs.
Program |
Description |
blastp |
Compares an amino acid query sequence against a protein sequence database. |
blastn |
Compares a nucleotide query sequence against a nucleotide sequence database. |
blastx |
Compares a nucleotide query sequence translated in all reading
frames against a protein sequence database. You could use this option to
find potential translation products of an unknown nucleotide sequence. |
tblastn |
Compares a protein query sequence against a nucleotide sequence database
dynamically translated in all reading frames. |
tblastx |
Compares the six-frame translations of a nucleotide query sequence
against the six-frame translations of a nucleotide sequence database. Please
note that the tblastx program cannot be used with the nr database on the BLAST
Web page because it is computationally intensive. |
To select a BLAST program for your search:
1. Open the Basic BLAST search page.
2. From the "Program" Pull Down Menu select the appropriate program.
Figure 1. Using the pull down menu to select a
BLAST program.
Selecting the BLAST Database
You can select several NCBI databases to compare your query sequences against. Note that some databases are specific to proteins or nucleotides and cannot be used in combination with certain BLAST programs (for example a blastn search against swissprot).
Proteins
Database |
Description |
nr |
All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF |
month |
All new or revised GenBank CDS translation+PDB+SwissProt+PIR released in the last 30 days. |
swissprot |
The last major release of the SWISS-PROT protein sequence database (no updates). These are uploaded to our system when they are received from EMBL. |
patents |
Protein sequences derived from the Patent division of GenBank. |
yeast |
Yeast (Saccharomyces cerevisiae) protein sequences. This database is not to be confused with a listing of all Yeast protein sequences. It is a database of the protein translations of the Yeast complete
genome. |
E. coli |
E. coli (Escherichia coli) genomic CDS translations. |
pdb |
Sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank. |
kabat [kabatpro] |
Kabat's database of sequences of immunological interest. For more information
http://immuno.bme.nwu.edu/ |
alu |
Translations of select Alu repeats from REPBASE, suitable for masking
Alu repeats from query sequences. It is available at ftp://ncbi.nlm.nih.gov/pub/jmc/alu.
See "Alu alert" by Claverie and Makalowski, Nature vol. 371, page 752 (1994). |
Nucleotides
Database |
Description |
nr |
All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences). |
month |
All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days. |
dbest |
Non-redundant database of GenBank+EMBL+DDBJ EST Divisions. |
dbsts |
Non-redundant database of GenBank+EMBL+DDBJ STS Divisions. |
mouse ests |
The non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism mouse. |
human ests |
The Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism human. |
other ests |
The non-redundant database of GenBank+EMBL+DDBJ EST Divisions all organisms except mouse and human. |
yeast |
Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences. Not a collection of all Yeast nucelotides sequences, but the sequence fragments
from the Yeast complete genome. |
E. coli |
E. coli (Escherichia coli) genomic nucleotide sequences. |
pdb |
Sequences derived from the 3-dimensional structure of proteins. |
kabat [kabatnuc] |
Kabat's database of sequences of immunological interest. For more information
http://immuno.bme.nwu.edu/ |
patents |
Nucleotide sequences derived from the Patent division of GenBank. |
vector |
Vector subset of GenBank(R), NCBI, (ftp://ncbi.nlm.nih.gov/pub/blast/db/
directory). |
mito |
Database of mitochondrial sequences (Rel. 1.0, July 1995). |
alu |
Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available at ftp://ncbi.nlm.nih.gov/pub/jmc/alu.
See "Alu alert" by Claverie
and Makalowski, Nature vol. 371, page 752 (1994). |
epd |
Eukaryotic Promotor Database ISREC in Epalinges s/Lausanne (Switzerland). |
gss |
Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences. |
htgs |
High Throughput Genomic Sequences. |
Figure 2. Using the Pull Down Menu to select the
BLAST database.
Entering
your Sequence
The BLAST web pages accept input sequences in three formats; FASTA sequence format, NCBI Accession numbers, or GIs.
FASTA Format
A description of the FASTA format is located on the Basic BLAST search pages.
1. Open your FASTA formatted sequence in a text editor as plain text.
2. Use your mouse to highlight the entire sequence.
3. Select Edit/Copy from the menu in your text editor.
4. Go to the BLAST search page in your web browser.
5. Use your mouse to select the main input field titled "Enter your input data here", by clicking it once.
6. Select Edit/Paste from the browser's menu.
7. You should now see your FASTA sequence in this field.
8. Set the pull down menu to "Sequence in FASTA format".
Figure 3. Example of a FASTA sequence in the input
field.
Accession or GI number
If you know the Accession number or the GI of a sequence in GenBank, you can use this as the query sequence in a BLAST search.
1. Go to the BLAST search page in your web browser.
2. Use your mouse to select the main input field titled "Enter your input data here", by clicking it once.
3. Using the keyboard enter the GenBank Accession number or the GI number.
4. Set the Pull Down Menu to "Accession or GI".
Submitting
your Search
1. Make sure you have selected the correct BLAST program and BLAST database.
2. If you have entered your FASTA sequence or an Accession or GI number, click the "Submit Query Button".
3. BLAST will now open a new window and tell you it is working on your search.
4. Once your results are computed they will be presented in
the window.
|