Query tutorial

back to Education

Contents

Introduction

Selecting the BLAST Program

Selecting the BLAST database

Entering your Sequence

Submitting your Search

Introduction

This online tutorial is designed to help the first time BLAST user. This tutorial will teach you to input a sequence into the Basic BLAST web page, choose a program and database, and examine the results.

The core of NCBI 's BLAST services is BLAST 2.0 otherwise known as "Gapped BLAST". This service is designed to take protein and nucleic acid sequences and compare them against a selection of NCBI databases.

The BLAST algorithm was written balancing speed and increased sensitivity for distant sequence relationships. Instead of relying on global alignments (commonly seen in multiple sequence alignment programs) BLAST emphasizes regions of local alignment to detect relationships among sequences which share only isolated regions of similarity (Altschul et al., 1990). Therefore, BLAST is more than a tool to view sequences aligned with each other or to find homology, but a program to locate regions of sequence similarity with a view to comparing structure and function.

Selecting the BLAST Program

The BLAST search pages allow you to select from several different programs.
Below is a table of these programs.

Program Description

blastp Compares an amino acid query sequence against a protein sequence database.

blastn Compares a nucleotide query sequence against a nucleotide sequence database.

blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence.

tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.

tblastx Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive.

To select a BLAST program for your search:

1. Open the Basic BLAST search page.
2. From the "Program" Pull Down Menu select the appropriate program.

Figure 1. Using the pull down menu to select a BLAST program.

Selecting the BLAST Database

You can select several NCBI databases to compare your query sequences against. Note that some databases are specific to proteins or nucleotides and cannot be used in combination with certain BLAST programs (for example a blastn search against swissprot).

Proteins

Database	Description
nr	All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
month	All new or revised GenBank CDS translation+PDB+SwissProt+PIR released in the last 30 days.
swissprot	The last major release of the SWISS-PROT protein sequence database (no updates). These are uploaded to our system when they are received from EMBL.
patents	Protein sequences derived from the Patent division of GenBank.
yeast	Yeast (Saccharomyces cerevisiae) protein sequences. This database is not to be confused with a listing of all Yeast protein sequences. It is a database of the protein translations of the Yeast complete genome.
E. coli	E. coli (Escherichia coli) genomic CDS translations.
pdb	Sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank.
kabat [kabatpro]	Kabat's database of sequences of immunological interest. For more information http://immuno.bme.nwu.edu/
alu	Translations of select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available at ftp://ncbi.nlm.nih.gov/pub/jmc/alu. See "Alu alert" by Claverie and Makalowski, Nature vol. 371, page 752 (1994).

Nucleotides

Database	Description
nr	All non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences).
month	All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.
dbest	Non-redundant database of GenBank+EMBL+DDBJ EST Divisions.
dbsts	Non-redundant database of GenBank+EMBL+DDBJ STS Divisions.
mouse ests	The non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism mouse.
human ests	The Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions limited to the organism human.
other ests	The non-redundant database of GenBank+EMBL+DDBJ EST Divisions all organisms except mouse and human.
yeast	Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences. Not a collection of all Yeast nucelotides sequences, but the sequence fragments from the Yeast complete genome.
E. coli	E. coli (Escherichia coli) genomic nucleotide sequences.
pdb	Sequences derived from the 3-dimensional structure of proteins.
kabat [kabatnuc]	Kabat's database of sequences of immunological interest. For more information http://immuno.bme.nwu.edu/
patents	Nucleotide sequences derived from the Patent division of GenBank.
vector	Vector subset of GenBank(R), NCBI, (ftp://ncbi.nlm.nih.gov/pub/blast/db/ directory).
mito	Database of mitochondrial sequences (Rel. 1.0, July 1995).
alu	Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available at ftp://ncbi.nlm.nih.gov/pub/jmc/alu. See "Alu alert" by Claverie and Makalowski, Nature vol. 371, page 752 (1994).
epd	Eukaryotic Promotor Database ISREC in Epalinges s/Lausanne (Switzerland).
gss	Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
htgs	High Throughput Genomic Sequences.

Figure 2. Using the Pull Down Menu to select the BLAST database.

Entering your Sequence

The BLAST web pages accept input sequences in three formats; FASTA sequence format, NCBI Accession numbers, or GIs.

FASTA Format

A description of the FASTA format is located on the Basic BLAST search pages.

1. Open your FASTA formatted sequence in a text editor as plain text.
2. Use your mouse to highlight the entire sequence.
3. Select Edit/Copy from the menu in your text editor.
4. Go to the BLAST search page in your web browser.
5. Use your mouse to select the main input field titled "Enter your input data here", by clicking it once.
6. Select Edit/Paste from the browser's menu.
7. You should now see your FASTA sequence in this field.
8. Set the pull down menu to "Sequence in FASTA format".

Figure 3. Example of a FASTA sequence in the input field.

Accession or GI number

If you know the Accession number or the GI of a sequence in GenBank, you can use this as the query sequence in a BLAST search.

1. Go to the BLAST search page in your web browser.
2. Use your mouse to select the main input field titled "Enter your input data here", by clicking it once.
3. Using the keyboard enter the GenBank Accession number or the GI number.
4. Set the Pull Down Menu to "Accession or GI".

Submitting your Search

1. Make sure you have selected the correct BLAST program and BLAST database.
2. If you have entered your FASTA sequence or an Accession or GI number, click the "Submit Query Button".
3. BLAST will now open a new window and tell you it is working on your search.
4. Once your results are computed they will be presented in the window.

Introduction

Selecting the BLAST Program

Selecting the BLAST database

Entering your Sequence

Submitting your Search