PLOTSIMILARITY(+)

Table of Contents
FUNCTION
DESCRIPTION
OUTPUT
INPUT FILES
RELATED PROGRAMS
ALGORITHM
CONSIDERATIONS
SUGGESTIONS
PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

PlotSimilarity plots the running average of the similarity among the sequences in a multiple sequence alignment.

DESCRIPTION

[ Previous | Top | Next ]

PlotSimilarity calculates the average similarity among all members of a group of aligned sequences at each position in the alignment, using a user-specified sliding window of comparison. The window of comparison is moved along all sequences, one position at a time, and the average similarity over the entire window is plotted at the middle position of the window. The average similarity across the entire alignment is plotted as a dotted line.

OUTPUT

[ Previous | Top | Next ]

If you are reading the Program Manual, the plot from this session is shown in the figure below.

INPUT FILES

[ Previous | Top | Next ]

PlotSimilarity accepts multiple (two or more) aligned nucleotide sequences or aligned protein sequences as input. The multiple sequence alignment created by the PileUp program can be used as input to PlotSimilarity. The gapped output files from the Gap and BestFit programs can also be used as input to PlotSimilarity. You can also optionally specify a weight for each sequence in a list file with the weight: sequence attribute. (See "Using List Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide for more information about sequence attributes in list files.)

You can assign weights to sequences in an MSF file by editing the MSF file and modifying the weight on the name/weight line for each sequence. (See "Using Multiple Sequence Format (MSF) Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide for a complete description of MSF files.)

You can assign weights to sequences in an RSF (rich sequence format) file by modifying the weight attribute for each sequence within SeqLab. (See "Using Rich Sequence Format (RSF) Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide for a complete description of RSF files. Also see "Viewing and Editing Sequence Attribute and Reference Information" in Chapter 2, Editing Sequences and Alignments in the SeqLab Guide for more information about modifying the weight attribute for each sequence within an RSF file.)

If a sequence from an MSF or RSF file is listed in a list file with a weight, the sequence weight is taken from the list file (the sequence weight in the MSF or RSF file is ignored). A weight of 1.0 is assumed if none is specified for a sequence. With -WEIGHT=1.0, PlotSimilarity ignores weights specified for individual sequences and gives all of the sequences in the alignment equal weight. The function of PlotSimilarity depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

PileUp creates a multiple sequence alignment of a group of related sequences. Gap uses the algorithm of Needleman and Wunsch to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. BestFit makes an optimal alignment of the best segment of similarity between two sequences. Optimal alignments are found by inserting gaps to maximize the number of matches using the local homology algorithm of Smith and Waterman. ProfileMake creates a position-specific scoring table, called a profile, that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap).

GapShow displays an alignment of two sequences by making a graph that show the distribution of similarities and gaps.

ALGORITHM

[ Previous | Top | Next ]

The average similarity at a position in an alignment is the arithmetic average of the scores of all possible pairwise symbol comparisons among the sequence symbols at that position. The comparison score between any two sequence symbols is the comparison value between those symbols in the scoring matrix multiplied by the weight of each of the two sequences. The average similarity across the entire alignment (plotted as a dotted line) is the sum of the separate window similarities divided by the number of windows.

With Plot the level of identity between the sequences, the program plots a measure of the level of identity among all sequences in the multiple sequence alignment. The calculations are done exactly as described above, but all identical symbol comparisons are given a value of 1; all other comparisons are given a value of 0.

With -PROFile, the program plots a running average of the positional conservation in a profile. The measure of conservation at any position is the difference between the greatest and least values at that position in the profile.

CONSIDERATIONS

[ Previous | Top | Next ]

PlotSimilarity does not create the multiple sequence alignment. You can create the alignment using PileUp, Gap, or BestFit (see the INPUT FILES topic above).

SUGGESTIONS

[ Previous | Top | Next ]

You can plot a measure of identity between all sequences in the alignment using Plot the level of identity between the sequences.

You can plot a measure of the level of conservation in a profile created from a multiple sequence alignment using -PROFile. This plot provides similar information to a plot of the similarity among the sequences in the multiple sequence alignment.

PARAMETER REFERENCE

[ Previous | Top | Next ]

You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

Comparison window

sets the size of the sequence window within which the average similarity score is calculated for the alignment.

Plot the level of identity between the sequences

plots the level of identity between the sequences.

Specify minimum similarity scale value

sets the bottom of the similarity score scale.

Specify maximum similarity scale value

sets the top of the similarity score scale.

Scale the plot between:

    minimum and maximum values in the scoring matrix
    minimum and maximum values calculated from the alignment

scales the plot between the observed minimum and maximum scores, rather than between the minimum and maximum scores in the scoring matrix.

Include the plot of overall similarity

provides a plot of overall average similarity between the sequences.

Printed: January 5, 2001 13:51 (1162)