Map maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence.
Map displays a sequence that is being assembled or analyzed intensively. Map asks you to select the enzymes whose restriction sites should be marked individually by typing their names. If you do not answer this question, Map selects a representative isoschizomer from all of the commercially available enzymes. You can choose to have your sequence translated in any or all of the six possible translation frames. You can also choose to have only the open reading frames translated.
After running Map, you may create a new sequence file with the protein sequence from any frame of DNA translation by using the ExtractPeptide program with the Map output file.
Here is part of the output file:
(Linear) MAP of: gamma.seq check: 6474 from: 2161 to: 2600 Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. Analyzed by Smithies et al. Cell 26; 345-353. With 216 enzymes: * September 24, 1998 16:19 .. HgaI SimI NlaIII| BsaJI || DsaI || NcoI || StyI || BsaHI | || RleAI BspGI | | || MnlI BseRI | BfaI | | | || MnlI | CviJI | | CviJI | | | | || | | | | | | GCTCCTAGTCCAGACGCCATGGGTCATTTCACAGAGGAGGACAAGGCTACTATCACAAGC 2161 ---------+---------+---------+---------+---------+---------+ 2220 CGAGGATCAGGTCTGCGGTACCCAGTAAAGTGTCTCCTCCTGTTCCGATGATAGTGTTCG a A P S P D A M G H F T E E D K A T I T S - b L L V Q T P W V I S Q R R T R L L S Q A - c S * S R R H G S F H R G G Q G Y Y H K P - 2161 ---------+---------+---------+---------+---------+---------+ 2220 d G L G S A M P * K V S S S L A V I V L - e E * D L R W P D N * L P P C P * * * L - f S R T W V G H T M E C L L V L S S D C A - /////////////////////////////////////////////////////////////////////// Enzymes that do cut: AccI AluI AvaII BanI BbvI BccI Bce83I BfaI BglI BmgI BpmI BsaHI BsaJI BseRI BsgI BslI Bsp1286I BspGI BstEII CjePI CviJI CviRI DdeI DpnI /////////////////////////////////////////////////////////////////////// Enzymes that do not cut: AatII AceIII AciI AflII AflIII AhdI AlwI Alw26I AlwNI ApaI ApaBI ApaLI ApoI AscI AvaI AvrII BaeI BamHI BanII BbsI BcefI BcgI BciVI BclI ///////////////////////////////////////////////////////////////////////
Map accepts a single nucleotide or protein sequence as input. The function of Map depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.
MapSort, PlasmidMap, and MapPlot display restriction maps in other formats. ExtractPeptide extracts the protein sequence from any translation frame in the Map output file and puts it into a new sequence file. FindPatterns searches for short patterns like enzyme recognition sites in one or more sequences. PeptideMap creates a peptide map of an amino acid sequence. You can use either Map or PeptideMap with protein sequence input and obtain identical results.
Map does not treat your sequence as circular unless you use Treat input sequence as circular.
The enzymes you name must be in the enzyme data file or you get an error message. You can have your system manager change the public enzyme data file to contain the enzymes most useful to your group, or you can maintain a private copy for your own use.
This program normally requires that a sequence pattern be a subset of the enzyme recognition site. If the recognition pattern in the enzyme data file were GCRGC, then the pattern GCAGC in your sequence would be found, since A is within the set of bases defined by R (see Appendix III). If the pattern in the enzyme data file were GCAGC, then a GCRGC in your sequence would not be recognized. If your sequence is very ambiguous, as it might be if it were a backtranslated sequence, then it may be better to use -ALL to do an overlap search. The overlap search would consider an R in your sequence to match an A in the recognition site.
With -PERFect, the program looks for a perfect symbol match between your sequence and the recognition pattern -- GCRGC in the recognition pattern would only match a GCRGC in the sequence.
All searches are case insensitive (upper- or lowercase) for the letters in either the sequence or the enzyme recognition site.
As in almost all sequence displays the 5'->3' direction of the top strand is from left to right. Map aligns each enzyme's name so that the name ends over the 3' end of the fragment that continues to the left. If you use Display both forward and reverse strand cut positions, Map aligns the name to end over the 5'-most nucleotide of the reverse strand fragment that continues to the left.
If more than one enzyme cuts at the same position, Map sorts the set of enzymes that cut at the position alphabetically and stacks them up so that each enzyme name ends over the same position. If enzymes that cut to the left are in the way of the display, Map puts the names further up and uses a line of '|' characters to connect the name to the cut position.
When you search for potential restriction sites with either Number of allowed mismatches or Find translationally silent potential restriction sites, Map differentiates the real sites from the potential sites by capitalizing the enzyme's name at the real sites.
The program presents you with an enzyme selection prompt that lets you enter enzymes individually or collectively. We maintain our enzyme files with a semicolon (;) character in front of all but one member of a family of isoschizomers. (Isoschizomers are restriction endonucleases with the same recognition site.) The isoschizomers beginning with a semicolon are normally not displayed by our mapping programs unless you specifically select them by name or type "**" instead of "*" at the enzyme prompt.
There is more information on enzyme files in Appendix VII.
The translation menu allows several responses. You can name the frames of interest individually with a response like abcf. You can use t or s to mean the three forward or all six possible translation frames. You can make all of the characters in your response uppercase to get three-letter instead of one-letter amino acid symbols in the translation. You can add o to your response to get translation only between potential start codons and stop codons (o by itself gives open reading frame translation of all six translation frames).
You can select translation for open reading frames only. All of the frames are treated as open at the 5' end of each strand; these pseudo-open reading frames run to the first stop codon in that frame (see the discussion of translation tables in Appendix VII). Thereafter, reading is turned on at each potential start codon and runs to the next stop codon. You can suppress the display of short open reading frames with Minimum open reading frame size set to 20, for example, which would restrict the display to frames coding for at least 20 amino acids.
Open reading frames are determined from the beginning and ending of the sequence in the file--not from just the range you have chosen. The potential start codons and stop codons are defined in the data file translate.txt.
If you want to analyze the restriction sites in another program you can display all the cut positions in a table. Use Write a table of cut sites (instead of restriction map) to get output like this:
(Linear) MAP of: gamma.seq check: 6474 from: 2161 to: 2600 Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. Analyzed by Smithies et al. Cell 26; 345-353. With 216 enzymes: * Enzyme + - September 25, 1996 12:24 .. BfaI 2165 2167 BspGI 2170 2170 BsaHI 2174 2176 //////////////////////
Normally, the table is sorted by position first and then alphabetically by enzyme name. You can sort the table by enzyme name first and then by position with Order table of cut sites by.
If you display the cut positions in a table using Write a table of cut sites (instead of restriction map), the program does not create the standard output file displaying the sequence and the restriction sites along that sequence.
To assist scientists doing site-directed mutagenesis, this program searches for places in your sequence where a restriction enzyme recognition site occurs with one or more mismatches. Use Number of allowed mismatches set to 1 to identify positions where recognition could occur with one or fewer mismatches.
Use Find translationally silent potential restriction sites to find the places in your sequence where a restriction site could be introduced without changing the translation. Read more about using Find translationally silent potential restriction sites under the PARAMETER REFERENCE topic below.
By changing the enzyme data file (see the LOCAL DATA FILES topic below), you can make this program search for any pattern. See Appendix VII for notes on enzyme data files.
FindPatterns, Map, MapSort, MapPlot, and Motifs all let you search with ambiguous expressions that match many different sequences. The expressions can include any legal GCG sequence character (see Appendix III). The expressions can also include several non-sequence characters, which are used to specify OR matching, NOT matching, begin and end constraints, and repeat counts. For instance, the expression TAATA(N){20,30}ATG means TAATA, followed by 20 to 30 of any base, followed by ATG. Following is an explanation of the syntax for pattern specification.
Parentheses () enclose one or more symbols that can be repeated some number of times. Braces {} enclose numbers that tell how many times the symbols within the preceding parentheses must be found.
Sometimes, you can leave out part of an expression. If braces appear without preceding parentheses, the numbers in the braces define the number of repeats for the immediately preceding symbol. One or both of the numbers within the braces may be missing. For instance, both the pattern GATG{2,}A and the pattern GATG{2}A mean GAT, followed by G repeated from 2 to 350,000 times, followed by A; the pattern GATG{}A means GAT, followed by G repeated from 0 to 350,000 times, followed by A; the pattern GAT(TG){,2}A means GAT, followed by TG repeated from 0 to 2 times, followed by A; the pattern GAT(TG){2,2}A means GAT, followed by TG repeated exactly 2 times, followed by A. (If the pattern in the parentheses is an OR expression (see below), it cannot be repeated more than 2,000 times.)
If you are searching nucleic acids, the ambiguity symbols defined in Appendix III let you define any combination of G, A, T, or C. If you are searching proteins, you can specify any of several symbol choices by enclosing the different choices in parentheses and separating the choices with commas. For instance, RGF(Q,A)S means RGF followed by either Q or A followed by S. The length of each choice need not be the same, and there can be up to 31 different choices within each set of parentheses. The pattern GAT(TG,T,G){1,4}A means GAT followed by any combination of TG, T, or G from 1 to 4 times followed by A. The sequence GATTGGA matches this pattern. There can be several parentheses in a pattern, but parentheses cannot be nested.
The pattern GC~CAT means GC, followed by any symbol except C, followed by AT. The pattern GC~(A,T)CC means GC, followed by any symbol except A or T, followed by CC.
The pattern <GACCAT can only be found if it occurs at the beginning of the sequence range being searched. Likewise, the pattern GACCAT> would only be found if it occurs at the end of the sequence range.
We are grateful to Frank Manion for suggestions and for code used in the revision of Map for version 9.0. The vertical enzyme output format of Map was designed by John Schroeder and Frederick Blattner (NAR 10; 69-84 (1982), Figure 1). Map was written for the first release of the Wisconsin Package(TM) by Paul Haeberli and John Devereux.
You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
The Minimum number of cuts, Maximum number of cuts, Show enzymes that cut only once, and Suppress enzymes that cut within the region base1,base2 parameters suppress the display of selected enzymes. The list of excluded enzymes in the program output includes both selected enzymes that cut within excluded ranges and selected enzymes that did not cut the right number of times.
Minimum number of cuts
excludes enzymes that do not cut at least the specified number of times.
Maximum number of cuts
excludes enzymes that cut more than the specified number of times.
Show enzymes that cut only once
excludes, from the set of enzymes displayed, those enzymes that cut your sequence more than once (equivalent to setting both mincuts and maxcuts to one).
Suppress enzymes that cut within the region base1,base2
excludes enzymes that cut anywhere within one or more ranges of the sequence. If an enzyme is found within an excluded range, then the enzyme is not displayed. The list of excluded enzymes includes enzymes that cut within excluded ranges. The ranges are defined with sets of two numbers. The numbers are separated by commas. Spaces between numbers are not allowed. The numbers must be integers that fall within the sequence beginning and ending points you have chosen. The range may be circular if circular mapping is being done. Exclusion is not done if there are any non-numeric characters in the numbers or numbers out of range or if there is an odd number of integers following the parameter.
Display both forward and reverse strand cut positions
shows where each enzyme cuts the reverse strand as well as the forward strand. The cut point on the bottom strand is the 5' end of the fragment which continues to the left.
HgaI SimI NlaIII| BsaJI || DsaI || NcoI || StyI || BsaHI | || RleAI BspGI | | || MnlI BseRI | BfaI | | | || MnlI | CviJI | | CviJI | | | | || | | | | | | GCTCCTAGTCCAGACGCCATGGGTCATTTCACAGAGGAGGACAAGGCTACTATCACAAGC 2161 ---------+---------+---------+---------+---------+---------+ 2220 CGAGGATCAGGTCTGCGGTACCCAGTAAAGTGTCTCCTCCTGTTCCGATGATAGTGTTCG | | || | ||| | || | | BfaI | BsaHI|StyI ||| | CviJI| | CviJI BspGI NlaIII |SimI|| | BseRI | NcoI MnlI| | RleAI DsaI HgaI | BsaJI MnlI
Display the complement strand in the output
suppresses complement sequence display.
Write a table of cut sites (instead of restriction map)
If you simply want a table of which enzymes cut where use this parameter. See the topic TABLE OUTPUT.
Order table of cut sites by
Table output is normally sorted by the position of the cut in the top strand of the sequence. Use this parameter to see the cuts sorted first by enzyme and then by position. See the topic TABLE OUTPUT.