Softwares for Bioinformatics

FTP
Netterm
WebLab Viewer
gstools
Mage
plasmid processor 1.02
Prekin
Primer
Promsed
RasMol
Biowire-Jellyfish
Swiss-Pdb Viewer
Sequin95
Vector NTI
X-Win412
Winseq
Winplasd

Chromas: Trace file viewer
[Download]
[Help]

SeqVerter:
Free sequence file format conversion
[Download]
[Tutorial]

Octopus
is an easy-to-use graphical user interface designed for the rapid interpretation of BLAST, BLAST-2 and FASTA output text files.
[Download]

SEQtools is a comprehensive program package for handling and analysis of nucleotide and protein sequences. The program includes a series of trivial functions to help you carry out common operations.

Special functions are included for design of microarray gene expression analysis experiments, for expression analyses with the SAGE procedure and for managing small EST projects. Utilities are included for primer design and ordering, renaming files, creating codon usage tables, building local searchable databases, aligning nucleotide and protein sequences, comparing sequences and a lot more...
[Registration]

BioEdit is a biological sequence alignment editor written for Windows 95/98/NT. A rich, intuitive multiple document interface with many convenient features makes alignment, manipulation and viewing of sequences relatively quick and easy on your desktop computer. Several sequence manipulation and analysis options and fully-automated links to local and WWW-based anaylsis programs facilitate an integrated working environment which allows you to view, align and analyze sequences from a single application with simple point-and-click operations. [Download] [Manual]

PHYLOGENETICS

PHYLIP is a free package of programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of executables. http://evolution.genetics.washington.edu/phylip.html
Download phylip V3.5

GENDOC is a Full Featured Multiple Sequence Alignment Editor, Analyser and Shading Utility for Windows. Features includes: Data Analysis and Visualization, Score assisted manual alignment, Paginated Printouts, Windows Based, Highly Configurable, Exported Figures and Phylogenetic Tree support. http://www.psc.edu/biomed/genedoc/
Download Gendoc V322700 (March, 2007)

TREEVIEW is a program for displaying and printing phylogenies. The program reads most NEXUS tree files (such as those produced by PAUP and COMPONENT) and PHYLIP style tree files (including those produced by fastDNAml and CLUSTALW).
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
Download Treeview V1.6.6 (Sep, 2001)

Software and Databases for Computational Biology on the Internet

This page is a supplement to the book Computational Methods in Molecular Biology, edited by Steven Salzberg, David Searls, and Simon Kasif. The publisher is Elsevier Sciences. Please contact Steven Salzberg (salzberg@cs.jhu.edu) if you wish to have your software referenced on this site, or if you wish to change the description of your software already listed here.

Gene finders and other sequence analysis programs

Glimmer is a system that uses Interpolated Markov Models (IMMs) to identify coding regions in microbial DNA. IMMs are a generalization of Markov models that allow great flexibility in the choice of the "context"; i.e., how many previous bases to use in predicting the next base. Glimmer has been tested on the complete genomes of H. influenzae, E. coli, H. pylori, M. genitalium, and other genomes, and results to date have proven it to be highly accurate. Glimmer was the principal gene finder for the genomes of B. burgdorferi , T. pallidum, C. trachomatis, C. pneumoniae, D. radiodurans, T. maritima, and others. The complete system, including source code, is available from this site. A version of the system built for the malaria parasite, GlimmerM, is also available.
GENSCAN is a program designed to predict complete gene structures, including exons, introns, promoter and poly-adenylation signals, in genomic sequences. It differs from the majority of existing gene finding algorithms in that it allows for partial genes as well as complete genes and for the occurrence of multiple genes in a single sequence, on either or both DNA strands. Program versions suitable for vertebrate, nematode (experimental), maize and Arabidopsis sequences are currently available. The vertebrate version also works fairly well for Drosophila sequences. Sequences can be submitted on a web-based form at this site. The GENSCAN Web site is at Stanford University.
VEIL (the Viterbi Exon-Intron Locator) uses a custom-designed hidden Markov model (HMM) to find genes in eukaryotic DNA. The training of the current version of VEIL used Burset and Guigo's database of 570 vertebrate sequences, so VEIL will work best on sequences from vertebrates. The VEIL site is at Johns Hopkins University.
MORGAN is an integrated system for finding genes in vertebrate DNA sequences. MORGAN uses a variety of techniques to accomplish this task, including a decision tree classifier, Markov chains to recognize splice sites, and a frame-dependent dynamic programming algorithm. Morgan has been trained and tested primarily on vertebrate sequence data. Results showing Morgan's accuracy and the source code to the system can be obtained from this site. The MORGAN site is at Johns Hopkins University.
Genie, a gene finder based on generalized hidden Markov models, is at the Lawrence Berkley National Laboratory. It was developed in collaboration with the Computational Biology Group at the University of California, Santa Cruz. Genie uses a statistical model of genes called a Generalized Hidden Markov Model (GHMM) to find genes in vertebrate and human DNA. In a GHMM, probabilities are assigned to transitions between states and to the generation of each nucleotide base given a particular state. Machine learning techniques are applied to optimize these probabilities using a standardized gene data set, which is available on this site. The page has a link to the Genie Web server, to which sequences may be submitted.
GRAIL, GENQUEST and The Genome Channel provide analysis and putative annotation of DNA sequences both interactively and through the use of automated computation. GRAIL is a tool for the identification of genes, exons, and various features in DNA sequences. GENQUEST is a sequence comparison program designed for rapid, sensitive comparison of DNA and protein sequences to DNA/protein databases. The GENOME CHANNEL is a tool for the comprehensive sequence-based view of genomes. These systems are at the Oak Ridge National Laboratory in Tennessee.
The FGENE family of programs finds splice sites, genes, promoters, and poly-A recognition regions in eukaryotic sequence data. The underlying technology uses linear discriminant analysis. You can submit sequences to FGENE using a Web interface found here. The site is located is at the Sanger Centre.
The GeneID server contains the GeneID system for finding genes in eukaryotes. GeneID is a hierarchical rule-based system, with scoring matrices to identify signals and rules to score coding regions. You can use this page to submit a genomic DNA sequence to the GeneID program. The GeneID site is at www1.imim.es/PersonalHomePage/rguigo/Geneid/geneid_input.html in Spain, and is also available at this page at Boston University.
GeneParser identifies protein coding regions in eukaryotic DNA sequences. The home page at the University of Colorado includes various documents describing GeneParser's theory and performance as well as some sample output screens. The complete system is available here.
GenLang is a syntactic pattern recognition system that uses the tools and techniques of computational linguistics to find genes and other higher-order features in biological sequnce data. Patterns are specified by means of rule sets called grammars, and a general purpose parser, implemented in the logic programming language Prolog, then performs the search. This system is at the University of Pennsylvania.
GeneMark is a system for finding genes in bacterial DNA sequences. The algorithm is based on non-homogeneous 5th-order Markov chains, and it was used to locate the genes in the complete genomes of H. influenzae, M. genitalium, and several other complete genomes. The site includes documentation and a Web interface to which sequences can be submitted. This system is at the Georgia Institute of Technology in Atlanta, GA.
MarFinder uses statistical patterns to deduce the presence of MARs (Matrix Association Regions) in DNA sequences. MARs constitute a significant functional block and have been shown to facilitate the processes of differential gene expression and DNA replication. This tool and Web site are at the National Center for Genome Resources.
NetPlantGene is at the Technical University of Denmark. The NetPlantGene Web server uses neural networks to predict splice sites in Arabidopsis thaliana DNA. This site also contains programs for other sequence analysis problems as well, such as the recognition of signal peptides. NetPlantGene is to be replaced with NetGene2.
MZEF and Pombe. This page contains software tools designed to predict putative internal protein coding exons in genomic DNA sequences. Human, mouse and arabidopsis exons are predicted by a program called MZEF, and fission yeast exons are predicted by a program called Pombe. The site is located at the Cold Spring Harbor Laboratory.
PROCRUSTES finds the multi-exon structure of a gene by aligning it with the protein databases. PROCRUSTES uses an algorithm called spliced alignment, which explores all possible exon assemblies and finds the multi-exon structure with the best fit to a related protein. If a database sequence exists that is closely similar to the query. PROCRUSTES will produce a highly accurate prediction. This program and Web page are at the University of Southern California.
Promoter Prediction by Neural Network (NNPP) is a method that finds eukaryotic and prokaryotic promoters in a DNA sequence. The basis of the NNPP program is a time-delay neural network. The time-delay network consists mainly of two feature layers, one for recognizing the TATA-box and one for recognizing the "Initiator", which is the region spanning the transcription start site. Both feature layers are combined into one output unit, which gives output scores between 0 and 1. This site is at the Lawrence Berkley National Laboratory. Also available at this site is the splice site predictor used by the Genie system. The output of this neural network is a score between 0 and 1 indicating a potential splice site.
Repeat Pattern Toolkit (RPT) consists of tools for analyzing repetitive sequences in a genome. RPT takes as input a single sequence in GenBank format, and attempts to find both coding (possible gene duplications, pseudogenes, homologous genes) and non-coding repeats. RPT locates all repeats using a fast Senstive Search Tool (SST). These repeats are evaluated for statistical significance utilizing a sensitive All-PAM search, and their evolutionary distance is estimated. The repeats are classified into families of similar sequences. The classification output is tabulated using perl scripts and plotted using gnuplot. RPT is at the Institute for Biomedical Computing at Washington University in St. Louis.
GenTerpret integrates local and web-based sequence analysis tools, including the identification of repetitive elements, coding regions and promoters, with BLAST or user-defined programs. The package includes SorFind 2.0, RepFind 2.0, and PromFind 2.0. It currently runs on Windows machines, and a Macintosh version is under development.
SplicePredictor is a program designed to predict donor and acceptor splice sites in maize and Arabidopsis sequences. Sequences can be submitted on a web-based form at this site. The system is at Stanford University.
The The TIGR Software Tool Collection is at The Institute for Genomic Research. A number of software tools are freely available for download. Tools currently available include:
- AAT: A tool for analyzing and annotating genomic sequences. Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. (1997) Genomics 46, 37-45. The AAT package includes two sets of programs, one set (DPS/NAP) for comparing the query sequence with a protein database, and the other (DDS/GAP2) for comparing the query with a cDNA database. Each set contains a fast database search program and a rigorous alignment program.
- btab: a BLAST output parser, which reformats BLAST output into an easily parsable form which can be used by a variety of other programs. Btab was written by Mark Dubnick at the NIH and appeared in CABIOS 8:601-602. The program has been modified by TIGR. The original program may be obtained from NIH.
- Glimmer: a bacterial gene finding system (with its own separate page; see elsewhere on this page)
- grasta: Modified Fasta code that searches both strands and outputs btab format files
- hbqcm (Hexamer Based Quality Control Method): a quality control algorithm for DNA sequencing projects
- Lucy: A utility that prepares raw DNA sequence fragments for sequence assembly. This sequence cleanup program includes quality assessment, confidence reassurane, vector trimming and vector removal.
- MUMmer: A system for aligning whole genome sequences. Uses a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Usage of the algorithm should facilitate analysis of syntenic chromosomal regionis, strain-to-strain comparisons, evolutionary comparisons, and genomic duplications.
- TIGR Assembler: a tool for assembly of large sets of overlapping sequence data such as ESTs, BACs, or small genomes
This page is at The Institute for Genomic Research in Rockville, Maryland.
TESS (Transcription Element Search Software) is a set of software routines for locating and displaying transcription factor binding sites in DNA sequence. TESS uses the Transfac database as its store of transcription factors and their binding sites. This page is at the University of Pennsylvania's Computational Biology and Informatics Laboratory.
Genotator, a workbench for automated sequence annotation, provides a flexible, transparent system for automatically running a series of sequence analysis programs on genetic sequences. It also has a graphical display that allows users to view all of the automatically-generated annotations and add their own. Genotator's display allows annotated sequences to be examined at multiple levels of detail, from an overview of the entire sequence down to individual bases. By displaying the aligned output of multiple types of sequence analysis, Genotator provides an intuitive way to identify the significant regions (for example, probable exons) in a sequence. Genotator was developed by Nomi Harris at Lawrence Berkely National Laboratory.
WebGene (GenView, ORFGene, SpliceView) is Web interface for several coding region recognition programs, including:
- GenView: a system for protein-coding gene prediction
- ORFGene: gene structure prediction using information on homologous protein sequences
- SpliceView: prediction of splicing signals
- HCpolya: a hamming Clustering Method for Poly-A prediction in eukaryotic genes
This page is at the Instituto Tecnologie Biomediche Avanzate in Italy.
The Staden Package contains a wealth of useful programs for sequence assembly, DNA sequence comparison and analysis, protein sequence analysis, and sequencing project quality control. The site is mirrored in several locations around the world.

Databases
The NCBI WWW Entrez PubMed Browser, at the National Center for Biotechnology Information (NCBI), is an important resource for searching the NCBI protein, nucleotide, 3-D structures, and genomes databases. You can also browse NCBI's taxonomy and search for bibliographic entries in Entrez PubMed.
GeneCards is a database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol as well as selected others. It is especially useful for those who are searching for information working in functional genomics and proteomics. The data is collected with Knowledge Discovery and Data Mining's techniques and accessed by means of proprietary Guidance System that makes more or less intelligent suggestions to the user of where and how the information may be retrieved.
GSDB is one of the four public nucleotide databases worldwide. Because GSDB is a community-curated and -owned database, the research community can obtain direct access to GSDB to deposit and curate data. As part of GSDB's efforts to provide the research community with a useful data set, sequence data from other archival databases such as DDJB, EMBL and GenBank are incorporated and made available with annotation. Data retrieval to GSDB is provided through Web query tools and direct SQL. GSDB is made available through the Web site of the National Center for Genome Resources.
HHS Sequence Classification. HHS is a database of sequences that have been clustered based on a variety of criteria. The database and clustering algorithms are described in Chapter 6. This Web page, at the Insitute for Biomedical Computing at Washington University in St. Louis, allows one to access classifications by sequence, group listing, structure, and alignment.
The EpoDB (Erythropoiesis Database) is a database of genes that relate to vertebrate red blood cells. A detailed description of EpoDB can be found on Chapter 5. The database includes DNA sequence, structural features and potential transcription factor binding sites. This Web site is at the University of Pennsylvania's CBIL.
The LENS (Linking ESTs and their associated Name Space) database links and resolves the names and identifiers of clones and ESTs generated in the I.M.A.G.E. Consortium/WashU/Merck EST project. The name space includes library and clone IDs and names from IMAGE Consortium, EST sequence IDs from Washington University, sequence entry accession numbers from dbEST/NCBI, and library and clone IDs from GDB. LENS allows for querying of IMAGE Consortium data via all the different IDs.
PDD, the NIMH-NCI Protein-Disease Database is at the Laboratory of Experimental and Computational Biology at the National Cancer Institute. This server is part of the NIMH-NCI Protein-Disease Database project for correlating diseases with proteins observable in serum, CSF, urine and other common human body fluids based on biomedical literature.
The Genome Database (GDB), the Johns Hopkins University School of Medicine, comprises descriptions of the following types of objects: regions of the human genome, including genes, clones, amplimers (PCR markers), breakpoints, cytogenetic markers, fragile sites, ESTs, syndromic regions, contigs and repeats; maps of the human genome, including cytogenetic maps, linkage maps, radiation hybrid maps, content contig maps, and integrated maps. These maps can be displayed graphically via the Web; variations within the human genome including mutations and polymorphisms, plus allele frequency data. NOTE: GDB has been transferred by DOE from Hopkins to the DOE lab at Oak Ridge.
The TRANSFAC Database is at the Gesellschaft für Biotechnologische Forschung mbH (Germany). TRANSFAC is a transcription factor database. It compiles data about gene regulatory DNA sequences and protein factors binding to them. On this basis, programs are developed that help to identify putative promoter or enhancer structures and to suggest their features.
TransTerm - Translational Signal Database is a database at the University of Otago (New Zealand). TransTerm contains sequence contexts about the stop and start codons of many species found in GenBank. TransTerm also contains codon usage data for these same species and summary statistics for the sequences analysed.

Motif Search:
PROSITE Search Form Allows you to rapidly compare a protein sequence against all patterns stored in the PROSITE pattern database. It answers the question: Which patterns from the PROSITE databse are found in my sequence? (EBI)
ScanProsite: Protein against Prosite form allows one to scan a protein sequence (either from SWISS-PROT or provided by the user) for the occurrences of patterns sorted in the PROSITE database. Pattern against SWISS-PROT scans in all of the SWISS-PROT database (including weekly releases) for the occurrence of a pattern that can originate from PROSITE or be provided by the user. (ExPASY)
Motifs in protein databases program determines if a protein motif is present in a database of protein sequences. This program allows the user to define a protein motif and then determine if a DNA sequence might encode them or if they are present in a protein database. The programs do not search a library of predefined protein motifs. A motif is defined by entering the amino acids of interest at each position.(Alces)
MatInspectorA tool for the detection of transcription factor binding sites. It is able to locate matches of sequences of unlimited length and compare one, several or all sequences in a sequence file against all or selected subsets of matrices from a library of matrix descriptions of protein binding sites. (GSF)
MEME - Multiple EM for Motif Elicitation system allows one to discover motifs of highly conserved regions in groups of related DNA or protein sequences and search sequence databases using motifs using MAST: Works by calculating match scores for each sequence. The match scores are converted into various types of p-values and these are used to determine the overall match of the sequence to the group of motifs.(SDSC)
Regular Expression Searches of Sequence DB using FPAT. This page is designed to search a molecular sequence database (proteins only) for patterns using simple regular expressions.At present, only protein sequence databases are available on the server. (Univ. of Toronto)
BCM Search Launcher : The Baylor College of Medicine has a variety of biology related search and analysis services including general protein sequence/Pattern searches and Species-Specific Protein Sequence Searches.(HGSC)
Screening pattern or alignment against PROTEIN databank This method of looking for all pattern entries in PROTEIN databank is almost the same as in PROSITE screening procedure. The one difference is that coincidence of pattern's and fragment's letter could be seen in a broad sense: as a similarity of letters according to a weight matrix selected by the user. (genebee)
Pattern Searching Proteins: A collection of software tools for protein sequence analysis. PATTINPROT scans a protein sequence or a protein database for one or several patterns. PROSCAN scans a protein sequence for sites/signatures against the PROSITE database.
PPSEARCH : Prosite Database Searches (sequence against databases of motifs). Allows you to search sequences for motifs or functional patterns in the prosite database (EBI)
COGnitor: Compare your sequence to COG- Clusters of Orthologous Groups database. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. (NCBI)
HMMER Sean Eddy: Profile hidden Markov models can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus.The advantage of using HMMS is that HMMS have a formal probabilistic basis and can be trained from unaligned sequences, if a trusted alignment isn't yet known. They do however make poor models of RNAs because they cannot describe base pairs. HMMER is a freely distibutable implementation of profile HMM software for protein sequence analysis. (Washington Univ.)
emotif is a research system that forms motifs for subsets of aligned sequences. Emotif ranks the motifs that it finds by both their specificity and the number of supplied sequences that it covers.(Stanford Bioinformatics Group)
FunSiteP Promoter Recognition: Recognition and classification of eukaryotic promoters by searching transcription factor binding sites using transcription factor binding site consensi. (GSF)
SMART Simple Modular Architecture Research Tool: Allows rapid identification and annotation of signalling protein domain sequences. It is able to determine the modular architectures of single sequences or genomes. (EBI)
SAM : Sequence Alignment and Modeling System using HMM (Hidden Markov Model). SAM is a collection of software tools for creating, refining and using linear HMM for biological sequence analysis. Documentation for SAM can be found here.(Pasteur)

Secondary Structure Prediction:
THREADER2 is a program for predicting protein tertiary structure by recognizing the correct fold from a library of alternatives. Of course, if a fold similar to the native fold of the protein being predicted is not in the library, then this approach will not succeed. Fortunately, certain folds crop up time and time again, and so fold recognition methods for predicting protein structure can be very effective. In the first prediction contest held at Asilomar, organized by John Moult and colleagues, THREADER correctly identified 8 out of 11 target structures which either globally or locally resembled a previously observed fold. Preliminary analysis of the results from the second competition (CASP2) show that THREADER 2 has shown clear improvement in both fold recognition sensitivity AND sequence-structure alignment accuracy. In CASP2, the new version of THREADER recognized 4 folds correctly out of 6 targets with recognizable structures (including the difficult task of assigning a jelly-roll fold rather than other beta-sandwich topologies for one target). THREADER 2 produced more correct fold predictions (i.e. correct folds ranked at No. 1) than any other method.
Predict Protein is a service for sequence analysis and structure prediction. Once you submit a protein sequence, PredictProtein retrieves similar sequences in the database and predicts aspects of protein structure, residue solvent accessibility and helical transmembrane regions. (EMBL)
MultPred - Multpredict Secondary Structure of Multiply Aligned Sequences: This program predicts secondary structure using physicochemical information from a set of aligned sequences and the Garnier secondary structure decision constants. The program requires as input, the sequences, aligned using the AMPS program. (AMPS)
NNPREDICT Protein Secondary Structure Prediction: A program that predicts the secondary structure type for each residue in an amino acid sequence. The basis of the prediction is a two-layer, feed-forward neural network. NNPREDICT takes as input a protein sequence and returns a secondary structure prediction for each position in the sequence.(UCSF)
PSA Protein Structure Prediction Server: Predicts probable secondary structures and folding classes for a given amino acid sequence. It performs three types of protein structure/sequence analysis:
1. Analysis of full length amino acid sequences that are assumed to be monomeric globular, water-soluble proteins consisting of a single domain
2. Analysis of either complete sequences, or sequence fragments with a minimal set of modelled structural assumptions
3. Analysis of potential WD-repeat protein family sequences
(BMERC at Boston University)
SSCP Secondary Structural Content Prediction computes predictions for the content of helix, strand, and coil for a given protein using the amino acid composition as the only input of inofrmation. The method used by SSCP consists in the application of analytic vector decomposition methods applied on the composition vector of the query protein.
PREDATOR Secondary structure prediction from fingle or multiple sequences. Takes as input a single protein sequence to be predicted and can optimally use a set of unaligned sequences as additional information to predict the query sequence. PREDATOR does not use multiple sequence alignment, instead it relies on careful pairwise local alignments of the sequences in the set with the query sequence to be predicted. If you supply a set of sequences in the form of a multiple alignment in CLUSTAL or MSF format, the sequences will be used but unaligned. (EMBL)
RNA secondary structure prediction: If a multiple alignment is given by the user, the information on conservative positions in it and compensation exchanges in some of those will be used - stems, including such positions, are given more chances to be included into the resulting secondary structure. The algorithm is the following: first all of the possible ways of fitting together different pieces of the sequences are looked for. Then locally optimal secondary structures are built from the helices found. Lastly, the final system construction is done optimizing the model energy of the system (includes inputs from conservative and complementary pairs with corresponding coefficients). (Genebee)
PSCAN server page: A program to play with protein threading. Allows one to align two sequences, find a match in the database through email (more reliable) or without email.
RNA-mfold and DNA-mfold: Performs RNA and DNA secondary structure prediction using nearest neighbor thermodynamic rules. The mfold software uses what are called nearest neighbor energy rules. That is, free energies are assigned to loops rather than to base pairs. Documentation for the programs and more detail about how the structures are computed can be found here. (M. Zuker at Washington University)
SoWhat: The SoWhat WWW server predicts distance constraints between amino acids in proteins from the amino acid sequence. It uses a neural network based method to predict contacts between C-alpha atoms from the amino acid sequence. (CBS Denmark)
Pasteur Institute:
- STRIDE: Protein secondary structure assignment from atomic coordinates
- DSSP: Definition of secondary structure of proteins given a set of 3D coordinates
- DSC: Discrimination of protein secondary structure class
- PREDATOR: Protein secondary structure prediction from a single sequence or a set of sequences
- environ: calculate accessible as well as buried surface area in protein structure
- confmat: Side chain packing optimization on a given main chain template for protein
Other Software and Information Sources:
The VSNS BioComputing Division offers educational services over the Internet in bioinformatics/biocomputing. They have offered award winning online courses in sequence analysis. The site includes a hypertext coursebook, covering topics such as pairwise sequence alignments, networking, and multiple alignment. You can also find a collection of online exercises, called ``Sequence Analysis with Distributed Resources'', and ``Biocomputing For Everyone'' and ``Biocomputing For Schools'' Websites.
The Banbury Cross Site is a web page for benchmarking gene identification software. Banbury Cross is at the Centre National De La Recherche Scientifique. This Benchmark site is intended to be a forum for scientists working in the field of gene identification and anonymous genomic sequence annotation, with the goal of improving current methods in the context of very large (in particular) vertebrate genomic sequences.
CBIL bioWidgets, at the University of Pennsylvania, is a collection of software libraries used for rapid development of graphical molecular biological applications. It includes:
- bioWidgets for Java(tm), a toolkit of biology-specific user interface widgets useful for rapid application development in Java(tm)
- bioTK, a toolkit of biology-specific user interface widgets useful for rapid application development in Tcl/Tk
- RSVP, a PostScript tool which lets your printer do nucleic acid sequence analysis; it generates very nice color diagrams of the results.
Human Genome Project Information at Oak Ridge National Laboratory contains many interesting and useful items about the U.S. Human Genome Project. They also have a more technical Research site.
FAKtory: A software environment for DNA Sequencing is at the University of Arizona. It is a prototype software environment in support of DNA sequencing. The environment consists of
1. their software library, FAK, for the core combinatorial problem of assembling fragments
2. a Tcl/Tk based interface
3. a software suite supporting a database of fragments and a processing pipeline that includes clipping, tagging, and vector removal modules.
A key feature of FAKtory is that it is highly customizable: the structure of the fragment database, the processing pipeline, and the operation of each phase of the pipeline may be specified by the user.
Computational Analysis and Annotation of Sequence Data. This is a tutorial by A. Baxevanis, M. Boguski, and B.F. Ouellette on how to use alignment programs and databases for sequence comparison. It is a review that will appear in the forthcoming book Genome Analysis: A Laboratory Manual (Bruce Birren, Eric Green, Phil Hieter, Sue Klapholz and Rick Myers, eds) to be published by Cold Spring Harbor Laboratory Press. The hypertext version of the review is linked to Medline records, software repositories, sequences, structures, and taxonomies via the Entrez system of the National Center for Biotechnology Information.

Software and Databases for Computational Biology on the Internet

Gene finders and other sequence analysis programs

Databases

Motif Search:

Secondary Structure Prediction:

Other Software and Information Sources: