Softwares
for Bioinformatics
FTP
Netterm
WebLab Viewer
gstools
Mage
plasmid processor 1.02
Prekin
Primer
Promsed
RasMol
Biowire-Jellyfish
Swiss-Pdb Viewer
Sequin95
Vector NTI
X-Win412
Winseq
Winplasd
|
Octopus
is an easy-to-use graphical user interface designed for the rapid
interpretation of BLAST, BLAST-2 and FASTA output text files.
[Download] |
SEQtools
is a comprehensive program package for handling and analysis of nucleotide
and protein sequences. The
program
includes a series of trivial functions
to help you carry out common operations.
Special functions are included
for design of microarray gene expression analysis experiments, for
expression analyses with the SAGE procedure and for managing small EST
projects. Utilities are included for primer design and ordering, renaming
files, creating codon usage tables, building local searchable databases,
aligning nucleotide and protein sequences, comparing sequences and a lot
more...
[Registration] |
¡@
BioEdit is a
biological sequence alignment editor written for Windows 95/98/NT. A rich,
intuitive multiple document interface with many convenient features makes
alignment, manipulation and viewing of sequences relatively quick and easy on
your desktop computer. Several sequence manipulation and
analysis options and fully-automated links to local and WWW-based anaylsis
programs facilitate an integrated working environment which allows you to view,
align and analyze sequences from a single application with simple
point-and-click operations. [Download]
[Manual]
PHYLOGENETICS |
PHYLIP
is a free package of programs for inferring phylogenies. It is
distributed as source code, documentation files, and a number of different
types of executables.
http://evolution.genetics.washington.edu/phylip.html
Download phylip V3.5
¡@ |
GENDOC
is a Full Featured Multiple Sequence Alignment Editor, Analyser and
Shading Utility for Windows. Features includes: Data Analysis and
Visualization, Score assisted manual alignment, Paginated Printouts, Windows
Based, Highly Configurable, Exported Figures and Phylogenetic Tree support.
http://www.psc.edu/biomed/genedoc/
Download Gendoc V322700 (March, 2007)
TREEVIEW
is a program for displaying and printing phylogenies. The program reads most
NEXUS tree files (such as those produced by PAUP and COMPONENT) and PHYLIP
style tree files (including those produced by fastDNAml and CLUSTALW).
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
Download Treeview V1.6.6 (Sep, 2001) |
Software and Databases for Computational Biology on the Internet
This page is a supplement to the book Computational Methods in
Molecular Biology, edited by Steven Salzberg, David Searls, and Simon Kasif.
The publisher is Elsevier Sciences. Please contact Steven Salzberg
(salzberg@cs.jhu.edu) if you wish to have your software referenced on this site,
or if you wish to change the description of your software already listed here.
Gene finders and other sequence analysis programs
- Glimmer is a
system that uses Interpolated Markov Models (IMMs) to identify coding regions
in microbial DNA. IMMs are a generalization of Markov models that allow great
flexibility in the choice of the "context"; i.e., how many previous bases to
use in predicting the next base. Glimmer has been tested on the complete
genomes of H. influenzae, E. coli, H. pylori, M. genitalium, and other
genomes, and results to date have proven it to be highly accurate. Glimmer was
the principal gene finder for the genomes of B. burgdorferi , T.
pallidum, C. trachomatis, C. pneumoniae, D. radiodurans, T. maritima, and
others. The complete system, including source code, is available from this
site. A version of the system built for the malaria parasite, GlimmerM,
is also available.
- GENSCAN is a program
designed to predict complete gene structures, including exons, introns,
promoter and poly-adenylation signals, in genomic sequences. It differs from
the majority of existing gene finding algorithms in that it allows for partial
genes as well as complete genes and for the occurrence of multiple genes in a
single sequence, on either or both DNA strands. Program versions suitable for
vertebrate, nematode (experimental), maize and Arabidopsis sequences are
currently available. The vertebrate version also works fairly well for
Drosophila sequences. Sequences can be submitted on a web-based form at this
site. The GENSCAN Web site is at Stanford University.
- VEIL (the
Viterbi Exon-Intron Locator) uses a custom-designed hidden Markov model (HMM)
to find genes in eukaryotic DNA. The training of the current version of VEIL
used Burset and Guigo's database of 570 vertebrate sequences, so VEIL will
work best on sequences from vertebrates. The VEIL site is at Johns Hopkins
University.
- MORGAN is an
integrated system for finding genes in vertebrate DNA sequences. MORGAN uses a
variety of techniques to accomplish this task, including a decision tree
classifier, Markov chains to recognize splice sites, and a frame-dependent
dynamic programming algorithm. Morgan has been trained and tested primarily on
vertebrate sequence data. Results showing Morgan's accuracy and the source
code to the system can be obtained from this site. The MORGAN site is at Johns
Hopkins University.
- Genie, a gene finder
based on generalized hidden Markov models, is at the Lawrence Berkley National
Laboratory. It was developed in collaboration with the Computational Biology
Group at the University of California, Santa Cruz. Genie uses a statistical
model of genes called a Generalized Hidden Markov Model (GHMM) to find genes
in vertebrate and human DNA. In a GHMM, probabilities are assigned to
transitions between states and to the generation of each nucleotide base given
a particular state. Machine learning techniques are applied to optimize these
probabilities using a standardized gene data set, which is available on this
site. The page has a link to the Genie Web server, to which sequences may be
submitted.
- GRAIL, GENQUEST and The Genome Channel
provide analysis and putative annotation of DNA sequences both interactively
and through the use of automated computation. GRAIL is a tool for the
identification of genes, exons, and various features in DNA sequences.
GENQUEST is a sequence comparison program designed for rapid, sensitive
comparison of DNA and protein sequences to DNA/protein databases. The GENOME
CHANNEL is a tool for the comprehensive sequence-based view of genomes. These
systems are at the Oak Ridge National Laboratory in Tennessee.
- The FGENE family
of programs finds splice sites, genes, promoters, and poly-A
recognition regions in eukaryotic sequence data. The underlying technology
uses linear discriminant analysis. You can submit sequences to FGENE using a
Web interface found here. The site is located is at the Sanger Centre.
- The GeneID server
contains the GeneID system for finding genes in eukaryotes. GeneID is a
hierarchical rule-based system, with scoring matrices to identify signals and
rules to score coding regions. You can use this page to submit a genomic DNA
sequence to the GeneID program. The GeneID site is at www1.imim.es/PersonalHomePage/rguigo/Geneid/geneid_input.html
in Spain, and is also available at this page at Boston University.
- GeneParser
identifies protein coding regions in eukaryotic DNA sequences. The home page
at the University of Colorado includes various documents describing
GeneParser's theory and performance as well as some sample output screens. The
complete system is available here.
- GenLang is a
syntactic pattern recognition system that uses the tools and techniques of
computational linguistics to find genes and other higher-order features in
biological sequnce data. Patterns are specified by means of rule sets called
grammars, and a general purpose parser, implemented in the logic programming
language Prolog, then performs the search. This system is at the University of
Pennsylvania.
- GeneMark is a system
for finding genes in bacterial DNA sequences. The algorithm is based on
non-homogeneous 5th-order Markov chains, and it was used to locate the genes
in the complete genomes of H. influenzae, M. genitalium, and several other
complete genomes. The site includes documentation and a Web interface to which
sequences can be submitted. This system is at the Georgia Institute of
Technology in Atlanta, GA.
- MarFinder uses
statistical patterns to deduce the presence of MARs (Matrix Association
Regions) in DNA sequences. MARs constitute a significant functional block and
have been shown to facilitate the processes of differential gene expression
and DNA replication. This tool and Web site are at the National Center for
Genome Resources.
- NetPlantGene is at
the Technical University of Denmark. The NetPlantGene Web server uses neural
networks to predict splice sites in Arabidopsis thaliana DNA. This site
also contains programs for other sequence analysis problems as well, such as
the recognition of signal peptides. NetPlantGene is to be replaced with NetGene2.
- MZEF and Pombe. This page
contains software tools designed to predict putative internal protein coding
exons in genomic DNA sequences. Human, mouse and arabidopsis exons are
predicted by a program called MZEF, and fission yeast exons are predicted by a
program called Pombe. The site is located at the Cold Spring Harbor
Laboratory.
- PROCRUSTES
finds the multi-exon structure of a gene by aligning it with the protein
databases. PROCRUSTES uses an algorithm called spliced alignment, which
explores all possible exon assemblies and finds the multi-exon structure with
the best fit to a related protein. If a database sequence exists that is
closely similar to the query. PROCRUSTES will produce a highly accurate
prediction. This program and Web page are at the University of Southern
California.
- Promoter
Prediction by Neural Network (NNPP) is a method that finds eukaryotic and
prokaryotic promoters in a DNA sequence. The basis of the NNPP program is a
time-delay neural network. The time-delay network consists mainly of two
feature layers, one for recognizing the TATA-box and one for recognizing the
"Initiator", which is the region spanning the transcription start site. Both
feature layers are combined into one output unit, which gives output scores
between 0 and 1. This site is at the Lawrence Berkley National Laboratory.
Also available at this
site is the splice site predictor used by the Genie system. The output of
this neural network is a score between 0 and 1 indicating a potential splice
site.
- Repeat Pattern Toolkit
(RPT) consists of tools for analyzing repetitive sequences in a genome.
RPT takes as input a single sequence in GenBank format, and attempts to find
both coding (possible gene duplications, pseudogenes, homologous genes) and
non-coding repeats. RPT locates all repeats using a fast Senstive Search Tool
(SST). These repeats are evaluated for statistical significance utilizing a
sensitive All-PAM search, and their evolutionary distance is estimated. The
repeats are classified into families of similar sequences. The classification
output is tabulated using perl scripts and plotted using gnuplot. RPT is at
the Institute for Biomedical Computing at Washington University in St. Louis.
- GenTerpret integrates local and
web-based sequence analysis tools, including the identification of repetitive
elements, coding regions and promoters, with BLAST or user-defined
programs. The package includes SorFind 2.0, RepFind 2.0, and
PromFind 2.0. It currently runs on Windows machines, and a Macintosh
version is under development.
- SplicePredictor
is a program designed to predict donor and acceptor splice sites in maize and
Arabidopsis sequences. Sequences can be submitted on a web-based form at this
site. The system is at Stanford University.
- The The TIGR Software Tool
Collection is at The Institute for Genomic Research. A number of software
tools are freely available for download. Tools currently available include:
- AAT: A tool for analyzing and annotating genomic sequences.
Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. (1997) Genomics 46,
37-45. The AAT package includes two sets of programs, one set (DPS/NAP) for
comparing the query sequence with a protein database, and the other
(DDS/GAP2) for comparing the query with a cDNA database. Each set contains a
fast database search program and a rigorous alignment program.
- btab: a BLAST output parser, which reformats BLAST output into an
easily parsable form which can be used by a variety of other programs. Btab
was written by Mark Dubnick at the NIH and appeared in CABIOS
8:601-602. The program has been modified by TIGR. The original program may
be obtained from NIH.
- Glimmer: a bacterial gene finding system (with its own separate
page; see elsewhere
on this page)
- grasta: Modified Fasta code that searches both strands and
outputs btab format files
- hbqcm (Hexamer Based Quality Control Method): a quality control
algorithm for DNA sequencing projects
- Lucy: A utility that prepares raw DNA sequence fragments for
sequence assembly. This sequence cleanup program includes quality
assessment, confidence reassurane, vector trimming and vector removal.
- MUMmer: A system for aligning whole genome sequences. Uses a
suffix tree, the system is able to rapidly align sequences containing
millions of nucleotides. Usage of the algorithm should facilitate analysis
of syntenic chromosomal regionis, strain-to-strain comparisons, evolutionary
comparisons, and genomic duplications.
- TIGR Assembler: a tool for assembly of large sets of overlapping
sequence data such as ESTs, BACs, or small genomes
This page is at
The Institute for Genomic Research in
Rockville, Maryland.
- TESS
(Transcription Element Search Software) is a set of software routines for
locating and displaying transcription factor binding sites in DNA sequence.
TESS uses the Transfac database as its store of transcription factors and
their binding sites. This page is at the University of Pennsylvania's
Computational Biology and Informatics Laboratory.
- Genotator, a
workbench for automated sequence annotation, provides a flexible, transparent
system for automatically running a series of sequence analysis programs on
genetic sequences. It also has a graphical display that allows users to view
all of the automatically-generated annotations and add their own. Genotator's
display allows annotated sequences to be examined at multiple levels of
detail, from an overview of the entire sequence down to individual bases. By
displaying the aligned output of multiple types of sequence analysis,
Genotator provides an intuitive way to identify the significant regions (for
example, probable exons) in a sequence. Genotator was developed by Nomi Harris
at Lawrence Berkely National Laboratory.
- WebGene (GenView, ORFGene,
SpliceView) is Web interface for several coding region recognition
programs, including:
- GenView: a system for protein-coding gene prediction
- ORFGene: gene structure prediction using information on homologous
protein sequences
- SpliceView: prediction of splicing signals
- HCpolya: a hamming Clustering Method for Poly-A prediction in eukaryotic
genes
This page is at the Instituto Tecnologie Biomediche Avanzate
in Italy.
- The Staden Package
contains a wealth of useful programs for sequence assembly, DNA sequence
comparison and analysis, protein sequence analysis, and sequencing project
quality control. The site is mirrored in several locations around the
world.
Databases
- The NCBI WWW
Entrez PubMed Browser, at the National Center for Biotechnology
Information (NCBI), is an important resource for searching the NCBI protein,
nucleotide, 3-D structures, and genomes databases. You can also browse NCBI's
taxonomy and search for bibliographic entries in Entrez PubMed.
- GeneCards is a
database of human genes, their products and their involvement in diseases. It
offers concise information about the functions of all human genes that have an
approved symbol as well as selected others. It is especially useful for those
who are searching for information working in functional genomics and
proteomics. The data is collected with Knowledge Discovery and Data Mining's
techniques and accessed by means of proprietary Guidance System that makes
more or less intelligent suggestions to the user of where and how the
information may be retrieved.
- GSDB is one of the four public
nucleotide databases worldwide. Because GSDB is a community-curated and -owned
database, the research community can obtain direct access to GSDB to deposit
and curate data. As part of GSDB's efforts to provide the research community
with a useful data set, sequence data from other archival databases such as
DDJB, EMBL and GenBank are incorporated and made available with annotation.
Data retrieval to GSDB is provided through Web query tools and direct SQL.
GSDB is made available through the Web site of the National Center for Genome
Resources.
- HHS Sequence
Classification. HHS is a database of sequences that have been clustered
based on a variety of criteria. The database and clustering algorithms are
described in Chapter 6. This Web page, at the Insitute for Biomedical
Computing at Washington University in St. Louis, allows one to access
classifications by sequence, group listing, structure, and alignment.
- The EpoDB
(Erythropoiesis Database) is a database of genes that relate to vertebrate
red blood cells. A detailed description of EpoDB can be found on Chapter 5.
The database includes DNA sequence, structural features and potential
transcription factor binding sites. This Web site is at the University of
Pennsylvania's CBIL.
- The LENS
(Linking ESTs and their associated Name Space) database links and resolves the
names and identifiers of clones and ESTs generated in the I.M.A.G.E.
Consortium/WashU/Merck EST project. The name space includes library and clone
IDs and names from IMAGE Consortium, EST sequence IDs from Washington
University, sequence entry accession numbers from dbEST/NCBI, and library and
clone IDs from GDB. LENS allows for querying of IMAGE Consortium data via all
the different IDs.
- PDD, the NIMH-NCI
Protein-Disease Database is at the Laboratory of Experimental and
Computational Biology at the National Cancer Institute. This server is part of
the NIMH-NCI Protein-Disease Database project for correlating diseases with
proteins observable in serum, CSF, urine and other common human body fluids
based on biomedical literature.
- The Genome Database (GDB), the Johns
Hopkins University School of Medicine, comprises descriptions of the following
types of objects: regions of the human genome, including genes, clones,
amplimers (PCR markers), breakpoints, cytogenetic markers, fragile sites,
ESTs, syndromic regions, contigs and repeats; maps of the human genome,
including cytogenetic maps, linkage maps, radiation hybrid maps, content
contig maps, and integrated maps. These maps can be displayed graphically via
the Web; variations within the human genome including mutations and
polymorphisms, plus allele frequency data. NOTE: GDB has been transferred by
DOE from Hopkins to the DOE lab at Oak Ridge.
- The
TRANSFAC Database is at the Gesellschaft für Biotechnologische Forschung
mbH (Germany). TRANSFAC is a transcription factor database. It compiles data
about gene regulatory DNA sequences and protein factors binding to them. On
this basis, programs are developed that help to identify putative promoter or
enhancer structures and to suggest their features.
- TransTerm
- Translational Signal Database is a database at the University of Otago
(New Zealand). TransTerm contains sequence contexts about the stop and start
codons of many species found in GenBank. TransTerm also contains codon usage
data for these same species and summary statistics for the sequences analysed.
Motif Search:
- PROSITE
Search Form Allows you to rapidly compare a protein sequence against all
patterns stored in the PROSITE pattern database. It answers the question:
Which patterns from the PROSITE databse are found in my sequence? (EBI)
- ScanProsite: Protein
against Prosite form allows one to scan a protein sequence (either from
SWISS-PROT or provided by the user) for the occurrences of patterns sorted in
the PROSITE database. Pattern against SWISS-PROT
scans in all of the SWISS-PROT database (including weekly releases) for the
occurrence of a pattern that can originate from PROSITE or be provided by the
user. (ExPASY)
- Motifs in protein
databases program determines if a protein motif is present in a database
of protein sequences. This program allows the user to define a protein motif
and then determine if a DNA sequence might encode them or if they are present
in a protein database. The programs do not search a library of predefined
protein motifs. A motif is defined by entering the amino acids of interest at
each position.(Alces)
- MatInspectorA tool
for the detection of transcription factor binding sites. It is able to locate
matches of sequences of unlimited length and compare one, several or all
sequences in a sequence file against all or selected subsets of matrices from
a library of matrix descriptions of protein binding sites. (GSF)
- MEME - Multiple EM for Motif
Elicitation system allows one to discover motifs of highly conserved regions
in groups of related DNA or protein sequences and search sequence databases
using motifs using MAST: Works
by calculating match scores for each sequence. The match scores are converted
into various types of p-values and these are used to determine the overall
match of the sequence to the group of motifs.(SDSC)
- Regular Expression Searches of
Sequence DB using FPAT. This page is designed to search a molecular
sequence database (proteins only) for patterns using simple regular
expressions.At present, only protein sequence databases are available on the
server. (Univ. of Toronto)
- BCM Search Launcher : The
Baylor College of Medicine has a variety of biology related search and
analysis services including general
protein sequence/Pattern searches and Species-Specific
Protein Sequence Searches.(HGSC)
- Screening
pattern or alignment against PROTEIN databank This method of looking for
all pattern entries in PROTEIN databank is almost the same as in PROSITE
screening procedure. The one difference is that coincidence of pattern's and
fragment's letter could be seen in a broad sense: as a similarity of letters
according to a weight matrix selected by the user. (genebee)
- Pattern Searching Proteins: A
collection of software tools for protein sequence analysis. PATTINPROT
scans a protein sequence or a protein database for one or several
patterns. PROSCAN
scans a protein sequence for sites/signatures against the PROSITE database.
- PPSEARCH : Prosite Database
Searches (sequence against databases of motifs). Allows you to search
sequences for motifs or functional patterns in the prosite database (EBI)
- COGnitor:
Compare your sequence to COG- Clusters of Orthologous Groups database. Each
COG consists of individual proteins or groups of paralogs from at least 3
lineages and thus corresponds to an ancient conserved domain. (NCBI)
- HMMER Sean Eddy: Profile hidden
Markov models can be used to do sensitive database searching using statistical
descriptions of a sequence family's consensus.The advantage of using HMMS is
that HMMS have a formal probabilistic basis and can be trained from unaligned
sequences, if a trusted alignment isn't yet known. They do however make poor
models of RNAs because they cannot describe base pairs. HMMER is a freely
distibutable implementation of profile HMM software for protein sequence
analysis. (Washington Univ.)
- emotif is a research system
that forms motifs for subsets of aligned sequences. Emotif ranks the motifs
that it finds by both their specificity and the number of supplied sequences
that it covers.(Stanford Bioinformatics Group)
- FunSiteP
Promoter Recognition: Recognition and classification of eukaryotic promoters
by searching transcription factor binding sites using transcription factor
binding site consensi. (GSF)
- SMART Simple Modular
Architecture Research Tool: Allows rapid identification and annotation of
signalling protein domain sequences. It is able to determine the modular
architectures of single sequences or genomes. (EBI)
- SAM :
Sequence Alignment and Modeling System using HMM (Hidden Markov Model). SAM is
a collection of software tools for creating, refining and using linear HMM for
biological sequence analysis. Documentation for SAM can be found here.(Pasteur)
Secondary Structure Prediction:
- THREADER2 is a
program for predicting protein tertiary structure by recognizing the correct
fold from a library of alternatives. Of course, if a fold similar to the
native fold of the protein being predicted is not in the library, then this
approach will not succeed. Fortunately, certain folds crop up time and time
again, and so fold recognition methods for predicting protein structure can be
very effective. In the first prediction contest held at Asilomar, organized by
John Moult and colleagues, THREADER correctly identified 8 out of 11 target
structures which either globally or locally resembled a previously observed
fold. Preliminary analysis of the results from the second competition (CASP2)
show that THREADER 2 has shown clear improvement in both fold recognition
sensitivity AND sequence-structure alignment accuracy. In CASP2, the new
version of THREADER recognized 4 folds correctly out of 6 targets with
recognizable structures (including the difficult task of assigning a
jelly-roll fold rather than other beta-sandwich topologies for one target).
THREADER 2 produced more correct fold predictions (i.e. correct folds ranked
at No. 1) than any other method.
- Predict
Protein is a service for sequence analysis and structure prediction. Once
you submit a protein sequence, PredictProtein retrieves similar sequences in
the database and predicts aspects of protein structure, residue solvent
accessibility and helical transmembrane regions. (EMBL)
- MultPred -
Multpredict Secondary Structure of Multiply Aligned Sequences: This program
predicts secondary structure using physicochemical information from a set of
aligned sequences and the Garnier secondary structure decision constants. The
program requires as input, the sequences, aligned using the AMPS program. (AMPS)
- NNPREDICT
Protein Secondary Structure Prediction: A program that predicts the secondary
structure type for each residue in an amino acid sequence. The basis of the
prediction is a two-layer, feed-forward neural network. NNPREDICT takes as
input a protein sequence and returns a secondary structure prediction for each
position in the sequence.(UCSF)
- PSA Protein Structure Prediction
Server: Predicts probable secondary structures and folding classes for a
given amino acid sequence. It performs three types of protein
structure/sequence analysis:
- Analysis of full length amino acid sequences that are assumed to be
monomeric globular, water-soluble proteins consisting of a single domain
- Analysis of either complete sequences, or sequence fragments with a
minimal set of modelled structural assumptions
- Analysis of potential WD-repeat protein family
sequences
(BMERC at Boston University)
- SSCP Secondary
Structural Content Prediction computes predictions for the content of
helix, strand, and coil for a given protein using the amino acid composition
as the only input of inofrmation. The method used by SSCP consists in the
application of analytic vector decomposition methods applied on the
composition vector of the query protein.
- PREDATOR
Secondary structure prediction from fingle or multiple sequences. Takes as
input a single protein sequence to be predicted and can optimally use a set of
unaligned sequences as additional information to predict the query sequence.
PREDATOR does not use multiple sequence alignment, instead it relies on
careful pairwise local alignments of the sequences in the set with the query
sequence to be predicted. If you supply a set of sequences in the form of a
multiple alignment in CLUSTAL or MSF format, the sequences will be used but
unaligned. (EMBL)
- RNA
secondary structure prediction: If a multiple alignment is given by the
user, the information on conservative positions in it and compensation
exchanges in some of those will be used - stems, including such positions, are
given more chances to be included into the resulting secondary structure. The
algorithm is the following: first all of the possible ways of fitting together
different pieces of the sequences are looked for. Then locally optimal
secondary structures are built from the helices found. Lastly, the final
system construction is done optimizing the model energy of the system
(includes inputs from conservative and complementary pairs with corresponding
coefficients). (Genebee)
- PSCAN server page: A
program to play with protein threading. Allows one to align two sequences,
find a match in the database through email (more reliable) or without email.
- RNA-mfold and
DNA-mfold: Performs
RNA and DNA secondary structure prediction using nearest neighbor
thermodynamic rules. The mfold software uses what are called nearest neighbor
energy rules. That is, free energies are assigned to loops rather than to base
pairs. Documentation for the programs and more detail about how the structures
are computed can be found here. (M. Zuker
at Washington University)
- SoWhat:
The SoWhat WWW server predicts distance constraints between amino acids in
proteins from the amino acid sequence. It uses a neural network based method
to predict contacts between C-alpha atoms from the amino acid sequence. (CBS
Denmark)
- Pasteur Institute:
- STRIDE:
Protein secondary structure assignment from atomic coordinates
- DSSP:
Definition of secondary structure of proteins given a set of 3D coordinates
- DSC:
Discrimination of protein secondary structure class
- PREDATOR:
Protein secondary structure prediction from a single sequence or a set of
sequences
- environ:
calculate accessible as well as buried surface area in protein structure
- confmat:
Side chain packing optimization on a given main chain template for protein
Other Software and Information Sources:
- The VSNS
BioComputing Division offers educational services over the Internet in
bioinformatics/biocomputing. They have offered award winning online courses in
sequence analysis. The site includes a hypertext coursebook, covering topics
such as pairwise sequence alignments, networking, and multiple alignment. You
can also find a collection of online exercises, called ``Sequence Analysis
with Distributed Resources'', and ``Biocomputing For Everyone'' and
``Biocomputing For Schools'' Websites.
- The Banbury
Cross Site is a web page for benchmarking gene identification software.
Banbury Cross is at the Centre National De La Recherche Scientifique. This
Benchmark site is intended to be a forum for scientists working in the field
of gene identification and anonymous genomic sequence annotation, with the
goal of improving current methods in the context of very large (in particular)
vertebrate genomic sequences.
- CBIL bioWidgets, at
the University of Pennsylvania, is a collection of software libraries used for
rapid development of graphical molecular biological applications. It includes:
- bioWidgets for Java(tm), a toolkit of biology-specific user interface
widgets useful for rapid application development in Java(tm)
- bioTK, a toolkit of biology-specific user interface widgets useful for
rapid application development in Tcl/Tk
- RSVP, a PostScript tool which lets your printer do nucleic acid sequence
analysis; it generates very nice color diagrams of the results.
- Human
Genome Project Information at Oak Ridge National Laboratory contains many
interesting and useful items about the U.S. Human Genome Project. They also
have a more technical Research
site.
- FAKtory: A software
environment for DNA Sequencing is at the University of Arizona. It is a
prototype software environment in support of DNA sequencing. The environment
consists of
- their software library, FAK, for the core combinatorial problem of
assembling fragments
- a Tcl/Tk based interface
- a software suite supporting a database of fragments and a processing
pipeline that includes clipping, tagging, and vector removal modules.
A key feature of FAKtory is that it is highly customizable: the
structure of the fragment database, the processing pipeline, and the operation
of each phase of the pipeline may be specified by the user.
- Computational Analysis and
Annotation of Sequence Data. This is a tutorial by A. Baxevanis, M.
Boguski, and B.F. Ouellette on how to use alignment programs and databases for
sequence comparison. It is a review that will appear in the forthcoming book
Genome Analysis: A Laboratory Manual (Bruce Birren, Eric Green, Phil Hieter,
Sue Klapholz and Rick Myers, eds) to be published by Cold Spring Harbor
Laboratory Press. The hypertext version of the review is linked to Medline
records, software repositories, sequences, structures, and taxonomies via the
Entrez system of the National Center for Biotechnology Information.