The applications are listed in alphabetical order in the tables below. They are also organised into groups of related functionality. There is also a table of areas requiring software development which includes proposed new applications. Please send suggestions for new applications to emboss@emboss.open-bio.org.
Please send bug reports to emboss-bug@emboss.open-bio.org.
Program name | Description |
---|---|
aaindexextract | Extract data from AAINDEX |
abiview | Reads ABI file and display the trace |
acdc | Tests definition files for any EMBOSS application. |
antigenic | Finds antigenic sites in proteins |
backtranambig | Back translate a protein sequence to ambiguous codons |
backtranseq | Back translate a protein sequence |
banana | Bending and Curvature Plot in B-DNA |
biosed | Replace or delete sequence sections |
btwisted | Calculates the twisting in a B-DNA sequence |
cai | CAI codon usage statistic |
chaos | Create a chaos plot for a sequence. |
charge | Protein charge plot |
checktrans | ORF property statistics |
chips | Codon usage statistics |
cirdna | Draws circular maps of DNA constructs |
codcmp | Codon usage table comparison |
coderet | Extract CDS, mRNA and translations from feature tables |
compseq | Counts the composition of dimer/trimer/etc words in a sequence |
cons | Creates a consensus from multiple alignments |
cpgplot | Plot CpG rich areas |
cpgreport | Reports CpG rich regions |
cusp | Create a codon usage table |
cutgextract | Extract data from CUTG |
cutseq | Removes a specified section from a sequence. |
dan | Plot melting temperatures for DNA. |
dbiblast | Database indexing for BLAST 1 and 2 indexed databases |
dbifasta | Index a fasta database |
dbiflat | Database indexing for flat file databases |
dbigcg | Database indexing for GCG formatted databases |
dbxfasta | Database b+tree indexing for fasta file databases |
dbxflat | Database b+tree indexing for flat file databases |
dbxgcg | Database b+tree indexing for GCG formatted databases |
degapseq | Removes gap characters from sequences |
descseq | Alter the name or description of a sequence. |
diffseq | Find differences between nearly identical sequences |
digest | Protein proteolytic enzyme or reagent cleavage digest |
distmat | Creates a distance matrix from multiple alignments |
dotmatcher | Produces a dotplot of two sequences. |
dotpath | Displays a non-overlapping wordmatch dotplot of two sequences |
dottup | DNA sequence dot plot |
dreg | Regular expression search of a nucleotide sequence |
einverted | Finds DNA inverted repeats |
embossdata | Finds or fetches the data files read in by the EMBOSS programs |
embossversion | Writes the current EMBOSS version number |
emowse | Protein identification by mass spectrometry |
emma | Multiple alignment program |
entret | Reads and writes (returns) flatfile entries |
epestfind | Finds PEST motifs as potential proteolytic cleavage sites |
eprimer3 | Picks PCR primers and hybridization oligos |
equicktandem | Finds tandem repeats |
est2genome | Align EST and genomic DNA sequences |
etandem | Looks for tandem repeats in a nucleotide sequence. |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence. |
findkm | Calculates Km and Vmax for an enzyme reaction |
freak | Residue/base frequency table or plot |
fuzznuc | Nucleic acid pattern search |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
garnier | Predicts protein secondary structure |
geecee | Calculates the fractional GC content of nucleic acid sequences |
getorf | Finds and extracts open reading frames (ORFs) |
helixturnhelix | Finds nucleic acid binding domains. |
hmoment | Hydrophobic moment calculation |
iep | Calculates the isoelectric point of a protein |
infoalign | Information on a multiple sequence alignment |
infoseq | Displays some simple information about sequences |
isochore | Plots isochores in large DNA sequences |
jembossctl | Jemboss Authentication Control |
lindna | Draws linear maps of DNA constructs |
listor | Writes a list file of the logical OR of two sets of sequences |
marscan | Finds MAR/SAR sites in nucleic sequences |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence. |
matcher | Local alignment of two sequences |
megamerger | Merge two large overlapping nucleic acid sequences |
merger | Merge two overlapping sequences |
msbar | Mutate sequence beyond all recognition |
mwcontam | Shows molwts that match across a set of files |
mwfilter | Filter noisy molwts from mass spec output |
needle | Needleman-Wunsch global alignment. |
newcpgreport | Report CpG rich areas |
newcpgseek | Reports CpG rich regions |
newseq | Type in a short new sequence. |
noreturn | Removes carriage return from ASCII files |
notseq | Excludes a set of sequences and writes out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
octanol | Displays protein hydropathy |
oddcomp | Finds protein sequence regions with a biased composition. |
palindrome | Looks for inverted repeats in a nucleotide sequence. |
pasteseq | Insert one sequence into another. |
patmatdb | Matching a Prosite motif against a Protein Sequence Database. |
patmatmotifs | Compares a protein sequence to the PROSITE motif database. |
pepcoil | Predicts coiled coil regions |
pepinfo | Plots simple amino acid properties in parallel |
pepnet | Protein helical net plot |
pepstats | Protein statistics |
pepwheel | Shows protein sequences as helices |
pepwindow | Displays protein hydropathy |
pepwindowall | Displays protein hydropathy of a set of sequences |
plotcon | Plots the quality of conservation of a sequence alignment |
plotorf | Plot potential open reading frames |
polydot | Multiple dotplot |
preg | Regular expression search of a protein sequence |
prettyplot | Displays aligned sequences, with colouring and boxing. |
prettyseq | Output sequence with translated ranges |
primersearch | Searches DNA sequences for matches with primer pairs |
printsextract | Preprocesses the PRINTS database for use with the program PSCAN |
profit | Scan a sequence or database with a matrix or profile |
prophecy | Creates matrices/profiles from multiple alignments |
prophet | Gapped alignment for profiles |
prosextract | Extracts ID, AC, and PA lines from the PROSITE motif database. |
pscan | Locates fingerprints (multiple motif features) in a protein sequence. |
psiphi | Calculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file |
rebaseextract | Extract data from REBASE |
recoder | Find and remove restriction sites but maintain the same translation |
redata | Isoschizomers, references and Suppliers for Restriction Enzymes |
remap | Display a sequence with restriction cut sites, translation etc.. |
restover | Finds restriction enzymes that produce a specific overhang |
restrict | Finds Restriction Enzyme Cleavage Sites |
revseq | Reverse and complement a sequence. |
seealso | Finds programs sharing group names |
seqmatchall | Does an all-against-all comparison of a set of sequences |
seqret | Reads and writes (returns) a sequence. |
seqretsplit | Reads and writes (returns) sequences in individual files |
showdb | Displays information on the currently available databases |
showalign | Display a multiple sequence alignment |
showfeat | Show features of a sequence. |
showorf | Pretty output of DNA translations |
showseq | Display a sequence with features, translation etc |
shuffleseq | Shuffles a set of sequences maintaining composition |
sigcleave | Predicts signal peptide cleavage sites |
silent | Silent mutation restriction enzyme scan |
sirna | Finds siRNA duplexes in mRNA |
sixpack | Display a DNA sequence with 6-frame translation and ORFs |
skipseq | Reads and writes (returns) sequences, skipping the first few |
splitter | Split a sequence into (overlapping) smaller sequences. |
stretcher | Global alignment of two sequences. |
stssearch | Searches a DNA database for matches with a set of STS primers |
supermatcher | Finds a match of a large sequence against one or more sequences |
syco | Synonymous codon usage Gribskov statistic plot |
tcode | Fickett TESTCODE statistic to identify protein-coding DNA |
textsearch | Search sequence documentation text. SRS and Entrez are faster! |
tfextract | Extract data from TRANSFAC |
tfm | Displays a program's help documentation manual |
tfscan | Scans DNA sequences for transcription factors. |
tmap | Predict transmembrane proteins |
tranalign | Align nucleic coding regions given the aligned proteins |
transeq | Translates nucleic acid sequences. |
trimest | Trim poly-A tails off EST sequences |
trimseq | Trim ambiguous bits off the ends of sequences |
twofeat | Finds neighbouring pairs of features in sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
water | Smith-Waterman local alignment. |
whichdb | Search all databases for an entry |
wobble | Wobble base plot |
wordcount | Counts words of a specified size in a DNA sequence. |
wordmatch | Finds all exact matches of a given size between 2 sequences |
wossname | Finds programs by keywords in their one-line documentation. |
yank | Reads a range from a sequence, appends the full USA to a list file |
The EMBASSY grouping includes applications and packages for specialised sequence analysis and non-sequence based analysis, as well as software included from third parties who have their own licencing terms. EMBOSS is GPL licensed. The libraries are under the Lesser GPL (LGPL). This allows the EMBOSS libraries to link to other software and only requires that software to have an LGPL-compatible licence. Phylip, for example, fits this model. EMBASSY applications have the same look and feel as EMBOSS aplications.
The PHYLIP programs in this EMBASSY package were ported from release 3.572.
PHYLIP 3.61 has been converted as PHYLIPNEW and was released with EMBOSS 3.0.0 as a beta version.
The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.61 (August 2004).
The PHYLIPNEW versions of these programs all have the prefix "f" to distinguish them from the original programs.
Program name | Description |
---|---|
fclique | Largest clique program |
fconsense | Majority-rule and strict consensus tree |
fcontml | Continuous character Maximum Likelihood method |
fcontrast | Continuous character Contrasts |
fdiscboot | Bootstrapped discrete sites algorithm |
fdnacomp | DNA compatibility algorithm |
fdnadist | Nucleic acid sequence Distance Matrix program |
fdnainvar | Nucleic acid sequence Invariants method |
fdnaml | Estimates phylogenies from nucleic acid sequence Maximum Likelihood |
fdnamlk | Estimates phylogenies from nucleic acid sequence Maximum Likelihood with molecular clock |
fdnamove | Interactive DNA parsimony |
fdnapars | DNA parsimony algorithm |
fdnapenny | Penny algorithm for DNA |
fdollop | Dollo and polymorphism parsimony algorithm |
fdolmove | Interactive Dollo and Polymorphism Parsimony |
fdolpenny | Penny algorithm Dollo or polymorphism |
fdrawgram | Plots a cladogram- or phenogram-like rooted tree diagram |
fdrawtree | Plots an unrooted tree diagram |
ffactor | Multistate to binary recoding program |
ffitch | Fitch-Margoliash and Least-Squares Distance Methods |
ffreqboot | Bootstrapped sequences algorithm |
fgendist | Compute genetic distances from gene frequencies |
fkitsch | Fitch-Margoliash method with contemporary tips |
fmix | Mixed parsimony algorithm |
fmove | Interactive mixed method parsimony |
fneighbor | Phylogenies from distance matrix by N-J or UPGMA method |
fpars | Discrete character parsimony |
fpenny | Penny algorithm, branch-and-bound to find all most parsimonious trees |
fproml | Protein maximum Likelihood program |
fpromlk | Protein maximum Likelihood program with molecular clock |
fprotdist | Protein distance algorithm |
fprotpars | Protein parsimony algorithm |
frestboot | Bootstrapped sequences algorithm |
frestdist | compute distance matrix from restriction sites or fragments |
frestml | Restriction site maximum Likelihood method |
fretree | Interactive tree rearrangement |
fseqboot | Bootstrapped sequences algorithm |
fseqbootall | Bootstrapped sequences algorithm |
ftreedist | Distances between trees |
ftreedistpair | Distances between trees |
The DOMAINATRIX programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
cathparse | Reads raw CATH classification files and writes a DCF file. |
domainreso | Removes low resolution domains from a DCF file. |
domainseqs | Adds sequence records to a DCF file. |
domainnr | Removes redundant domains from a DCF file. The file must contain domain sequence information which can be added by using DOMAINSEQS. |
domainsse | Adds secondary structure records to a DCF file. |
scopparse | Reads raw SCOP classification files and writes a DCF file. |
ssematch | Searches a DCF file for secondary structure matches. The file must contain domain secondary structure information which can be added by using DOMAINSEQS. |
The DOMALIGN programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
allversusall | Does an all-versus-all global alignment for each set of sequences in an input directory and writes files of sequence similarity values. |
domainrep | Reorder DCF file so that the representative structure of each user-specified node is given first. |
domainalign | Generates structure-based sequence alignments for nodes in a DCF file. |
seqalign | Reads a DAF file and a DHF and writes a DAF file extended with the hits. |
The DOMSEARCH programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
seqsearch | Generate DHF files of database hits (sequences) from a DAF file (or other file of sequences) by using PSI-BLAST. |
seqfraggle | Removes fragments from DHF files (or other files of sequences). |
seqsort | Reads DHF files of database hits (sequences) and removes hits of ambiguous classification. |
seqnr | Removes redundancy from DHF files (or other files of sequences). |
seqwords | Generates DHF files of database hits (sequences) from Swissprot matching keywords from a keywords file. |
The SIGNATURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
libgen | Generates various type of discriminator for each alignment in a directory. |
libscan | Generates hits (sequences in a domain hits file) from searches of various types of discriminator (HMMs, profiles etc) against a sequence database. Or generates hits from screening sequences against a library of such discriminators. |
matgen3d | Generates a 3D-1D scoring matrix from CCF files (clean coordinate files). |
rocon | Reads a DHF file of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a "hits file" for the hits, which are classified and rank-ordered on the basis of score. |
rocplot | A generic and flexible tool for interpretation and graphical display of the performance of predictive methods using receiver Operator Characteristic (ROC) analysis. |
siggen | Generates a sparse protein signature from an alignment and residue contact data. |
siggenlig | Generates ligand-binding signatures from a CON file (contacts file) of residue-ligand contacts. |
sigscan | Generates a DHF of hits (sequences) from scanning a signature against a sequence database. |
sigscanlig | Generates a LHF (ligand hits file) of hits (sequences) from scanning a sequence against a library of ligand-binding signatures |
The STRUCTURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
contacts | Reads CCF files and writes CON files of intra-chain residue-residue contact data. |
domainer | Reads CCF files for proteins and writes CCF files for domains in a DCF file. |
hetparse | Converts raw dictionary of heterogen groups to EMBL-like format. |
interface | Reads protein CCF files and writes CON files of inter-chain residue-residue contact data. |
pdbparse | Parses PDB files and writes CCF files for proteins. |
pdbplus | Add records for residue solvent accessibility and secondary structure to a CCF file. |
pdbtosp | Convert raw swissprot:PDB equivalence file to EMBL-like format. |
sites | Reads CCF files and writes CON files of residue-ligand contact data for domains in a DCF file. |
The HMMEROLD programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.1.1.
The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.
Program name | Description |
---|---|
oalistat | Statistics for multiple alignment files |
ohmmalign | Align sequences with an HMM |
ohmmbuild | Build HMM |
ohmmcalibrate | Calibrate a hidden Markov model |
ohmmconvert | Convert between HMM formats |
ohmmemit | Extract HMM sequences |
ohmmfetch | Extract HMM from a database |
ohmmindex | Index an HMM database |
ohmmpfam | Align single sequence with an HMM |
ohmmsearch | Search sequence database with an HMM |
The HMMER programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.3.2.
The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.
Program name | Description |
---|---|
ealistat | Statistics for multiple alignment files |
ehmmalign | Align sequences with an HMM |
ehmmbuild | Build HMM |
ehmmcalibrate | Calibrate a hidden Markov model |
ehmmconvert | Convert between HMM formats |
ehmmemit | Extract HMM sequences |
ehmmfetch | Extract HMM from a database |
ehmmindex | Index an HMM database |
ehmmpfam | Align single sequence with an HMM |
ehmmsearch | Search sequence database with an HMM |
These programs are adapted from the VIENNA RNA package.
This is currently under development, and is available only from the CVS server. We hope to make a beta release in the near future, but there is much work to be done on sequence formats and testing. The programs are listed in alphabetical order:
Program name | Author(s) | Description |
---|---|---|
vrnaalifold | Ivo Hofacker | RNA alignment folding |
vrnaalifoldpf | Ivo Hofacker | RNA alignment folding with partition |
vrnacofold | Ivo Hofacker | RNA cofolding |
vrnacofoldconc | Ivo Hofacker | RNA cofolding with concentrations |
vrnacofoldpf | Ivo Hofacker | RNA cofolding with partitioning |
vrnadistance | Ivo Hofacker | RNA distances |
vrnaduplex | Ivo Hofacker | RNA duplex calculation |
vrnaeval | Ivo Hofacker | RNA eval |
vrnaevalpair | Ivo Hofacker | RNA eval with cofold |
vrnafold | Ivo Hofacker | Calculate secondary structures of RNAs |
vrnafoldpf | Ivo Hofacker | Secondary structures of RNAs with partition |
vrnaheat | Ivo Hofacker | RNA melting |
vrnainverse | Ivo Hofacker | RNA sequences matching a structure |
vrnalfold | Ivo Hofacker | Calculate locally stable secondary structures of RNAs |
vrnapaln | Ivo Hofacker | RNA alignment |
vrnaplot | Ivo Hofacker | Plot vrnafold output |
vrnasubopt | Ivo Hofacker | Calculate RNA suboptimals |
Other EMBASSY packages with single applications. These are contributed single programs, or conversions of single programs.
Program name | Description |
---|---|
emnu | Simple menu of EMBOSS applications |
esim4 | Align an mRNA to a genomic DNA sequence |
meme | Motif detection |
mse | Conversion of Will Gilbert's MSE editor |
topo | Conversion of Susan Jean Johns' TOPO |
crystalball | Answers every drug discovery question you have about this sequence |
The EMBOSS applications are organized into logical groups according to their function. See the Application Groups Documentation for more information.
This is a list of areas requiring software development including putative new applications.