EMBOSS: backtranseq

backtranseq

Function

Description

backtranseq reads a protein sequence and writes the nucleic acid sequence it is most likely to have come from.

Algorithm

backtranseq uses a codon usage table which gives the frequency of usage of each codon for each amino acid. For each amino acid in the input sequence, the corresponding most frequently occuring codon is used in the nucleic acid sequence that is output.

Usage

Command line arguments

Input file format

Any DNA sequence USA.

Output file format

The output is a nucleotide sequence containing the most favoured back translation of the specified protein, and using the specified translation table (which defaults to human).

Data files

The codon usage table is read by default from "Ehum.cut" in the 'data/CODONS' directory of the EMBOSS distribution. If the name of a codon usage file is specified on the command line, then this file will first be searched for in the current directory and then in the 'data/CODONS' directory of the EMBOSS distribution.

Notes

backtranseq reads a data file containing the codon usage table. The default file is Ehum.cut - the human codon usage table. Many others are available and can be set by name with the -cfile qualifier. It is important to use one that is appropriate for the species that your protein comes from. The specified data file must exist in the EMBOSS data directory (see below for more information).

References

None.

Warnings

None.

Diagnostic Error Messages

"Corrupt codon index file" - the codon usage file is incomplete or empty.

"The file 'drosoph.cut' does not exist" - the codon usage file cannot be opened.

Exit status

This program always exits with a status of 0, unless the codon usage table cannot be opened.