EMBOSS: coderet

coderet

Function

Description

coderet extracts the coding nucleotide sequence (CDS), messenger RNA nucleotide sequence (mRNA) and translations specified by the feature tables of the input sequence(s). If the sequences to be extracted are in other entries of that database, they are automatically fetched and incorporated correctly into the output.

For each input sequence, an output sequence file is written containing any CDS, mRNA and protein translation sequences from the input feature table. Optionally, the CDS, mRNA, translated protein sequence and non-coding nucleotide sequence regions may be written to separate files.

Usage

Command line arguments

Input file format

coderet reads one or more nucleic sequence USAs having CDS, mRNA or translation headings in their feature tables.

Output file format

The output is a sequence file containing any CDS, mRNA and protein translation sequences as specified by the feature table of the sequence(s).

One or more of CDS, mRNA, translation can be excluded from the output by using the appropriate qualifiers to the program (i.e. -nocds, etc.)

The ID names of the output sequences are constructed from the name of the input sequence, the type of feature being output (i.e. cds, mrna, pro) and a unique ordinal number for this type to distinguish it from others in this sequence. The name, type and number of separated by underscore characters. Thus the second CDS feature in the sequence 'A12345' would be named 'A12345_cds_2'.

The translations are not made from the coding sequence, they are extracted directly from the translation sequence held in the feature table.

Data files

None.

Notes

One or more of CDS, mRNA, translation or non-coding regions can be excluded from output with the appropriate qualifiers; "no" is prepended to the qualifier name, for example -nocds would exclude the coding sequence.

The translations are not made from the coding sequence, they are extracted directly from the translation sequence held in the feature table.

The regions of the feature table that concern us are shown below.

This specifies that the coding sequence for the gene is constructed by joining several sections of code, many of which are in other entries in this database:

FT   CDS             join(U21925.1:818..987,U21926.1:258..420,
FT                   U21927.1:428..520,U21928.1:196..336,U21929.1:279..415,
FT                   U21930.1:895..1014,516..708)

This specifies that the messenger RNA sequence for the gene is constructed by joining several sections of code, many of which are in other entries in this database.

FT   mRNA            join(M88628.1:1006..1318,M88629.1:221..342,
FT                   M88630.1:101..223,M88631.1:46..258,M88632.1:104..172,
FT                   M88633.1:387..503,M88634.1:51..272,M88635.1:303..564,
FT                   M88635.1:849..1020,M88636.1:282..375,M88637.1:39..253,
FT                   M88638.1:91..241,M88639.1:168..377,M88640.1:627..3732,
FT                   M88641.1:158..311,M88642.1:1051..1263,M88642.1:1550..1778,
FT                   M88642.1:1986..2168,M88642.1:3904..4020,
FT                   M88642.1:4627..4698,M88643.1:39..124,M88644.1:42..197,
FT                   M88645.1:542..686,M88646.1:75..223,M88647.1:109..285,
FT                   253..2211)

This specifies that the translation of the coding region is as follows:

FT                   /translation="MAQDSVDLSCDYQFWMQKLSVWDQASTLETQQDTCLHVAQFQEFL
FT                   RKMYEALKEMDSNTVIERFPTIGQLLAKACWNPFILAYDESQKILIWCLCCLINKEPQN
FT                   SGQSKLNSWIQGVLSHILSALRFDKEVALFTQGLGYAPIDYYPGLLKNMVLSLASELRE
FT                   NHLNGFNTQRRMAPERVASLSRVCVPLITLTDVDPLVEALLICHGREPQEILQPEFFEA
FT                   VNEAILLKKISLPMSAVVCLWLRHLPSLEKAMLHLFEKLISSERNCLRRIECFIKDSSL
FT                   PQAACHPAIFRVDEMFRCALLETDGALEIIATIQVFTQCFVEALEKASKQLRFALKTYF
FT                   PYTSPSLAMVLLQDPQDIPRGHWLQTLKHISELLREAVEDQTHGSCGGPFESWFLFIHF
FT                   GGWAEMVAEQLLMSAAEPPTALLWLLAFYYGPRDGRQQRAQTMVQVKAVLGHLLAMSRS
FT                   SSLSAQDLQTVAGQGTDTDLRAPAQQLIRHLLLNFLLWAPGGHTIAWDVITLMAHTAEI
FT                   THEIIGFLDQTLYRWNRLGIESPRSEKLARELLKELRTQV"

Function

Description

Usage

Command line arguments

Input file format

Output file format

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

Author(s)

History

Target users

Comments