transeq reads one or more nucleotide sequences and writes the corresponding protein sequence translations to file. It can translate in any of the 3 forward or three reverse sense frames, or in all three forward or reverse frames, or in all six frames. The translation may be restricted to specified regions, for example, corresponding to the coding regions of your sequences. It can translate using the standard ('Universal') genetic code and also with a selection of non-standard codes.
|
One or more peptide sequences are written out.
The names of the resulting protein sequences are formed from the name of the input nucleic acid sequence with '_' and the translation frame appended to it. Thus a nucleic acid sequence with the name 'XYZ' franslated in all 6 frame would produce protein sequences with the names: 'XYZ_1', 'XYZ_2', 'XYZ_3', 'XYZ_4', 'XYZ_5', 'XYZ_6'.
If regions are specified, they are taken to be translated in frame 1 and so the output name would be 'XYZ_1'.
Termination (STOP) codons are translated as the character *. The -trim option removes all all X and * characters from the right end of the translation. This trimming process starts at the end and continues until the next character is not an X or a *. The -clean option changes all STOP codon positions from the * character to X (an unknown residue). This is useful because some programs will not accept protein sequences with * characters in them.
The reverse frame '-1' is defined as the translation you get when you use the reverse-complement of the sequence with the same codon phase as the codon in frame '1'. Thus the sequence ACTGG in frame 1 is the translation of the codons ACT,GG; the translation of frame -1 uses these same codons, reverse complemented: forward sense ACT GG reverse sense TGA CC reverse-complement CC AGT frame -1 translation S
Frame -1 is the translation of CCAGT (the reverse complement of ACTGG) using the codon AGT (the first bases CC are ignored). The result is the peptide S.
Similarly frame -2 is the phase used by frame 2, CAG T (the first base C is ignored). The last base cannot be successfully translated and is output as the unknown residue X. The result is the peptide QX.
Frame -3 is the phase used by frame 3, CCA GT. The last two bases will translate to V as it does not matter what the next base is. (GTA, GTC, GTG, GTT all code for V). The result is the peptide PV.
The alternative way of generating the reverse translation frames used by some people is that frame -1 is made by taking the frame '1' of the reverse complement. There is no correspondence between the codons used in frame 1 and -1, 2 and -2, 3 and -3; the codons used change with the length modulus 3.
There does not appear to be a convention on which definition to use. The Staden package uses the same convention as this program. The GCG package sneakily avoids the problem by naming the frames using letters (a, b, c, d, e, f). If you really need to define frame -1 as the frame given when you reverse complement the sequence and then start translating at the first frame in the resulting sequence, then use the -alternative qualifier.
When translating using a non-standard genetic code, you should check the table carefully for deviations from your particular organism's code.
When using the -regions option, you should always leave the -frames option at the default of frame '1'. If you change the frame while specifying a region to translate, then the regions will be offset by 1 or 2 bases, which is not what you want.