trimest

Function

Description

trimest reads one or more nucleotide sequences and writes them out again but with any 3' poly-A tail (or, optionally, 5' poly-T tail) removed. It detect any poly-A and poly-T tails in the input sequences that are at least the specified minimum length. The tails may continue a defined num of non-A or non-T bases. If both a 5' poly-T tail and a 3' poly-A tail is identified, it removes the longest of the two. The output is a set of sequences with the poly-A (or poly-T) tails removed. If a sequence had a 5' poly-T tail then the resulting sequence is reverse-complemented by default. The description line has a comment appended about the changes made to the sequence.

Algorithm

trimest looks for a repeat of at least -minlength A's at the 3' end (and, by default, -minlength T's at the 5' end). If there are an apparent 5' poly-T tail and a poly-A tail, then it removes whichever is the longer of the two.

By default, it will allow -mismatches non-A (or non-T) bases in the tail. If a mismatch is found, then there has to be at least -minlength A's (or T's) past the mismatch (working from the end) for the mismatch to be considered part of the tail. If -mismatches is greater than 1 then that number of contiguous non-A (or non-T) bases will be allowed as part of the tail.

Usage

Command line arguments


Input file format

trimest reads the USA of one or more normal nucleic acid sequences.

Output file format

If a poly-A tail is reomved then [poly-A tail removed] is appended to the description of the sequence. If poly-T is removed, then [poly-T tail removed] is appended and if the sequence is reversed, [reverse complement] is appended.

The output is a set of sequences with the poly-A (or poly-T) tails removed. If a sequence had a 5' poly-T tail then the resulting sequence is reverse-complemented by default. The description line has a comment appended about the changes made to the sequence.

Data files

None.

Notes

EST and mRNA sequences often have poly-A tails at their 3' end. Where an EST sequences is the reverse complement of a corresponding mRNA's forward sense it may have a poly-T tail at its 5' end.

trimest is not infallible. There are often repeats of A (or T) in a sequence that just happen by chance to occur at the 3' (or 5') end of the EST sequence. trimest has no way of determining if the A's it finds are part of a real poly-A tail or are a part of the transcribed genomic sequence. It removes any apparent poly-A tails that match its criteria for a poly-A tail (see "Algorithm").

References

None.

Warnings

trimest can make no guarantee that the tails removed have biologic significance or not. They may in fact be part of the transcribed sequence.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Author(s)

History

Target users

Comments