skipredundant

Function

Description

Redundancy in a database or other collection of sequences occurs when one or more similar sequences are present. The inclusion of very similar sequences in certain analyses will introduce undesirable bias. For example, a family may possess 100 sequences in the sequence database, but 90 of these might be essentially the same sequence, e.g. very close relatives or mutations of a single sequence. Although 100 sequences are known, the family only contains 11 sequences that are essentially unique. For many applications it is desirable or even essential to remove redundant sequences from a set in order to produce a smaller set that is representative of the whole. SEQNR removes redundancy from an input file of sequences, either at a single threshold of sequence similiarty (e.g. 40%) or within a threshold range of sequence similiarty (e.g. 40% - 70%).

Algorithm

Redundancy is calculated for each pair of sequences in turn.

Usage

Command line arguments


Input file format

skipredundant reads any normal sequence USAs.

Output file format

skipredundant outputs a graph to the specified graphics device. outputs a report format file. The default format is ...

Data files

For protein sequences EBLOSUM62 is used for the substitution matrix. For nucleotide sequence, EDNAFULL is used. Others can be specified.

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Author(s)

History

Target users

Comments