skipredundant
Function
Description
Redundancy in a database or other collection of sequences occurs when
one or more similar sequences are present. The inclusion of very
similar sequences in certain analyses will introduce undesirable
bias. For example, a family may possess 100 sequences in the sequence
database, but 90 of these might be essentially the same sequence,
e.g. very close relatives or mutations of a single sequence. Although
100 sequences are known, the family only contains 11 sequences that
are essentially unique. For many applications it is desirable or even
essential to remove redundant sequences from a set in order to produce
a smaller set that is representative of the whole. SEQNR removes
redundancy from an input file of sequences, either at a single
threshold of sequence similiarty (e.g. 40%) or within a threshold
range of sequence similiarty (e.g. 40% - 70%).
Algorithm
Redundancy is calculated for each pair of sequences in turn.
Usage
Command line arguments
Input file format
skipredundant reads any normal sequence USAs.
Output file format
skipredundant
outputs a graph to the specified graphics device.
outputs a report format file. The default format is ...
Data files
For protein sequences EBLOSUM62 is used for the substitution
matrix. For nucleotide sequence, EDNAFULL is used. Others can be specified.
Notes
None.
References
None.
Warnings
None.
Diagnostic Error Messages
None.
Exit status
It always exits with status 0.
Known bugs
None.
Author(s)
History
Target users
Comments