seqmatchall

Function

Description

seqmatchall takes a set of sequences and does an all-against-all pairwise comparison of words of a specified size in the sequences, finding regions of identity between any two sequences. It writes an output file with a list of regions of identity in pairs of sequences, the start and end positions and length of the matching regions and the name of the sequences.

Usage

Command line arguments


Input file format

seqmatchall reads a set of sequence USAs.

The sequences must be either all protein or all nucleic acid.

Output file format

J01636 (the complete E.coli lac operon) matches V00294 V00295 V00296 and X51872 (the individual genes), and there is a short overlap between V00295 (lacY) and the flanking genes V00296 (lacZ) and X51872 (lacA)

The output is a list of regions of identity in pairs of sequences, each consisting of one line with 7 columns of data separated by TABs or space characters.

The columns of data consist of:

Data files

None.

Notes

The larger the specified word size, the faster the comparison will proceed. Regions whose stretches of identity are shorter than the word size will be missed. You should therefore choose a word size that is small enough to find those regions of similarity you are interested in within a reasonable time-frame.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It exits with a status of 0.

Known bugs

None.

polydot will give a graphical view of the same matches.

Author(s)

History

Target users

Comments