EMBOSS: seqmatchall

seqmatchall

Function

Description

seqmatchall takes a set of sequences and does an all-against-all pairwise comparison of words of a specified size in the sequences, finding regions of identity between any two sequences. It writes an output file with a list of regions of identity in pairs of sequences, the start and end positions and length of the matching regions and the name of the sequences.

Usage

Command line arguments

Input file format

seqmatchall reads a set of sequence USAs.

The sequences must be either all protein or all nucleic acid.

Output file format

J01636 (the complete E.coli lac operon) matches V00294 V00295 V00296 and X51872 (the individual genes), and there is a short overlap between V00295 (lacY) and the flanking genes V00296 (lacZ) and X51872 (lacA)

The output is a list of regions of identity in pairs of sequences, each consisting of one line with 7 columns of data separated by TABs or space characters.

The columns of data consist of:

The length of the region of identity.
The start position in sequence 1.
The end position in sequence 1.
The name of sequence 1.
The start position in sequence 2.
The end position in sequence 2.
The name of sequence 2.

Data files

None.

Notes

The larger the specified word size, the faster the comparison will proceed. Regions whose stretches of identity are shorter than the word size will be missed. You should therefore choose a word size that is small enough to find those regions of similarity you are interested in within a reasonable time-frame.

Function

Description

Usage

Command line arguments

Input file format

Output file format

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

Author(s)

History

Target users

Comments