seqmatchall takes a set of sequences and does an all-against-all pairwise comparison of words of a specified size in the sequences, finding regions of identity between any two sequences. It writes an output file with a list of regions of identity in pairs of sequences, the start and end positions and length of the matching regions and the name of the sequences.
The sequences must be either all protein or all nucleic acid.
J01636 (the complete E.coli lac operon) matches V00294 V00295 V00296 and X51872 (the individual genes), and there is a short overlap between V00295 (lacY) and the flanking genes V00296 (lacZ) and X51872 (lacA)
The output is a list of regions of identity in pairs of sequences, each consisting of one line with 7 columns of data separated by TABs or space characters.
The columns of data consist of:
The larger the specified word size, the faster the comparison will proceed. Regions whose stretches of identity are shorter than the word size will be missed. You should therefore choose a word size that is small enough to find those regions of similarity you are interested in within a reasonable time-frame.
polydot will give a graphical view of the same matches.