dotpath

Function

Description

dotpath generates a dotplot from two input sequences. The dotplot is an intuitive graphical representation of the regions of similarity between two sequences. Sequence "words" of a user-specified length are compared and all exact word matches between the two sequences are recorded. The set of the longest possible but non-overlapping matches is identified. The two sequences are the axes of the rectangular dotplot. Wherever there is an exact matching word in the two sequences a line is plotted.

Algorithm

dotpath uses the same algorithm as diffseq and dottup for finding a minimal set of exact matches between two sequences. It finds all identical words of size -wordsize or greater in the two sequences. It then reduces the matches found to the minimal set of matches that do not overlap. This set is rendered as lines in the dotplot.

Usage

Command line arguments


Input file format

Output file format

In normal operation, a dotplot image is displayed.

With the -data qualifier a file of the positions of the matches in the minimal non-overlapping set of matches is output.

Notes

For similar sequences, dotpath provides a convenient way to find a path that aligns the two sequences well. It is not a true optimal path as produced by the dynamic programming algorithms used in water or needle, but for very closely related sequences it will produce the same result. In contast to full alignment, it works very quickly with very long sequences.

The entire set of matches found can be displayed with the -overlaps qualifier. This shows all matches in red, except for those in the minimal path (non-overlapping set) which are shown in black, as normal.

Using a longer word size will create a dottplot with relatively less noise; the matches are longer and therefore more likely to have biological meaning. Such runs will be much faster, but of course are less sensitive.

References

None

Warnings

If you give a small word size with a very large sequence you will run out of memory. If this happens, try again with a larger word size.

Diagnostic Error Messages

None

Exit status

It always exits with status 0.

Known bugs

None

This program is closely based on dottup with the addition of by default displaying only the minimal set of non-overlapping matches.

This program uses the same algorithm as diffseq for finding a minimal set of very good matches between two sequences. diffseq may be more convenient if you are looking at the differences between two nearly identical sequences.

Author(s)

History

Target users

Comments