megamerger

Function

Description

megamerger reads two overlapping input DNA sequences and uses a word-match algorithm to align the sequences. A merged sequence is generated from the alignment and writen to the output file. The actions megamerger took in generating the merged sequence are written to an output file. The sequences can be very long.

Algorithm

The program does a match of all sequence words of size 20 (by default). It then reduces this to the minimum set of overlapping matches by sorting the matches in order of size (largest size first) and then for each such match it removes any smaller matches that overlap. The result is a set of the longest ungapped alignments between the two sequences that do not overlap with each other. If the two sequences are identical in their region of overlap then there will be one region of match and no mismatches. Where there is a mismatch, the merged sequence uses bases from the sequence whose mismatch region is furthest from the start or end of the sequence.

Usage

Command line arguments


Input file format

megamerger reads any two Sequence USAs.

Output file format

The actions megamerger took in generating the merged sequence are written to an output file. Any actions that require a choice between using regions of the two sequences where they have a mismatch is marked with the word WARNING!. Where there was a mismatch between the two sequences, the merged sequence is written out in uppercase and the sequence whose mismatch region is furthest from the edges of the sequence is used in the merged sequence.

The name and description of the first input sequence is used for the name and description of the output sequence.

A merged sequence is written out.

Where there has been a mismatch between the two sequences, the merged sequence is written out in uppercase and the sequence whose mismatch region is furthest from the edges of the sequence is used in the merged sequence.

The name and description of the first input sequence is used for the name and description of the output sequence.

A report of the merger is written out.

Data files

None.

Notes

It should be possible to merge sequences that are Mega bytes long. Compare this with the program merger which does a more accurate alignment of more divergent sequences using the Needle and Wunsch algorithm but which uses much more memory.

megamerger takes two overlapping sequences and merges them into one sequence. It could thus be regarded as the opposite of what splitter does.

References

None.

Warnings

The sequences should ideally be identical in their region of overlap. If there are any mismatches between the two sequences then megamerger will still create a merged sequence, but you should check that this is what you required.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Compare this with the program merger which does a more accurate alignment of more divergent sequences using the Needle and Wunsch algorithm but which uses much more memory.

A graphical dotplot of the matches used in this merge can be displayed using the program dotpath.

Author(s)

History

Target users

Comments