stretcher

Function

Description

stretcher calculates an optimal global alignment of two sequences using a modification of the classic dynamic programming algorithm which uses linear space. The output is a standard alignment file. The substitution matrix, gap insertion penalty and gap extension penalties used to calculate the alignment may be specified.

Algorithm

The standard sequence global alignment program using the Needleman & Wunsch algorithm, as implemented in the program needle, requires O(MN) space and O(N) time. This is standard computer-science language for it needing an amount of computer memory that is proportional to the product of the two sequences being aligned and taking an amount of time that is proportional to the shorter of the two sequences. So if a 1 kb and a 10 kb sequence take 10 Mega-words of memory and 10 minutes to align, you should expect that in order to align a 10 kb sequence and a 1 Mb sequence you will need appoximately 10 Giga-words of memory and 100 minutes. Computer memory will rapidly be exhausted as the size of the aligned sequences increases.

This program implements the Myers and Miller algorithm for finding an optimal global alignment in an amount of computer memory that is only proportional to the size of the smaller sequence - O(N).

In computing, a benefit is seldom gained without a cost elsewhere. The cost of gaining a memory-efficient alignment is that it takes about twice the amount of time to do the alignment as the Needleman & Wunsch algorithm. In computer-science language the time is approximately O(2N).

Usage

Command line arguments


Input file format

stretcher reads any 2 sequence USAs of the same type (DNA or protein).

Output file format

The default output format is 'markx0'.

Data files

For protein sequences EBLOSUM62 is used for the substitution matrix. For nucleotide sequence, EDNAMAT is used. Others can be specified.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

A global pairwise alignment is one where it is assumed that the two sequences have diverged from a common ancestor and that the program should try to stretch the two sequences, introducing gaps where necessary, in order to show the alignment over the whole length of the two sequences that best illustrates their similarities. In contrast, a local alignment program like matcher simply finds local, small parts of the two sequences where there is some similarity and makes no assumption about the whole length of the sequence needing to be similar.

References

  1. E. Myers and W. Miller, "Optimal Alignments in Linear Space," CABIOS 4, 1 (1988), 11-17.

Warnings

Demonstration of similarity is not evidence of homology! This program will produce a global alignment even if there is no biological justification for thinking that there might be a common ancestor.

Diagnostic Error Messages

None.

Exit status

It exits with a status of 0.

Known bugs

None.

Author(s)

The original program was written by Gene Myers and Webb Miller in 1989.

This application was modified for inclusion in EMBOSS by

History

Target users

Comments