stretcher calculates an optimal global alignment of two sequences using a modification of the classic dynamic programming algorithm which uses linear space. The output is a standard alignment file. The substitution matrix, gap insertion penalty and gap extension penalties used to calculate the alignment may be specified.
The standard sequence global alignment program using the Needleman & Wunsch algorithm, as implemented in the program needle, requires O(MN) space and O(N) time. This is standard computer-science language for it needing an amount of computer memory that is proportional to the product of the two sequences being aligned and taking an amount of time that is proportional to the shorter of the two sequences. So if a 1 kb and a 10 kb sequence take 10 Mega-words of memory and 10 minutes to align, you should expect that in order to align a 10 kb sequence and a 1 Mb sequence you will need appoximately 10 Giga-words of memory and 100 minutes. Computer memory will rapidly be exhausted as the size of the aligned sequences increases.
This program implements the Myers and Miller algorithm for finding an optimal global alignment in an amount of computer memory that is only proportional to the size of the smaller sequence - O(N).
In computing, a benefit is seldom gained without a cost elsewhere. The cost of gaining a memory-efficient alignment is that it takes about twice the amount of time to do the alignment as the Needleman & Wunsch algorithm. In computer-science language the time is approximately O(2N).
|
The default output format is 'markx0'.
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by EMBOSS environment variable EMBOSS_DATA.
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
A global pairwise alignment is one where it is assumed that the two sequences have diverged from a common ancestor and that the program should try to stretch the two sequences, introducing gaps where necessary, in order to show the alignment over the whole length of the two sequences that best illustrates their similarities. In contrast, a local alignment program like matcher simply finds local, small parts of the two sequences where there is some similarity and makes no assumption about the whole length of the sequence needing to be similar.
Demonstration of similarity is not evidence of homology! This program will produce a global alignment even if there is no biological justification for thinking that there might be a common ancestor.
This application was modified for inclusion in EMBOSS by