wordfinder
Function
Description
This is a rough and ready local alignment program for large
sequences. The reason it is rough and ready is that wordmatch is used
to find all the word matches between the first sequence and another
sequence. Then by calculating the highest score for a diagonal we can
then use this as the centre point for a Smith-Waterman type
calculation of a width given by the user. So a narrow diagonal
Smith-Waterman is calculated hence the results will be rough but due
to the space saving much larger sequences can be aligned.
Usage
Command line arguments
Input file format
wordfinder reads two sequence USAs of the same type. They must
both be protein or both be nucleic acid sequences.
Output file format
The output alignment is in simple format by default.
The file 'wordfinder.error' will contain any errors that occured
during the program. This may be that wordmatch could not find any
matches hence no suitable start point is found for the smith-waterman
calculation.
Data files
For protein sequences EBLOSUM62 is used for the substitution
matrix. For nucleotide sequence, EDNAMAT is used. Others can be specified.
Notes
The time this program takes to do an alignment depends very much on the
word size. For short sequences a short word size (e.g. 4) can make it
take a very long time. Large word sizes (e.g. 30) for sequences that
are very similar give a very quick result. The default of 16 should
give reasonable fast alignments.
Because it does a Smith & Waterman alignment (albeit in a narrow region
around the diagonal shown to be the 'best' by a word match), this
program can use huge amounts of memory if the sequences are large.
Because the alignment is made within a narrow area each side of the
'best' diagonal, if there are sufficient indels between the two
sequences, then the path of the Smith & Waterman alignment can wander
outside of this area. Making the width larger can avoid this problem,
but you then use more memory.
The longer the sequences and the wider the specified alignment width,
the more memory will be used.
If the program terminates due to lack of memory you can try the
following:
Run the UNIX command 'limit' to see if your stack or memory usage have
been limited and if so, run 'unlimit', (e.g.: '% unlimit stacksize').
References
None.
Warnings
None.
Diagnostic Error Messages
None.
Exit status
It always exits with a status of 0.
Known bugs
None.
Author(s)
History
Target users
Comments