maskseq reads a sequence and writes a masked version of it to file. The sequence is masked in a specified set of regions such that characters in that region are (optionally) converted to lower case and / or (optionally) replaced with the specified mask character.
|
You can specify a file of ranges to mask out by giving the '-regions' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-regions @myfile').
The format of the range file is:
An example range file is:
# this is my set of ranges 12 23 4 5 this is like 12-23, but smaller 67 10348 interesting region
It is common for database searches to mask out low-complexity or biased composition regions of a sequence so that spurious matches do not occur. It is possible that you have a program that has reported such biased regions but which has not masked the sequence itself. In that case, you can use this program to do the masking.
There are other uses for it. For example, some non-EMBOSS programs (for example FASTA) are capable of treating lower-case regions as if they are masked. maskseq can mask a region to lower-case instead of replacing the sequence with N's or X's if you use the qualifier -tolower or use a space character as the masking character.