showalign reads a set of aligned protein or a nucleic acid sequences, and writes them to file (or screen) in a style suitable for publication. Similarities and differences of each sequence to a reference sequence are highlighted for specified types of matches. The reference sequence can be the calculated consensus sequence (default) or one of the input set (specified by name or the ordinal number of that sequence in the file). The output sequences can be displayed in either the input order (the default), sorted in order of their similarity to the reference sequence, or sorted alphabetically by their names. There are many other options to control the content and format of the output.
|
You can specifiy a file of ranges to display in uppercase by giving the '-uppercase' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-upper @myfile').
The format of the range file is:
An example range file is:
# this is my set of ranges 12 23 4 5 this is like 12-23, but smaller 67 10348 interesting region
You can specifiy a file of ranges to highlight in a different colour when outputting in HTML format (using the '-html' qualifier) by giving the '-highlight' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-highlight @myfile').
The format of this file is very similar to the format of the above
uppercase range file, except that the text after the start and end
positions is used as the HTML colour name. This colour name is used 'as
is' when specifying the colour in HTML in a ''
construct, (where 'xxx' is the name of the colour).
The standard names of HTML font colours are given in:
An example highlight range file is:
showalign reads in a scoring matrix to determine the consesnus sequence and to determine which matches are similar or not. By using the -show option, the displayed sequences can be shown as: Changing the similar matches to lowercase can optionally be disable by using the option -nosimilarcase. A small table of the way these alignments are displayed illustrates this. If we have a reference protein sequence of "III" and a sequence aligned to this of "ILW", then we have an identical matching residue, then a similar one, then a dissimilar one. The different methods of display would give the following:
http://http://www.w3.org/TR/REC-html40/types.html
and
http://www.ausmall.com.au/freegraf/ncolour2.htm
and
http://mindprod.com/htmlcolours.html
(amongst other places).
# this is my set of ranges
12 23 red
4 5 darkturquoise
67 10348 #FFE4E1
Output file format
showalign writes out a text file, optionally formatted for HTML.
Data files
showalign reads in scoring matrices to determine the consesnus
sequence and to determine which matches are similar or not.
Notes
Reference III
All ILw
Identical I..
Non-id .lW
Similar Il.
Dissimilar ..W
The consensus line can be displayed in a mixture of uppercase and lowercase symbols. Uppercase indicates a strong consensus and lowercase a weak one. The cutoff for setting the consensus case is set by the qualifier -setcase. If the number of residues at that position that match the consensus value is greater than this, then the symbol is in uppercase, otherwise the symbol is in lowercase. By default, the value of -setcase is set so that if there are more than 50% of residues identical to the consunsus at that position, then the consensus is in uppercase. To put all of the consensus symbols into uppercase or lowercase, make -setcase zero or very large (try 100000 ?).
Other display options include Sequence numbering ruler with ticks above the sequence. The width of a line can be set. The width of a margin to the left of the sequences that shows the sequence names can be set. Specified regions of the sequence can be displayed in uppercase to highlight them.
The output can be formatted for HTML. If the output is being formatted for HTML, then specified regions of the sequence can be displayed in any valid HTML colours.