EMBOSS: showalign

showalign

Function

Description

showalign reads a set of aligned protein or a nucleic acid sequences, and writes them to file (or screen) in a style suitable for publication. Similarities and differences of each sequence to a reference sequence are highlighted for specified types of matches. The reference sequence can be the calculated consensus sequence (default) or one of the input set (specified by name or the ordinal number of that sequence in the file). The output sequences can be displayed in either the input order (the default), sorted in order of their similarity to the reference sequence, or sorted alphabetically by their names. There are many other options to control the content and format of the output.

Usage

Command line arguments

Input file format

showalign reads in a set of aligned protein or nucleic sequences.

You can specifiy a file of ranges to display in uppercase by giving the '-uppercase' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-upper @myfile').

The format of the range file is:

Comment lines start with '#' in the first column.
Comment lines and blank lines are ignored.
The line may start with white-space.
There are two positive (integer) numbers per line separated by one or more space or
TAB characters.
The second number must be greater or equal to the first number.
There can be optional text after the two numbers to annotate the line.
White-space before or after the text is removed.

An example range file is:

          
# this is my set of ranges
12   23                           
 4   5       this is like 12-23, but smaller
67   10348   interesting region

You can specifiy a file of ranges to highlight in a different colour when outputting in HTML format (using the '-html' qualifier) by giving the '-highlight' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-highlight @myfile').

The format of this file is very similar to the format of the above uppercase range file, except that the text after the start and end positions is used as the HTML colour name. This colour name is used 'as is' when specifying the colour in HTML in a '' construct, (where 'xxx' is the name of the colour).

The standard names of HTML font colours are given in:
http://http://www.w3.org/TR/REC-html40/types.html and http://www.ausmall.com.au/freegraf/ncolour2.htm and http://mindprod.com/htmlcolours.html (amongst other places).

An example highlight range file is:

          
# this is my set of ranges
12   23         red
 4   5          darkturquoise
67   10348      #FFE4E1

Output file format

showalign writes out a text file, optionally formatted for HTML.

Data files

showalign reads in scoring matrices to determine the consesnus sequence and to determine which matches are similar or not.

Notes

showalign reads in a scoring matrix to determine the consesnus sequence and to determine which matches are similar or not.

By using the -show option, the displayed sequences can be shown as:

complete (-show=All),
only identical matches between the sequence and the reference sequence, all other positions being replaced by '.' characters (-show=Identities)
non-identical matches, with identical matches being replaced by '.' characters, similar matches are shown in lower case (-show=Non-identities)
similar matches, with non-similar matches being replaced by '.' characters, similar matches are shown in lower case (-show=Similarities)
dissimilar matches, with identical or similar matches being replaced by '.' characters (-show=Dissimilarities)

Changing the similar matches to lowercase can optionally be disable by using the option -nosimilarcase.

A small table of the way these alignments are displayed illustrates this. If we have a reference protein sequence of "III" and a sequence aligned to this of "ILW", then we have an identical matching residue, then a similar one, then a dissimilar one. The different methods of display would give the following:

Reference       III

All             ILw
Identical       I..
Non-id          .lW
Similar         Il.
Dissimilar      ..W

The consensus line can be displayed in a mixture of uppercase and lowercase symbols. Uppercase indicates a strong consensus and lowercase a weak one. The cutoff for setting the consensus case is set by the qualifier -setcase. If the number of residues at that position that match the consensus value is greater than this, then the symbol is in uppercase, otherwise the symbol is in lowercase. By default, the value of -setcase is set so that if there are more than 50% of residues identical to the consunsus at that position, then the consensus is in uppercase. To put all of the consensus symbols into uppercase or lowercase, make -setcase zero or very large (try 100000 ?).

Other display options include Sequence numbering ruler with ticks above the sequence. The width of a line can be set. The width of a margin to the left of the sequences that shows the sequence names can be set. Specified regions of the sequence can be displayed in uppercase to highlight them.

The output can be formatted for HTML. If the output is being formatted for HTML, then specified regions of the sequence can be displayed in any valid HTML colours.

Function

Description

Usage

Command line arguments

Input file format

Output file format

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

Author(s)

History

Target users

Comments