showseq

Function

Description

showseq displays one or more protein or nucleic acid sequences, with features, in a style suitable for publication. The output is sent to screen by default but can be written to file. You may pick a format from a list, alternatively, use the many options to control what is output and in what format. Optionally, the sequence feature table can be displayed. Where the input sequence is a nucleic acid, the sequence can be translated, using the specified genetic code tables. Also recognition sites and/or cut sites of restriction enzymes from the REBASE database may be displayed. There are various other options for controlling how the sequence is displayed and numbered and the output can be formatted for HTML.

Usage

Command line arguments


Input file format

showseq reads normal sequence USAs.

You can specifiy a file of ranges to display in uppercase by giving the '-uppercase' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-upper @myfile').

The format of the range file is:

An example range file is:

          
# this is my set of ranges
12   23                           
 4   5       this is like 12-23, but smaller
67   10348   interesting region

You can specifiy a file of ranges to highlight in a different colour when outputting in HTML format (using the '-html' qualifier) by giving the '-highlight' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-highlight @myfile').

The format of this file is very similar to the format of the above uppercase range file, except that the text after the start and end positions is used as the HTML colour name. This colour name is used 'as is' when specifying the colour in HTML in a '' construct, (where 'xxx' is the name of the colour).

The standard names of HTML font colours are given in:
http://http://www.w3.org/TR/REC-html40/types.html and http://www.ausmall.com.au/freegraf/ncolour2.htm and http://mindprod.com/htmlcolours.html (amongst other places).

An example highlight range file is:

          
# this is my set of ranges
12   23         red
 4   5          darkturquoise
67   10348      #FFE4E1

You can specifiy a file of ranges to annotate by giving the '-annotate' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-annotate @myfile').

The format of this file is very similar to the format of the above highlight range file, except that the text after the start and end positions is used as the displayed text of the annotated region.

An example annotation range file is:

# this is my set of ranges
12   23         exon 1
 4   5          CAP site
67   10348      exon 2

You can specify a file of enzyme names to read in by giving the '-enzymes' qualifier the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'.

Blank lines and lines starting with a '#' or '!' character are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for.

An example of a file of enzyme names is:

      
# my enzymes
HincII, ppiI
# other enzymes
hindiii
HinfI

Output file format

Most of the variants of the output format have already been described in the 'Description' and 'Usage' sections, but here is some more just to fill out this section ;-)

The output format is extremely variable and under the control of the qualifiers used.

The sequence can be formatted for HTML display by using the '-html' qualifier. The top and tail html tags <HEAD>, <BODY> etc. are not included as it is expected that the output of this program will be included in a more extensive HTML page and so these parts are left to the user to provide.

The name of the sequence is displayed, followed by the description of the sequence. These can be turned off with the '-noname' and '-nodescription' qualifiers.

Then the sequence is output, one line at a time. Any associated information to be displayed is also output above and below the sequence line, as specified by the '-format' and or '-things' qualifiers. (See the 'Description' section for detals).

The margins around the sequence are specified by the use of the '-margin' qaulifier and any numbering of the sequence and its translations are placed in the margin.

A display of the restriction enzyme cut sites can be selected via '-format 6' option or the '-format 0 -thing b,r,s,-r' style of options.

The option '-format 7' will produce a formatted display of cut sites on the sequence, with the six-frame translation below it. The cut sites are indicated by a slash character '\' that points to the poition between the nucleotides where the cuts occur. Cuts by many enzymes at the same position are indicated by stacking the enzyme names on top of each other.

At the end the section header 'Enzymes that cut' is displayed followed by a list of the enzymes that cut the specified sequence and the number of times that they cut.

The '-flatreformat' qualifier changes the display to emphasise the recognition site of the restriction enzyme, which is indicated by a row of '=' characters. The cut site if pointed to by a '>' or '<' character and if the cut site is not within or imemdiately adjacent to the recognition site, they are linked by a row or '.' characters.

The name of the enzyme is displayed above (or below when the reverse sense site if displayed) the recognition site. The name of the enzyme is also displayed above the cut site if this occurs on a different display line to the recognition site (i.e. if it wraps onto the next line of sequence).

Data files

Notes

One or more things may be selected for display from a menu (-things option). The order of specified characters (upper or lower case) determines the order in the output:

s	Sequence
b	Blank line
1	Frame 1 translation
2	Frame 2 translation
3	Frame 3 translation
-1	Frame -1 translation
-2	Frame -2 translation
-2	Frame -3 translation
t	Ticks line
n	Number ticks line
c	Complement sequence
f	Features (from the feature table or from a command line -ufo file)
r	Restriction enzyme cut sites in the forward sense
-r	Restriction enzyme cut sites in the reverse sense
a	User Annotation

Alternatively, there is a selection of pre-defined formats to choose from. The codes from above used in the list of standard formats are:

Sequence only:                  S A
Default sequence:               B N T S A F
Pretty sequence:                B N T S A
One frame translation:          B N T S B 1 A F
Three frame translations:       B N T S B 1 2 3 A F
Six frame translations:         B N T S B 1 2 3 T -3 -2 -1 A F
Restriction enzyme map:         B R S N T C -R B 1 2 3 T -3 -2 -1 A
Baroque:                        B 1 2 3 N T R S T C -R T -3 -2 -1 A F

The default standard format displays the following: for every new line that the sequence starts to write, the output display will contain first a blank line (b), then the position numbers of the ticks (n) then the ticks every 10 characters (t) then the sequence itself (s) then any user-supplied annotation (a) then the features from the feature table (f). Subsequent lines of the sequence output will repeat this format.

The sequence can be translated, using the specified genetic code tables. The translation can be done in one, three or six frames. The translation can be displayed in one-letter or three-letter amino acid codes. The translation can optionally be displayed only when it is in open reading frames (ORFs) of a specified minimum size. One or more specified regions of the sequence can be individually translated.

The output can be formatted for HTML. If the output is being formatted for HTML, then specified regions of the sequence can be displayed in any valid HTML colours.

This program can use REBASE data to find the recognition sites and/or cut sites of restriction enzymes in a nucleic acid sequence. This program can display the cut sites on both strands. The -flatreformat option displays not only the cut sites which many other restriction cut-site programs will show, but also shows the recognition site.

The Restriction Enzyme database (REBASE) is a collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. DNA methyltransferases, homing endonucleases, nicking enzymes, specificity subunits and control proteins are also included. Most recently, putative DNA methyltransferases and restriction enzymes, as predicted from analysis of genomic sequences, are also listed. The home page of REBASE is: http://rebase.neb.com/

If the sequence is in EMBL, Genbank or SwissProt format, the feature table of the sequence can be dispalyed with the sequence. GFF file features can also be displayed if they are included on the command line using -ufo=file.

Other display options include: The displayed sequence can be numbered either by numbering the start and ending positions, or by placing a ruler with ticks above or below the sequence. An initial position to start the numbering from can be set. The width of a line, and width of a margin around the sequence reserved for numbering can be set. Specified regions of the sequence can be displayed in uppercase to highlight them.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None known.

If you ask for the sequence display to end at position '100', with the qualifier '-send 100', it will display the sequence up to the end of the line - position '120'. This is a feature of this program to make the display of things like restriction enzyme cutting sites easier.

It is not a bug. Please don't report it.

Author(s)

History

Target users

Comments