remap

Function

Description

remap scans one or more nucleotide sequences for recognition sites and/or cut sites for a supplied set of restriction enzymes. One or more restriction enzymes can be specified or alternatively all the enzymes in the REBASE database can be investigated. The minimum length of a recognition site to be reported must be specified. It writes an output file showing the location of the cut sites and (optionally) the recognition sites. Sites on both strands are shown by default but there are many options to control exactly what sites are reported and the format of the output file. Optionally, the translated sequence is reported. Additionally, the output file lists enzymess that cut / do not cut the sequence and which match / do not matching certain specified criteria.

Usage

Command line arguments


Input file format

You can specifiy a file of ranges to display in uppercase by giving the '-uppercase' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-upper @myfile').

The format of the range file is:

An example range file is:


# this is my set of ranges
12   23
 4   5       this is like 12-23, but smaller
67   10348   interesting region

You can specifiy a file of ranges to highlight in a different colour when outputting in HTML format (using the '-html' qualifier) by giving the '-highlight' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-highlight @myfile').

The format of this file is very similar to the format of the above uppercase range file, except that the text after the start and end positions is used as the HTML colour name. This colour name is used 'as is' when specifying the colour in HTML in a '<FONT COLOR=xxx>' construct, (where 'xxx' is the name of the colour).

The standard names of HTML font colours are given in:
http://http://www.w3.org/TR/REC-html40/types.html and http://www.ausmall.com.au/freegraf/ncolour2.htm and http://mindprod.com/htmlcolours.html (amongst other places).

An example highlight range file is:


# this is my set of ranges
12   23		red
 4   5		darkturquoise
67   10348	#FFE4E1

Output file format

The name of the sequence is displayed, followed by the description of the sequence.

The formatted display of cut sites on the sequence follows, with the six-frame translation below it. The cut sites are indicated by a slash character '\' that points to the poition between the nucleotides where the cuts occur. Cuts by many enzymes at the same position are indicated by stacking the enzyme names on top of each other.

At the end the section header 'Enzymes that cut' is displayed followed by a list of the enzymes that cut the specified sequence and the number of times that they cut. For each enzyme that cuts, a list of isoschizomers of that enzyme (sharing the same recognition site pattern and cut sites) is given.

This is followed by lists of the enzymes that do cut, but which cut less often than the '-mincut' qualifier or more often than the '-maxcut' qualifier.

Any of the isoschizomers that are excluded from cutting, (either through restrictions such as the permitted number of cuts, blunt cutters only, single cutters only etc. or because their name has not been given in the input list of enzymes), will not be listed.

Then a list is displayed of the enzymes whose names were input and which match the other criteria ('-sitelen', '-blunt', '-sticky', '-ambiguity' or '-commercial') but which do not cut.

Finally the number of enzymes that were rejected from consideration because they do not match the '-sitelen', '-blunt', '-sticky', '-ambiguity' or '-commercial' criteria is displayed.

The '-flatreformat' qualifier changes the display to emphasise the recognition site of the restriction enzyme, which is indicated by a row of '=' characters. The cut site if pointed to by a '>' or '<' character and if the cut site is not within or imemdiately adjacent to the recognition site, they are linked by a row of '.' characters.

The name of the enzyme is displayed above (or below when the reverse sense site if displayed) the recognition site. The name of the enzyme is also displayed above the cut site if this occurs on a different display line to the recognition site (i.e. if it wraps onto the next line of sequence).

Data files

Notes

The Restriction Enzyme database (REBASE) is a collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. DNA methyltransferases, homing endonucleases, nicking enzymes, specificity subunits and control proteins are also included. Most recently, putative DNA methyltransferases and restriction enzymes, as predicted from analysis of genomic sequences, are also listed.

The home page of REBASE is: http://rebase.neb.com/

Where the translation is given in the output file, the genetic code and one or more frames for translation may be specified. The -no[reverse] option specifies whether the translation (and cut and recognition sites) are shown for the reverse sense strand.

By default, only one enzyme of any group of isoschizomers (enzymes that have the same recognition site and cut positions) is reported. This behaviour can be changed by specifying -nolimit, in which case all isoschizomers are reported. The default behaviour uses the representative enzyme of an isoschizomer group (the prototype) which is specified in the EMBOSS data file embossre.equ. This file is generated from the REBASE database by running rebaseextract. You may edit this file to set your own preferred prototype, if you wish.

As well as the display of where enzymes cut in the sequence, remap displays:

References

None.

Warnings

remap uses the EMBOSS REBASE restriction enzyme data files stored in directory data/REBASE/* under the EMBOSS installation directory. These files must first be set up using the program rebaseextract. Running rebaseextract may be the job of your system manager.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Author(s)

History

Target users

Comments