einverted

Function

Description

einverted finds inverted repeats (stem loops) in nucleotide sequences. It identifies regions of local alignment of the input sequence and its reverse complement that exceed a threshold score. The alignments may include a proportion of mismatches and gaps, which correspond to bulges in the stem loop. One or more sequences are read and a file with the sequence(s) (without gap characters) of the inverted repeat regions is written. It can find multiple inverted repeats in a sequence. Only non-overlapping matches are reported.

Algorithm

einverted uses dynamic programming and thus is guaranteed to find optimal, local alignments between the sequence and its reverse complement. Matched bases contribute positively to the score whereas gaps and mismatches are penalised. The score for a local alignment is the sum of the values of each match, minus penalties for mismatches and gap insertion. Any region whose score exceeds the threshold is reported. The gap penalty, match score and mismatch score, and the threshold score for reporting regions, are all user-specified.

Usage

Command line arguments


Input file format

The input for einverted is a nucleotide sequence

Output file format

Data files

None.

Notes

The original "inverted" program (from which einverted was derived) was used to annotate the nematode genome. Excluding overlapping repeats saved problems with simple repeat sequences in this genome.

einverted will find optimal alignments but is slower than heuristic methods such as BLAST.

Sometimes you can find repeats using the program palindrome that you can't find with einverted using the default parameters.

This is not due to a problem with either program. It is simply because some of the shortest repeats that you find with palindrome's default parameter values are below einverted's default cutoff score - you should decrease the 'Minimum score threshold' to see them.

For example, when palindrome is run with 'em:x65921', it finds the repeat:

64    aaaactaaggc    74
      |||||||||||
98    ttttgattccg    88

einverted will not report this as its score is 33 (11 bases scoring 3 each, no mismatches or gaps) with is below the default score cutoff of 50.

If einverted is run as:

% einverted em:x65921 -threshold 30

then it will find it:

Score 33: 11/11 (100%) matches, 0 gaps
      64 aaaactaaggc 74      
         |||||||||||
      98 ttttgattccg 88      

Anything can be considered to be a repeat if you set the score threshold low enough!

einverted does not report overlapping matches.

The original "inverted" program was written to annotate the nematode genome. Excluding overlapping repeats saved problems with simple repeat sequences in this genome.

References

Some useful references on inverted repeats:

  1. Pearson CE, Zorbas H, Price GB, Zannis-Hadjopoulos M Inverted repeats, stem-loops, and cruciforms: significance for initiation of DNA replication. J Cell Biochem 1996 Oct;63(1):1-22
  2. Waldman AS, Tran H, Goldsmith EC, Resnick MA. q Long inverted repeats are an at-risk motif for recombination in mammalian cells. Genetics. 1999 Dec;153(4):1873-83. PMID: 10581292; UI: 20050682
  3. Jacobsen SE Gene silencing: Maintaining methylation patterns. Curr Biol 1999 Aug 26;9(16):R617-9
  4. Lewis S, Akgun E, Jasin M. Palindromic DNA and genome stability. Further studies. Ann N Y Acad Sci. 1999 May 18;870:45-57. PMID: 10415472; UI: 99343961
  5. Dai X, Greizerstein MB, Nadas-Chinni K, Rothman-Denes LB Supercoil-induced extrusion of a regulatory DNA hairpin. Proc Natl Acad Sci U S A 1997 Mar 18;94(6):2174-9

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None. palindrome also looks for inverted repeats but is much faster and less sensitive, as it looks for near-perfect repeats.

Author(s)

This program was originally written by

This application was modified for inclusion in EMBOSS by

History

Target users

Comments