sirna

Function

Description

Finds siRNA duplexes in mRNA. The output is a standard EMBOSS report file. The siRNAs are reported in order of best score first. sirna reports both the sense and antisense siRNAs as 5' to 3'.

Algorithm

for each input sequence:

    find the start position of the CDS in the feature table
    if there is no such CDS, take the -sbegin position as the CDS start

    for each 23 base window along the sequence:

        set the score for this window = 0
        if base 2 of the window is not 'a': ignore this window
        if the window is within 50 bases of the CDS start: ignore this window
	if the window is within 100 bases of the CDS: score = -2
	measure the %GC of the 20 bases from position 2 to 21 of the window
	for the following %GC values change the score:
		%GC <= 25% (<= 5 bases): ignore this window
		%GC 30% (6 bases): score + 0
		%GC 35% (7 bases): score + 2
		%GC 40% (8 bases): score + 4
		%GC 45% (9 bases): score + 5
		%GC 50% (10 bases): score + 6
		%GC 55% (11 bases): score + 5
		%GC 60% (12 bases): score + 4
		%GC 65% (13 bases): score + 2
		%GC 70% (14 bases): score + 0
		%GC >= 75% (>= 15 bases): ignore this window
	if the window starts with a 'AA': score + 3
	if the window does not start 'AA' and it is required: ignore this window
	if the window ends with a 'TT': score + 1
	if the window does not end 'TT' and it is required: ignore this window
	if 4 G's in a row are found: ignore this window
	if any 4 bases in a row are present and not required: ignore this window
	if PolIII probes are required and the window is not NARN(17)YNN: ignore this window
        if the score is > 0: store this window for output
	
    sort the windows found by their score
    output the 23-base windows to the sequence file
    if the 'context' qualifier is specified, output window bases 1 and 2 in brackets to the report file
    take the window bases 3 to 21, add 'dTdT' output to the report file
    take the window bases 3 to 21, reverse complement, add 'dTdT' output to the report file

Usage

Command line arguments


Input file format

sirna reads any normal sequence USAs.

Output file format

sirna outputs a report format file. The default format is 'table'

The siRNAs are reported in order of best score first.

sirna reports both the sense and antisense siRNAs as 5' to 3'.

Data files

None.

Notes

RNA interference (RNAi) is a phenomenon whereby small interfering RNA strands (siRNA) inhibit gene expression at the level of transcription or translation of specific genes. RNAi is a defence mechanism against viruses and is important in regulating development and genome maintenance. siRNA are double stranded RNA molecules where one or the other strand is strongly complementary to a target RNA strand. Once they bind to a target, a nuclease protein guided by the siRNA cleaves the target and renders it untranslateable.

Gene silencing using RNAi has been used to determine the function of many genes in Drosophilia, C. elegans, and many plant species. The duration of knockdown by siRNA can typically last for 7-10 days, and has been shown to transfer to daughter cells. Of further note, siRNAs are effective at quantities much lower than alternative gene silencing methodologies, including antisense and ribozyme based strategies.

Due to various mechanisms of antiviral response to long dsRNA, RNAi at first proved more difficult to establish in mammalian species. Then, Tuschl, Elbashir, and others discovered that RNAi can be elicited very effectively by well-defined 21-base duplex RNAs. When these small interfering RNA, or siRNA, are added in duplex form with a transfection agent to mammalian cell cultures, the 21-base-pair RNA acts in concert with cellular components to silence the gene with sequence homology to one of the siRNA sequences. Strategies for the design of effective siRNA sequences have been recently documented, most notably by Sayda Elbashir, Thomas Tuschl, et al.

Their studies of mammalian RNAi suggest that the most efficient gene-silencing effect is achieved using double-stranded siRNA having a 19-nucleotide complementary region and a 2-nucleotide 3' overhang at each end. Current models of the RNAi mechanism suggest that the antisense siRNA strand recognizes the specific gene target.

In gene-specific RNAi, the coding region (CDS) of the mRNA is usually targeted. The search for an appropriate target sequence should begin 50-100 nucleotides downstream of the start codon. UTR-binding proteins and/or translation initiation complexes may interfere with the binding of the siRNP endonuclease complex. Tuschl, Elbashir et al. say that they have successfully used siRNAs targetting the 3' UTR. To avoid interference from mRNA regulatory proteins, sequences in the 5' untranslated region or near the start codon should not be targeted.

A set of rules for the design of siRNA has been suggested http://www.mpibpc.gwdg.de/abteilungen/100/105/sirna.html based on the work of Tuschl, Elbashir et al. They suggest searching for 23-nt sequence motif AA(N19)TT (N, any nucleotide) and select hits with approx. 50% G/C-content (30% to 70% has also worked in for them). If no suitable sequences are found, the search is extended using the motif NA(N21). The sequence of the sense siRNA corresponds to (N19)TT or N21 (position 3 to 23 of the 23-nt motif), respectively. In the latter case, they convert the 3' end of the sense siRNA to TT.

The rationale for this sequence conversion is to generate a symmetric duplex with respect to the sequence composition of the sense and antisense 3' overhangs. The antisense siRNA is synthesized as the complement to position 1 to 21 of the 23-nt motif. Because position 1 of the 23-nt motif is not recognized sequence-specifically by the antisense siRNA, the 3'-most nucleotide residue of the antisense siRNA, can be chosen deliberately. However, the penultimate nucleotide of the antisense siRNA (complementary to position 2 of the 23-nt motif) should always be complementary to the targeted sequence. For simplifying chemical synthesis, they always use TT.

More recently, they preferentially select siRNAs corresponding to the target motif NAR(N17)YNN, where R is purine (A, G) and Y is pyrimidine (C, U). The respective 21-nt sense and antisense siRNAs therefore begin with a purine nucleotide and can also be expressed from pol III expression vectors without a change in targeting site; expression of RNAs from pol III promoters is only efficient when the first transcribed nucleotide is a purine.

They always design siRNAs with symmetric 3' TT overhangs, believing that symmetric 3' overhangs help to ensure that the siRNPs are formed with approximately equal ratios of sense and antisense target RNA-cleaving siRNPs Please note that the modification of the overhang of the sense sequence of the siRNA duplex is not expected to affect targeted mRNA recognition, as the antisense siRNA strand guides target recognition. In summary, no matter what you do to your overhangs, siRNAs should still function to a reasonable extent. However, using TT in the 3' overhang will always help your RNA synthesis company to let you know when you accidentally order a siRNA sequences 3' to 5' rather than in the recommended format of 5' to 3'. sirna reports both the sense and antisense siRNAs as 5' to 3'.

Xeragon.com also suggest that choosing a region of the mRNA with a GC content as close as possible to 50% is a more important consideration than choosing a target sequence that begins with AA. They also suggest that a key consideration in target selection is to avoid having more than three guanosines in a row, since poly G sequences can hyperstack and form agglomerates that potentially interfere with the siRNA silencing mechanism.

siRNAs appear to effectively silence genes in more than 80% of cases. Current data indicate that there are regions of some mRNAs where gene silencing does not work. To help ensure that a given target gene is silenced, it is advised that at least two target sequences as far apart on the gene as possible be chosen.

Coding region specification

It's possible (although the evidence is unclear) that regulatory protein binding to regions in and near the untranslated 5' region might interfere with the RNAi process. Therefore, this program avoids choosing siRNA probes from the 5' UTR and from the first 50 bases of the coding region. The second 50 bases of the coding region has a penalty associated with it to reduce the reporting of possible siRNA probes in this region.

If the input sequence has a feature table specifying a coding region, then this will be used, else you can specify the start of the coding region, where this is known by the -sbegin command-line qualifier (which is normally used to specify the start of the region of a sequence that should be analysed in all EMBOSS programs). sirna looks at the feature table of the input mRNA sequence to find the coding regions (CDS). It will ignore the 5' UTR and the first 50 bases of the CDS. It will assign a penalty of 2 points to any siRNA in positions 51 to 100 in the CDS. If there is no CDS in the feature table, you can specify the CDS by using the command-line qualifier -sbegin to indicate where the CDS should start. If there is no CDS in the feature table and you do not use the command-line qualifier -sbegin, then sirna will assume that the CDS region is not known and will look for siRNAs in the whole of the sequence with no penaties associated with the location within the sequence. All these confusing regions There are a lot of references to 23 base regions, 21 base regions, 19 base regions, etc. in any description of siRNA. Perhaps an example with a sequence would be clearer? The 23 base region, in this case starting with an AA, might typically look like:
5' AAGUGAGAGGUCAGACUCCUATC
The sense siRNA is made from the 19 bases of positions 3 to 21 of the 23 base target region, so:
5'   GUGAGAGGUCAGACUCCUA
and then typically d(TT) is added, so:
5'   GUGAGAGGUCAGACUCCUAdTdT
The antisense siRNA sequence is made from bases 3 to 21 of the target region, so:
5'   GUGAGAGGUCAGACUCCUA sense
3'   CACUCUCCAGUCUGAGGAU antisense 3' -> 5'
so the antisense sequence that should be ordered with d(TT) added is:
5'   UAGGAGUCUGACCUCUCACdTdT antisense 5' -> 3'

References

  1. Elbashir, S. M., et al. (2001a). Duplexes of 21-nucleotide RNAs mediate RNA interference in mammalian cell culture. Nature 411: 494-498.
  2. Elbashir, S. M., W. Lendeckel and T. Tuschl (2001b). RNA interference is mediated by 21 and 22 nt RNAs. Genes & Dev. 15: 188-200.

Warnings

It is assumed that the input sequence is mRNA.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Author(s)

History

Target users

Comments