Finds siRNA duplexes in mRNA. The output is a standard EMBOSS report file. The siRNAs are reported in order of best score first. sirna reports both the sense and antisense siRNAs as 5' to 3'.
for each input sequence: find the start position of the CDS in the feature table if there is no such CDS, take the -sbegin position as the CDS start for each 23 base window along the sequence: set the score for this window = 0 if base 2 of the window is not 'a': ignore this window if the window is within 50 bases of the CDS start: ignore this window if the window is within 100 bases of the CDS: score = -2 measure the %GC of the 20 bases from position 2 to 21 of the window for the following %GC values change the score: %GC <= 25% (<= 5 bases): ignore this window %GC 30% (6 bases): score + 0 %GC 35% (7 bases): score + 2 %GC 40% (8 bases): score + 4 %GC 45% (9 bases): score + 5 %GC 50% (10 bases): score + 6 %GC 55% (11 bases): score + 5 %GC 60% (12 bases): score + 4 %GC 65% (13 bases): score + 2 %GC 70% (14 bases): score + 0 %GC >= 75% (>= 15 bases): ignore this window if the window starts with a 'AA': score + 3 if the window does not start 'AA' and it is required: ignore this window if the window ends with a 'TT': score + 1 if the window does not end 'TT' and it is required: ignore this window if 4 G's in a row are found: ignore this window if any 4 bases in a row are present and not required: ignore this window if PolIII probes are required and the window is not NARN(17)YNN: ignore this window if the score is > 0: store this window for output sort the windows found by their score output the 23-base windows to the sequence file if the 'context' qualifier is specified, output window bases 1 and 2 in brackets to the report file take the window bases 3 to 21, add 'dTdT' output to the report file take the window bases 3 to 21, reverse complement, add 'dTdT' output to the report file
|
sirna outputs a report format file. The default format is 'table'
The siRNAs are reported in order of best score first.
sirna reports both the sense and antisense siRNAs as 5' to 3'.
RNA interference (RNAi) is a phenomenon whereby small interfering RNA strands (siRNA) inhibit gene expression at the level of transcription or translation of specific genes. RNAi is a defence mechanism against viruses and is important in regulating development and genome maintenance. siRNA are double stranded RNA molecules where one or the other strand is strongly complementary to a target RNA strand. Once they bind to a target, a nuclease protein guided by the siRNA cleaves the target and renders it untranslateable.
Gene silencing using RNAi has been used to determine the function of many genes in Drosophilia, C. elegans, and many plant species. The duration of knockdown by siRNA can typically last for 7-10 days, and has been shown to transfer to daughter cells. Of further note, siRNAs are effective at quantities much lower than alternative gene silencing methodologies, including antisense and ribozyme based strategies.
Due to various mechanisms of antiviral response to long dsRNA, RNAi at first proved more difficult to establish in mammalian species. Then, Tuschl, Elbashir, and others discovered that RNAi can be elicited very effectively by well-defined 21-base duplex RNAs. When these small interfering RNA, or siRNA, are added in duplex form with a transfection agent to mammalian cell cultures, the 21-base-pair RNA acts in concert with cellular components to silence the gene with sequence homology to one of the siRNA sequences. Strategies for the design of effective siRNA sequences have been recently documented, most notably by Sayda Elbashir, Thomas Tuschl, et al.
Their studies of mammalian RNAi suggest that the most efficient gene-silencing effect is achieved using double-stranded siRNA having a 19-nucleotide complementary region and a 2-nucleotide 3' overhang at each end. Current models of the RNAi mechanism suggest that the antisense siRNA strand recognizes the specific gene target.
In gene-specific RNAi, the coding region (CDS) of the mRNA is usually targeted. The search for an appropriate target sequence should begin 50-100 nucleotides downstream of the start codon. UTR-binding proteins and/or translation initiation complexes may interfere with the binding of the siRNP endonuclease complex. Tuschl, Elbashir et al. say that they have successfully used siRNAs targetting the 3' UTR. To avoid interference from mRNA regulatory proteins, sequences in the 5' untranslated region or near the start codon should not be targeted.
A set of rules for the design of siRNA has been suggested http://www.mpibpc.gwdg.de/abteilungen/100/105/sirna.html based on the work of Tuschl, Elbashir et al. They suggest searching for 23-nt sequence motif AA(N19)TT (N, any nucleotide) and select hits with approx. 50% G/C-content (30% to 70% has also worked in for them). If no suitable sequences are found, the search is extended using the motif NA(N21). The sequence of the sense siRNA corresponds to (N19)TT or N21 (position 3 to 23 of the 23-nt motif), respectively. In the latter case, they convert the 3' end of the sense siRNA to TT.
The rationale for this sequence conversion is to generate a symmetric duplex with respect to the sequence composition of the sense and antisense 3' overhangs. The antisense siRNA is synthesized as the complement to position 1 to 21 of the 23-nt motif. Because position 1 of the 23-nt motif is not recognized sequence-specifically by the antisense siRNA, the 3'-most nucleotide residue of the antisense siRNA, can be chosen deliberately. However, the penultimate nucleotide of the antisense siRNA (complementary to position 2 of the 23-nt motif) should always be complementary to the targeted sequence. For simplifying chemical synthesis, they always use TT.
More recently, they preferentially select siRNAs corresponding to the target motif NAR(N17)YNN, where R is purine (A, G) and Y is pyrimidine (C, U). The respective 21-nt sense and antisense siRNAs therefore begin with a purine nucleotide and can also be expressed from pol III expression vectors without a change in targeting site; expression of RNAs from pol III promoters is only efficient when the first transcribed nucleotide is a purine.
They always design siRNAs with symmetric 3' TT overhangs, believing that symmetric 3' overhangs help to ensure that the siRNPs are formed with approximately equal ratios of sense and antisense target RNA-cleaving siRNPs Please note that the modification of the overhang of the sense sequence of the siRNA duplex is not expected to affect targeted mRNA recognition, as the antisense siRNA strand guides target recognition. In summary, no matter what you do to your overhangs, siRNAs should still function to a reasonable extent. However, using TT in the 3' overhang will always help your RNA synthesis company to let you know when you accidentally order a siRNA sequences 3' to 5' rather than in the recommended format of 5' to 3'. sirna reports both the sense and antisense siRNAs as 5' to 3'.
Xeragon.com also suggest that choosing a region of the mRNA with a GC content as close as possible to 50% is a more important consideration than choosing a target sequence that begins with AA. They also suggest that a key consideration in target selection is to avoid having more than three guanosines in a row, since poly G sequences can hyperstack and form agglomerates that potentially interfere with the siRNA silencing mechanism.
siRNAs appear to effectively silence genes in more than 80% of cases. Current data indicate that there are regions of some mRNAs where gene silencing does not work. To help ensure that a given target gene is silenced, it is advised that at least two target sequences as far apart on the gene as possible be chosen.
5' AAGUGAGAGGUCAGACUCCUATCThe sense siRNA is made from the 19 bases of positions 3 to 21 of the 23 base target region, so:
5' GUGAGAGGUCAGACUCCUAand then typically d(TT) is added, so:
5' GUGAGAGGUCAGACUCCUAdTdTThe antisense siRNA sequence is made from bases 3 to 21 of the target region, so:
5' GUGAGAGGUCAGACUCCUA sense 3' CACUCUCCAGUCUGAGGAU antisense 3' -> 5'so the antisense sequence that should be ordered with d(TT) added is:
5' UAGGAGUCUGACCUCUCACdTdT antisense 5' -> 3'