Selection of highly efficient small interference RNA (SiRNA) targeting mammalian genes ()
1. INTRODUCTION
RNA interference (RNAi) is the process through which double stranded RNA (dsRNA) molecules inhibit gene expression by mediating sequence-specific messenger RNA (mRNA) degradation. The process starts when dsRNA molecules are degraded into short interfering RNA (siRNA) molecules, about 21 - 23 nucleotides in length, by the RNase enzyme Dicer. These siRNAs are then incorporated into a ribonucleoprotein complex named RNA-induced silencing complex (RISC). The RISC helicase subunit unwinds a duplex siRNA into two singlestranded RNAs, namely the passenger strand (sense) and the guide strand (antisense). The passenger strand will be degraded, and the guide strand is incorporated into the RNA-induced silencing complex (RISC) and guides it to bind with the complementary mRNA. The mRNA can be inhibited or degraded, depending on the degree of pairmatching [1-3].
Multiple statistical and computational models have been proposed in recent years to design functional siRNA depending on a large number of “features”. These features can be roughly classified into three categories, the first category is sequence features, the second category includes features defined based on the thermodynamics of the siRNA and the third category of features are defined based on the target sites on the mRNA, including target location-related features [4].
In this work, statistical studies are done on a large number of data (3734 siRNA) available on internet: http://gesteland.genetics.utah.edu/members/olgaM/siRN A database September 2006.xls. In the primary statistical study, thirteen properties (positional and thermodynamics) are identified in the high efficient siRNA group. In the final statistical study, the average weight of each identified property is computed. The advantages of this study are the average weights obtained for the identified thermodynamics properties of siRNA and it is done on a large number of data.
2. MATERIAL AND METHODS
The database contain a large number of mammalian experimentally validated (siRNAs) targeting mRNA of different mammalian genes [5-7]. The database contains the sequences and the thermo-dynamics properties of the siRNA besides the experimentally determined of the percent of remaining level of the mRNA in cells.
2.1. Classification of the SiRNA
The siRNAs are classified according to their silencing percentage efficiency (calculated from the percentage of the remaining level of mRNA) to three groups: high-efficient siRNA group (experimental efficiency ≥ 80%), moderate-efficient group (efficiency < 80% and ≥ 50%), and low-efficient group (efficiency < 50%).
2.2. Statistical Studies
Positional properties of the anti-sense strand (AS) of siRNA, such as the presence of the nucleotide (A/U) or (G/C) in certain position are studied in each group. The thermodynamics properties including Gibbs free energy change ΔG of sense-antisense duplex, Gibbs free energy change ΔG of antisense oligo secondary structure, Gibbs free energy change ΔG of antisense oligo-oligo dimer and the binding energy difference dΔG or (ΔG1 - ΔG18) between the 5’end and the 3’end of (AS) strand are studied in each group. The average GC content in each group is also studied.
From these studies the important positional and thermodynamics properties present significantly in the high efficient group are deduced. The thirteen important chosen properties are:
1) A/U at position 1 of AS strand.
2) A/U at position 2 of AS strand.
3) A/U at position 7 of AS strand.
4) A/U at position 10 of AS strand.
5) A/U at position13 of AS strand.
6) A/U at position 14 of AS strand.
7) G/C at position 19 of AS strand.
8) G/C at position 18 of AS strand.
9) G/C content ranged between (6 - 10) nucleotides of 19 ones of AS strand.
10) ΔG of sense-antisense duplex (≥ −36.720 Kcal/mole).
11) ΔG of antisense oligo secondary structure (≥ −1.240 Kcal/mole).
12) ΔG antisense oligo-oligo dimer (≥ −8.414 Kcal/mole).
(13) The binding energy difference between the 5’end and the 3’end of (AS) strand > 0 Kcal/mole.
2.3. Determining the Weight of Each Property
The 3743 siRNAs in the database are sorted according to the type of properties found in each one (from the thirteen identified properties). If more than one siRNA have the same type of properties then the average corresponding efficiency for them is calculated. From these data the different weights of the thirteen properties are computed by using a developed computer program. The program obtains the best fit of the data to a second order polynomial function (). The best fit weights minimizing the sum squared error function were computed 200 times after randomizing the order in which the properties were considered. Then the average weight of each property was computed.
3. RESULT AND DISCUSSION
Figure 1 shows the percent of siRNAs in the three groups (high-efficient, moderate-efficient and low-efficient) which contain A/U at positions from 1 to 19 of AS strand starting from 5’-AS end. It can be seen from Figure 1 that high percentage of siRNA has A/U at positions (1, 2, 7, 10, 13 and 14 of AS strand starting from the 5’end) are present in the high efficient group i.e. low internal stability in these positions. In contrast the low efficient group is enriched with siRNA molecules that have high internal stability (G/C) in these positions (see Figure 2). Khvorova et al. [8] found in their experimental study that the functional group of siRNA is enriched with siRNA molecules that have a low internal stability at the 5’-AS and an overall low internal stability profile, particularly in positions 9 - 14 100 (counted from 5’-AS end) and this is in contrast to the non-functional group.