VIRsiRNApred: a web server for predicting inhibition efficacy of siRNAs targeting human viruses
Background:
Datasets:
VIRsi-RNAdb:
19mer
V 345 V^{345} V345: 345 sequences for validation
T 1380 T^{1380} T1380: 1380 sequences for training
Method:
SVM,ANN,KNN,REP Tree
siRNA sequence features:
Nucleotide Frequencies:is the number of each nucleotide in a siRNA,the objective of calculating nucleotide frequencies of siRNA sequences is to transfrom any length of nucleotide sequence to fixed length feature vectors.
Binary Pattern: A : 1000, G : 0010 , C : 0100 , U : 0001
Thermodynamic Properties : correspond to the Gibbs free energy stability of the nucleotide pairs of siRNAs
Secondary Structure :
Hybrid Approaches:
Leave one out cross validation(LOOCV):
Viral siRNA target conservation : use ALIGN0 algorithm
Algorithm and server implementation SVM:
we used the radial basis function kernel (RBF):
x ˉ \bar{x} xˉand y ˉ \bar{y} yˉ are two data vectors , and γ \gamma γ is a training parameter
k ( x ˉ , y ˉ ) = exp ( − γ ∣ ∣ x ˉ − y ˉ ∣ ∣ 2 ) k(\bar{x},\bar{y}) = \exp{(-\gamma ||\bar{x} - \bar{y}||^2)} k(xˉ,yˉ)=exp(−γ∣∣xˉ−yˉ∣∣2)
n n n is the size of test set
Pearson’s correlation coefficient®:
E i p r e d E_i^{pred} Eipred and E i a c t E_i^{act} Eiact is the predict and actual efficacy respectively
R = n ∑ n = 1 n E i a c t E i p r e d − ∑ n = 1 n E i a c t ∑ n = 1 n E i p r e d n ∑ n = 1 n ( E i a c t ) 2 − ( ∑ n = 1 n E i a c t ) 2 n ∑ n = 1 n ( E i p r e d ) 2 − ( ∑ n = 1 n E i p r e d ) 2 R = \frac{n\sum_{n=1}^n E_i^{act} E_i^{pred} - \sum_{n=1}^nE_i^{act} \sum_{n=1}^n E_i^{pred}}{\sqrt{n \sum_{n=1}^n(E_i^{act})^2 - (\sum_{n=1}^n E_i^{act})^2} \sqrt{n\sum_{n=1}^n(E_i^{pred})^2 - (\sum_{n=1}^n E_i^{pred})^2}} R=n∑n=1n(Eiact)2−(∑n=1nEiact)2n∑n=1n(Eipred)2−(∑n=1nEipred)2n∑n=1nEiactEipred−∑n=1nEiact∑n=1nEipred
using heterogeneous siRNA dataset and using mammalian homogeneous siRNA dataset
Conclusion:
SVM is better than existing siRNA prediction algorithm
the first viral siRNA effi- cacy prediction algorithm developed on experimentally verified viral siRNAs targeting as many as 37 diverse human viruses since existing general mammalian siRNA prediction methods are not able to effectively predict viral siRNA activity.