Prediction of siRNA functionality using generalized string kernel and support vector machine

Prediction of siRNA functionality using generalized string kernel and support vector machine

Abstract:

GSK+SVM

GSK:generalized string kernel

result : classify effective and ineffective siRNAs use GSK + SVM

Introduction

useful method before this paper:

(1),siRNA duplexes 21nt have a 2 nt 3’ overhang at each mRNA end

(2),a target sequence should be begining 50-100 downstream of the start codon of the mRNA

(3),G/C 50%

Materials and methods

DATASET:

Dataset : Khvorova’s dataset , total 94 siRNAs , 53 siRNA effective , 41 siRNA ineffective

effective : with 90% or more gene silencing activity

ineffective : with less than 50% gene silencing activity

FEATURE map for siRNAs

GSK is based on mismatch string kernel(MSK) as well as in the spectrum kernel

k k k:the length of the sub-sequence of one string

m m m:at most m mismatches

MSK:

K ( k , m ) ( x , y ) = < Φ ( k , m ) ( x ) , Φ ( k , m ) ( y ) > K_{(k,m)}(x,y) = <\Phi_{(k,m)}(x),\Phi_{(k,m)}(y)> K(k,m)(x,y)=<Φ(k,m)(x),Φ(k,m)(y)>

K ( k , m ) ( x , y ) = K ( k , m ) ( x , y ) K ( k , m ) ( x , x ) K ( k , m ) ( y , y ) K_{(k,m)}(x,y) = \frac{K_{(k,m)}(x,y)}{\sqrt{K_{(k,m)}(x,x)}\sqrt{K_{(k,m)}(y,y)}} K(k,m)(x,y)=K(k,m)(x,x) K(k,m)(y,y) K(k,m)(x,y)

GSK is a sum of all the ( k i , m i ) (k_i,m_i) (ki,mi)-mismatch kernels:

K k 1 , m 1 , . . . , k s , m s = ∑ i < Φ ( k i , m i ) ( x ) , Φ ( k i , m i ) ( y ) > = ∑ i K ( k i , m i ) ( x , y ) K_{k_1,m_1,...,k_s,m_s} = \sum_i <\Phi_{(k_i,m_i)}(x),\Phi_{(k_i,m_i)}(y)> = \sum_i K_{(k_i,m_i)}(x,y) Kk1,m1,...,ks,ms=i<Φ(ki,mi)(x),Φ(ki,mi)(y)>=iK(ki,mi)(x,y)

SVM implementation:

linear kernel and soft margin

RESULT

Subsequence->Weight

TP,TN,FP,FN,Acc

LOOCV of the GSK/SVM algorithm

Validation of predictive performance of GSK/SVM algorithm against other genes

Discussion

advantage:without a prior knowledge , we could determine contribution of each parameters to siRNA,and it can be applied to siRNAs shorter or longer than 21-mer in length
disadvantage:we can not deduce the sequence of the other useful siRNA
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值