R语言中的SVM

最新推荐文章于 2022-06-02 14:54:26 发布

u013524655

最新推荐文章于 2022-06-02 14:54:26 发布

阅读量8.8k

点赞数 1

分类专栏： R语言系列

R语言系列专栏收录该内容

58 篇文章 21 订阅

订阅专栏

转载自：http://www.klshu.com/1667.html

通过本文，你将了解到如下的内容：
1、如何在R语言中通过kernlab包来使用SVM
2、观察C参数和和核函数的变化的影响
3、使用SVM分类来测试一个基因实验数据的癌症诊断

一、线性SVM

在这里我们生成了二维的玩具数据，并且学习如何训练和测试SVM

1.1 生成玩具数据

首先从2高斯（2 Gaussians）产生正样本和负样本的样例数据。

n <- 150 # number of data points
p <- 2 # dimension

sigma <- 1 # variance of the distribution
meanpos <- 0 # centre of the distribution of positive examples
meanneg <- 3 # center of the distribution of negative examples
npos <- round(n/2) # number of positive examples
nneg <- n-npos # number of negative examples

#Generate the positive and negative examples
xpos <- matrix(rnorm(npos*p,mean=meanpos,sd=sigma),npos,p)
xneg <- matrix(rnorm(nneg*p,mean=meanneg,sd=sigma),npos,p)
x <- rbind(xpos,xneg)

#Generatethelabels
y<-matrix(c(rep(1,npos),rep(-1,nneg)))

#Visualize the data
plot(x,col=ifelse(y>0,1,2))
legend("topleft",c('Positive','Negative'),col=seq(2),pch=1,text.col=seq(2))

下面将数据划分为80%的训练集和20%的测试集

##Prepare a training and a test set##
ntrain <- round(n*0.8) # number of training examples
tindex <- sample(n,ntrain) # indices of training samples
xtrain<-x[tindex,]
xtest<-x[-tindex,]
ytrain<-y[tindex]
ytest<-y[-tindex]
istrain=rep(0,n)
istrain[tindex]=1

#Visualize
plot(x,col=ifelse(y>0,1,2),pch=ifelse(istrain==1,1,2))
legend("topleft",c('PositiveTrain','PositiveTest','NegativeTrain','NegativeTest'), col=c(1,1,2,2),pch=c(1,2,1,2),text.col=c(1,1,2,2))

1.2 训练SVM

现在我们在训练集上使用参数C=0.08来训练线性SVM

#load the kernlab package
library(kernlab)

#traintheSVM
svp<-ksvm(xtrain,ytrain,type="C-svc",kernel='vanilladot',C=100,scaled=c())

下面来了解和看看svp包含了什么

#Generalsummary
svp

#Attributes that you can access
attributes(svp)

#For example,the support vectors
alpha(svp)
alphaindex(svp)
b(svp)

#Use the built-in function to pretty-plot the classifier
plot(svp,data=xtrain)

1.3 使用SVM预测

现在我们可以使用训练过的SVM来预测测试集中的点的类型。然后我们来使用指标变量来分析结果。

#Predict labels on test
ypred = predict(svp,xtest)
table(ytest,ypred)

#Compute accuracy
sum(ypred==ytest)/length(ytest)

#Compute at the prediction scores
ypredscore=predict(svp,xtest,type="decision”)

#Check that the predicted labels are the signs of the scores
table(ypredscore>0,ypred)

#Package to compute ROC curve,precision-recall etc...
library(ROCR)

pred<-prediction(ypredscore,ytest)

#Plot ROC curve
perf<-performance(pred,measure="tpr",x.measure="fpr")
plot(perf)

#Plot precision/recall curve
perf<-performance(pred,measure="prec",x.measure="rec")
plot(perf)

#Plot accuracy as function of threshold
perf<-performance(pred,measure="acc")
plot(perf)

1.4 交叉验证（Cross-validation）

cv.folds <- function(n,folds=3)
 ##randomly split the n samples into folds
 {
 split(sample(n),rep(1:folds,length=length(y)))
 }
svp<-ksvm(x,y,type="C-svc",kernel=’vanilladot’,C=1,scaled=c(),cross=5)
print(cross(svp))
#[1] -1

1.5 参数C的影响

C平衡了大量的边际和没法识别的点
如何选择好它是非常重要的。

二、非线性SVM

有时候线性的SVM是不够的，比如产生的玩具的数据的正样本和负样本是混合在一起的，没法使用线性分类。
比如下图使用线性SVM是没法分类的。

为了解决这个问题，我们使用非线性SVM。我们改变了kernerl参数，如使用高斯RBF的核函数，并且σ=1，C=1

#Train a nonlinear SVM
svp<-ksvm(x,y,type="C-svc",kernel=’rbf’,kpar=list(sigma=1),C=1)

#Visualizeit
plot(svp,data=x)

u013524655

关注

1
点赞
踩
24

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录