应用预测建模第四章信用卡评分模型练习-R语言【不同的重抽样方法比较:重复K折交叉验证、K折交叉验证、留一交叉验证、重复训练/测试集划分 、Bootstrap;支持向量机与逻辑回归模型比较】

不同的重抽样方法比较:重复K折交叉验证、K交叉验证、LOOCV留一交叉验证、重复训练/测试集划分 (留多交叉验证、蒙特卡洛交叉验证)、Bootstrap自助法、632法(为消除估计偏差,Bootstrap自助法改进)

支持向量机与逻辑回归模型比较

应用预测建模 Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译

语言:R语言


德国信贷数据集是用来评估机器学习算法效果的一个基准测试数据。该数据有1000个样本点分别被标记为具有良好信用或不良信用。其中70 % 的样本具有良好信用。当评估模型准确率时,需要达到的基线准确率是70 % (只要把所有人都归为良好信用即可达到这个准确率) 。数据中除了信用类别之外,还有与信用历史相关的变量,如就业情况、账户状态等等。一些预测变量是数值型的,如贷款金额。然而,大多数的预测变量本质上就是分类变量,如贷款的目的、性别和婚姻状况。分类变量被转换为“虚拟变量”,每个“虚拟变量”对应一个类。例如,申请人的居住信息分为“租房”、“自有房产”或者“免费住房” 。这个变量的每个类都能够对应转化成气个“ 0 / 1 ”变量。如申请者是租房则取1 不是则取0 。最终,共有41 个预测变量用于信用状态评定模型。这组数据将被用来展示通过重抽样调试模型的过程。



导入数据: 

library(caret)
data(GermanCredit)
str(GermanCredit)
> str(GermanCredit)
'data.frame':	1000 obs. of  62 variables:
 $ Duration                              : int  6 48 12 42 24 36 24 36 12 30 ...
 $ Amount                                : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
 $ InstallmentRatePercentage             : int  4 2 2 2 3 2 3 2 2 4 ...
 $ ResidenceDuration                     : int  4 2 3 4 4 4 4 2 4 2 ...
 $ Age                                   : int  67 22 49 45 53 35 53 35 61 28 ...
 $ NumberExistingCredits                 : int  2 1 1 1 2 1 1 1 1 2 ...
 $ NumberPeopleMaintenance               : int  1 1 2 2 2 2 1 1 1 1 ...
 $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
 $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Class                                 : Factor w/ 2 levels "Bad","Good": 2 1 2 2 1 2 2 2 2 1 ...
 $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
 $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
 $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
 $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
 $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
 $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
 $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
 $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
 $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
 $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
 $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
 $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
 $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
 $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
 $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
 $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
 $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
 $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
 $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
 $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
 $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
 $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
 $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
 $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
 $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
 $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
 $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
 $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
 $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
 $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
 $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
 $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
 $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
 $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
 $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
 $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
 $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
 $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...

首先,剔除近零方差变量。

nearZeroVar可诊断具有唯一值的预测变量(即零方差预测变量)或同时具有以下两个特征的预测变量:
相对于样本数量,它们具有很少的唯一值;最常见值的频率与次最常见值的频率之比很大。

建议从数据集中移除这些变量,这些变量会削弱一些模型,删除这些变量能显著提高模型的表现与稳定性(p31)
nearZeroVar(x,freqCut = 95/5,uniqueCut = 10,saveMetrics = FALSE,names = FALSE,foreach = FALSE,allowParallel = TRUE)
freqCut 最常见值与第二常见值之比的临界值,默认95/5
uniqueCut 样本总数中不同值的百分比的临界值,默认10

GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]#剔除近零方差变量

 剔除后还剩41个预测变量。


有些变量值重复了,因此需要删除。

如housing有三个选项: "Rent", "Own" and "ForFree"。目前数据集中的这三个变量分别为Housing.Rent  、Housing.Own  、Housing.ForFree 。实际上,任意一个变量均可以由其他两个推知。为了避免线性依赖,删除其中一个,如ForFree(具体虚拟变量编码可见P24)

GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL

创建训练集与测试集,以响应变量Class为分类标准进行分层抽样,训练集与测试集的比例为8:2

#创建训练集与测试集
set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p =0.8,list=FALSE)
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest  <- GermanCredit[-inTrain, ]


用支持向量机建模(基于径向基函数RBF)。

首先用caret包中的train函数选择支持向量机的调优参数(本次只有损失参数需要进行调优)。

#建模,决定调优参数
set.seed(1056)
svmFit<-train(Class~.
              ,data=GermanCreditTrain
              ,method="svmRadial" #模型类型
              ,preProc=c("center","scale") #对数据进行中心化与标准化
              ,tuneLength=10 #设定不同的损失参数值,即为本语句最终目的(选择最佳调优参数),函数会对损失函数值2^-2、2^-1...2^7进行评估
              ,trControl=trainControl(method="repeatedcv",repeats=5#模型评估表现默认使用自助法bootstrap,trainControl函数用来实现重复十折交叉验证(重复次数为5次)
                                      , classProbs = TRUE)#设定classProbs = TRUE,则之后预测模型时可选择输出概率
              ) 

交叉验证结果:核函数参数sigma最优值为0.1418,由算法估计出来,在交叉建模时保持为常数。

Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01418087 and C = 4.

> svmFit
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ... 
Resampling results across tuning parameters:

  C       Accuracy  Kappa    
    0.25  0.70000   0.0000000
    0.50  0.71950   0.1258630
    1.00  0.74300   0.2942905
    2.00  0.74575   0.3432486
    4.00  0.75650   0.3866960
    8.00  0.74425   0.3727896
   16.00  0.73525   0.3592463
   32.00  0.73100   0.3538083
   64.00  0.72675   0.3463869
  128.00  0.72600   0.3468630

Tuning parameter 'sigma' was held constant at a value of 0.01418087
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01418087 and C = 4.

> svmFit$results
        sigma      C Accuracy     Kappa AccuracySD    KappaSD
1  0.01418087   0.25  0.70000 0.0000000 0.00000000 0.00000000
2  0.01418087   0.50  0.71950 0.1258630 0.02232643 0.07289277
3  0.01418087   1.00  0.74300 0.2942905 0.04135461 0.10954016
4  0.01418087   2.00  0.74575 0.3432486 0.04556967 0.11186963
5  0.01418087   4.00  0.75650 0.3866960 0.04105124 0.10282855
6  0.01418087   8.00  0.74425 0.3727896 0.04568149 0.10933250
7  0.01418087  16.00  0.73525 0.3592463 0.04400907 0.09710983
8  0.01418087  32.00  0.73100 0.3538083 0.04753087 0.10562148
9  0.01418087  64.00  0.72675 0.3463869 0.04744828 0.10635084
10 0.01418087 128.00  0.72600 0.3468630 0.04810437 0.10715867

画图

#画图
plot(svmFit,scale=list(x=list(log=2)))
#The 'scales' argument is actually an argument to xyplot that converts the x-axis to log-2 units.


 预测新样本

#预测新样本
#输出分类结果
predictedClasses<-predict(svmFit,GermanCreditTest)
head(predictedClasses)
#输出概率 type="prob"
predictedProbs<-predict(svmFit,GermanCreditTest,type="prob")
head(predictedProbs)
> #预测新样本
> #输出分类结果
> predictedClasses<-predict(svmFit,GermanCreditTest)
> head(predictedClasses)
[1] Bad  Bad  Good Good Bad  Good
Levels: Bad Good
> #输出概率 type="prob"
> predictedProbs<-predict(svmFit,GermanCreditTest,type="prob")
> head(predictedProbs)
         Bad      Good
1 0.59498463 0.4050154
2 0.51981827 0.4801817
3 0.34775046 0.6522495
4 0.09517461 0.9048254
5 0.64882536 0.3511746
6 0.14423627 0.8557637

用逻辑回归建模,与支持向量机模型进行比较 

#模型间比较
#用逻辑回归
#逻辑回归无调优参数,但是可以通过重抽样方法估计其模型表现
set.seed(1056)
logisticReg<-train(Class~.
              ,data=GermanCreditTrain
              ,method="glm" #广义线性模型
              ,preProc=c("center","scale") #对数据进行中心化与标准化
              ,trControl=trainControl(method="repeatedcv",repeats=5)#模型评估表现默认使用自助法bootstrap,trainControl函数用来实现重复十折交叉验证(重复次数为5次)
) 
logisticReg

 

> logisticReg
Generalized Linear Model 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ... 
Resampling results:

  Accuracy  Kappa    
  0.748     0.3573223

基于交叉验证比较支持向量机与逻辑回归,使用resample函数,该函数比较具有相同重抽样数据集的两个模型
由于事先已设定随机数种子,因此在每个数据集上两个模型都有成对的准确度测量

首先,从两个模型中创建一个resamples对象 

#基于交叉验证比较支持向量机与逻辑回归,使用resample函数,该函数比较具有相同重抽样数据集的两个模型
#由于事先已设定随机数种子,因此在每个数据集上两个模型都有成对的准确度测量

#首先,从两个模型中创建一个resamples对象
resamp<-resamples(list(SVM=svmFit,Logistic=logisticReg))
summary(resamp)

 观察汇总结果,可见两个模型的表现非常接近。NA值对应重抽样模型拟合失败的情况(通常由于数值计算问题)。

> summary(resamp)

Call:
summary.resamples(object = resamp)

Models: SVM, Logistic 
Number of resamples: 50 

Accuracy 
           Min.  1st Qu.  Median    Mean 3rd Qu.   Max. NA's
SVM      0.6875 0.725000 0.75625 0.75325  0.7750 0.8250    0
Logistic 0.6250 0.715625 0.75000 0.74800  0.7875 0.8375    0

Kappa 
                Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
SVM       0.11971831 0.2774423 0.3436717 0.3320549 0.3729167 0.5270270    0
Logistic -0.01351351 0.2797677 0.3670683 0.3573223 0.4471501 0.5886076    0

画图比较:

画图语句可?xyplot.resamples

resamps <- resamples(list(CART = rpartFit,
                          CondInfTree = ctreeFit,
                          MARS = earthFit))

dotplot(resamps,
        scales =list(x = list(relation = "free")),
        between = list(x = 2))

bwplot(resamps,
       metric = "RMSE")

densityplot(resamps,
            auto.key = list(columns = 3),
            pch = "|")

xyplot(resamps,
       models = c("CART", "MARS"),
       metric = "RMSE")

splom(resamps, metric = "RMSE")
splom(resamps, variables = "metrics")

parallelplot(resamps, metric = "RMSE")

 

xyplot(resamp)

 

bwplot(resamp)

diff函数得到两个模型之间的差距 

#diff函数得到两个模型之间的差距
modelDifferences <- diff(resamp)
summary(modelDifferences)

 Accuracy P值为0.3474,Kappa的 P值为0.08531。两个数都比较大,说明两个模型的表现没有显著差异。

> summary(modelDifferences)

Call:
summary.diff.resamples(object = modelDifferences)

p-value adjustment: bonferroni 
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0

Accuracy 
         SVM    Logistic
SVM             0.00525 
Logistic 0.3474         

Kappa 
         SVM     Logistic
SVM              -0.02527
Logistic 0.08531

不同的重抽样方法比较:重复十折交叉验证(重复5次)、十折交叉验证、LOOCV留一交叉验证、重复训练/测试集划分 (留多交叉验证、蒙特卡洛交叉验证)、Bootstrap自助法、632法(为消除估计偏差,Bootstrap自助法改进)

建立一个参数网格,注意,计算出来的sigma为0.008865455,与之前不同。

#不同的重抽样方法比较
#建立一个参数网格
library(kernlab)
set.seed(231)
sigDist <- sigest(Class ~ ., data = GermanCreditTrain, frac = 1)
svmTuneGrid <- data.frame(sigma = as.vector(sigDist)[1], C = 2^(-2:7))
> head(svmTuneGrid)
        sigma    C
1 0.008865455 0.25
2 0.008865455 0.50
3 0.008865455 1.00
4 0.008865455 2.00
5 0.008865455 4.00
6 0.008865455 8.00
#重复十折交叉验证(重复5次)
set.seed(1056)
svmFit <- train(Class ~ .,
                data = GermanCreditTrain,
                method = "svmRadial",
                preProc = c("center", "scale"),
                tuneGrid = svmTuneGrid,
                trControl = trainControl(method = "repeatedcv", 
                                         repeats = 5,
                                         classProbs = TRUE))

#十折交叉验证
set.seed(1056)
svmFit10CV <- train(Class ~ .,
                    data = GermanCreditTrain,
                    method = "svmRadial",
                    preProc = c("center", "scale"),
                    tuneGrid = svmTuneGrid,
                    trControl = trainControl(method = "cv", number = 10))
svmFit10CV

#LOOCV留一交叉验证
set.seed(1056)
svmFitLOO <- train(Class ~ .,
                   data = GermanCreditTrain,
                   method = "svmRadial",
                   preProc = c("center", "scale"),
                   tuneGrid = svmTuneGrid,
                   trControl = trainControl(method = "LOOCV"))
svmFitLOO

# 重复训练/测试集划分 (留多交叉验证、蒙特卡洛交叉验证)
set.seed(1056)
svmFitLGO <- train(Class ~ .,
                   data = GermanCreditTrain,
                   method = "svmRadial",
                   preProc = c("center", "scale"),
                   tuneGrid = svmTuneGrid,
                   trControl = trainControl(method = "LGOCV", 
                                            number = 50, 
                                            p = .8))
svmFitLGO 

#Bootstrap自助法
set.seed(1056)
svmFitBoot <- train(Class ~ .,
                    data = GermanCreditTrain,
                    method = "svmRadial",
                    preProc = c("center", "scale"),
                    tuneGrid = svmTuneGrid,
                    trControl = trainControl(method = "boot", number = 50))
svmFitBoot

#632法(为消除估计偏差,Bootstrap自助法改进)
set.seed(1056)
svmFitBoot632 <- train(Class ~ .,
                       data = GermanCreditTrain,
                       method = "svmRadial",
                       preProc = c("center", "scale"),
                       tuneGrid = svmTuneGrid,
                       trControl = trainControl(method = "boot632", 
                                                number = 50))
svmFitBoot632

输出结果: 

> set.seed(1056)
> svmFit <- train(Class ~ .,
+                 data = GermanCreditTrain,
+                 method = "svmRadial",
+                 preProc = c("center", "scale"),
+                 tuneGrid = svmTuneGrid,
+                 trControl = trainControl(method = "repeatedcv", 
+                                          repeats = 5,
+                                          classProbs = TRUE))
> > svmFit
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ... 
Resampling results across tuning parameters:

  C       Accuracy  Kappa    
    0.25  0.74275   0.3551656
    0.50  0.74225   0.3526561
    1.00  0.74450   0.3381060
    2.00  0.74325   0.3240710
    4.00  0.74600   0.3236372
    8.00  0.75600   0.3404570
   16.00  0.75175   0.3167243
   32.00  0.74625   0.2991639
   64.00  0.74200   0.2863788
  128.00  0.74100   0.2876532

Tuning parameter 'sigma' was held constant at a value of 0.008865455
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.008865455 and C = 8.

> set.seed(1056)
> svmFit10CV <- train(Class ~ .,
+                     data = GermanCreditTrain,
+                     method = "svmRadial",
+                     preProc = c("center", "scale"),
+                     tuneGrid = svmTuneGrid,
+                     trControl = trainControl(method = "cv", number = 10))
> svmFit10CV
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ... 
Resampling results across tuning parameters:

  C       Accuracy  Kappa     
    0.25  0.70000   0.00000000
    0.50  0.71375   0.07434516
    1.00  0.73000   0.22440830
    2.00  0.72500   0.27103511
    4.00  0.73250   0.31410770
    8.00  0.74250   0.35622373
   16.00  0.74875   0.38163710
   32.00  0.72500   0.34552832
   64.00  0.71625   0.32881637
  128.00  0.71375   0.31719377

Tuning parameter 'sigma' was held constant at a value of 0.008865455
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.008865455 and C = 16.
> 
> set.seed(1056)
> svmFitLOO <- train(Class ~ .,
+                    data = GermanCreditTrain,
+                    method = "svmRadial",
+                    preProc = c("center", "scale"),
+                    tuneGrid = svmTuneGrid,
+                    trControl = trainControl(method = "LOOCV"))
> svmFitLOO
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 799, 799, 799, 799, 799, 799, ... 
Resampling results across tuning parameters:

  C       Accuracy  Kappa    
    0.25  0.70000   0.0000000
    0.50  0.71750   0.1003185
    1.00  0.74875   0.3049793
    2.00  0.74000   0.3157895
    4.00  0.74875   0.3582375
    8.00  0.76125   0.4068323
   16.00  0.76125   0.4155447
   32.00  0.72250   0.3345324
   64.00  0.71625   0.3268090
  128.00  0.72000   0.3333333

Tuning parameter 'sigma' was held constant at a value of 0.008865455
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.008865455 and C = 8.
> 
> set.seed(1056)
> svmFitLGO <- train(Class ~ .,
+                    data = GermanCreditTrain,
+                    method = "svmRadial",
+                    preProc = c("center", "scale"),
+                    tuneGrid = svmTuneGrid,
+                    trControl = trainControl(method = "LGOCV", 
+                                             number = 50, 
+                                             p = .8))
> svmFitLGO 
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Repeated Train/Test Splits Estimated (50 reps, 80%) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results across tuning parameters:

  C       Accuracy  Kappa    
    0.25  0.70000   0.0000000
    0.50  0.71050   0.0594299
    1.00  0.73625   0.2495432
    2.00  0.74150   0.3146341
    4.00  0.74125   0.3368009
    8.00  0.74275   0.3579301
   16.00  0.73575   0.3538939
   32.00  0.73175   0.3543491
   64.00  0.73050   0.3530627
  128.00  0.72500   0.3416933

Tuning parameter 'sigma' was held constant at a value of 0.008865455
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.008865455 and C = 8.
> 
> set.seed(1056)
> svmFitBoot <- train(Class ~ .,
+                     data = GermanCreditTrain,
+                     method = "svmRadial",
+                     preProc = c("center", "scale"),
+                     tuneGrid = svmTuneGrid,
+                     trControl = trainControl(method = "boot", number = 50))
> svmFitBoot
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 800, 800, 800, 800, 800, 800, ... 
Resampling results across tuning parameters:

  C       Accuracy   Kappa     
    0.25  0.7075989  0.02603939
    0.50  0.7269075  0.17762422
    1.00  0.7335815  0.27299372
    2.00  0.7368083  0.31556354
    4.00  0.7406900  0.34400618
    8.00  0.7389901  0.35552016
   16.00  0.7286678  0.33977450
   32.00  0.7237453  0.33405357
   64.00  0.7160024  0.31912917
  128.00  0.7155680  0.31758286

Tuning parameter 'sigma' was held constant at a value of 0.008865455
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.008865455 and C = 4.
> 
> set.seed(1056)
> svmFitBoot632 <- train(Class ~ .,
+                        data = GermanCreditTrain,
+                        method = "svmRadial",
+                        preProc = c("center", "scale"),
+                        tuneGrid = svmTuneGrid,
+                        trControl = trainControl(method = "boot632", 
+                                                 number = 50))
> svmFitBoot632
Support Vector Machines with Radial Basis Function Kernel 

800 samples
 41 predictor
  2 classes: 'Bad', 'Good' 

Pre-processing: centered (41), scaled (41) 
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 800, 800, 800, 800, 800, 800, ... 
Resampling results across tuning parameters:

  C       Accuracy   Kappa     
    0.25  0.7048035  0.01646003
    0.50  0.7321838  0.18129341
    1.00  0.7621541  0.34889337
    2.00  0.7793689  0.42604578
    4.00  0.7969976  0.48700027
    8.00  0.8087988  0.52781151
   16.00  0.8137701  0.54685454
   32.00  0.8193957  0.56468853
   64.00  0.8168005  0.56082761
  128.00  0.8192850  0.56644034

Tuning parameter 'sigma' was held constant at a value of 0.008865455
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.008865455 and C = 32.

 

书中的图(注意,数值计算与上述有差异)

 

  • 4
    点赞
  • 41
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值