应用预测建模第六章线性回归习题6.1【主成分分析,模型的最优参数选择与模型对比 ,多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网】

模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网

语言:R语言

参考书:应用预测建模 Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译


案例:

 ( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?
( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?
( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?
( e)解释你将使用哪个模型来预测样品的脂肪含量。



载入数据

library(caret)

#载入数据
data(tecator)
head(absorp)
head(endpoints)
> #载入数据
> data(tecator)
> head(absorp)
        [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]    [,9]   [,10]   [,11]
[1,] 2.61776 2.61814 2.61859 2.61912 2.61981 2.62071 2.62186 2.62334 2.62511 2.62722 2.62964
[2,] 2.83454 2.83871 2.84283 2.84705 2.85138 2.85587 2.86060 2.86566 2.87093 2.87661 2.88264
[3,] 2.58284 2.58458 2.58629 2.58808 2.58996 2.59192 2.59401 2.59627 2.59873 2.60131 2.60414
[4,] 2.82286 2.82460 2.82630 2.82814 2.83001 2.83192 2.83392 2.83606 2.83842 2.84097 2.84374
[5,] 2.78813 2.78989 2.79167 2.79350 2.79538 2.79746 2.79984 2.80254 2.80553 2.80890 2.81272
[6,] 3.00993 3.01540 3.02086 3.02634 3.03190 3.03756 3.04341 3.04955 3.05599 3.06274 3.06982
       [,12]   [,13]   [,14]   [,15]   [,16]   [,17]   [,18]   [,19]   [,20]   [,21]   [,22]
[1,] 2.63245 2.63565 2.63933 2.64353 2.64825 2.65350 2.65937 2.66585 2.67281 2.68008 2.68733
[2,] 2.88898 2.89577 2.90308 2.91097 2.91953 2.92873 2.93863 2.94929 2.96072 2.97272 2.98493
[3,] 2.60714 2.61029 2.61361 2.61714 2.62089 2.62486 2.62909 2.63361 2.63835 2.64330 2.64838
[4,] 2.84664 2.84975 2.85307 2.85661 2.86038 2.86437 2.86860 2.87308 2.87789 2.88301 2.88832
[5,] 2.81704 2.82184 2.82710 2.83294 2.83945 2.84664 2.85458 2.86331 2.87280 2.88291 2.89335
[6,] 3.07724 3.08511 3.09343 3.10231 3.11185 3.12205 3.13294 3.14457 3.15703 3.17038 3.18429
       [,23]   [,24]   [,25]   [,26]   [,27]   [,28]   [,29]   [,30]   [,31]   [,32]   [,33]
[1,] 2.69427 2.70073 2.70684 2.71281 2.71914 2.72628 2.73462 2.74416 2.75466 2.76568 2.77679
[2,] 2.99690 3.00833 3.01920 3.02990 3.04101 3.05345 3.06777 3.08416 3.10221 3.12106 3.13983
[3,] 2.65354 2.65870 2.66375 2.66880 2.67383 2.67892 2.68411 2.68937 2.69470 2.70012 2.70563
[4,] 2.89374 2.89917 2.90457 2.90991 2.91521 2.92043 2.92565 2.93082 2.93604 2.94128 2.94658
[5,] 2.90374 2.91371 2.92305 2.93187 2.94060 2.94986 2.96035 2.97241 2.98606 3.00097 3.01652
[6,] 3.19840 3.21225 3.22552 3.23827 3.25084 3.26393 3.27851 3.29514 3.31401 3.33458 3.35591
       [,34]   [,35]   [,36]   [,37]   [,38]   [,39]   [,40]   [,41]   [,42]   [,43]   [,44]
[1,] 2.78790 2.79949 2.81225 2.82706 2.84356 2.86106 2.87857 2.89497 2.90924 2.92085 2.93015
[2,] 3.15810 3.17623 3.19519 3.21584 3.23747 3.25889 3.27835 3.29384 3.30362 3.30681 3.30393
[3,] 2.71141 2.71775 2.72490 2.73344 2.74327 2.75433 2.76642 2.77931 2.79272 2.80649 2.82064
[4,] 2.95202 2.95777 2.96419 2.97159 2.98045 2.99090 3.00284 3.01611 3.03048 3.04579 3.06194
[5,] 3.03220 3.04793 3.06413 3.08153 3.10078 3.12185 3.14371 3.16510 3.18470 3.20140 3.21477
[6,] 3.37709 3.39772 3.41828 3.43974 3.46266 3.48663 3.51002 3.53087 3.54711 3.55699 3.55986
       [,45]   [,46]   [,47]   [,48]   [,49]   [,50]   [,51]   [,52]   [,53]   [,54]   [,55]
[1,] 2.93846 2.94771 2.96019 2.97831 3.00306 3.03506 3.07428 3.11963 3.16868 3.21771 3.26254
[2,] 3.29700 3.28925 3.28409 3.28505 3.29326 3.30923 3.33267 3.36251 3.39661 3.43188 3.46492
[3,] 2.83541 2.85121 2.86872 2.88905 2.91289 2.94088 2.97325 3.00946 3.04780 3.08554 3.11947
[4,] 3.07889 3.09686 3.11629 3.13775 3.16217 3.19068 3.22376 3.26172 3.30379 3.34793 3.39093
[5,] 3.22544 3.23505 3.24586 3.26027 3.28063 3.30889 3.34543 3.39019 3.44198 3.49800 3.55407
[6,] 3.55656 3.54937 3.54169 3.53692 3.53823 3.54760 3.56512 3.59043 3.62229 3.65830 3.69515
       [,56]   [,57]   [,58]   [,59]   [,60]   [,61]   [,62]   [,63]   [,64]   [,65]   [,66]
[1,] 3.29988 3.32847 3.34899 3.36342 3.37379 3.38152 3.38741 3.39164 3.39418 3.39490 3.39366
[2,] 3.49295 3.51458 3.53004 3.54067 3.54797 3.55306 3.55675 3.55921 3.56045 3.56034 3.55876
[3,] 3.14696 3.16677 3.17938 3.18631 3.18924 3.18950 3.18801 3.18498 3.18039 3.17411 3.16611
[4,] 3.42920 3.45998 3.48227 3.49687 3.50558 3.51026 3.51221 3.51215 3.51036 3.50682 3.50140
[5,] 3.60534 3.64789 3.68011 3.70272 3.71815 3.72863 3.73574 3.74059 3.74357 3.74453 3.74336
[6,] 3.72932 3.75803 3.78003 3.79560 3.80614 3.81313 3.81774 3.82079 3.82258 3.82301 3.82206
       [,67]   [,68]   [,69]   [,70]   [,71]   [,72]   [,73]   [,74]   [,75]   [,76]   [,77]
[1,] 3.39045 3.38541 3.37869 3.37041 3.36073 3.34979 3.33769 3.32443 3.31013 3.29487 3.27891
[2,] 3.55571 3.55132 3.54585 3.53950 3.53235 3.52442 3.51583 3.50668 3.49700 3.48683 3.47626
[3,] 3.15641 3.14512 3.13241 3.11843 3.10329 3.08714 3.07014 3.05237 3.03393 3.01504 2.99569
[4,] 3.49398 3.48457 3.47333 3.46041 3.44595 3.43005 3.41285 3.39450 3.37511 3.35482 3.33376
[5,] 3.73991 3.73418 3.72638 3.71676 3.70553 3.69289 3.67900 3.66396 3.64785 3.63085 3.61305
[6,] 3.81959 3.81557 3.81021 3.80375 3.79642 3.78835 3.77958 3.77024 3.76040 3.75005 3.73929
       [,78]   [,79]   [,80]   [,81]   [,82]   [,83]   [,84]   [,85]   [,86]   [,87]   [,88]
[1,] 3.26232 3.24542 3.22828 3.21080 3.19287 3.17433 3.15503 3.13475 3.11339 3.09116 3.06850
[2,] 3.46552 3.45501 3.44481 3.43477 3.42465 3.41419 3.40303 3.39082 3.37731 3.36265 3.34745
[3,] 2.97612 2.95642 2.93660 2.91667 2.89655 2.87622 2.85563 2.83474 2.81361 2.79235 2.77113
[4,] 3.31204 3.28986 3.26730 3.24442 3.22117 3.19757 3.17357 3.14915 3.12429 3.09908 3.07366
[5,] 3.59463 3.57582 3.55695 3.53796 3.51880 3.49936 3.47938 3.45869 3.43711 3.41458 3.39129
[6,] 3.72831 3.71738 3.70681 3.69664 3.68659 3.67649 3.66611 3.65503 3.64283 3.62938 3.61483
       [,89]   [,90]   [,91]   [,92]   [,93]   [,94]   [,95]   [,96]   [,97]   [,98]   [,99]
[1,] 3.04596 3.02393 3.00247 2.98145 2.96072 2.94013 2.91978 2.89966 2.87964 2.85960 2.83940
[2,] 3.33245 3.31818 3.30473 3.29186 3.27921 3.26655 3.25369 3.24045 3.22659 3.21181 3.19600
[3,] 2.75015 2.72956 2.70934 2.68951 2.67009 2.65112 2.63262 2.61461 2.59718 2.58034 2.56404
[4,] 3.04825 3.02308 2.99820 2.97367 2.94951 2.92576 2.90251 2.87988 2.85794 2.83672 2.81617
[5,] 3.36772 3.34450 3.32201 3.30025 3.27907 3.25831 3.23784 3.21765 3.19766 3.17770 3.15770
[6,] 3.59990 3.58535 3.57163 3.55877 3.54651 3.53442 3.52221 3.50972 3.49682 3.48325 3.46870
      [,100]
[1,] 2.81920
[2,] 3.17942
[3,] 2.54816
[4,] 2.79622
[5,] 3.13753
[6,] 3.45307
> head(endpoints)
     [,1] [,2] [,3]
[1,] 60.5 22.5 16.7
[2,] 46.0 40.1 13.5
[3,] 71.0  8.4 20.5
[4,] 72.8  5.9 20.7
[5,] 58.3 25.5 15.5
[6,] 44.0 42.7 13.7

 



 ( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?

主成分分析
 应用主成分分析时,注意以下五点:

  1. 可使用样本协方差阵或相关系数矩阵为出发点来进行分析,但大都以相关系数矩阵为主;
  2. 为使方差达到最大,通常主成分分析是不加以转轴的;
  3. 成分的保留(可有三种方法参考,1:保留特征值大于1的主成分;2:碎石图,在图形变化最大处之上的主成分均可保留;3:平行分析,将真实数据的特征值与模拟数据的特征值进行比较,保留真实数据的特征值大于模拟数据的特征值的主成分);
  4. 在实际研究里,研究者如果用不超过三或五个成分就能解释变异的80%,就算令人满意;
  5. 使用成分得分后,会使各变量的方差为最大,而且各变量之间会彼此独立正交。
PCA=princomp(absorp,cor=T)#cor=T时,输入矩阵为相关系数矩阵,每个元素是0<=x<=1的,对角线为1。
help(princomp)

一共有100个成分。查看成分的重要性(每个成分的标准差,方差贡献率,累计方差贡献率)。

可见,第一个成分解释了98.62%的方差。

> summary(PCA)
Importance of components:
                          Comp.1      Comp.2      Comp.3      Comp.4       Comp.5       Comp.6
Standard deviation     9.9310721 0.984736121 0.528511377 0.338274841 8.037979e-02 5.123077e-02
Proportion of Variance 0.9862619 0.009697052 0.002793243 0.001144299 6.460911e-05 2.624591e-05
Cumulative Proportion  0.9862619 0.995958978 0.998752221 0.999896520 9.999611e-01 9.999874e-01

每个成分的标准差即为特征值开根号的结果,可以选择保留特征值大于1的主成分 。

本例中,保留第一个成分。

> PCA$sdev
      Comp.1       Comp.2       Comp.3       Comp.4       Comp.5       Comp.6       Comp.7 
9.931072e+00 9.847361e-01 5.285114e-01 3.382748e-01 8.037979e-02 5.123077e-02 2.680884e-02 
      Comp.8       Comp.9      Comp.10      Comp.11      Comp.12      Comp.13      Comp.14 
1.960880e-02 8.564232e-03 6.739417e-03 4.441898e-03 3.360852e-03 1.867188e-03 1.376574e-03 

由碎石图可以看出,第二个主成分及之后图线变化趋于平稳,因此可以选择第一个主成分

screeplot(PCA,type="lines")

 结论:这些数据的有效维度为1维。



( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?

进行数据预处理

#数据预处理
summary(absorp)
summary(endpoints)
#有缺失值的变量所在的位置
NAcol <- which(colSums(is.na(absorp))>0)
#本例无缺失值

#计算偏度
library(e1071)
summary(apply(absorp,2,skewness) )

#矩阵转为数据框
absorp<-as.data.frame(absorp)

#box-cox变换
boxcox = function(x){
  trans = BoxCoxTrans(x)
  x_bc = predict( trans, x )
}
absorp.trans<-apply( absorp, 2, boxcox )
absorp.trans<-as.data.frame(absorp.trans)

#偏度减小
summary(apply(absorp.trans,2,skewness) )

#响应变量
fat<-endpoints[,2]
> #计算偏度
> library(e1071)
> summary(apply(absorp,2,skewness) )
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8260  0.8432  0.8946  0.9027  0.9667  0.9976 
> #偏度减小
> summary(apply(absorp.trans,2,skewness) )
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.005178 0.021773 0.034949 0.034739 0.046840 0.066915 

进行建模。 

由于本数据集为小样本(215),且建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择),因此重抽样方法采取自助法。


#本次的重抽样方法用自助法,因为案例为小样本,建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择)

#设定随机数种子,这样重抽样数据集可以重复
set.seed(100)
indx <- createResample(fat,times = 50, list = TRUE)
ctrl <- trainControl(method = "boot",number=50,index = indx)

 

线性回归

#进行线性回归
set.seed(100)
lmFit1 <- train(x = absorp.trans, y = fat,
                method = "lm",
                trControl = ctrl,
                preProc = c("center", "scale"))

lmFit1 
> lmFit1
Linear Regression 

215 samples
100 predictors

Pre-processing: centered (100), scaled (100) 
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... 
Resampling results:

  RMSE      Rsquared   MAE    
  2.706693  0.9560697  1.87457

Tuning parameter 'intercept' was held constant at a value of TRUE

 

最小二乘回归PLS

#PLS
set.seed(100)
plsTune <- train(x = absorp.trans, y = fat,
                 method = "kernelpls",#Dayal和MacGregor的第一种核函数算法kernelpls
                 tuneGrid = expand.grid(ncomp = 1:50),#设定成分数
                 trControl = ctrl,
                 preProc = c("center", "scale"))
plsTune

 PLS调优参数:最佳成分数为20

> plsTune
Partial Least Squares 

215 samples
100 predictors

Pre-processing: centered (100), scaled (100) 
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... 
Resampling results across tuning parameters:

  ncomp  RMSE       Rsquared   MAE     
   1     11.183174  0.2396493  9.088124
   2      8.686219  0.5457929  6.929797
   3      5.436814  0.8171801  4.017126
   4      4.719667  0.8639848  3.637222
   5      3.183214  0.9391108  2.432778
   6      3.113421  0.9417685  2.407983
   7      2.981929  0.9478784  2.203364
   8      2.803669  0.9537681  2.004770
   9      2.658721  0.9578675  1.851331
  10      2.486691  0.9635027  1.724633
  11      2.291300  0.9688312  1.594113
  12      2.148954  0.9725424  1.527899
  13      2.046004  0.9746878  1.462254
  14      2.019919  0.9752486  1.425622
  15      1.909752  0.9777992  1.336137
  16      1.760398  0.9808560  1.224250
  17      1.666462  0.9829127  1.171777
  18      1.590492  0.9845054  1.134067
  19      1.567033  0.9849643  1.128898
  20      1.534394  0.9855824  1.113200
  21      1.560273  0.9850988  1.119986
  22      1.566204  0.9849703  1.115952
  23      1.553964  0.9851720  1.105929
  24      1.591527  0.9845027  1.118788
  25      1.625377  0.9838303  1.135157
  26      1.658889  0.9831096  1.157611
  27      1.683492  0.9824806  1.172554
  28      1.744393  0.9811685  1.208383
  29      1.795215  0.9800030  1.244794
  30      1.848273  0.9788338  1.286987
  31      1.883307  0.9780175  1.318409
  32      1.938951  0.9767456  1.360510
  33      1.986740  0.9755708  1.392391
  34      2.023170  0.9746550  1.422097
  35      2.071566  0.9734367  1.453087
  36      2.112281  0.9724229  1.478971
  37      2.146216  0.9715039  1.500828
  38      2.175165  0.9708326  1.520763
  39      2.192173  0.9705042  1.536536
  40      2.222708  0.9697809  1.557308
  41      2.246722  0.9692350  1.575675
  42      2.256637  0.9689731  1.587785
  43      2.274497  0.9685442  1.604375
  44      2.306888  0.9677469  1.627217
  45      2.329405  0.9671802  1.644553
  46      2.359832  0.9663816  1.662377
  47      2.374701  0.9659943  1.673540
  48      2.404638  0.9652013  1.691980
  49      2.433347  0.9643316  1.709945
  50      2.449268  0.9638598  1.721255

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 20.

 

稳健回归

 

#稳健回归
set.seed(100)
rlmFit <- train(x = absorp.trans, y = fat,
                method = "rlm",
                trControl = ctrl,
                preProc = c("center", "scale"))

rlmFit

有截距,方法为huber 

> rlmFit
Robust Linear Model 

215 samples
100 predictors

Pre-processing: centered (100), scaled (100) 
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... 
Resampling results across tuning parameters:

  intercept  psi           RMSE       Rsquared   MAE      
  FALSE      psi.huber     18.145830  0.9560697  17.940133
  FALSE      psi.hampel    18.145830  0.9560697  17.940133
  FALSE      psi.bisquare  18.146153  0.9560864  17.940487
   TRUE      psi.huber      3.220055  0.9385840   2.215829
   TRUE      psi.hampel    25.311035  0.3743814  17.164686
   TRUE      psi.bisquare  76.651845  0.4210313  54.553360

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were intercept = TRUE and psi = psi.huber.


 

岭回归

#岭回归
#用train函数选择岭回归的最佳参数
#设定正则化参数 取值范围为0-0.1,中间取15个值
ridgeGrid <- expand.grid(lambda = seq(0, .1, length = 15))

set.seed(100)
ridgeTune <- train(x = absorp.trans, y = fat,
                   method = "ridge", #岭回归
                   tuneGrid = ridgeGrid,
                   trControl = ctrl,
                   preProc = c("center", "scale"))

ridgeTune

岭回归调优参数:正则化参数lambda为0,即不进行正则化

> ridgeTune
Ridge Regression 

215 samples
100 predictors

Pre-processing: centered (100), scaled (100) 
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... 
Resampling results across tuning parameters:

  lambda       RMSE      Rsquared   MAE     
  0.000000000  2.706699  0.9560687  1.874579
  0.007142857  3.643592  0.9213753  2.748934
  0.014285714  4.052245  0.9031794  3.011670
  0.021428571  4.316364  0.8907687  3.175379
  0.028571429  4.511024  0.8814221  3.300525
  0.035714286  4.668275  0.8737664  3.406487
  0.042857143  4.803098  0.8670998  3.502220
  0.050000000  4.923200  0.8610413  3.589788
  0.057142857  5.032872  0.8553722  3.672677
  0.064285714  5.134671  0.8499616  3.751602
  0.071428571  5.230211  0.8447286  3.826811
  0.078571429  5.320565  0.8396221  3.898918
  0.085714286  5.406487  0.8346093  3.967995
  0.092857143  5.488526  0.8296690  4.034324
  0.100000000  5.567103  0.8247876  4.099350

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.

 

弹性网

#enet弹性网
#弹性网模型同时具有岭回归罚参数和lasso 罚参数
#lambda为岭回归罚参数(当lambda为0时即为纯lasso模型)
#fraction为lasso罚参数,取值范围为0.05-1,取了20个值
enetGrid <- expand.grid(lambda = seq(0, .1, length = 15), 
                        fraction = seq(.05, 1, length = 20))
set.seed(100)
enetTune <- train(x = absorp.trans, y = fat,
                  method = "enet", #弹性网 elastic net
                  tuneGrid = enetGrid,
                  trControl = ctrl,
                  preProc = c("center", "scale"))
enetTune

弹性网的调优参数:岭回归罚lambda为0,lasso罚为0.0526

> enetTune
Elasticnet 

215 samples
100 predictors

Pre-processing: centered (100), scaled (100) 
Resampling: Bootstrapped (50 reps) 
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ... 
Resampling results across tuning parameters:

  lambda       fraction    RMSE       Rsquared   MAE      
  0.000000000  0.00000000  12.719745        NaN  10.747191
  0.000000000  0.05263158   1.505096  0.9861744   1.072324
  0.000000000  0.10526316   1.611281  0.9839983   1.133129
  0.000000000  0.15789474   1.713409  0.9818137   1.200435
  0.000000000  0.21052632   1.787748  0.9802273   1.257652
  0.000000000  0.26315789   1.844759  0.9790006   1.303476
  0.000000000  0.31578947   1.897734  0.9778433   1.344380
  0.000000000  0.36842105   1.952677  0.9766278   1.383494
  0.000000000  0.42105263   2.008177  0.9753659   1.424576
  0.000000000  0.47368421   2.060634  0.9741002   1.463339
  0.000000000  0.52631579   2.113615  0.9727729   1.502524
  0.000000000  0.57894737   2.170547  0.9713402   1.540915
  0.000000000  0.63157895   2.232797  0.9697258   1.581797
  0.000000000  0.68421053   2.294018  0.9680902   1.620818
  0.000000000  0.73684211   2.357063  0.9663678   1.660460
  0.000000000  0.78947368   2.423098  0.9645125   1.700407
  0.000000000  0.84210526   2.490069  0.9625804   1.740759
  0.000000000  0.89473684   2.560683  0.9604961   1.784247
  0.000000000  0.94736842   2.632378  0.9583456   1.828712
  0.000000000  1.00000000   2.706699  0.9560687   1.874579
  0.007142857  0.00000000  12.719745        NaN  10.747191
  0.007142857  0.05263158  10.350630  0.3671638   8.438978
  0.007142857  0.10526316   9.476790  0.4998352   7.728673
  0.007142857  0.15789474   8.639364  0.6119787   7.045030
  0.007142857  0.21052632   7.846910  0.6977900   6.378826
  0.007142857  0.26315789   7.111837  0.7588304   5.744852
  0.007142857  0.31578947   6.465549  0.7991165   5.168081
  0.007142857  0.36842105   5.913752  0.8265441   4.656087
  0.007142857  0.42105263   5.440989  0.8490667   4.229522
  0.007142857  0.47368421   5.027732  0.8674093   3.858951
  0.007142857  0.52631579   4.688027  0.8811803   3.550859
  0.007142857  0.57894737   4.415170  0.8914804   3.307966
  0.007142857  0.63157895   4.206019  0.8992059   3.136113
  0.007142857  0.68421053   4.051948  0.9049141   3.023897
  0.007142857  0.73684211   3.935221  0.9093661   2.945459
  0.007142857  0.78947368   3.848531  0.9128567   2.888426
  0.007142857  0.84210526   3.779217  0.9157310   2.841966
  0.007142857  0.89473684   3.722209  0.9181026   2.803051
  0.007142857  0.94736842   3.676363  0.9200050   2.771592
  0.007142857  1.00000000   3.643592  0.9213753   2.748934
  0.014285714  0.00000000  12.719745        NaN  10.747191
  0.014285714  0.05263158  10.502349  0.3428964   8.558695
  0.014285714  0.10526316   9.773519  0.4546769   7.961195
  0.014285714  0.15789474   9.069145  0.5553162   7.390326
  0.014285714  0.21052632   8.394427  0.6389904   6.832886
  0.014285714  0.26315789   7.761244  0.7038771   6.296183
  0.014285714  0.31578947   7.168269  0.7528274   5.785871
  0.014285714  0.36842105   6.630843  0.7880543   5.309986
  0.014285714  0.42105263   6.163194  0.8129957   4.881257
  0.014285714  0.47368421   5.767428  0.8322562   4.512357
  0.014285714  0.52631579   5.416732  0.8490222   4.194330
  0.014285714  0.57894737   5.106516  0.8627297   3.908038
  0.014285714  0.63157895   4.841776  0.8735009   3.660274
  0.014285714  0.68421053   4.631100  0.8814848   3.464679
  0.014285714  0.73684211   4.459273  0.8877800   3.312559
  0.014285714  0.78947368   4.326597  0.8925717   3.202747
  0.014285714  0.84210526   4.226622  0.8961885   3.129434
  0.014285714  0.89473684   4.151838  0.8990589   3.077307
  0.014285714  0.94736842   4.095699  0.9013489   3.039711
  0.014285714  1.00000000   4.052245  0.9031794   3.011670
  0.021428571  0.00000000  12.719745        NaN  10.747191
  0.021428571  0.05263158  10.568050  0.3321870   8.607846
  0.021428571  0.10526316   9.903277  0.4339021   8.059825
  0.021428571  0.15789474   9.258979  0.5277223   7.536974
  0.021428571  0.21052632   8.643469  0.6077921   7.031643
  0.021428571  0.26315789   8.056217  0.6732191   6.538404
  0.021428571  0.31578947   7.500502  0.7247124   6.063381
  0.021428571  0.36842105   6.989114  0.7634905   5.620227
  0.021428571  0.42105263   6.534448  0.7915351   5.211965
  0.021428571  0.47368421   6.143486  0.8120566   4.848922
  0.021428571  0.52631579   5.799866  0.8294819   4.527803
  0.021428571  0.57894737   5.491306  0.8443804   4.242645
  0.021428571  0.63157895   5.225475  0.8562590   3.992674
  0.021428571  0.68421053   4.995680  0.8657523   3.773273
  0.021428571  0.73684211   4.802370  0.8732078   3.590165
  0.021428571  0.78947368   4.648047  0.8788861   3.446489
  0.021428571  0.84210526   4.525632  0.8832665   3.338855
  0.021428571  0.89473684   4.430319  0.8866321   3.260002
  0.021428571  0.94736842   4.364849  0.8889374   3.210078
  0.021428571  1.00000000   4.316364  0.8907687   3.175379
  0.028571429  0.00000000  12.719745        NaN  10.747191
  0.028571429  0.05263158  10.605392  0.3259594   8.633546
  0.028571429  0.10526316   9.976188  0.4217673   8.112706
  0.028571429  0.15789474   9.368227  0.5108284   7.618146
  0.028571429  0.21052632   8.784915  0.5884503   7.139689
  0.028571429  0.26315789   8.224602  0.6535386   6.670814
  0.028571429  0.31578947   7.696412  0.7055287   6.221096
  0.028571429  0.36842105   7.207055  0.7456827   5.799405
  0.028571429  0.42105263   6.764782  0.7757525   5.407932
  0.028571429  0.47368421   6.376330  0.7980554   5.051709
  0.028571429  0.52631579   6.038847  0.8158720   4.734050
  0.028571429  0.57894737   5.739066  0.8312751   4.452923
  0.028571429  0.63157895   5.476325  0.8437891   4.205595
  0.028571429  0.68421053   5.242242  0.8541133   3.981219
  0.028571429  0.73684211   5.041635  0.8623182   3.786711
  0.028571429  0.78947368   4.879378  0.8684836   3.631171
  0.028571429  0.84210526   4.745253  0.8734010   3.504712
  0.028571429  0.89473684   4.641796  0.8770309   3.410686
  0.028571429  0.94736842   4.567118  0.8795336   3.345936
  0.028571429  1.00000000   4.511024  0.8814221   3.300525
  0.035714286  0.00000000  12.719745        NaN  10.747191
  0.035714286  0.05263158  10.630389  0.3216962   8.649142
  0.035714286  0.10526316  10.025364  0.4132908   8.146398
  0.035714286  0.15789474   9.441794  0.4988722   7.670647
  0.035714286  0.21052632   8.878988  0.5746934   7.208290
  0.035714286  0.26315789   8.339491  0.6389087   6.756922
  0.035714286  0.31578947   7.831674  0.6908389   6.325532
  0.035714286  0.36842105   7.359193  0.7316532   5.919404
  0.035714286  0.42105263   6.926296  0.7631330   5.539167
  0.035714286  0.47368421   6.543396  0.7867501   5.191419
  0.035714286  0.52631579   6.210901  0.8051696   4.879779
  0.035714286  0.57894737   5.921095  0.8207037   4.605158
  0.035714286  0.63157895   5.658841  0.8338811   4.356554
  0.035714286  0.68421053   5.425099  0.8447735   4.132049
  0.035714286  0.73684211   5.224377  0.8533910   3.935746
  0.035714286  0.78947368   5.056868  0.8600396   3.772063
  0.035714286  0.84210526   4.916623  0.8653073   3.636631
  0.035714286  0.89473684   4.810863  0.8690600   3.536302
  0.035714286  0.94736842   4.729634  0.8717975   3.461235
  0.035714286  1.00000000   4.668275  0.8737664   3.406487
  0.042857143  0.00000000  12.719745        NaN  10.747191
  0.042857143  0.05263158  10.649006  0.3183598   8.659408
  0.042857143  0.10526316  10.063732  0.4065052   8.171419
  0.042857143  0.15789474   9.498332  0.4893522   7.709320
  0.042857143  0.21052632   8.951603  0.5635681   7.259062
  0.042857143  0.26315789   8.429932  0.6267147   6.822851
  0.042857143  0.31578947   7.938112  0.6784272   6.405875
  0.042857143  0.36842105   7.478275  0.7197554   6.010701
  0.042857143  0.42105263   7.053938  0.7521797   5.639166
  0.042857143  0.47368421   6.677302  0.7767562   5.299404
  0.042857143  0.52631579   6.352896  0.7955833   4.996825
  0.042857143  0.57894737   6.066895  0.8115599   4.725789
  0.042857143  0.63157895   5.807544  0.8252168   4.477926
  0.042857143  0.68421053   5.574963  0.8365873   4.253607
  0.042857143  0.73684211   5.375833  0.8455274   4.058461
  0.042857143  0.78947368   5.204039  0.8526667   3.888874
  0.042857143  0.84210526   5.061502  0.8581731   3.748493
  0.042857143  0.89473684   4.954026  0.8620665   3.644632
  0.042857143  0.94736842   4.868042  0.8650243   3.562585
  0.042857143  1.00000000   4.803098  0.8670998   3.502220
  0.050000000  0.00000000  12.719745        NaN  10.747191
  0.050000000  0.05263158  10.664511  0.3155007   8.667079
  0.050000000  0.10526316  10.096683  0.4006075   8.192319
  0.050000000  0.15789474   9.546087  0.4811327   7.741090
  0.050000000  0.21052632   9.014028  0.5537204   7.302158
  0.050000000  0.26315789   8.507619  0.6158259   6.878413
  0.050000000  0.31578947   8.028660  0.6673112   6.472190
  0.050000000  0.36842105   7.579264  0.7090114   6.086293
  0.050000000  0.42105263   7.164037  0.7420437   5.723139
  0.050000000  0.47368421   6.794556  0.7672864   5.391397
  0.050000000  0.52631579   6.476421  0.7866019   5.096395
  0.050000000  0.57894737   6.193359  0.8030704   4.829024
  0.050000000  0.63157895   5.936877  0.8171967   4.583914
  0.050000000  0.68421053   5.707621  0.8288982   4.361443
  0.050000000  0.73684211   5.508621  0.8382434   4.166045
  0.050000000  0.78947368   5.334012  0.8458269   3.992927
  0.050000000  0.84210526   5.191385  0.8515252   3.851408
  0.050000000  0.89473684   5.081095  0.8556455   3.742093
  0.050000000  0.94736842   4.991350  0.8588172   3.655083
  0.050000000  1.00000000   4.923200  0.8610413   3.589788
  0.057142857  0.00000000  12.719745        NaN  10.747191
  0.057142857  0.05263158  10.676963  0.3130233   8.672758
  0.057142857  0.10526316  10.125993  0.3953287   8.210536
  0.057142857  0.15789474   9.588444  0.4737441   7.768832
  0.057142857  0.21052632   9.069954  0.5447150   7.339995
  0.057142857  0.26315789   8.576760  0.6058303   6.926522
  0.057142857  0.31578947   8.109112  0.6570301   6.530300
  0.057142857  0.36842105   7.668838  0.6990048   6.152385
  0.057142857  0.42105263   7.262463  0.7324665   5.797181
  0.057142857  0.47368421   6.901056  0.7581593   5.473870
  0.057142857  0.52631579   6.586697  0.7780815   5.183851
  0.057142857  0.57894737   6.307299  0.7949700   4.921166
  0.057142857  0.63157895   6.053480  0.8095358   4.679131
  0.057142857  0.68421053   5.828208  0.8215064   4.460088
  0.057142857  0.73684211   5.628373  0.8313213   4.263205
  0.057142857  0.78947368   5.452644  0.8392831   4.088233
  0.057142857  0.84210526   5.310478  0.8451827   3.946665
  0.057142857  0.89473684   5.197039  0.8495755   3.833397
  0.057142857  0.94736842   5.104287  0.8529609   3.741933
  0.057142857  1.00000000   5.032872  0.8553722   3.672677
  0.064285714  0.00000000  12.719745        NaN  10.747191
  0.064285714  0.05263158  10.686288  0.3108834   8.676953
  0.064285714  0.10526316  10.152606  0.3905288   8.226772
  0.064285714  0.15789474   9.626873  0.4669835   7.793571
  0.064285714  0.21052632   9.120859  0.5363797   7.373740
  0.064285714  0.26315789   8.639160  0.5965806   6.969417
  0.064285714  0.31578947   8.181790  0.6474532   6.582130
  0.064285714  0.36842105   7.750265  0.6895483   6.211952
  0.064285714  0.42105263   7.352325  0.7233143   5.864293
  0.064285714  0.47368421   6.998582  0.7493784   5.548492
  0.064285714  0.52631579   6.687170  0.7698967   5.262544
  0.064285714  0.57894737   6.411479  0.7871715   5.004514
  0.064285714  0.63157895   6.161145  0.8020894   4.766387
  0.064285714  0.68421053   5.938954  0.8143581   4.551033
  0.064285714  0.73684211   5.738487  0.8246301   4.353021
  0.064285714  0.78947368   5.562997  0.8329097   4.177992
  0.064285714  0.84210526   5.420938  0.8390481   4.036234
  0.064285714  0.89473684   5.304599  0.8437340   3.919402
  0.064285714  0.94736842   5.209377  0.8473351   3.824781
  0.064285714  1.00000000   5.134671  0.8499616   3.751602
 [ reached getOption("max.print") -- omitted 100 rows ]

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.05263158 and lambda = 0.


 

 ( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?

比较模型结果:

caret包的resamples函数可以分析和可视化重抽样的结果(需要用train函数进行重抽样)。

对于每一个模型来说,比较的对象为每个算法中RMSE最小的最终模型。因为重抽样法为自助法,设定抽取了50次,因此每个算法的最终模型都有50个结果。

#模型比较
#resamples函数可以分析和可视化重抽样的结果
resamp <- resamples( list(lm=lmFit1,rlm=rlmFit,pls=plsTune,ridge=ridgeTune,enet=enetTune) )
summary(resamp) 

可见,pls与enet模型的RMSE均值最小,代表这两个模型的预测效果最好,而rlm的RMSE均值最高,代表预测效果最差。

> summary(resamp)

Call:
summary.resamples(object = resamp)

Models: lm, rlm, pls, ridge, enet 
Number of resamples: 50 

MAE 
           Min.   1st Qu.   Median     Mean  3rd Qu.     Max. NA's
lm    1.3090300 1.6211047 1.879747 1.874570 2.024030 2.989562    0
rlm   1.4045612 1.9200873 2.183430 2.215829 2.456707 3.715148    0
pls   0.8375685 1.0323470 1.101890 1.113200 1.180808 1.440820    0
ridge 1.3089927 1.6211901 1.879508 1.874579 2.024011 2.990832    0
enet  0.8195073 0.9752678 1.076620 1.072324 1.156561 1.291045    0

RMSE 
          Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
lm    1.705850 2.286390 2.678425 2.706693 3.014897 4.458380    0
rlm   1.886688 2.693024 3.180843 3.220055 3.576437 5.347496    0
pls   1.068864 1.370754 1.503226 1.534394 1.646459 2.127759    0
ridge 1.705678 2.286423 2.678400 2.706699 3.014607 4.460421    0
enet  1.109904 1.346236 1.484470 1.505096 1.661772 2.002391    0

Rsquared 
           Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
lm    0.8969856 0.9481050 0.9575495 0.9560697 0.9703579 0.9829345    0
rlm   0.8266755 0.9283558 0.9418637 0.9385840 0.9597848 0.9779785    0
pls   0.9708559 0.9823688 0.9869358 0.9855824 0.9888411 0.9926647    0
ridge 0.8969050 0.9481031 0.9575498 0.9560687 0.9703573 0.9829340    0
enet  0.9744586 0.9833999 0.9869512 0.9861744 0.9887381 0.9930981    0

绘图,每一个模型的RMSE均值的置信区间。

dotplot( resamp, metric="RMSE" )

由图可直观看出,enet与pls预测效果最优且两者效果相近,rlm预测效果最差。 

用diff函数进行对比。

summary(diff(resamp))

上三角: estimates of the difference,差值

下三角:P值

从RMSE角度进行考虑,联系差值与P值,可以得出结论:rlm显著地比其他模型差,enet显著地比其他模型好。

> summary(diff(resamp))

Call:
summary.diff.resamples(object = diff(resamp))

p-value adjustment: bonferroni 
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0

MAE 
      lm        rlm        pls        ridge      enet      
lm              -3.413e-01  7.614e-01 -8.713e-06  8.022e-01
rlm   1.605e-11             1.103e+00  3.413e-01  1.144e+00
pls   < 2.2e-16 < 2.2e-16             -7.614e-01  4.088e-02
ridge 1         1.591e-11  < 2.2e-16              8.023e-01
enet  < 2.2e-16 < 2.2e-16  1.420e-05  < 2.2e-16            

RMSE 
      lm        rlm        pls        ridge      enet      
lm              -5.134e-01  1.172e+00 -6.918e-06  1.202e+00
rlm   2.097e-11             1.686e+00  5.134e-01  1.715e+00
pls   < 2.2e-16 < 2.2e-16             -1.172e+00  2.930e-02
ridge 1.0000    2.082e-11  < 2.2e-16              1.202e+00
enet  < 2.2e-16 < 2.2e-16  0.2861     < 2.2e-16            

Rsquared 
      lm        rlm        pls        ridge      enet      
lm               1.749e-02 -2.951e-02  9.420e-07 -3.010e-02
rlm   5.945e-09            -4.700e-02 -1.748e-02 -4.759e-02
pls   1.224e-14 1.451e-14              2.951e-02 -5.920e-04
ridge 1.0000    5.896e-09  1.238e-14             -3.011e-02
enet  1.679e-15 5.609e-15  0.3368     1.698e-15    


 

( e)解释你将使用哪个模型来预测样品的脂肪含量。

选择弹性网,因为此模型显著地比其他模型好,RMSE最低。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值