模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网
语言:R语言
参考书:应用预测建模 Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译
案例:
( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?
( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?
( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?
( e)解释你将使用哪个模型来预测样品的脂肪含量。
载入数据
library(caret)
#载入数据
data(tecator)
head(absorp)
head(endpoints)
> #载入数据
> data(tecator)
> head(absorp)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 2.61776 2.61814 2.61859 2.61912 2.61981 2.62071 2.62186 2.62334 2.62511 2.62722 2.62964
[2,] 2.83454 2.83871 2.84283 2.84705 2.85138 2.85587 2.86060 2.86566 2.87093 2.87661 2.88264
[3,] 2.58284 2.58458 2.58629 2.58808 2.58996 2.59192 2.59401 2.59627 2.59873 2.60131 2.60414
[4,] 2.82286 2.82460 2.82630 2.82814 2.83001 2.83192 2.83392 2.83606 2.83842 2.84097 2.84374
[5,] 2.78813 2.78989 2.79167 2.79350 2.79538 2.79746 2.79984 2.80254 2.80553 2.80890 2.81272
[6,] 3.00993 3.01540 3.02086 3.02634 3.03190 3.03756 3.04341 3.04955 3.05599 3.06274 3.06982
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
[1,] 2.63245 2.63565 2.63933 2.64353 2.64825 2.65350 2.65937 2.66585 2.67281 2.68008 2.68733
[2,] 2.88898 2.89577 2.90308 2.91097 2.91953 2.92873 2.93863 2.94929 2.96072 2.97272 2.98493
[3,] 2.60714 2.61029 2.61361 2.61714 2.62089 2.62486 2.62909 2.63361 2.63835 2.64330 2.64838
[4,] 2.84664 2.84975 2.85307 2.85661 2.86038 2.86437 2.86860 2.87308 2.87789 2.88301 2.88832
[5,] 2.81704 2.82184 2.82710 2.83294 2.83945 2.84664 2.85458 2.86331 2.87280 2.88291 2.89335
[6,] 3.07724 3.08511 3.09343 3.10231 3.11185 3.12205 3.13294 3.14457 3.15703 3.17038 3.18429
[,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33]
[1,] 2.69427 2.70073 2.70684 2.71281 2.71914 2.72628 2.73462 2.74416 2.75466 2.76568 2.77679
[2,] 2.99690 3.00833 3.01920 3.02990 3.04101 3.05345 3.06777 3.08416 3.10221 3.12106 3.13983
[3,] 2.65354 2.65870 2.66375 2.66880 2.67383 2.67892 2.68411 2.68937 2.69470 2.70012 2.70563
[4,] 2.89374 2.89917 2.90457 2.90991 2.91521 2.92043 2.92565 2.93082 2.93604 2.94128 2.94658
[5,] 2.90374 2.91371 2.92305 2.93187 2.94060 2.94986 2.96035 2.97241 2.98606 3.00097 3.01652
[6,] 3.19840 3.21225 3.22552 3.23827 3.25084 3.26393 3.27851 3.29514 3.31401 3.33458 3.35591
[,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44]
[1,] 2.78790 2.79949 2.81225 2.82706 2.84356 2.86106 2.87857 2.89497 2.90924 2.92085 2.93015
[2,] 3.15810 3.17623 3.19519 3.21584 3.23747 3.25889 3.27835 3.29384 3.30362 3.30681 3.30393
[3,] 2.71141 2.71775 2.72490 2.73344 2.74327 2.75433 2.76642 2.77931 2.79272 2.80649 2.82064
[4,] 2.95202 2.95777 2.96419 2.97159 2.98045 2.99090 3.00284 3.01611 3.03048 3.04579 3.06194
[5,] 3.03220 3.04793 3.06413 3.08153 3.10078 3.12185 3.14371 3.16510 3.18470 3.20140 3.21477
[6,] 3.37709 3.39772 3.41828 3.43974 3.46266 3.48663 3.51002 3.53087 3.54711 3.55699 3.55986
[,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55]
[1,] 2.93846 2.94771 2.96019 2.97831 3.00306 3.03506 3.07428 3.11963 3.16868 3.21771 3.26254
[2,] 3.29700 3.28925 3.28409 3.28505 3.29326 3.30923 3.33267 3.36251 3.39661 3.43188 3.46492
[3,] 2.83541 2.85121 2.86872 2.88905 2.91289 2.94088 2.97325 3.00946 3.04780 3.08554 3.11947
[4,] 3.07889 3.09686 3.11629 3.13775 3.16217 3.19068 3.22376 3.26172 3.30379 3.34793 3.39093
[5,] 3.22544 3.23505 3.24586 3.26027 3.28063 3.30889 3.34543 3.39019 3.44198 3.49800 3.55407
[6,] 3.55656 3.54937 3.54169 3.53692 3.53823 3.54760 3.56512 3.59043 3.62229 3.65830 3.69515
[,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66]
[1,] 3.29988 3.32847 3.34899 3.36342 3.37379 3.38152 3.38741 3.39164 3.39418 3.39490 3.39366
[2,] 3.49295 3.51458 3.53004 3.54067 3.54797 3.55306 3.55675 3.55921 3.56045 3.56034 3.55876
[3,] 3.14696 3.16677 3.17938 3.18631 3.18924 3.18950 3.18801 3.18498 3.18039 3.17411 3.16611
[4,] 3.42920 3.45998 3.48227 3.49687 3.50558 3.51026 3.51221 3.51215 3.51036 3.50682 3.50140
[5,] 3.60534 3.64789 3.68011 3.70272 3.71815 3.72863 3.73574 3.74059 3.74357 3.74453 3.74336
[6,] 3.72932 3.75803 3.78003 3.79560 3.80614 3.81313 3.81774 3.82079 3.82258 3.82301 3.82206
[,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77]
[1,] 3.39045 3.38541 3.37869 3.37041 3.36073 3.34979 3.33769 3.32443 3.31013 3.29487 3.27891
[2,] 3.55571 3.55132 3.54585 3.53950 3.53235 3.52442 3.51583 3.50668 3.49700 3.48683 3.47626
[3,] 3.15641 3.14512 3.13241 3.11843 3.10329 3.08714 3.07014 3.05237 3.03393 3.01504 2.99569
[4,] 3.49398 3.48457 3.47333 3.46041 3.44595 3.43005 3.41285 3.39450 3.37511 3.35482 3.33376
[5,] 3.73991 3.73418 3.72638 3.71676 3.70553 3.69289 3.67900 3.66396 3.64785 3.63085 3.61305
[6,] 3.81959 3.81557 3.81021 3.80375 3.79642 3.78835 3.77958 3.77024 3.76040 3.75005 3.73929
[,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86] [,87] [,88]
[1,] 3.26232 3.24542 3.22828 3.21080 3.19287 3.17433 3.15503 3.13475 3.11339 3.09116 3.06850
[2,] 3.46552 3.45501 3.44481 3.43477 3.42465 3.41419 3.40303 3.39082 3.37731 3.36265 3.34745
[3,] 2.97612 2.95642 2.93660 2.91667 2.89655 2.87622 2.85563 2.83474 2.81361 2.79235 2.77113
[4,] 3.31204 3.28986 3.26730 3.24442 3.22117 3.19757 3.17357 3.14915 3.12429 3.09908 3.07366
[5,] 3.59463 3.57582 3.55695 3.53796 3.51880 3.49936 3.47938 3.45869 3.43711 3.41458 3.39129
[6,] 3.72831 3.71738 3.70681 3.69664 3.68659 3.67649 3.66611 3.65503 3.64283 3.62938 3.61483
[,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98] [,99]
[1,] 3.04596 3.02393 3.00247 2.98145 2.96072 2.94013 2.91978 2.89966 2.87964 2.85960 2.83940
[2,] 3.33245 3.31818 3.30473 3.29186 3.27921 3.26655 3.25369 3.24045 3.22659 3.21181 3.19600
[3,] 2.75015 2.72956 2.70934 2.68951 2.67009 2.65112 2.63262 2.61461 2.59718 2.58034 2.56404
[4,] 3.04825 3.02308 2.99820 2.97367 2.94951 2.92576 2.90251 2.87988 2.85794 2.83672 2.81617
[5,] 3.36772 3.34450 3.32201 3.30025 3.27907 3.25831 3.23784 3.21765 3.19766 3.17770 3.15770
[6,] 3.59990 3.58535 3.57163 3.55877 3.54651 3.53442 3.52221 3.50972 3.49682 3.48325 3.46870
[,100]
[1,] 2.81920
[2,] 3.17942
[3,] 2.54816
[4,] 2.79622
[5,] 3.13753
[6,] 3.45307
> head(endpoints)
[,1] [,2] [,3]
[1,] 60.5 22.5 16.7
[2,] 46.0 40.1 13.5
[3,] 71.0 8.4 20.5
[4,] 72.8 5.9 20.7
[5,] 58.3 25.5 15.5
[6,] 44.0 42.7 13.7
( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?
主成分分析
应用主成分分析时,注意以下五点:
- 可使用样本协方差阵或相关系数矩阵为出发点来进行分析,但大都以相关系数矩阵为主;
- 为使方差达到最大,通常主成分分析是不加以转轴的;
- 成分的保留(可有三种方法参考,1:保留特征值大于1的主成分;2:碎石图,在图形变化最大处之上的主成分均可保留;3:平行分析,将真实数据的特征值与模拟数据的特征值进行比较,保留真实数据的特征值大于模拟数据的特征值的主成分);
- 在实际研究里,研究者如果用不超过三或五个成分就能解释变异的80%,就算令人满意;
- 使用成分得分后,会使各变量的方差为最大,而且各变量之间会彼此独立正交。
PCA=princomp(absorp,cor=T)#cor=T时,输入矩阵为相关系数矩阵,每个元素是0<=x<=1的,对角线为1。
help(princomp)
一共有100个成分。查看成分的重要性(每个成分的标准差,方差贡献率,累计方差贡献率)。
可见,第一个成分解释了98.62%的方差。
> summary(PCA)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Standard deviation 9.9310721 0.984736121 0.528511377 0.338274841 8.037979e-02 5.123077e-02
Proportion of Variance 0.9862619 0.009697052 0.002793243 0.001144299 6.460911e-05 2.624591e-05
Cumulative Proportion 0.9862619 0.995958978 0.998752221 0.999896520 9.999611e-01 9.999874e-01
每个成分的标准差即为特征值开根号的结果,可以选择保留特征值大于1的主成分 。
本例中,保留第一个成分。
> PCA$sdev
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
9.931072e+00 9.847361e-01 5.285114e-01 3.382748e-01 8.037979e-02 5.123077e-02 2.680884e-02
Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14
1.960880e-02 8.564232e-03 6.739417e-03 4.441898e-03 3.360852e-03 1.867188e-03 1.376574e-03
由碎石图可以看出,第二个主成分及之后图线变化趋于平稳,因此可以选择第一个主成分
screeplot(PCA,type="lines")
结论:这些数据的有效维度为1维。
( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?
进行数据预处理
#数据预处理
summary(absorp)
summary(endpoints)
#有缺失值的变量所在的位置
NAcol <- which(colSums(is.na(absorp))>0)
#本例无缺失值
#计算偏度
library(e1071)
summary(apply(absorp,2,skewness) )
#矩阵转为数据框
absorp<-as.data.frame(absorp)
#box-cox变换
boxcox = function(x){
trans = BoxCoxTrans(x)
x_bc = predict( trans, x )
}
absorp.trans<-apply( absorp, 2, boxcox )
absorp.trans<-as.data.frame(absorp.trans)
#偏度减小
summary(apply(absorp.trans,2,skewness) )
#响应变量
fat<-endpoints[,2]
> #计算偏度
> library(e1071)
> summary(apply(absorp,2,skewness) )
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8260 0.8432 0.8946 0.9027 0.9667 0.9976
> #偏度减小
> summary(apply(absorp.trans,2,skewness) )
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.005178 0.021773 0.034949 0.034739 0.046840 0.066915
进行建模。
由于本数据集为小样本(215),且建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择),因此重抽样方法采取自助法。
#本次的重抽样方法用自助法,因为案例为小样本,建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择)
#设定随机数种子,这样重抽样数据集可以重复
set.seed(100)
indx <- createResample(fat,times = 50, list = TRUE)
ctrl <- trainControl(method = "boot",number=50,index = indx)
线性回归
#进行线性回归
set.seed(100)
lmFit1 <- train(x = absorp.trans, y = fat,
method = "lm",
trControl = ctrl,
preProc = c("center", "scale"))
lmFit1
> lmFit1
Linear Regression
215 samples
100 predictors
Pre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results:
RMSE Rsquared MAE
2.706693 0.9560697 1.87457
Tuning parameter 'intercept' was held constant at a value of TRUE
最小二乘回归PLS
#PLS
set.seed(100)
plsTune <- train(x = absorp.trans, y = fat,
method = "kernelpls",#Dayal和MacGregor的第一种核函数算法kernelpls
tuneGrid = expand.grid(ncomp = 1:50),#设定成分数
trControl = ctrl,
preProc = c("center", "scale"))
plsTune
PLS调优参数:最佳成分数为20
> plsTune
Partial Least Squares
215 samples
100 predictors
Pre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:
ncomp RMSE Rsquared MAE
1 11.183174 0.2396493 9.088124
2 8.686219 0.5457929 6.929797
3 5.436814 0.8171801 4.017126
4 4.719667 0.8639848 3.637222
5 3.183214 0.9391108 2.432778
6 3.113421 0.9417685 2.407983
7 2.981929 0.9478784 2.203364
8 2.803669 0.9537681 2.004770
9 2.658721 0.9578675 1.851331
10 2.486691 0.9635027 1.724633
11 2.291300 0.9688312 1.594113
12 2.148954 0.9725424 1.527899
13 2.046004 0.9746878 1.462254
14 2.019919 0.9752486 1.425622
15 1.909752 0.9777992 1.336137
16 1.760398 0.9808560 1.224250
17 1.666462 0.9829127 1.171777
18 1.590492 0.9845054 1.134067
19 1.567033 0.9849643 1.128898
20 1.534394 0.9855824 1.113200
21 1.560273 0.9850988 1.119986
22 1.566204 0.9849703 1.115952
23 1.553964 0.9851720 1.105929
24 1.591527 0.9845027 1.118788
25 1.625377 0.9838303 1.135157
26 1.658889 0.9831096 1.157611
27 1.683492 0.9824806 1.172554
28 1.744393 0.9811685 1.208383
29 1.795215 0.9800030 1.244794
30 1.848273 0.9788338 1.286987
31 1.883307 0.9780175 1.318409
32 1.938951 0.9767456 1.360510
33 1.986740 0.9755708 1.392391
34 2.023170 0.9746550 1.422097
35 2.071566 0.9734367 1.453087
36 2.112281 0.9724229 1.478971
37 2.146216 0.9715039 1.500828
38 2.175165 0.9708326 1.520763
39 2.192173 0.9705042 1.536536
40 2.222708 0.9697809 1.557308
41 2.246722 0.9692350 1.575675
42 2.256637 0.9689731 1.587785
43 2.274497 0.9685442 1.604375
44 2.306888 0.9677469 1.627217
45 2.329405 0.9671802 1.644553
46 2.359832 0.9663816 1.662377
47 2.374701 0.9659943 1.673540
48 2.404638 0.9652013 1.691980
49 2.433347 0.9643316 1.709945
50 2.449268 0.9638598 1.721255
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 20.
稳健回归
#稳健回归
set.seed(100)
rlmFit <- train(x = absorp.trans, y = fat,
method = "rlm",
trControl = ctrl,
preProc = c("center", "scale"))
rlmFit
有截距,方法为huber
> rlmFit
Robust Linear Model
215 samples
100 predictors
Pre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:
intercept psi RMSE Rsquared MAE
FALSE psi.huber 18.145830 0.9560697 17.940133
FALSE psi.hampel 18.145830 0.9560697 17.940133
FALSE psi.bisquare 18.146153 0.9560864 17.940487
TRUE psi.huber 3.220055 0.9385840 2.215829
TRUE psi.hampel 25.311035 0.3743814 17.164686
TRUE psi.bisquare 76.651845 0.4210313 54.553360
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were intercept = TRUE and psi = psi.huber.
岭回归
#岭回归
#用train函数选择岭回归的最佳参数
#设定正则化参数 取值范围为0-0.1,中间取15个值
ridgeGrid <- expand.grid(lambda = seq(0, .1, length = 15))
set.seed(100)
ridgeTune <- train(x = absorp.trans, y = fat,
method = "ridge", #岭回归
tuneGrid = ridgeGrid,
trControl = ctrl,
preProc = c("center", "scale"))
ridgeTune
岭回归调优参数:正则化参数lambda为0,即不进行正则化
> ridgeTune
Ridge Regression
215 samples
100 predictors
Pre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:
lambda RMSE Rsquared MAE
0.000000000 2.706699 0.9560687 1.874579
0.007142857 3.643592 0.9213753 2.748934
0.014285714 4.052245 0.9031794 3.011670
0.021428571 4.316364 0.8907687 3.175379
0.028571429 4.511024 0.8814221 3.300525
0.035714286 4.668275 0.8737664 3.406487
0.042857143 4.803098 0.8670998 3.502220
0.050000000 4.923200 0.8610413 3.589788
0.057142857 5.032872 0.8553722 3.672677
0.064285714 5.134671 0.8499616 3.751602
0.071428571 5.230211 0.8447286 3.826811
0.078571429 5.320565 0.8396221 3.898918
0.085714286 5.406487 0.8346093 3.967995
0.092857143 5.488526 0.8296690 4.034324
0.100000000 5.567103 0.8247876 4.099350
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.
弹性网
#enet弹性网
#弹性网模型同时具有岭回归罚参数和lasso 罚参数
#lambda为岭回归罚参数(当lambda为0时即为纯lasso模型)
#fraction为lasso罚参数,取值范围为0.05-1,取了20个值
enetGrid <- expand.grid(lambda = seq(0, .1, length = 15),
fraction = seq(.05, 1, length = 20))
set.seed(100)
enetTune <- train(x = absorp.trans, y = fat,
method = "enet", #弹性网 elastic net
tuneGrid = enetGrid,
trControl = ctrl,
preProc = c("center", "scale"))
enetTune
弹性网的调优参数:岭回归罚lambda为0,lasso罚为0.0526
> enetTune
Elasticnet
215 samples
100 predictors
Pre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:
lambda fraction RMSE Rsquared MAE
0.000000000 0.00000000 12.719745 NaN 10.747191
0.000000000 0.05263158 1.505096 0.9861744 1.072324
0.000000000 0.10526316 1.611281 0.9839983 1.133129
0.000000000 0.15789474 1.713409 0.9818137 1.200435
0.000000000 0.21052632 1.787748 0.9802273 1.257652
0.000000000 0.26315789 1.844759 0.9790006 1.303476
0.000000000 0.31578947 1.897734 0.9778433 1.344380
0.000000000 0.36842105 1.952677 0.9766278 1.383494
0.000000000 0.42105263 2.008177 0.9753659 1.424576
0.000000000 0.47368421 2.060634 0.9741002 1.463339
0.000000000 0.52631579 2.113615 0.9727729 1.502524
0.000000000 0.57894737 2.170547 0.9713402 1.540915
0.000000000 0.63157895 2.232797 0.9697258 1.581797
0.000000000 0.68421053 2.294018 0.9680902 1.620818
0.000000000 0.73684211 2.357063 0.9663678 1.660460
0.000000000 0.78947368 2.423098 0.9645125 1.700407
0.000000000 0.84210526 2.490069 0.9625804 1.740759
0.000000000 0.89473684 2.560683 0.9604961 1.784247
0.000000000 0.94736842 2.632378 0.9583456 1.828712
0.000000000 1.00000000 2.706699 0.9560687 1.874579
0.007142857 0.00000000 12.719745 NaN 10.747191
0.007142857 0.05263158 10.350630 0.3671638 8.438978
0.007142857 0.10526316 9.476790 0.4998352 7.728673
0.007142857 0.15789474 8.639364 0.6119787 7.045030
0.007142857 0.21052632 7.846910 0.6977900 6.378826
0.007142857 0.26315789 7.111837 0.7588304 5.744852
0.007142857 0.31578947 6.465549 0.7991165 5.168081
0.007142857 0.36842105 5.913752 0.8265441 4.656087
0.007142857 0.42105263 5.440989 0.8490667 4.229522
0.007142857 0.47368421 5.027732 0.8674093 3.858951
0.007142857 0.52631579 4.688027 0.8811803 3.550859
0.007142857 0.57894737 4.415170 0.8914804 3.307966
0.007142857 0.63157895 4.206019 0.8992059 3.136113
0.007142857 0.68421053 4.051948 0.9049141 3.023897
0.007142857 0.73684211 3.935221 0.9093661 2.945459
0.007142857 0.78947368 3.848531 0.9128567 2.888426
0.007142857 0.84210526 3.779217 0.9157310 2.841966
0.007142857 0.89473684 3.722209 0.9181026 2.803051
0.007142857 0.94736842 3.676363 0.9200050 2.771592
0.007142857 1.00000000 3.643592 0.9213753 2.748934
0.014285714 0.00000000 12.719745 NaN 10.747191
0.014285714 0.05263158 10.502349 0.3428964 8.558695
0.014285714 0.10526316 9.773519 0.4546769 7.961195
0.014285714 0.15789474 9.069145 0.5553162 7.390326
0.014285714 0.21052632 8.394427 0.6389904 6.832886
0.014285714 0.26315789 7.761244 0.7038771 6.296183
0.014285714 0.31578947 7.168269 0.7528274 5.785871
0.014285714 0.36842105 6.630843 0.7880543 5.309986
0.014285714 0.42105263 6.163194 0.8129957 4.881257
0.014285714 0.47368421 5.767428 0.8322562 4.512357
0.014285714 0.52631579 5.416732 0.8490222 4.194330
0.014285714 0.57894737 5.106516 0.8627297 3.908038
0.014285714 0.63157895 4.841776 0.8735009 3.660274
0.014285714 0.68421053 4.631100 0.8814848 3.464679
0.014285714 0.73684211 4.459273 0.8877800 3.312559
0.014285714 0.78947368 4.326597 0.8925717 3.202747
0.014285714 0.84210526 4.226622 0.8961885 3.129434
0.014285714 0.89473684 4.151838 0.8990589 3.077307
0.014285714 0.94736842 4.095699 0.9013489 3.039711
0.014285714 1.00000000 4.052245 0.9031794 3.011670
0.021428571 0.00000000 12.719745 NaN 10.747191
0.021428571 0.05263158 10.568050 0.3321870 8.607846
0.021428571 0.10526316 9.903277 0.4339021 8.059825
0.021428571 0.15789474 9.258979 0.5277223 7.536974
0.021428571 0.21052632 8.643469 0.6077921 7.031643
0.021428571 0.26315789 8.056217 0.6732191 6.538404
0.021428571 0.31578947 7.500502 0.7247124 6.063381
0.021428571 0.36842105 6.989114 0.7634905 5.620227
0.021428571 0.42105263 6.534448 0.7915351 5.211965
0.021428571 0.47368421 6.143486 0.8120566 4.848922
0.021428571 0.52631579 5.799866 0.8294819 4.527803
0.021428571 0.57894737 5.491306 0.8443804 4.242645
0.021428571 0.63157895 5.225475 0.8562590 3.992674
0.021428571 0.68421053 4.995680 0.8657523 3.773273
0.021428571 0.73684211 4.802370 0.8732078 3.590165
0.021428571 0.78947368 4.648047 0.8788861 3.446489
0.021428571 0.84210526 4.525632 0.8832665 3.338855
0.021428571 0.89473684 4.430319 0.8866321 3.260002
0.021428571 0.94736842 4.364849 0.8889374 3.210078
0.021428571 1.00000000 4.316364 0.8907687 3.175379
0.028571429 0.00000000 12.719745 NaN 10.747191
0.028571429 0.05263158 10.605392 0.3259594 8.633546
0.028571429 0.10526316 9.976188 0.4217673 8.112706
0.028571429 0.15789474 9.368227 0.5108284 7.618146
0.028571429 0.21052632 8.784915 0.5884503 7.139689
0.028571429 0.26315789 8.224602 0.6535386 6.670814
0.028571429 0.31578947 7.696412 0.7055287 6.221096
0.028571429 0.36842105 7.207055 0.7456827 5.799405
0.028571429 0.42105263 6.764782 0.7757525 5.407932
0.028571429 0.47368421 6.376330 0.7980554 5.051709
0.028571429 0.52631579 6.038847 0.8158720 4.734050
0.028571429 0.57894737 5.739066 0.8312751 4.452923
0.028571429 0.63157895 5.476325 0.8437891 4.205595
0.028571429 0.68421053 5.242242 0.8541133 3.981219
0.028571429 0.73684211 5.041635 0.8623182 3.786711
0.028571429 0.78947368 4.879378 0.8684836 3.631171
0.028571429 0.84210526 4.745253 0.8734010 3.504712
0.028571429 0.89473684 4.641796 0.8770309 3.410686
0.028571429 0.94736842 4.567118 0.8795336 3.345936
0.028571429 1.00000000 4.511024 0.8814221 3.300525
0.035714286 0.00000000 12.719745 NaN 10.747191
0.035714286 0.05263158 10.630389 0.3216962 8.649142
0.035714286 0.10526316 10.025364 0.4132908 8.146398
0.035714286 0.15789474 9.441794 0.4988722 7.670647
0.035714286 0.21052632 8.878988 0.5746934 7.208290
0.035714286 0.26315789 8.339491 0.6389087 6.756922
0.035714286 0.31578947 7.831674 0.6908389 6.325532
0.035714286 0.36842105 7.359193 0.7316532 5.919404
0.035714286 0.42105263 6.926296 0.7631330 5.539167
0.035714286 0.47368421 6.543396 0.7867501 5.191419
0.035714286 0.52631579 6.210901 0.8051696 4.879779
0.035714286 0.57894737 5.921095 0.8207037 4.605158
0.035714286 0.63157895 5.658841 0.8338811 4.356554
0.035714286 0.68421053 5.425099 0.8447735 4.132049
0.035714286 0.73684211 5.224377 0.8533910 3.935746
0.035714286 0.78947368 5.056868 0.8600396 3.772063
0.035714286 0.84210526 4.916623 0.8653073 3.636631
0.035714286 0.89473684 4.810863 0.8690600 3.536302
0.035714286 0.94736842 4.729634 0.8717975 3.461235
0.035714286 1.00000000 4.668275 0.8737664 3.406487
0.042857143 0.00000000 12.719745 NaN 10.747191
0.042857143 0.05263158 10.649006 0.3183598 8.659408
0.042857143 0.10526316 10.063732 0.4065052 8.171419
0.042857143 0.15789474 9.498332 0.4893522 7.709320
0.042857143 0.21052632 8.951603 0.5635681 7.259062
0.042857143 0.26315789 8.429932 0.6267147 6.822851
0.042857143 0.31578947 7.938112 0.6784272 6.405875
0.042857143 0.36842105 7.478275 0.7197554 6.010701
0.042857143 0.42105263 7.053938 0.7521797 5.639166
0.042857143 0.47368421 6.677302 0.7767562 5.299404
0.042857143 0.52631579 6.352896 0.7955833 4.996825
0.042857143 0.57894737 6.066895 0.8115599 4.725789
0.042857143 0.63157895 5.807544 0.8252168 4.477926
0.042857143 0.68421053 5.574963 0.8365873 4.253607
0.042857143 0.73684211 5.375833 0.8455274 4.058461
0.042857143 0.78947368 5.204039 0.8526667 3.888874
0.042857143 0.84210526 5.061502 0.8581731 3.748493
0.042857143 0.89473684 4.954026 0.8620665 3.644632
0.042857143 0.94736842 4.868042 0.8650243 3.562585
0.042857143 1.00000000 4.803098 0.8670998 3.502220
0.050000000 0.00000000 12.719745 NaN 10.747191
0.050000000 0.05263158 10.664511 0.3155007 8.667079
0.050000000 0.10526316 10.096683 0.4006075 8.192319
0.050000000 0.15789474 9.546087 0.4811327 7.741090
0.050000000 0.21052632 9.014028 0.5537204 7.302158
0.050000000 0.26315789 8.507619 0.6158259 6.878413
0.050000000 0.31578947 8.028660 0.6673112 6.472190
0.050000000 0.36842105 7.579264 0.7090114 6.086293
0.050000000 0.42105263 7.164037 0.7420437 5.723139
0.050000000 0.47368421 6.794556 0.7672864 5.391397
0.050000000 0.52631579 6.476421 0.7866019 5.096395
0.050000000 0.57894737 6.193359 0.8030704 4.829024
0.050000000 0.63157895 5.936877 0.8171967 4.583914
0.050000000 0.68421053 5.707621 0.8288982 4.361443
0.050000000 0.73684211 5.508621 0.8382434 4.166045
0.050000000 0.78947368 5.334012 0.8458269 3.992927
0.050000000 0.84210526 5.191385 0.8515252 3.851408
0.050000000 0.89473684 5.081095 0.8556455 3.742093
0.050000000 0.94736842 4.991350 0.8588172 3.655083
0.050000000 1.00000000 4.923200 0.8610413 3.589788
0.057142857 0.00000000 12.719745 NaN 10.747191
0.057142857 0.05263158 10.676963 0.3130233 8.672758
0.057142857 0.10526316 10.125993 0.3953287 8.210536
0.057142857 0.15789474 9.588444 0.4737441 7.768832
0.057142857 0.21052632 9.069954 0.5447150 7.339995
0.057142857 0.26315789 8.576760 0.6058303 6.926522
0.057142857 0.31578947 8.109112 0.6570301 6.530300
0.057142857 0.36842105 7.668838 0.6990048 6.152385
0.057142857 0.42105263 7.262463 0.7324665 5.797181
0.057142857 0.47368421 6.901056 0.7581593 5.473870
0.057142857 0.52631579 6.586697 0.7780815 5.183851
0.057142857 0.57894737 6.307299 0.7949700 4.921166
0.057142857 0.63157895 6.053480 0.8095358 4.679131
0.057142857 0.68421053 5.828208 0.8215064 4.460088
0.057142857 0.73684211 5.628373 0.8313213 4.263205
0.057142857 0.78947368 5.452644 0.8392831 4.088233
0.057142857 0.84210526 5.310478 0.8451827 3.946665
0.057142857 0.89473684 5.197039 0.8495755 3.833397
0.057142857 0.94736842 5.104287 0.8529609 3.741933
0.057142857 1.00000000 5.032872 0.8553722 3.672677
0.064285714 0.00000000 12.719745 NaN 10.747191
0.064285714 0.05263158 10.686288 0.3108834 8.676953
0.064285714 0.10526316 10.152606 0.3905288 8.226772
0.064285714 0.15789474 9.626873 0.4669835 7.793571
0.064285714 0.21052632 9.120859 0.5363797 7.373740
0.064285714 0.26315789 8.639160 0.5965806 6.969417
0.064285714 0.31578947 8.181790 0.6474532 6.582130
0.064285714 0.36842105 7.750265 0.6895483 6.211952
0.064285714 0.42105263 7.352325 0.7233143 5.864293
0.064285714 0.47368421 6.998582 0.7493784 5.548492
0.064285714 0.52631579 6.687170 0.7698967 5.262544
0.064285714 0.57894737 6.411479 0.7871715 5.004514
0.064285714 0.63157895 6.161145 0.8020894 4.766387
0.064285714 0.68421053 5.938954 0.8143581 4.551033
0.064285714 0.73684211 5.738487 0.8246301 4.353021
0.064285714 0.78947368 5.562997 0.8329097 4.177992
0.064285714 0.84210526 5.420938 0.8390481 4.036234
0.064285714 0.89473684 5.304599 0.8437340 3.919402
0.064285714 0.94736842 5.209377 0.8473351 3.824781
0.064285714 1.00000000 5.134671 0.8499616 3.751602
[ reached getOption("max.print") -- omitted 100 rows ]
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.05263158 and lambda = 0.
( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?
比较模型结果:
caret包的resamples函数可以分析和可视化重抽样的结果(需要用train函数进行重抽样)。
对于每一个模型来说,比较的对象为每个算法中RMSE最小的最终模型。因为重抽样法为自助法,设定抽取了50次,因此每个算法的最终模型都有50个结果。
#模型比较
#resamples函数可以分析和可视化重抽样的结果
resamp <- resamples( list(lm=lmFit1,rlm=rlmFit,pls=plsTune,ridge=ridgeTune,enet=enetTune) )
summary(resamp)
可见,pls与enet模型的RMSE均值最小,代表这两个模型的预测效果最好,而rlm的RMSE均值最高,代表预测效果最差。
> summary(resamp)
Call:
summary.resamples(object = resamp)
Models: lm, rlm, pls, ridge, enet
Number of resamples: 50
MAE
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 1.3090300 1.6211047 1.879747 1.874570 2.024030 2.989562 0
rlm 1.4045612 1.9200873 2.183430 2.215829 2.456707 3.715148 0
pls 0.8375685 1.0323470 1.101890 1.113200 1.180808 1.440820 0
ridge 1.3089927 1.6211901 1.879508 1.874579 2.024011 2.990832 0
enet 0.8195073 0.9752678 1.076620 1.072324 1.156561 1.291045 0
RMSE
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 1.705850 2.286390 2.678425 2.706693 3.014897 4.458380 0
rlm 1.886688 2.693024 3.180843 3.220055 3.576437 5.347496 0
pls 1.068864 1.370754 1.503226 1.534394 1.646459 2.127759 0
ridge 1.705678 2.286423 2.678400 2.706699 3.014607 4.460421 0
enet 1.109904 1.346236 1.484470 1.505096 1.661772 2.002391 0
Rsquared
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 0.8969856 0.9481050 0.9575495 0.9560697 0.9703579 0.9829345 0
rlm 0.8266755 0.9283558 0.9418637 0.9385840 0.9597848 0.9779785 0
pls 0.9708559 0.9823688 0.9869358 0.9855824 0.9888411 0.9926647 0
ridge 0.8969050 0.9481031 0.9575498 0.9560687 0.9703573 0.9829340 0
enet 0.9744586 0.9833999 0.9869512 0.9861744 0.9887381 0.9930981 0
绘图,每一个模型的RMSE均值的置信区间。
dotplot( resamp, metric="RMSE" )
由图可直观看出,enet与pls预测效果最优且两者效果相近,rlm预测效果最差。
用diff函数进行对比。
summary(diff(resamp))
上三角: estimates of the difference,差值
下三角:P值
从RMSE角度进行考虑,联系差值与P值,可以得出结论:rlm显著地比其他模型差,enet显著地比其他模型好。
> summary(diff(resamp))
Call:
summary.diff.resamples(object = diff(resamp))
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
MAE
lm rlm pls ridge enet
lm -3.413e-01 7.614e-01 -8.713e-06 8.022e-01
rlm 1.605e-11 1.103e+00 3.413e-01 1.144e+00
pls < 2.2e-16 < 2.2e-16 -7.614e-01 4.088e-02
ridge 1 1.591e-11 < 2.2e-16 8.023e-01
enet < 2.2e-16 < 2.2e-16 1.420e-05 < 2.2e-16
RMSE
lm rlm pls ridge enet
lm -5.134e-01 1.172e+00 -6.918e-06 1.202e+00
rlm 2.097e-11 1.686e+00 5.134e-01 1.715e+00
pls < 2.2e-16 < 2.2e-16 -1.172e+00 2.930e-02
ridge 1.0000 2.082e-11 < 2.2e-16 1.202e+00
enet < 2.2e-16 < 2.2e-16 0.2861 < 2.2e-16
Rsquared
lm rlm pls ridge enet
lm 1.749e-02 -2.951e-02 9.420e-07 -3.010e-02
rlm 5.945e-09 -4.700e-02 -1.748e-02 -4.759e-02
pls 1.224e-14 1.451e-14 2.951e-02 -5.920e-04
ridge 1.0000 5.896e-09 1.238e-14 -3.011e-02
enet 1.679e-15 5.609e-15 0.3368 1.698e-15
( e)解释你将使用哪个模型来预测样品的脂肪含量。
选择弹性网,因为此模型显著地比其他模型好,RMSE最低。