模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网
语言:R语言
参考书:应用预测建模 Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译
案例
#载入数据
library(AppliedPredictiveModeling)
data(permeability)
head(fingerprints[1:10,1:5])
head(permeability)
> head(fingerprints[1:10,1:5])
X1 X2 X3 X4 X5
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
6 0 0 0 0 0
> head(permeability)
permeability
1 12.520
2 1.120
3 19.405
4 1.730
5 1.680
6 0.510
library(caret)
#nearZeroVar可诊断具有唯一值的预测变量(即零方差预测变量)或同时具有以下两个特征的预测变量:
#相对于样本数量,它们具有很少的唯一值;最常见值的频率与次最常见值的频率之比很大。
#nearZeroVar(x,freqCut = 95/5,uniqueCut = 10,saveMetrics = FALSE,names = FALSE,foreach = FALSE,allowParallel = TRUE)
#freqCut 最常见值与第二常见值之比的临界值,默认95/5
#uniqueCut 样本总数中不同值的百分比的临界值,默认10
fingerprints<-as.data.frame(fingerprints)
near.zero.ind<-nearZeroVar(fingerprints)
fingerprintsFliter<-fingerprints[-near.zero.ind]
有719个预测变量服从退化分布,还剩388个预测变量。
#数据预处理
summary(permeability)
#有缺失值的变量所在的位置
NAcol <- which(colSums(is.na(fingerprintsFliter))>0)
NAcol
#本例无缺失值
#响应变量从list转变为向量
permeability<-permeability[,1]
#预测变量都为0-1变量,因此不需要进行box-cox变换
#注意,就算进行box-cox变化也会提示Lambda could not be estimated; no transformation is applied
建模:
#用重抽样方法对模型预测新样本的能力进行评价
#设定为重复十折交叉验证
#设定随机数种子,这样重抽样数据集可以重复
set.seed(100)
indx <- createMultiFolds(permeability, k = 10, times = 5)
ctrl <- trainControl(method = "repeatedcv",number=10,repeats =5 , index = indx)
PLS:
#PLS
set.seed(100)
pls.model <- train(x = fingerprintsFliter, y = permeability,
method = "kernelpls",#Dayal和MacGregor的第一种核函数算法kernelpls
# tuneGrid = expand.grid(ncomp = 1:10),#设定成分数
tuneLength=40,
trControl = ctrl,
preProc = c("center", "scale"))
pls.model
pls.model$bestTune
最优的PLS成分数为9,对应的重抽样估计的R方为50.79%
> pls.model
Partial Least Squares
165 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ...
Resampling results across tuning parameters:
ncomp RMSE Rsquared MAE
1 13.05227 0.3240300 9.896661
2 11.80498 0.4663535 8.308282
3 11.66726 0.4678344 8.728710
4 11.72482 0.4674509 8.910673
5 11.55988 0.4852945 8.621538
6 11.49614 0.4905275 8.510156
7 11.31531 0.5016862 8.535471
8 11.28900 0.5035427 8.665233
9 11.28461 0.5078777 8.524744
10 11.46917 0.4984304 8.604099
11 11.75179 0.4825756 8.752012
12 11.86670 0.4765582 8.825396
13 12.05614 0.4688906 8.980787
14 12.23023 0.4595144 9.057769
15 12.40422 0.4496407 9.161088
16 12.66006 0.4375607 9.364334
17 12.95604 0.4254162 9.534700
18 13.03904 0.4248562 9.610917
19 13.10685 0.4220656 9.694758
20 13.25601 0.4156344 9.804219
21 13.39648 0.4094236 9.887756
22 13.66561 0.3984469 10.039217
23 13.83785 0.3903268 10.162622
24 13.94098 0.3868608 10.255307
25 14.06735 0.3825551 10.312950
26 14.32064 0.3746016 10.494286
27 14.67806 0.3632787 10.717377
28 15.00589 0.3529639 10.951726
29 15.24184 0.3489428 11.080950
30 15.40581 0.3434618 11.153399
31 15.51121 0.3398546 11.173934
32 15.54279 0.3402037 11.151922
33 15.63262 0.3398001 11.197528
34 15.67534 0.3394593 11.202881
35 15.74075 0.3371198 11.232010
36 15.79167 0.3363571 11.236849
37 15.85952 0.3350562 11.277133
38 15.86677 0.3371154 11.275449
39 15.87254 0.3403697 11.262704
40 15.91858 0.3426028 11.295925
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 9.
考虑到样本数量小(165个),本例未切割单一的的训练集与测试集,问题(C)中的R方50.79% 即为50次重抽样、成分数为9的平均R方。
多元线性回归
#多元线性回归
set.seed(100)
lm.model <- train(x = fingerprintsFliter, y = permeability,
method = "lm",
trControl = ctrl,
preProc = c("center", "scale"))
lm.model
> lm.model
Linear Regression
165 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ...
Resampling results:
RMSE Rsquared MAE
24.57042 0.2022581 15.82187
Tuning parameter 'intercept' was held constant at a value of TRUE
稳健回归
预测变量数据框为奇异矩阵,因此不能直接使用稳健回归
用pca对预测变量进行预处理,然后再使用稳健回归
#稳健回归
#预测变量数据框为奇异矩阵,因此不能直接使用稳健回归
#用pca对预测变量进行预处理,然后再使用稳健回归
set.seed(100)
rlm.model <- train(x = fingerprintsFliter, y = permeability,
method = "rlm",
trControl = ctrl,
preProc = c("pca"))
rlm.model
> rlm.model
Robust Linear Model
165 samples
388 predictors
Pre-processing: principal component signal extraction (388), centered (388), scaled (388)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ...
Resampling results across tuning parameters:
intercept psi RMSE Rsquared MAE
FALSE psi.huber 17.08872 0.4521439 13.555676
FALSE psi.hampel 16.98587 0.4624553 13.672698
FALSE psi.bisquare 17.02395 0.4598218 13.550679
TRUE psi.huber 12.05982 0.4701847 8.484627
TRUE psi.hampel 12.05907 0.4668805 8.675854
TRUE psi.bisquare 12.87237 0.4536371 8.412304
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were intercept = TRUE and psi = psi.hampel.
岭回归
#岭回归
#用train函数选择岭回归的最佳参数
#设定正则化参数 取值范围为0-0.1,中间取15个值
ridgeGrid <- expand.grid(lambda = seq(0, .1, length = 15))
set.seed(100)
ridge.model <- train(x = fingerprintsFliter, y = permeability,
method = "ridge", #岭回归
tuneGrid = ridgeGrid,
trControl = ctrl,
preProc = c("center", "scale"))
ridge.model
岭回归的最优参数lambda为0.1
> ridge.model
Ridge Regression
165 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ...
Resampling results across tuning parameters:
lambda RMSE Rsquared MAE
0.000000000 6.028134e+15 0.2650199 2.026089e+15
0.007142857 1.548415e+03 0.3782875 8.180708e+02
0.014285714 8.845611e+01 0.3798410 5.209785e+01
0.021428571 3.332586e+01 0.4164882 2.503135e+01
0.028571429 1.285875e+01 0.4335107 9.274614e+00
0.035714286 1.260947e+01 0.4444402 9.114385e+00
0.042857143 1.254606e+01 0.4464467 9.082873e+00
0.050000000 1.234778e+01 0.4585901 8.963490e+00
0.057142857 1.225975e+01 0.4639287 8.923119e+00
0.064285714 1.217146e+01 0.4692078 8.860966e+00
0.071428571 1.210816e+01 0.4732640 8.823143e+00
0.078571429 1.206249e+01 0.4765388 8.797852e+00
0.085714286 1.202607e+01 0.4794337 8.776395e+00
0.092857143 1.199524e+01 0.4820107 8.759420e+00
0.100000000 1.197763e+01 0.4841285 8.751014e+00
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.
弹性网
#enet弹性网
#弹性网模型同时具有岭回归罚参数和lasso 罚参数
#lambda为岭回归罚参数(当lambda为0时即为纯lasso模型)
#fraction为lasso罚参数,当lasso罚参数为0时即为纯岭回归模型
enetGrid <- expand.grid(lambda = seq(0, .1, length = 15),
fraction = seq(0, 1, length = 20))
set.seed(100)
enet.model<- train(x = fingerprintsFliter, y = permeability,
method = "enet", #弹性网 elastic net
tuneGrid = enetGrid,
trControl = ctrl,
preProc = c("center", "scale"))
enet.model
弹性网的最佳参数:fraction(lasso罚) = 0.2631579 and lambda(岭回归罚) = 0.09285714
> enet.model
Elasticnet
165 samples
388 predictors
Pre-processing: centered (388), scaled (388)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ...
Resampling results across tuning parameters:
lambda fraction RMSE Rsquared MAE
0.000000000 0.00000000 1.540478e+01 NaN 1.221506e+01
0.000000000 0.05263158 3.449540e+14 0.4283842 1.166587e+14
0.000000000 0.10526316 6.815173e+14 0.4022667 2.302363e+14
0.000000000 0.15789474 1.018114e+15 0.3916906 3.438139e+14
0.000000000 0.21052632 1.354720e+15 0.3798959 4.573915e+14
0.000000000 0.26315789 1.691329e+15 0.3680099 5.709691e+14
0.000000000 0.31578947 2.027939e+15 0.3572475 6.845467e+14
0.000000000 0.36842105 2.364551e+15 0.3451573 7.981243e+14
0.000000000 0.42105263 2.701163e+15 0.3339668 9.117019e+14
0.000000000 0.47368421 3.037776e+15 0.3253926 1.025279e+15
0.000000000 0.52631579 3.374389e+15 0.3173453 1.138857e+15
0.000000000 0.57894737 3.693177e+15 0.3095418 1.246083e+15
0.000000000 0.63157895 3.987501e+15 0.3028937 1.344561e+15
0.000000000 0.68421053 4.280623e+15 0.2959701 1.442581e+15
0.000000000 0.73684211 4.573826e+15 0.2888721 1.540590e+15
0.000000000 0.78947368 4.867121e+15 0.2828329 1.638597e+15
0.000000000 0.84210526 5.160496e+15 0.2774765 1.736604e+15
0.000000000 0.89473684 5.453804e+15 0.2728098 1.834564e+15
0.000000000 0.94736842 5.740949e+15 0.2689265 1.930330e+15
0.000000000 1.00000000 6.028134e+15 0.2650199 2.026089e+15
0.007142857 0.00000000 1.542794e+01 NaN 1.225391e+01
0.007142857 0.05263158 1.033524e+02 0.4623499 5.200281e+01
0.007142857 0.10526316 1.646549e+02 0.4660347 8.998209e+01
0.007142857 0.15789474 2.323283e+02 0.4720228 1.292456e+02
0.007142857 0.21052632 3.016033e+02 0.4678674 1.691791e+02
0.007142857 0.26315789 3.819501e+02 0.4596376 2.124624e+02
0.007142857 0.31578947 4.641333e+02 0.4475696 2.558784e+02
0.007142857 0.36842105 5.471372e+02 0.4340064 2.993385e+02
0.007142857 0.42105263 6.305388e+02 0.4251005 3.427369e+02
0.007142857 0.47368421 7.135822e+02 0.4175334 3.858955e+02
0.007142857 0.52631579 7.964393e+02 0.4118095 4.289063e+02
0.007142857 0.57894737 8.794325e+02 0.4067252 4.719197e+02
0.007142857 0.63157895 9.625168e+02 0.4016919 5.149725e+02
0.007142857 0.68421053 1.045668e+03 0.3977980 5.580246e+02
0.007142857 0.73684211 1.128872e+03 0.3943475 6.010758e+02
0.007142857 0.78947368 1.212128e+03 0.3910934 6.441272e+02
0.007142857 0.84210526 1.295521e+03 0.3874173 6.872420e+02
0.007142857 0.89473684 1.379817e+03 0.3846229 7.308539e+02
0.007142857 0.94736842 1.464120e+03 0.3811768 7.744635e+02
0.007142857 1.00000000 1.548415e+03 0.3782875 8.180708e+02
0.014285714 0.00000000 1.542794e+01 NaN 1.225391e+01
0.014285714 0.05263158 1.797794e+01 0.4732053 1.065071e+01
0.014285714 0.10526316 2.190067e+01 0.4706222 1.262729e+01
0.014285714 0.15789474 2.597123e+01 0.4704845 1.477876e+01
0.014285714 0.21052632 3.034370e+01 0.4663541 1.708084e+01
0.014285714 0.26315789 3.428904e+01 0.4592488 1.927629e+01
0.014285714 0.31578947 3.831697e+01 0.4502840 2.154504e+01
0.014285714 0.36842105 4.241496e+01 0.4408115 2.389293e+01
0.014285714 0.42105263 4.653987e+01 0.4298422 2.635053e+01
0.014285714 0.47368421 5.061041e+01 0.4214552 2.882180e+01
0.014285714 0.52631579 5.465756e+01 0.4151950 3.127600e+01
0.014285714 0.57894737 5.869254e+01 0.4106026 3.370772e+01
0.014285714 0.63157895 6.273159e+01 0.4066244 3.614245e+01
0.014285714 0.68421053 6.676046e+01 0.4036789 3.857259e+01
0.014285714 0.73684211 7.076430e+01 0.4010070 4.100412e+01
0.014285714 0.78947368 7.463485e+01 0.3974049 4.339555e+01
0.014285714 0.84210526 7.845897e+01 0.3932502 4.575368e+01
0.014285714 0.89473684 8.194474e+01 0.3889553 4.792304e+01
0.014285714 0.94736842 8.521037e+01 0.3842786 5.001852e+01
0.014285714 1.00000000 8.845611e+01 0.3798410 5.209785e+01
0.021428571 0.00000000 1.542794e+01 NaN 1.225391e+01
0.021428571 0.05263158 1.317415e+01 0.4768770 9.291129e+00
0.021428571 0.10526316 1.437122e+01 0.4735211 1.017494e+01
0.021428571 0.15789474 1.534896e+01 0.4823646 1.098397e+01
0.021428571 0.21052632 1.637868e+01 0.4866607 1.177871e+01
0.021428571 0.26315789 1.744992e+01 0.4841110 1.261975e+01
0.021428571 0.31578947 1.854287e+01 0.4800858 1.346653e+01
0.021428571 0.36842105 1.971829e+01 0.4715516 1.433782e+01
0.021428571 0.42105263 2.093725e+01 0.4621313 1.524660e+01
0.021428571 0.47368421 2.212320e+01 0.4537090 1.612098e+01
0.021428571 0.52631579 2.326907e+01 0.4475608 1.704217e+01
0.021428571 0.57894737 2.438377e+01 0.4438079 1.792729e+01
0.021428571 0.63157895 2.549237e+01 0.4406479 1.880798e+01
0.021428571 0.68421053 2.660261e+01 0.4377210 1.968561e+01
0.021428571 0.73684211 2.771253e+01 0.4350921 2.056040e+01
0.021428571 0.78947368 2.882446e+01 0.4324209 2.144242e+01
0.021428571 0.84210526 2.994108e+01 0.4293780 2.233282e+01
0.021428571 0.89473684 3.106375e+01 0.4256027 2.323234e+01
0.021428571 0.94736842 3.219787e+01 0.4209350 2.413550e+01
0.021428571 1.00000000 3.332586e+01 0.4164882 2.503135e+01
0.028571429 0.00000000 1.542794e+01 NaN 1.225391e+01
0.028571429 0.05263158 1.164803e+01 0.4870994 8.381945e+00
0.028571429 0.10526316 1.148989e+01 0.4820355 8.229012e+00
0.028571429 0.15789474 1.130939e+01 0.4897388 8.271174e+00
0.028571429 0.21052632 1.124240e+01 0.4943329 8.292172e+00
0.028571429 0.26315789 1.130334e+01 0.4929722 8.327261e+00
0.028571429 0.31578947 1.137778e+01 0.4898966 8.370961e+00
0.028571429 0.36842105 1.150147e+01 0.4839684 8.449668e+00
0.028571429 0.42105263 1.165597e+01 0.4764994 8.532020e+00
0.028571429 0.47368421 1.183303e+01 0.4683594 8.646258e+00
0.028571429 0.52631579 1.197183e+01 0.4622449 8.744455e+00
0.028571429 0.57894737 1.207651e+01 0.4581481 8.816762e+00
0.028571429 0.63157895 1.216941e+01 0.4551937 8.869916e+00
0.028571429 0.68421053 1.225411e+01 0.4530774 8.914132e+00
0.028571429 0.73684211 1.234610e+01 0.4507666 8.966320e+00
0.028571429 0.78947368 1.244166e+01 0.4481786 9.019456e+00
0.028571429 0.84210526 1.253844e+01 0.4451619 9.072420e+00
0.028571429 0.89473684 1.264025e+01 0.4418328 9.128159e+00
0.028571429 0.94736842 1.275330e+01 0.4375581 9.199859e+00
0.028571429 1.00000000 1.285875e+01 0.4335107 9.274614e+00
0.035714286 0.00000000 1.542794e+01 NaN 1.225391e+01
0.035714286 0.05263158 1.170165e+01 0.4858385 8.468453e+00
0.035714286 0.10526316 1.150543e+01 0.4814832 8.209731e+00
0.035714286 0.15789474 1.132812e+01 0.4898204 8.238532e+00
0.035714286 0.21052632 1.123613e+01 0.4950617 8.257063e+00
0.035714286 0.26315789 1.127977e+01 0.4947655 8.314775e+00
0.035714286 0.31578947 1.132916e+01 0.4932586 8.339980e+00
0.035714286 0.36842105 1.141954e+01 0.4894771 8.401248e+00
0.035714286 0.42105263 1.154807e+01 0.4834828 8.461108e+00
0.035714286 0.47368421 1.169445e+01 0.4769915 8.548476e+00
0.035714286 0.52631579 1.183228e+01 0.4709574 8.645223e+00
0.035714286 0.57894737 1.194146e+01 0.4664147 8.723158e+00
0.035714286 0.63157895 1.201380e+01 0.4643840 8.770827e+00
0.035714286 0.68421053 1.208726e+01 0.4623733 8.815985e+00
0.035714286 0.73684211 1.216064e+01 0.4606180 8.855813e+00
0.035714286 0.78947368 1.224247e+01 0.4581163 8.896741e+00
0.035714286 0.84210526 1.232990e+01 0.4552601 8.947428e+00
0.035714286 0.89473684 1.242178e+01 0.4520166 8.999512e+00
0.035714286 0.94736842 1.251768e+01 0.4481484 9.055775e+00
0.035714286 1.00000000 1.260947e+01 0.4444402 9.114385e+00
0.042857143 0.00000000 1.542794e+01 NaN 1.225391e+01
0.042857143 0.05263158 1.173948e+01 0.4853274 8.532311e+00
0.042857143 0.10526316 1.151846e+01 0.4812703 8.199202e+00
0.042857143 0.15789474 1.134515e+01 0.4902282 8.221829e+00
0.042857143 0.21052632 1.121456e+01 0.4977462 8.228653e+00
0.042857143 0.26315789 1.120874e+01 0.5002426 8.270423e+00
0.042857143 0.31578947 1.125242e+01 0.4992024 8.289716e+00
0.042857143 0.36842105 1.132054e+01 0.4962719 8.337868e+00
0.042857143 0.42105263 1.142924e+01 0.4912499 8.396577e+00
0.042857143 0.47368421 1.156241e+01 0.4853460 8.474910e+00
0.042857143 0.52631579 1.170353e+01 0.4789064 8.562321e+00
0.042857143 0.57894737 1.181902e+01 0.4737488 8.648265e+00
0.042857143 0.63157895 1.191252e+01 0.4701042 8.714922e+00
0.042857143 0.68421053 1.198879e+01 0.4677065 8.763115e+00
0.042857143 0.73684211 1.206858e+01 0.4651293 8.813336e+00
0.042857143 0.78947368 1.215910e+01 0.4618470 8.865899e+00
0.042857143 0.84210526 1.225542e+01 0.4582101 8.917571e+00
0.042857143 0.89473684 1.235329e+01 0.4544600 8.973749e+00
0.042857143 0.94736842 1.245131e+01 0.4504904 9.028878e+00
0.042857143 1.00000000 1.254606e+01 0.4464467 9.082873e+00
0.050000000 0.00000000 1.542794e+01 NaN 1.225391e+01
0.050000000 0.05263158 1.178462e+01 0.4843596 8.612444e+00
0.050000000 0.10526316 1.150884e+01 0.4825917 8.159398e+00
0.050000000 0.15789474 1.137125e+01 0.4893425 8.221834e+00
0.050000000 0.21052632 1.124620e+01 0.4962140 8.225802e+00
0.050000000 0.26315789 1.122915e+01 0.4988484 8.264990e+00
0.050000000 0.31578947 1.128418e+01 0.4970736 8.299529e+00
0.050000000 0.36842105 1.133788e+01 0.4952844 8.334344e+00
0.050000000 0.42105263 1.141684e+01 0.4924373 8.381556e+00
0.050000000 0.47368421 1.152914e+01 0.4878408 8.447196e+00
0.050000000 0.52631579 1.165030e+01 0.4826274 8.521640e+00
0.050000000 0.57894737 1.175222e+01 0.4783838 8.592669e+00
0.050000000 0.63157895 1.182910e+01 0.4757828 8.648299e+00
0.050000000 0.68421053 1.189072e+01 0.4742518 8.692550e+00
0.050000000 0.73684211 1.195425e+01 0.4725210 8.734556e+00
0.050000000 0.78947368 1.202159e+01 0.4706393 8.775882e+00
0.050000000 0.84210526 1.209943e+01 0.4681078 8.818007e+00
0.050000000 0.89473684 1.218274e+01 0.4650933 8.867505e+00
0.050000000 0.94736842 1.226803e+01 0.4618557 8.917214e+00
0.050000000 1.00000000 1.234778e+01 0.4585901 8.963490e+00
0.057142857 0.00000000 1.542794e+01 NaN 1.225391e+01
0.057142857 0.05263158 1.182753e+01 0.4830048 8.681742e+00
0.057142857 0.10526316 1.150122e+01 0.4840802 8.136433e+00
0.057142857 0.15789474 1.137930e+01 0.4899405 8.204708e+00
0.057142857 0.21052632 1.123801e+01 0.4976457 8.198693e+00
0.057142857 0.26315789 1.120357e+01 0.5010694 8.233278e+00
0.057142857 0.31578947 1.126110e+01 0.4995409 8.284754e+00
0.057142857 0.36842105 1.131703e+01 0.4975747 8.320962e+00
0.057142857 0.42105263 1.138943e+01 0.4950779 8.369898e+00
0.057142857 0.47368421 1.149156e+01 0.4911438 8.428510e+00
0.057142857 0.52631579 1.159760e+01 0.4868781 8.493519e+00
0.057142857 0.57894737 1.169796e+01 0.4827026 8.560904e+00
0.057142857 0.63157895 1.177364e+01 0.4801300 8.615219e+00
0.057142857 0.68421053 1.183159e+01 0.4787248 8.656630e+00
0.057142857 0.73684211 1.188962e+01 0.4772597 8.696937e+00
0.057142857 0.78947368 1.194775e+01 0.4757366 8.733759e+00
0.057142857 0.84210526 1.201928e+01 0.4734532 8.776562e+00
0.057142857 0.89473684 1.209756e+01 0.4705435 8.825266e+00
0.057142857 0.94736842 1.218293e+01 0.4671266 8.878051e+00
0.057142857 1.00000000 1.225975e+01 0.4639287 8.923119e+00
0.064285714 0.00000000 1.542794e+01 NaN 1.225391e+01
0.064285714 0.05263158 1.183869e+01 0.4847573 8.741220e+00
0.064285714 0.10526316 1.150149e+01 0.4842645 8.117179e+00
0.064285714 0.15789474 1.140137e+01 0.4890712 8.201532e+00
0.064285714 0.21052632 1.125490e+01 0.4970566 8.199461e+00
0.064285714 0.26315789 1.120131e+01 0.5014210 8.227519e+00
0.064285714 0.31578947 1.125510e+01 0.4998556 8.276543e+00
0.064285714 0.36842105 1.131186e+01 0.4978607 8.311247e+00
0.064285714 0.42105263 1.137609e+01 0.4958606 8.355754e+00
0.064285714 0.47368421 1.146583e+01 0.4927417 8.410237e+00
0.064285714 0.52631579 1.156260e+01 0.4892446 8.467217e+00
0.064285714 0.57894737 1.165268e+01 0.4857390 8.525025e+00
0.064285714 0.63157895 1.172156e+01 0.4835935 8.576247e+00
0.064285714 0.68421053 1.177692e+01 0.4822296 8.615257e+00
0.064285714 0.73684211 1.183054e+01 0.4810509 8.652265e+00
0.064285714 0.78947368 1.188335e+01 0.4797117 8.687362e+00
0.064285714 0.84210526 1.194718e+01 0.4777726 8.726094e+00
0.064285714 0.89473684 1.202184e+01 0.4750441 8.771871e+00
0.064285714 0.94736842 1.209913e+01 0.4720985 8.820179e+00
0.064285714 1.00000000 1.217146e+01 0.4692078 8.860966e+00
[ reached getOption("max.print") -- omitted 100 rows ]
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.2631579 and lambda = 0.09285714.
模型比较:
caret包的resamples函数可以分析和可视化重抽样的结果(需要用train函数进行重抽样)。
对于每一个模型来说,比较的对象为每个算法中RMSE最小的最终模型。因为重抽样法为重复十折交叉验证,总共抽取了50次,因此每个算法的最终模型都有50个结果。
#模型比较
#resamples函数可以分析和可视化重抽样的结果
resamp <- resamples( list(lm=lm.model,rlm=rlm.model,pls=pls.model,ridge=ridge.model,enet=enet.model) )
summary(resamp)
dotplot( resamp, metric="Rsquared" )
summary(diff(resamp))
可见,pls与enet模型的RMSE均值最小,代表这两个模型的预测效果最好,而lm的RMSE均值最高,代表预测效果最差。
> summary(resamp)
Call:
summary.resamples(object = resamp)
Models: lm, rlm, pls, ridge, enet
Number of resamples: 50
MAE
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 6.065709 12.894319 15.171037 15.821875 18.783692 24.65799 0
rlm 5.717319 7.432964 8.749105 8.675854 10.093184 14.11213 0
pls 4.682886 7.514041 8.515557 8.524744 9.745764 11.94252 0
ridge 4.829772 7.835397 8.616009 8.751014 10.078878 12.12983 0
enet 5.260407 7.075480 8.321390 8.181317 8.879569 11.87011 0
RMSE
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 8.378259 17.667157 24.15602 24.57042 31.61323 37.00719 0
rlm 7.239544 9.659033 11.89120 12.05907 14.40045 20.71075 0
pls 6.168291 9.985897 11.13949 11.28461 13.03929 15.58695 0
ridge 6.166274 10.433456 11.92233 11.97763 13.77134 16.98081 0
enet 6.915297 9.869505 10.90024 11.19268 12.99169 16.83886 0
Rsquared
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 0.0005498779 0.02368074 0.1239874 0.2022581 0.3396363 0.8353050 0
rlm 0.0502481827 0.30541469 0.4553379 0.4668805 0.6025489 0.8584627 0
pls 0.1747960521 0.36136520 0.4763336 0.5078777 0.6766491 0.8827837 0
ridge 0.1272810606 0.31071584 0.4429805 0.4841285 0.6520417 0.8474234 0
enet 0.1534451182 0.35680985 0.4975184 0.5049357 0.6397455 0.8552184 0
绘图,每一个模型的RMSE均值与R方均值的置信区间。
可见,pls与enet的预测效果相近。
用diff函数进行对比。
上三角: estimates of the difference,差值
下三角:P值
从R方角度进行考虑,联系差值与P值,可以得出结论:除了lm模型非常差,其他模型没有显著差别。因此可以继续使用pls模型。但考虑到弹性网表现略微优异,也可以选择弹性网。
> summary(diff(resamp))
Call:
summary.diff.resamples(object = diff(resamp))
p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0
MAE
lm rlm pls ridge enet
lm 7.14602 7.29713 7.07086 7.64056
rlm < 2e-16 0.15111 -0.07516 0.49454
pls < 2e-16 1.00000 -0.22627 0.34343
ridge < 2e-16 1.00000 0.42680 0.56970
enet < 2e-16 0.02309 0.28097 0.01684
RMSE
lm rlm pls ridge enet
lm 12.51135 13.28581 12.59279 13.37774
rlm 1.600e-15 0.77446 0.08144 0.86639
pls < 2.2e-16 0.206060 -0.69302 0.09193
ridge < 2.2e-16 1.000000 2.685e-05 0.78495
enet < 2.2e-16 0.002561 1.000000 0.011402
Rsquared
lm rlm pls ridge enet
lm -0.264622 -0.305620 -0.281870 -0.302678
rlm 1.799e-11 -0.040997 -0.017248 -0.038055
pls 2.169e-15 0.77260 0.023749 0.002942
ridge 3.529e-14 1.00000 0.04525 -0.020807
enet 4.330e-15 0.20239 1.00000 1.00000
会。可以使用pls或者RMSE。