应用预测建模第六章线性回归习题6.2【模型的最优参数选择与模型对比 ,多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网】

模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网

语言:R语言

参考书:应用预测建模 Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译


案例 

#载入数据
library(AppliedPredictiveModeling)
data(permeability)
head(fingerprints[1:10,1:5])
head(permeability)
> head(fingerprints[1:10,1:5])
  X1 X2 X3 X4 X5
1  0  0  0  0  0
2  0  0  0  0  0
3  0  0  0  0  0
4  0  0  0  0  0
5  0  0  0  0  0
6  0  0  0  0  0
> head(permeability)
  permeability
1       12.520
2        1.120
3       19.405
4        1.730
5        1.680
6        0.510

 

 

 

library(caret)
#nearZeroVar可诊断具有唯一值的预测变量(即零方差预测变量)或同时具有以下两个特征的预测变量:
#相对于样本数量,它们具有很少的唯一值;最常见值的频率与次最常见值的频率之比很大。
#nearZeroVar(x,freqCut = 95/5,uniqueCut = 10,saveMetrics = FALSE,names = FALSE,foreach = FALSE,allowParallel = TRUE)
#freqCut 最常见值与第二常见值之比的临界值,默认95/5
#uniqueCut 样本总数中不同值的百分比的临界值,默认10
fingerprints<-as.data.frame(fingerprints)
near.zero.ind<-nearZeroVar(fingerprints)
fingerprintsFliter<-fingerprints[-near.zero.ind]

 有719个预测变量服从退化分布,还剩388个预测变量。


 


#数据预处理
summary(permeability)
#有缺失值的变量所在的位置
NAcol <- which(colSums(is.na(fingerprintsFliter))>0)
NAcol
#本例无缺失值

#响应变量从list转变为向量
permeability<-permeability[,1]

#预测变量都为0-1变量,因此不需要进行box-cox变换
#注意,就算进行box-cox变化也会提示Lambda could not be estimated; no transformation is applied

建模:

#用重抽样方法对模型预测新样本的能力进行评价
#设定为重复十折交叉验证
#设定随机数种子,这样重抽样数据集可以重复
set.seed(100)
indx <- createMultiFolds(permeability, k = 10, times = 5)
ctrl <- trainControl(method = "repeatedcv",number=10,repeats =5 , index = indx)

PLS:

#PLS
set.seed(100)
pls.model <- train(x = fingerprintsFliter, y = permeability,
                 method = "kernelpls",#Dayal和MacGregor的第一种核函数算法kernelpls
                # tuneGrid = expand.grid(ncomp = 1:10),#设定成分数
                 tuneLength=40, 
                 trControl = ctrl,
                 preProc = c("center", "scale"))
pls.model
pls.model$bestTune

最优的PLS成分数为9,对应的重抽样估计的R方为50.79% 

> pls.model
Partial Least Squares 

165 samples
388 predictors

Pre-processing: centered (388), scaled (388) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ... 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared   MAE      
   1     13.05227  0.3240300   9.896661
   2     11.80498  0.4663535   8.308282
   3     11.66726  0.4678344   8.728710
   4     11.72482  0.4674509   8.910673
   5     11.55988  0.4852945   8.621538
   6     11.49614  0.4905275   8.510156
   7     11.31531  0.5016862   8.535471
   8     11.28900  0.5035427   8.665233
   9     11.28461  0.5078777   8.524744
  10     11.46917  0.4984304   8.604099
  11     11.75179  0.4825756   8.752012
  12     11.86670  0.4765582   8.825396
  13     12.05614  0.4688906   8.980787
  14     12.23023  0.4595144   9.057769
  15     12.40422  0.4496407   9.161088
  16     12.66006  0.4375607   9.364334
  17     12.95604  0.4254162   9.534700
  18     13.03904  0.4248562   9.610917
  19     13.10685  0.4220656   9.694758
  20     13.25601  0.4156344   9.804219
  21     13.39648  0.4094236   9.887756
  22     13.66561  0.3984469  10.039217
  23     13.83785  0.3903268  10.162622
  24     13.94098  0.3868608  10.255307
  25     14.06735  0.3825551  10.312950
  26     14.32064  0.3746016  10.494286
  27     14.67806  0.3632787  10.717377
  28     15.00589  0.3529639  10.951726
  29     15.24184  0.3489428  11.080950
  30     15.40581  0.3434618  11.153399
  31     15.51121  0.3398546  11.173934
  32     15.54279  0.3402037  11.151922
  33     15.63262  0.3398001  11.197528
  34     15.67534  0.3394593  11.202881
  35     15.74075  0.3371198  11.232010
  36     15.79167  0.3363571  11.236849
  37     15.85952  0.3350562  11.277133
  38     15.86677  0.3371154  11.275449
  39     15.87254  0.3403697  11.262704
  40     15.91858  0.3426028  11.295925

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 9.


考虑到样本数量小(165个),本例未切割单一的的训练集与测试集,问题(C)中的R方50.79%  即为50次重抽样、成分数为9的平均R方。



多元线性回归 

#多元线性回归
set.seed(100)
lm.model <- train(x = fingerprintsFliter, y = permeability,
                method = "lm",
                trControl = ctrl,
                preProc = c("center", "scale"))

lm.model 

 

> lm.model
Linear Regression 

165 samples
388 predictors

Pre-processing: centered (388), scaled (388) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  24.57042  0.2022581  15.82187

Tuning parameter 'intercept' was held constant at a value of TRUE

稳健回归
预测变量数据框为奇异矩阵,因此不能直接使用稳健回归
用pca对预测变量进行预处理,然后再使用稳健回归

#稳健回归
#预测变量数据框为奇异矩阵,因此不能直接使用稳健回归
#用pca对预测变量进行预处理,然后再使用稳健回归
set.seed(100)
rlm.model <- train(x = fingerprintsFliter, y = permeability,
                method = "rlm",
                trControl = ctrl,
                preProc = c("pca"))

rlm.model
> rlm.model
Robust Linear Model 

165 samples
388 predictors

Pre-processing: principal component signal extraction (388), centered (388), scaled (388) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ... 
Resampling results across tuning parameters:

  intercept  psi           RMSE      Rsquared   MAE      
  FALSE      psi.huber     17.08872  0.4521439  13.555676
  FALSE      psi.hampel    16.98587  0.4624553  13.672698
  FALSE      psi.bisquare  17.02395  0.4598218  13.550679
   TRUE      psi.huber     12.05982  0.4701847   8.484627
   TRUE      psi.hampel    12.05907  0.4668805   8.675854
   TRUE      psi.bisquare  12.87237  0.4536371   8.412304

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were intercept = TRUE and psi = psi.hampel.

 

岭回归

#岭回归
#用train函数选择岭回归的最佳参数
#设定正则化参数 取值范围为0-0.1,中间取15个值
ridgeGrid <- expand.grid(lambda = seq(0, .1, length = 15))

set.seed(100)
ridge.model <- train(x = fingerprintsFliter, y = permeability,
                   method = "ridge", #岭回归
                   tuneGrid = ridgeGrid,
                   trControl = ctrl,
                   preProc = c("center", "scale"))

ridge.model

 岭回归的最优参数lambda为0.1

> ridge.model
Ridge Regression 

165 samples
388 predictors

Pre-processing: centered (388), scaled (388) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ... 
Resampling results across tuning parameters:

  lambda       RMSE          Rsquared   MAE         
  0.000000000  6.028134e+15  0.2650199  2.026089e+15
  0.007142857  1.548415e+03  0.3782875  8.180708e+02
  0.014285714  8.845611e+01  0.3798410  5.209785e+01
  0.021428571  3.332586e+01  0.4164882  2.503135e+01
  0.028571429  1.285875e+01  0.4335107  9.274614e+00
  0.035714286  1.260947e+01  0.4444402  9.114385e+00
  0.042857143  1.254606e+01  0.4464467  9.082873e+00
  0.050000000  1.234778e+01  0.4585901  8.963490e+00
  0.057142857  1.225975e+01  0.4639287  8.923119e+00
  0.064285714  1.217146e+01  0.4692078  8.860966e+00
  0.071428571  1.210816e+01  0.4732640  8.823143e+00
  0.078571429  1.206249e+01  0.4765388  8.797852e+00
  0.085714286  1.202607e+01  0.4794337  8.776395e+00
  0.092857143  1.199524e+01  0.4820107  8.759420e+00
  0.100000000  1.197763e+01  0.4841285  8.751014e+00

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.1.

 


弹性网 


#enet弹性网
#弹性网模型同时具有岭回归罚参数和lasso 罚参数
#lambda为岭回归罚参数(当lambda为0时即为纯lasso模型)
#fraction为lasso罚参数,当lasso罚参数为0时即为纯岭回归模型
enetGrid <- expand.grid(lambda = seq(0, .1, length = 15), 
                        fraction = seq(0, 1, length = 20))
set.seed(100)
enet.model<- train(x = fingerprintsFliter, y = permeability,
                  method = "enet", #弹性网 elastic net
                  tuneGrid = enetGrid,
                  trControl = ctrl,
                  preProc = c("center", "scale"))
enet.model

弹性网的最佳参数:fraction(lasso罚) = 0.2631579 and lambda(岭回归罚) = 0.09285714

> enet.model
Elasticnet 

165 samples
388 predictors

Pre-processing: centered (388), scaled (388) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 148, 149, 149, 148, 149, 149, ... 
Resampling results across tuning parameters:

  lambda       fraction    RMSE          Rsquared   MAE         
  0.000000000  0.00000000  1.540478e+01        NaN  1.221506e+01
  0.000000000  0.05263158  3.449540e+14  0.4283842  1.166587e+14
  0.000000000  0.10526316  6.815173e+14  0.4022667  2.302363e+14
  0.000000000  0.15789474  1.018114e+15  0.3916906  3.438139e+14
  0.000000000  0.21052632  1.354720e+15  0.3798959  4.573915e+14
  0.000000000  0.26315789  1.691329e+15  0.3680099  5.709691e+14
  0.000000000  0.31578947  2.027939e+15  0.3572475  6.845467e+14
  0.000000000  0.36842105  2.364551e+15  0.3451573  7.981243e+14
  0.000000000  0.42105263  2.701163e+15  0.3339668  9.117019e+14
  0.000000000  0.47368421  3.037776e+15  0.3253926  1.025279e+15
  0.000000000  0.52631579  3.374389e+15  0.3173453  1.138857e+15
  0.000000000  0.57894737  3.693177e+15  0.3095418  1.246083e+15
  0.000000000  0.63157895  3.987501e+15  0.3028937  1.344561e+15
  0.000000000  0.68421053  4.280623e+15  0.2959701  1.442581e+15
  0.000000000  0.73684211  4.573826e+15  0.2888721  1.540590e+15
  0.000000000  0.78947368  4.867121e+15  0.2828329  1.638597e+15
  0.000000000  0.84210526  5.160496e+15  0.2774765  1.736604e+15
  0.000000000  0.89473684  5.453804e+15  0.2728098  1.834564e+15
  0.000000000  0.94736842  5.740949e+15  0.2689265  1.930330e+15
  0.000000000  1.00000000  6.028134e+15  0.2650199  2.026089e+15
  0.007142857  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.007142857  0.05263158  1.033524e+02  0.4623499  5.200281e+01
  0.007142857  0.10526316  1.646549e+02  0.4660347  8.998209e+01
  0.007142857  0.15789474  2.323283e+02  0.4720228  1.292456e+02
  0.007142857  0.21052632  3.016033e+02  0.4678674  1.691791e+02
  0.007142857  0.26315789  3.819501e+02  0.4596376  2.124624e+02
  0.007142857  0.31578947  4.641333e+02  0.4475696  2.558784e+02
  0.007142857  0.36842105  5.471372e+02  0.4340064  2.993385e+02
  0.007142857  0.42105263  6.305388e+02  0.4251005  3.427369e+02
  0.007142857  0.47368421  7.135822e+02  0.4175334  3.858955e+02
  0.007142857  0.52631579  7.964393e+02  0.4118095  4.289063e+02
  0.007142857  0.57894737  8.794325e+02  0.4067252  4.719197e+02
  0.007142857  0.63157895  9.625168e+02  0.4016919  5.149725e+02
  0.007142857  0.68421053  1.045668e+03  0.3977980  5.580246e+02
  0.007142857  0.73684211  1.128872e+03  0.3943475  6.010758e+02
  0.007142857  0.78947368  1.212128e+03  0.3910934  6.441272e+02
  0.007142857  0.84210526  1.295521e+03  0.3874173  6.872420e+02
  0.007142857  0.89473684  1.379817e+03  0.3846229  7.308539e+02
  0.007142857  0.94736842  1.464120e+03  0.3811768  7.744635e+02
  0.007142857  1.00000000  1.548415e+03  0.3782875  8.180708e+02
  0.014285714  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.014285714  0.05263158  1.797794e+01  0.4732053  1.065071e+01
  0.014285714  0.10526316  2.190067e+01  0.4706222  1.262729e+01
  0.014285714  0.15789474  2.597123e+01  0.4704845  1.477876e+01
  0.014285714  0.21052632  3.034370e+01  0.4663541  1.708084e+01
  0.014285714  0.26315789  3.428904e+01  0.4592488  1.927629e+01
  0.014285714  0.31578947  3.831697e+01  0.4502840  2.154504e+01
  0.014285714  0.36842105  4.241496e+01  0.4408115  2.389293e+01
  0.014285714  0.42105263  4.653987e+01  0.4298422  2.635053e+01
  0.014285714  0.47368421  5.061041e+01  0.4214552  2.882180e+01
  0.014285714  0.52631579  5.465756e+01  0.4151950  3.127600e+01
  0.014285714  0.57894737  5.869254e+01  0.4106026  3.370772e+01
  0.014285714  0.63157895  6.273159e+01  0.4066244  3.614245e+01
  0.014285714  0.68421053  6.676046e+01  0.4036789  3.857259e+01
  0.014285714  0.73684211  7.076430e+01  0.4010070  4.100412e+01
  0.014285714  0.78947368  7.463485e+01  0.3974049  4.339555e+01
  0.014285714  0.84210526  7.845897e+01  0.3932502  4.575368e+01
  0.014285714  0.89473684  8.194474e+01  0.3889553  4.792304e+01
  0.014285714  0.94736842  8.521037e+01  0.3842786  5.001852e+01
  0.014285714  1.00000000  8.845611e+01  0.3798410  5.209785e+01
  0.021428571  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.021428571  0.05263158  1.317415e+01  0.4768770  9.291129e+00
  0.021428571  0.10526316  1.437122e+01  0.4735211  1.017494e+01
  0.021428571  0.15789474  1.534896e+01  0.4823646  1.098397e+01
  0.021428571  0.21052632  1.637868e+01  0.4866607  1.177871e+01
  0.021428571  0.26315789  1.744992e+01  0.4841110  1.261975e+01
  0.021428571  0.31578947  1.854287e+01  0.4800858  1.346653e+01
  0.021428571  0.36842105  1.971829e+01  0.4715516  1.433782e+01
  0.021428571  0.42105263  2.093725e+01  0.4621313  1.524660e+01
  0.021428571  0.47368421  2.212320e+01  0.4537090  1.612098e+01
  0.021428571  0.52631579  2.326907e+01  0.4475608  1.704217e+01
  0.021428571  0.57894737  2.438377e+01  0.4438079  1.792729e+01
  0.021428571  0.63157895  2.549237e+01  0.4406479  1.880798e+01
  0.021428571  0.68421053  2.660261e+01  0.4377210  1.968561e+01
  0.021428571  0.73684211  2.771253e+01  0.4350921  2.056040e+01
  0.021428571  0.78947368  2.882446e+01  0.4324209  2.144242e+01
  0.021428571  0.84210526  2.994108e+01  0.4293780  2.233282e+01
  0.021428571  0.89473684  3.106375e+01  0.4256027  2.323234e+01
  0.021428571  0.94736842  3.219787e+01  0.4209350  2.413550e+01
  0.021428571  1.00000000  3.332586e+01  0.4164882  2.503135e+01
  0.028571429  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.028571429  0.05263158  1.164803e+01  0.4870994  8.381945e+00
  0.028571429  0.10526316  1.148989e+01  0.4820355  8.229012e+00
  0.028571429  0.15789474  1.130939e+01  0.4897388  8.271174e+00
  0.028571429  0.21052632  1.124240e+01  0.4943329  8.292172e+00
  0.028571429  0.26315789  1.130334e+01  0.4929722  8.327261e+00
  0.028571429  0.31578947  1.137778e+01  0.4898966  8.370961e+00
  0.028571429  0.36842105  1.150147e+01  0.4839684  8.449668e+00
  0.028571429  0.42105263  1.165597e+01  0.4764994  8.532020e+00
  0.028571429  0.47368421  1.183303e+01  0.4683594  8.646258e+00
  0.028571429  0.52631579  1.197183e+01  0.4622449  8.744455e+00
  0.028571429  0.57894737  1.207651e+01  0.4581481  8.816762e+00
  0.028571429  0.63157895  1.216941e+01  0.4551937  8.869916e+00
  0.028571429  0.68421053  1.225411e+01  0.4530774  8.914132e+00
  0.028571429  0.73684211  1.234610e+01  0.4507666  8.966320e+00
  0.028571429  0.78947368  1.244166e+01  0.4481786  9.019456e+00
  0.028571429  0.84210526  1.253844e+01  0.4451619  9.072420e+00
  0.028571429  0.89473684  1.264025e+01  0.4418328  9.128159e+00
  0.028571429  0.94736842  1.275330e+01  0.4375581  9.199859e+00
  0.028571429  1.00000000  1.285875e+01  0.4335107  9.274614e+00
  0.035714286  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.035714286  0.05263158  1.170165e+01  0.4858385  8.468453e+00
  0.035714286  0.10526316  1.150543e+01  0.4814832  8.209731e+00
  0.035714286  0.15789474  1.132812e+01  0.4898204  8.238532e+00
  0.035714286  0.21052632  1.123613e+01  0.4950617  8.257063e+00
  0.035714286  0.26315789  1.127977e+01  0.4947655  8.314775e+00
  0.035714286  0.31578947  1.132916e+01  0.4932586  8.339980e+00
  0.035714286  0.36842105  1.141954e+01  0.4894771  8.401248e+00
  0.035714286  0.42105263  1.154807e+01  0.4834828  8.461108e+00
  0.035714286  0.47368421  1.169445e+01  0.4769915  8.548476e+00
  0.035714286  0.52631579  1.183228e+01  0.4709574  8.645223e+00
  0.035714286  0.57894737  1.194146e+01  0.4664147  8.723158e+00
  0.035714286  0.63157895  1.201380e+01  0.4643840  8.770827e+00
  0.035714286  0.68421053  1.208726e+01  0.4623733  8.815985e+00
  0.035714286  0.73684211  1.216064e+01  0.4606180  8.855813e+00
  0.035714286  0.78947368  1.224247e+01  0.4581163  8.896741e+00
  0.035714286  0.84210526  1.232990e+01  0.4552601  8.947428e+00
  0.035714286  0.89473684  1.242178e+01  0.4520166  8.999512e+00
  0.035714286  0.94736842  1.251768e+01  0.4481484  9.055775e+00
  0.035714286  1.00000000  1.260947e+01  0.4444402  9.114385e+00
  0.042857143  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.042857143  0.05263158  1.173948e+01  0.4853274  8.532311e+00
  0.042857143  0.10526316  1.151846e+01  0.4812703  8.199202e+00
  0.042857143  0.15789474  1.134515e+01  0.4902282  8.221829e+00
  0.042857143  0.21052632  1.121456e+01  0.4977462  8.228653e+00
  0.042857143  0.26315789  1.120874e+01  0.5002426  8.270423e+00
  0.042857143  0.31578947  1.125242e+01  0.4992024  8.289716e+00
  0.042857143  0.36842105  1.132054e+01  0.4962719  8.337868e+00
  0.042857143  0.42105263  1.142924e+01  0.4912499  8.396577e+00
  0.042857143  0.47368421  1.156241e+01  0.4853460  8.474910e+00
  0.042857143  0.52631579  1.170353e+01  0.4789064  8.562321e+00
  0.042857143  0.57894737  1.181902e+01  0.4737488  8.648265e+00
  0.042857143  0.63157895  1.191252e+01  0.4701042  8.714922e+00
  0.042857143  0.68421053  1.198879e+01  0.4677065  8.763115e+00
  0.042857143  0.73684211  1.206858e+01  0.4651293  8.813336e+00
  0.042857143  0.78947368  1.215910e+01  0.4618470  8.865899e+00
  0.042857143  0.84210526  1.225542e+01  0.4582101  8.917571e+00
  0.042857143  0.89473684  1.235329e+01  0.4544600  8.973749e+00
  0.042857143  0.94736842  1.245131e+01  0.4504904  9.028878e+00
  0.042857143  1.00000000  1.254606e+01  0.4464467  9.082873e+00
  0.050000000  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.050000000  0.05263158  1.178462e+01  0.4843596  8.612444e+00
  0.050000000  0.10526316  1.150884e+01  0.4825917  8.159398e+00
  0.050000000  0.15789474  1.137125e+01  0.4893425  8.221834e+00
  0.050000000  0.21052632  1.124620e+01  0.4962140  8.225802e+00
  0.050000000  0.26315789  1.122915e+01  0.4988484  8.264990e+00
  0.050000000  0.31578947  1.128418e+01  0.4970736  8.299529e+00
  0.050000000  0.36842105  1.133788e+01  0.4952844  8.334344e+00
  0.050000000  0.42105263  1.141684e+01  0.4924373  8.381556e+00
  0.050000000  0.47368421  1.152914e+01  0.4878408  8.447196e+00
  0.050000000  0.52631579  1.165030e+01  0.4826274  8.521640e+00
  0.050000000  0.57894737  1.175222e+01  0.4783838  8.592669e+00
  0.050000000  0.63157895  1.182910e+01  0.4757828  8.648299e+00
  0.050000000  0.68421053  1.189072e+01  0.4742518  8.692550e+00
  0.050000000  0.73684211  1.195425e+01  0.4725210  8.734556e+00
  0.050000000  0.78947368  1.202159e+01  0.4706393  8.775882e+00
  0.050000000  0.84210526  1.209943e+01  0.4681078  8.818007e+00
  0.050000000  0.89473684  1.218274e+01  0.4650933  8.867505e+00
  0.050000000  0.94736842  1.226803e+01  0.4618557  8.917214e+00
  0.050000000  1.00000000  1.234778e+01  0.4585901  8.963490e+00
  0.057142857  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.057142857  0.05263158  1.182753e+01  0.4830048  8.681742e+00
  0.057142857  0.10526316  1.150122e+01  0.4840802  8.136433e+00
  0.057142857  0.15789474  1.137930e+01  0.4899405  8.204708e+00
  0.057142857  0.21052632  1.123801e+01  0.4976457  8.198693e+00
  0.057142857  0.26315789  1.120357e+01  0.5010694  8.233278e+00
  0.057142857  0.31578947  1.126110e+01  0.4995409  8.284754e+00
  0.057142857  0.36842105  1.131703e+01  0.4975747  8.320962e+00
  0.057142857  0.42105263  1.138943e+01  0.4950779  8.369898e+00
  0.057142857  0.47368421  1.149156e+01  0.4911438  8.428510e+00
  0.057142857  0.52631579  1.159760e+01  0.4868781  8.493519e+00
  0.057142857  0.57894737  1.169796e+01  0.4827026  8.560904e+00
  0.057142857  0.63157895  1.177364e+01  0.4801300  8.615219e+00
  0.057142857  0.68421053  1.183159e+01  0.4787248  8.656630e+00
  0.057142857  0.73684211  1.188962e+01  0.4772597  8.696937e+00
  0.057142857  0.78947368  1.194775e+01  0.4757366  8.733759e+00
  0.057142857  0.84210526  1.201928e+01  0.4734532  8.776562e+00
  0.057142857  0.89473684  1.209756e+01  0.4705435  8.825266e+00
  0.057142857  0.94736842  1.218293e+01  0.4671266  8.878051e+00
  0.057142857  1.00000000  1.225975e+01  0.4639287  8.923119e+00
  0.064285714  0.00000000  1.542794e+01        NaN  1.225391e+01
  0.064285714  0.05263158  1.183869e+01  0.4847573  8.741220e+00
  0.064285714  0.10526316  1.150149e+01  0.4842645  8.117179e+00
  0.064285714  0.15789474  1.140137e+01  0.4890712  8.201532e+00
  0.064285714  0.21052632  1.125490e+01  0.4970566  8.199461e+00
  0.064285714  0.26315789  1.120131e+01  0.5014210  8.227519e+00
  0.064285714  0.31578947  1.125510e+01  0.4998556  8.276543e+00
  0.064285714  0.36842105  1.131186e+01  0.4978607  8.311247e+00
  0.064285714  0.42105263  1.137609e+01  0.4958606  8.355754e+00
  0.064285714  0.47368421  1.146583e+01  0.4927417  8.410237e+00
  0.064285714  0.52631579  1.156260e+01  0.4892446  8.467217e+00
  0.064285714  0.57894737  1.165268e+01  0.4857390  8.525025e+00
  0.064285714  0.63157895  1.172156e+01  0.4835935  8.576247e+00
  0.064285714  0.68421053  1.177692e+01  0.4822296  8.615257e+00
  0.064285714  0.73684211  1.183054e+01  0.4810509  8.652265e+00
  0.064285714  0.78947368  1.188335e+01  0.4797117  8.687362e+00
  0.064285714  0.84210526  1.194718e+01  0.4777726  8.726094e+00
  0.064285714  0.89473684  1.202184e+01  0.4750441  8.771871e+00
  0.064285714  0.94736842  1.209913e+01  0.4720985  8.820179e+00
  0.064285714  1.00000000  1.217146e+01  0.4692078  8.860966e+00
 [ reached getOption("max.print") -- omitted 100 rows ]

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.2631579 and lambda = 0.09285714.

模型比较: 

caret包的resamples函数可以分析和可视化重抽样的结果(需要用train函数进行重抽样)。

对于每一个模型来说,比较的对象为每个算法中RMSE最小的最终模型。因为重抽样法为重复十折交叉验证,总共抽取了50次,因此每个算法的最终模型都有50个结果。

#模型比较
#resamples函数可以分析和可视化重抽样的结果
resamp <- resamples( list(lm=lm.model,rlm=rlm.model,pls=pls.model,ridge=ridge.model,enet=enet.model) )
summary(resamp) 
dotplot( resamp, metric="Rsquared" )

summary(diff(resamp))

可见,pls与enet模型的RMSE均值最小,代表这两个模型的预测效果最好,而lm的RMSE均值最高,代表预测效果最差。 

> summary(resamp)

Call:
summary.resamples(object = resamp)

Models: lm, rlm, pls, ridge, enet 
Number of resamples: 50 

MAE 
          Min.   1st Qu.    Median      Mean   3rd Qu.     Max. NA's
lm    6.065709 12.894319 15.171037 15.821875 18.783692 24.65799    0
rlm   5.717319  7.432964  8.749105  8.675854 10.093184 14.11213    0
pls   4.682886  7.514041  8.515557  8.524744  9.745764 11.94252    0
ridge 4.829772  7.835397  8.616009  8.751014 10.078878 12.12983    0
enet  5.260407  7.075480  8.321390  8.181317  8.879569 11.87011    0

RMSE 
          Min.   1st Qu.   Median     Mean  3rd Qu.     Max. NA's
lm    8.378259 17.667157 24.15602 24.57042 31.61323 37.00719    0
rlm   7.239544  9.659033 11.89120 12.05907 14.40045 20.71075    0
pls   6.168291  9.985897 11.13949 11.28461 13.03929 15.58695    0
ridge 6.166274 10.433456 11.92233 11.97763 13.77134 16.98081    0
enet  6.915297  9.869505 10.90024 11.19268 12.99169 16.83886    0

Rsquared 
              Min.    1st Qu.    Median      Mean   3rd Qu.      Max. NA's
lm    0.0005498779 0.02368074 0.1239874 0.2022581 0.3396363 0.8353050    0
rlm   0.0502481827 0.30541469 0.4553379 0.4668805 0.6025489 0.8584627    0
pls   0.1747960521 0.36136520 0.4763336 0.5078777 0.6766491 0.8827837    0
ridge 0.1272810606 0.31071584 0.4429805 0.4841285 0.6520417 0.8474234    0
enet  0.1534451182 0.35680985 0.4975184 0.5049357 0.6397455 0.8552184    0

绘图,每一个模型的RMSE均值与R方均值的置信区间。 

可见,pls与enet的预测效果相近。

用diff函数进行对比。 

上三角: estimates of the difference,差值

下三角:P值

从R方角度进行考虑,联系差值与P值,可以得出结论:除了lm模型非常差,其他模型没有显著差别。因此可以继续使用pls模型。但考虑到弹性网表现略微优异,也可以选择弹性网。

 

> summary(diff(resamp))

Call:
summary.diff.resamples(object = diff(resamp))

p-value adjustment: bonferroni 
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0

MAE 
      lm      rlm      pls      ridge    enet    
lm             7.14602  7.29713  7.07086  7.64056
rlm   < 2e-16           0.15111 -0.07516  0.49454
pls   < 2e-16 1.00000           -0.22627  0.34343
ridge < 2e-16 1.00000  0.42680            0.56970
enet  < 2e-16 0.02309  0.28097  0.01684          

RMSE 
      lm        rlm      pls       ridge    enet    
lm              12.51135 13.28581  12.59279 13.37774
rlm   1.600e-15           0.77446   0.08144  0.86639
pls   < 2.2e-16 0.206060           -0.69302  0.09193
ridge < 2.2e-16 1.000000 2.685e-05           0.78495
enet  < 2.2e-16 0.002561 1.000000  0.011402         

Rsquared 
      lm        rlm       pls       ridge     enet     
lm              -0.264622 -0.305620 -0.281870 -0.302678
rlm   1.799e-11           -0.040997 -0.017248 -0.038055
pls   2.169e-15 0.77260              0.023749  0.002942
ridge 3.529e-14 1.00000   0.04525             -0.020807
enet  4.330e-15 0.20239   1.00000   1.00000 

 



会。可以使用pls或者RMSE。

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值