> library(pacman)
> p_load(dplyr, readr, caret)
以上一节中未去除离群值的MSE为3619.029,修正R2为0.8603和去除离群值后的MSE为2690.545,修正R2为0.8706为基准,以及两个模型在测试集上的MSE分别为2914.014和1672.859,对模型进行改进。
> results
+ "original", 3619.029, 0.8603, 2914.014,
+ "remove_out", 2690.545, 0.8706, 1672.859)
> results
## # A tibble: 2 x 4
## model mse r_square test_mse
##
## 1 original 3619. 0.860 2914.
## 2 remove_out 2691. 0.871 1673.
1、数据预处理
> machine
> names(machine)
+ "cach", "chmin", "chmax", "prp", "erp")
> machine
>
> set.seed(123)
> ind
>
> dtrain
> dtest
2、缩减特征集
> ct
> set.seed(123)
> fit.step
+ trControl = ct, preProcess = c("corr"), trace = F)
>
> summary(fit.step$finalModel)
##
## Call:
## lm(formula = .outcome ~ myct + mmin + mmax + cach + chmax, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -163.94 -29.68 3.25 28.52 355.05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.024e+01 8.909e+00