导入程序包
需要从程序包里面加载程序包,然后选择MVP
导入数据
> data<-table.b11
> data
Clarity Aroma Body Flavor Oakiness Quality Region
1 1.0 3.3 2.8 3.1 4.1 9.8 1
2 1.0 4.4 4.9 3.5 3.9 12.6 1
3 1.0 3.9 5.3 4.8 4.7 11.9 1
4 1.0 3.9 2.6 3.1 3.6 11.1 1
5 1.0 5.6 5.1 5.5 5.1 13.3 1
6 1.0 4.6 4.7 5.0 4.1 12.8 1
7 1.0 4.8 4.8 4.8 3.3 12.8 1
8 1.0 5.3 4.5 4.3 5.2 12.0 1
9 1.0 4.3 4.3 3.9 2.9 13.6 3
10 1.0 4.3 3.9 4.7 3.9 13.9 1
11 1.0 5.1 4.3 4.5 3.6 14.4 3
12 0.5 3.3 5.4 4.3 3.6 12.3 2
13 0.8 5.9 5.7 7.0 4.1 16.1 3
14 0.7 7.7 6.6 6.7 3.7 16.1 3
15 1.0 7.1 4.4 5.8 4.1 15.5 3
16 0.9 5.5 5.6 5.6 4.4 15.5 3
17 1.0 6.3 5.4 4.8 4.6 13.8 3
18 1.0 5.0 5.5 5.5 4.1 13.8 3
19 1.0 4.6 4.1 4.3 3.1 11.3 1
20 0.9 3.4 5.0 3.4 3.4 7.9 2
21 0.9 6.4 5.4 6.6 4.8 15.1 3
22 1.0 5.5 5.3 5.3 3.8 13.5 3
23 0.7 4.7 4.1 5.0 3.7 10.8 2
24 0.7 4.1 4.0 4.1 4.0 9.5 2
25 1.0 6.0 5.4 5.7 4.7 12.7 3
26 1.0 4.3 4.6 4.7 4.9 11.6 2
27 1.0 3.9 4.0 5.1 5.1 11.7 1
28 1.0 5.1 4.9 5.0 5.1 11.9 2
29 1.0 3.9 4.4 5.0 4.4 10.8 2
30 1.0 4.5 3.7 2.9 3.9 8.5 2
31 1.0 5.2 4.3 5.0 6.0 10.7 2
32 0.8 4.2 3.8 3.0 4.7 9.1 1
33 1.0 3.3 3.5 4.3 4.5 12.1 1
34 1.0 6.8 5.0 6.0 5.2 14.9 3
35 0.8 5.0 5.7 5.5 4.8 13.5 1
36 0.8 3.5 4.7 4.2 3.3 12.2 1
37 0.8 4.3 5.5 3.5 5.8 10.3 1
38 0.8 5.2 4.8 5.7 3.5 13.2 1
换名字
> colnames(data)<-c("x1","x2","x3","x4","x5","y")
> data
x1 x2 x3 x4 x5 y NA
1 1.0 3.3 2.8 3.1 4.1 9.8 1
2 1.0 4.4 4.9 3.5 3.9 12.6 1
3 1.0 3.9 5.3 4.8 4.7 11.9 1
4 1.0 3.9 2.6 3.1 3.6 11.1 1
5 1.0 5.6 5.1 5.5 5.1 13.3 1
6 1.0 4.6 4.7 5.0 4.1 12.8 1
7 1.0 4.8 4.8 4.8 3.3 12.8 1
8 1.0 5.3 4.5 4.3 5.2 12.0 1
9 1.0 4.3 4.3 3.9 2.9 13.6 3
10 1.0 4.3 3.9 4.7 3.9 13.9 1
11 1.0 5.1 4.3 4.5 3.6 14.4 3
12 0.5 3.3 5.4 4.3 3.6 12.3 2
13 0.8 5.9 5.7 7.0 4.1 16.1 3
14 0.7 7.7 6.6 6.7 3.7 16.1 3
15 1.0 7.1 4.4 5.8 4.1 15.5 3
16 0.9 5.5 5.6 5.6 4.4 15.5 3
17 1.0 6.3 5.4 4.8 4.6 13.8 3
18 1.0 5.0 5.5 5.5 4.1 13.8 3
19 1.0 4.6 4.1 4.3 3.1 11.3 1
20 0.9 3.4 5.0 3.4 3.4 7.9 2
21 0.9 6.4 5.4 6.6 4.8 15.1 3
22 1.0 5.5 5.3 5.3 3.8 13.5 3
23 0.7 4.7 4.1 5.0 3.7 10.8 2
24 0.7 4.1 4.0 4.1 4.0 9.5 2
25 1.0 6.0 5.4 5.7 4.7 12.7 3
26 1.0 4.3 4.6 4.7 4.9 11.6 2
27 1.0 3.9 4.0 5.1 5.1 11.7 1
28 1.0 5.1 4.9 5.0 5.1 11.9 2
29 1.0 3.9 4.4 5.0 4.4 10.8 2
30 1.0 4.5 3.7 2.9 3.9 8.5 2
31 1.0 5.2 4.3 5.0 6.0 10.7 2
32 0.8 4.2 3.8 3.0 4.7 9.1 1
33 1.0 3.3 3.5 4.3 4.5 12.1 1
34 1.0 6.8 5.0 6.0 5.2 14.9 3
35 0.8 5.0 5.7 5.5 4.8 13.5 1
36 0.8 3.5 4.7 4.2 3.3 12.2 1
37 0.8 4.3 5.5 3.5 5.8 10.3 1
38 0.8 5.2 4.8 5.7 3.5 13.2 1
建立线性回归方程,数据为data
> lma<-lm(y~x1+x2+x3+x4+x5,data=data)
> summary(lma)
Call:
lm(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.85552 -0.57448 -0.07092 0.67275 1.68093
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9969 2.2318 1.791 0.082775 .
x1 2.3395 1.7348 1.349 0.186958
x2 0.4826 0.2724 1.771 0.086058 .
x3 0.2732 0.3326 0.821 0.417503
x4 1.1683 0.3045 3.837 0.000552 ***
x5 -0.6840 0.2712 -2.522 0.016833 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.163 on 32 degrees of freedom
Multiple R-squared: 0.7206, Adjusted R-squared: 0.6769
F-statistic: 16.51 on 5 and 32 DF, p-value: 4.703e-08
结果分析:回归方程y=3.9969+2.3395 x1+0.4826x2+0.2732 x3+ 1.1683 x4–0.6840 x5
回归方程的显著性检验:F值=16.51,p值<4.703*10^(-8)<0.01,因此x1,x2,x3,x4,x5对y非常显著的线性影响,回归系数x1,x2,x3,x4,x5的t的检验:
变量 | x1 | x2 | x3 | x4 | x5 |
---|---|---|---|---|---|
p值 | 0.186958 | 0.06058 | 0.417503 | 0.000552 | 0.016833 |
t值 | 1.7348 | 0.2724 | 0.821 | 3.837 | -2.522 |
若显著性水平为α=0.05,那么从上面可知只有x4,x5的系数不显著为0
逐步回归R程序
> lm.step<-step(lma,direction="both")
Start: AIC=16.92
y ~ x1 + x2 + x3 + x4 + x5
Df Sum of Sq RSS AIC
- x3 1 0.9118 44.160 15.709
<none> 43.248 16.916
- x1 1 2.4577 45.706 17.016
- x2 1 4.2397 47.488 18.470
- x5 1 8.5978 51.846 21.806
- x4 1 19.8986 63.147 29.299
Step: AIC=15.71
y ~ x1 + x2 + x4 + x5
Df Sum of Sq RSS AIC
- x1 1 1.6936 45.853 15.139
<none> 44.160 15.709
+ x3 1 0.9118 43.248 16.916
- x2 1 5.3545 49.514 18.058
- x5 1 8.0807 52.241 20.094
- x4 1 27.3280 71.488 32.014
Step: AIC=15.14
y ~ x2 + x4 + x5
Df Sum of Sq RSS AIC
<none> 45.853 15.139
+ x1 1 1.6936 44.160 15.709
+ x3 1 0.1477 45.706 17.016
- x2 1 6.6026 52.456 18.251
- x5 1 6.9989 52.852 18.537
- x4 1 25.6888 71.542 30.043
利用逐步回归得到最优回归模型,即y关于x2,x4,x5回归方程
> summary(lm.step)
Call:
lm(formula = y ~ x2 + x4 + x5, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.5707 -0.6256 0.1521 0.6467 1.7741
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.4672 1.3328 4.852 2.67e-05 ***
x2 0.5801 0.2622 2.213 0.033740 *
x4 1.1997 0.2749 4.364 0.000113 ***
x5 -0.6023 0.2644 -2.278 0.029127 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.161 on 34 degrees of freedom
Multiple R-squared: 0.7038, Adjusted R-squared: 0.6776
F-statistic: 26.92 on 3 and 34 DF, p-value: 4.203e-09
结果分析:y关于x2,x4,x5回归方程为:y=6.4672+0.5801x2+1.1997x4-0.6023x5
F检验:F值=26.92,p值4.203*10^(-9)<0.01,因此x2,x4,x5对y非常显著的线性影响,回归系数t检验:
变量 | x2 | x4 | x5 |
---|---|---|---|
t值 | 2.213 | 4.364 | -2.278 |
p值 | 0.033740 | 0.000113 | 0.029127 |
若显著性水平为α=0.05,那么从上面可知x2,x4,x5的系数都显著不为0
y预测点估计与区间估计
> preds<-data.frame(x=1.1,x2=5.2,x3=5.6,x4=5.5,x5=14)
> predict(lm.step,newdata=preds,interval="c",level=0.95)
fit lwr upr
1 7.649586 2.429657 12.86951
> predict(lm.step,newdata=preds,interval="prediction",level=0.95)
fit lwr upr
1 7.649586 1.920927 13.37824
结果分析:
均值:7.649586,置信区间[ 2.429657,12.86951]预测区间[1.920927,13.37824]
这里因为一个字母输错了就出来了一个不一样的东西,就因为newdata打成了mewdata
> predict(lm.step,mewdata=preds,interval="c",level=0.95)
fit lwr upr
1 9.631108 8.880601 10.38162
2 10.869583 10.180994 11.55817
3 11.657264 10.952672 12.36186
4 10.280343 9.478989 11.08170
5 13.242323 12.620377 13.86427
6 12.664681 12.205161 13.12420
7 13.022626 12.382474 13.66278
8 11.568423 10.787597 12.34925
9 11.893772 11.048734 12.73881
10 12.251202 11.760227 12.74218
11 12.656057 12.068372 13.24374
12 11.371902 10.574376 12.16943
13 15.818223 14.805646 16.83080
14 16.743462 15.538991 17.94793
15 15.074736 14.102037 16.04744
16 13.725908 13.230446 14.22137
17 13.109785 12.255224 13.96434
18 13.496576 12.964147 14.02900
19 12.427221 11.695597 13.15884
20 10.470656 9.713694 11.22762
21 15.206779 14.397822 16.01574
22 13.727395 13.188910 14.26588
23 12.963623 12.442063 13.48518
24 11.355130 10.874464 11.83580
25 13.955240 13.368040 14.54244
26 11.648877 11.049556 12.24820
27 11.776242 10.872145 12.68034
28 12.352417 11.765997 12.93884
29 12.077900 11.352426 12.80337
30 10.207779 9.207923 11.20763
31 11.868336 10.871301 12.86537
32 9.671853 8.753894 10.58981
33 10.829810 10.039689 11.61993
34 14.478081 13.596799 15.35936
35 13.074948 12.490712 13.65918
36 11.548654 10.772242 12.32507
37 9.667154 8.558959 10.77535
38 14.213933 13.499064 14.92880