本次笔记是对 R in action 的第八章的学习和代码实现
首先,为了正确恰当解释OLS模型的系数,数据必须满足如下统计假设:
- 正态性:对于固定的自变量,因变量成正态分布
- 独立性:
值之间相互独立
- 线性:因变量和自变量线性相关
- 同方差性:因变量的方差不随自变量的水平不同而变化
简单线性回归
使用基础安装中的数据集women,代码如下
> fit <- lm(weight~height,data=women)
> summary(fit)
Call:
lm(formula = weight ~ height, data = women)
Residuals:
Min 1Q Median 3Q Max
-1.7333 -1.1333 -0.3833 0.7417 3.1167
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
height 3.45000 0.09114 37.85 1.09e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.525 on 13 degrees of freedom
Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
观察残差和作图
> women$weight
[1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
> fitted(fit)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
112.5833 116.0333 119.4833 122.9333 126.3833 129.8333 133.2833 136.7333 140.1833 143.6333 147.0833 150.5333 153.9833 157.4333 160.8833
> residuals(fit)
1 2 3 4 5 6 7 8 9 10 11 12
2.41666667 0.96666667 0.51666667 0.06666667 -0.38333333 -0.83333333 -1.28333333 -1.73333333 -1.18333333 -1.63333333 -1.08333333 -0.53333333
13 14 15
0.01666667 1.56666667 3.11666667
> plot(women$height,women$weight,xlab="Height(in inches)",ylab="Weight(in pounds)")
> abline(fit)
得到公式:
多项式回归
> fit2 <- lm(weight~height+I(height^2),data = women)
> summary(fit2)
Call:
lm(formula = weight ~ height + I(height^2), data = women)
Residuals:
Min 1Q Median 3Q Max
-0.50941 -0.29611 -0.00941 0.28615 0.59706
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 261.87818 25.19677 10.393 2.36e-07 ***
height -7.34832 0.77769 -9.449 6.58e-07 ***
I(height^2) 0.08306 0.00598 13.891 9.32e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3841 on 12 degrees of freedom
Multiple R-squared: 0.9995, Adjusted R-squared: 0.9994
F-statistic: 1.139e+04 on 2 and 12 DF, p-value: < 2.2e-16
> plot(women$height,women$weight,xlab="Height(in inches)",ylab="Weight(in lbs)")
> lines(women$height,fitted(fit2))
新的预测公式: