ISLR第三章线性回归应用练习题答案(下)


ISLR;R语言; 机器学习 ;线性回归

一些专业词汇只知道英语的,中文可能不标准,请轻喷


12.没有截距的简单线性回归
a)观察3.38式可发现


当x^2之和与y^2之和相等时,具有相同的参数估计。
b)

set.seed(1)
x=rnorm(100)
y=2*x
lm.fit=lm(y~x+0)
lm.fit2=lm(x~y+0)
summary(lm.fit)

输出结果:

Call:
lm(formula = y ~ x + 0)

Residuals:
       Min         1Q     Median         3Q        Max 
-3.776e-16 -3.378e-17  2.680e-18  6.113e-17  5.105e-16 

Coefficients:
   Estimate Std. Error   t value Pr(>|t|)    
x 2.000e+00  1.296e-17 1.543e+17   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.167e-16 on 99 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:      1 
F-statistic: 2.382e+34 on 1 and 99 DF,  p-value: < 2.2e-16

线性回归2:

summary(lm.fit2)

输出结果:

Call:
lm(formula = x ~ y + 0)

Residuals:
       Min         1Q     Median         3Q        Max 
-1.888e-16 -1.689e-17  1.339e-18  3.057e-17  2.552e-16 

Coefficients:
  Estimate Std. Error   t value Pr(>|t|)    
y 5.00e-01   3.24e-18 1.543e+17   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.833e-17 on 99 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:      1 
F-statistic: 2.382e+34 on 1 and 99 DF,  p-value: < 2.2e-16

实验发现回归参数不同
c)
sample()函数能够从指定的特定对象集合中随机取样,通过指定某类对象的向量x,然后从中取样size。
例如,从整数1到10中取样,并从中不放回地抽取4个数字使用sample(1:10, 4)
,得到3、4、5、7。如果再做一遍得到的是3、9、8、5。因为选择不放回取样,所以不会得到重复的数字。

 > set.seed(1)
 > x=rnorm(100)
 > y=sample(x,100)
 > sum(x^2)
 [1] 81.05509
 > sum(y^2)
 [1] 81.05509
 > lm.fit=lm(y~x+0)
 > lm.fit2=lm(x~y+0)
 > summary(lm.fit)

输出结果:

 Call:
 lm(formula = y ~ x + 0)

 Residuals:
     Min      1Q  Median      3Q     Max 
 -2.2315 -0.5124  0.1027  0.6877  2.3926 

 Coefficients:
   Estimate Std. Error t value Pr(>|t|)
 x  0.02148    0.10048   0.214    0.831

 Residual standard error: 0.9046 on 99 degrees of freedom
 Multiple R-squared:  0.0004614, Adjusted R-squared:  -0.009635 
 F-statistic: 0.0457 on 1 and 99 DF,  p-value: 0.8312

线性回归2:

 Call:
 lm(formula = x ~ y + 0)

 Residuals:
     Min      1Q  Median      3Q     Max 
 -2.2400 -0.5154  0.1213  0.6788  2.3959 

 Coefficients:
   Estimate Std. Error t value Pr(>|t|)
 y  0.02148    0.10048   0.214    0.831

 Residual standard error: 0.9046 on 99 degrees of freedom
 Multiple R-squared:  0.0004614, Adjusted R-squared:  -0.009635 
 F-statistic: 0.0457 on 1 and 99 DF,  p-value: 0.8312

实验发现当x^2之和与y^2之和相等时,线性回归参数相等。


13.
a)

> set.seed(1)
> x=rnorm(100)

b)

> eps=rnorm(100,0,sqrt(0.25))

c)

> y=-1+0.5*x+eps

y向量长度为100;β0=-1;β1=0.5
d)

> plot(x,y)


观察到x与y为线性关系,且斜率大于零。
e)

> lm.fit=lm(y~x)
> summary(lm.fit)

输出结果

Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.93842 -0.30688 -0.06975  0.26970  1.17309 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.01885    0.04849 -21.010  < 2e-16 ***
x            0.49947    0.05386   9.273 4.58e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4814 on 98 degrees of freedom
Multiple R-squared:  0.4674,    Adjusted R-squared:  0.4619 
F-statistic: 85.99 on 1 and 98 DF,  p-value: 4.583e-15

β ˆ0=-1.01885,β ˆ1=0.49947与β0=-1;β1=0.5相近,p值接近于零说明具有显著统计关系。
f)

> plot(x,y)
> abline(lm.fit,lwd=3,col="red")
> abline(-1,0.5,lwd=3,col="green")
> legend(-1,legend=c("model fit", "pop regression"),col=2:3,lwd=3)


g)

> lm.fit2=lm(y~x+I(x^2))
> summary(lm.fit2)

输出结果:

Call:
lm(formula = y ~ x + I(x^2))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.98252 -0.31270 -0.06441  0.29014  1.13500 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.97164    0.05883 -16.517  < 2e-16 ***
x            0.50858    0.05399   9.420  2.4e-15 ***
I(x^2)      -0.05946    0.04238  -1.403    0.164    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.479 on 97 degrees of freedom
Multiple R-squared:  0.4779,    Adjusted R-squared:  0.4672 
F-statistic:  44.4 on 2 and 97 DF,  p-value: 2.038e-14

R^2和RSE只有微弱的增加,x^2的t值为0.164说明y与x^2无显著统计关系
h)

> set.seed(1)
> esp1=rnorm(100,0,sqrt(0.125))
> y1=-1+0.5*x + esp1
> plot(x,y1)
> lm.fit1=lm(y1~x)
> summary(lm.fit1)

输出结果:

Call:
lm(formula = y1 ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.66356 -0.21700 -0.04932  0.19071  0.82950 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.01333    0.03429  -29.55   <2e-16 ***
x            0.49963    0.03809   13.12   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3404 on 98 degrees of freedom
Multiple R-squared:  0.6371,    Adjusted R-squared:  0.6334 
F-statistic: 172.1 on 1 and 98 DF,  p-value: < 2.2e-16

画图:

> abline(lm.fit1,lwd=3,col=2)
> abline(-1,0.5,lwd=3,col=3)
> legend(-1,legend=c("model fit","pop. regression"),col=2:3,lwd=3)


RSE减小
i)

> esp2=rnorm(100,0,sqrt(0.5))
> y2=-1+0.5*x + esp2
> plot(x,y2)
> lm.fit2=lm(y2~x)
> summary(lm.fit2)

输出结果:

Call:
lm(formula = y2 ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.06059 -0.34104 -0.03205  0.45908  1.86787 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.98065    0.07404 -13.245  < 2e-16 ***
x            0.51497    0.08224   6.262 1.01e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7349 on 98 degrees of freedom
Multiple R-squared:  0.2858,    Adjusted R-squared:  0.2785 
F-statistic: 39.21 on 1 and 98 DF,  p-value: 1.01e-08

画图:

abline(lm.fit2,lwd=3,col=2)
abline(-1,0.5,lwd=3,col=3)
legend(-1,legend=c(“model fit”,”pop. regression”),col=2:3,lwd=3)


RSE增大
j)

> confint(lm.fit)
                 2.5 %     97.5 %
(Intercept) -1.1150804 -0.9226122
x            0.3925794  0.6063602
> confint(lm.fit1)
                 2.5 %     97.5 %
(Intercept) -1.0813741 -0.9452786
x            0.4240422  0.5752080
> confint(lm.fit2)
                 2.5 %     97.5 %
(Intercept) -1.1275711 -0.8337236
x            0.3517741  0.6781604

噪声越大,置信区间相对越大。


14.
a)


β0=2;β1=2;β2=0.3;
b)

> cor(x1,x2)
[1] 0.8351212
> plot(x1,x2)


c)

> lm.fit=lm(y~x1+x2)
> summary(lm.fit)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8311 -0.7273 -0.0537  0.6338  2.3359 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.1305     0.2319   9.188 7.61e-15 ***
x1            1.4396     0.7212   1.996   0.0487 *  
x2            1.0097     1.1337   0.891   0.3754    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.056 on 97 degrees of freedom
Multiple R-squared:  0.2088,    Adjusted R-squared:  0.1925 
F-statistic:  12.8 on 2 and 97 DF,  p-value: 1.164e-05

β ˆ0=2.1305;β ˆ1=1.4396;β ˆ2=1.0097
β0=2;β1=2;β2=0.3;
由于t值过大,我们并不能拒绝β2 = 0的假设
d)

> lm.fit1=lm(y~x1)
> summary(lm.fit1)

Call:
lm(formula = y ~ x1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.89495 -0.66874 -0.07785  0.59221  2.45560 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.1124     0.2307   9.155 8.27e-15 ***
x1            1.9759     0.3963   4.986 2.66e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.055 on 98 degrees of freedom
Multiple R-squared:  0.2024,    Adjusted R-squared:  0.1942 
F-statistic: 24.86 on 1 and 98 DF,  p-value: 2.661e-06

由于p值接近于0可以拒绝H*0 : β*1 = 0假设
e)

> lm.fit2=lm(y~x2)
> summary(lm.fit2)

Call:
lm(formula = y ~ x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.62687 -0.75156 -0.03598  0.72383  2.44890 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.3899     0.1949   12.26  < 2e-16 ***
x2            2.8996     0.6330    4.58 1.37e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.072 on 98 degrees of freedom
Multiple R-squared:  0.1763,    Adjusted R-squared:  0.1679 
F-statistic: 20.98 on 1 and 98 DF,  p-value: 1.366e-05

由于p值接近于0可以拒绝H*0 : β*1 = 0假设
f)
因为x1与x2共线的,所以当x1与x2一起做线性回归时很难区分他们的影响,当他们分别做线性回归就很清晰了。
g)

> x1=c(x1,0.1)
> x1=c(x1,0.1)
> x2=c(x2,0.8)
> y=c(y,6)
> lm.fit1 = lm(y~x1+x2)
> summary(lm.fit1)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.73348 -0.69318 -0.05263  0.66385  2.30619 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.2267     0.2314   9.624 7.91e-16 ***
x1            0.5394     0.5922   0.911  0.36458    
x2            2.5146     0.8977   2.801  0.00614 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.075 on 98 degrees of freedom
Multiple R-squared:  0.2188,    Adjusted R-squared:  0.2029 
F-statistic: 13.72 on 2 and 98 DF,  p-value: 5.564e-06

> lm.fit2 = lm(y~x1)
> summary(lm.fit2)

Call:
lm(formula = y ~ x1)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8897 -0.6556 -0.0909  0.5682  3.5665 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.2569     0.2390   9.445 1.78e-15 ***
x1            1.7657     0.4124   4.282 4.29e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.111 on 99 degrees of freedom
Multiple R-squared:  0.1562,    Adjusted R-squared:  0.1477 
F-statistic: 18.33 on 1 and 99 DF,  p-value: 4.295e-05

> lm.fit3 = lm(y~x2)
> summary(lm.fit3)

Call:
lm(formula = y ~ x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.64729 -0.71021 -0.06899  0.72699  2.38074 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.3451     0.1912  12.264  < 2e-16 ***
x2            3.1190     0.6040   5.164 1.25e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.074 on 99 degrees of freedom
Multiple R-squared:  0.2122,    Adjusted R-squared:  0.2042 
F-statistic: 26.66 on 1 and 99 DF,  p-value: 1.253e-06

新的数据导致y1中不能拒绝β1=0假设。

> par(mfrow=c(2,2))
> plot(lm.fit1)

> par(mfrow=c(2,2))
> plot(lm.fit2)

> par(mfrow=c(2,2))
> plot(lm.fit3)


在第一个和第三个线性回归模型中,新加入的点是高权重点。

> plot(predict(lm.fit1), rstudent(lm.fit1))
> plot(predict(lm.fit2), rstudent(lm.fit2))
> plot(predict(lm.fit3), rstudent(lm.fit3))


只有第二个线性回归模型归一化残差大于3,为异常值。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值