《统计学习导论:基于R应用》第3章课后习题参考答案

【第3章课后习题参考答案】
Chapter 3 Exercise

1. In Table 3.4, the null hypothesis for "TV" is that in the presence of radio
ads and newspaper ads, TV ads have no effect on sales. Similarly, the null
hypothesis for "radio" is that in the presence of TV and newspaper ads, radio
ads have no effect on sales. (And there is a similar null hypothesis for
"newspaper".) The low p-values of TV and radio suggest that the null hypotheses
are false for TV and radio. The high p-value of newspaper suggests that the null
hypothesis is true for newspaper.


2. KNN classifier and KNN regression methods are closely related in formula.
However, the final result of KNN classifier is the classification output for Y
(qualitative), where as the output for a KNN regression predicts the
quantitative value for f(X).


3. Y = 50 + 20(gpa) + 0.07(iq) + 35(gender) + 0.01(gpa * iq) - 10 (gpa * gender)

(a) Y = 50 + 20 k_1 + 0.07 k_2 + 35 gender + 0.01(k_1 * k_2) - 10 (k_1 * gender)
male: (gender = 0)   50 + 20 k_1 + 0.07 k_2 + 0.01(k_1 * k_2)
female: (gender = 1) 50 + 20 k_1 + 0.07 k_2 + 35 + 0.01(k_1 * k_2) - 10 (k_1)

Once the GPA is high enough, males earn more on average. => iii.

(b) Y(Gender = 1, IQ = 110, GPA = 4.0)
= 50 + 20 * 4 + 0.07 * 110 + 35 + 0.01 (4 * 110) - 10 * 4
= 137.1

(c) False. We must examine the p-value of the regression coefficient to
determine if the interaction term is statistically significant or not.


4. (a) I would expect the polynomial regression to have a lower training RSS
than the linear regression because it could make a tighter fit against data that
matched with a wider irreducible error (Var(epsilon)).

(b) Converse to (a), I would expect the polynomial regression to have a higher
test RSS as the overfit from training would have more error than the linear
regression.

(c) Polynomial regression has lower train RSS than the linear fit because of
higher flexibility: no matter what the underlying true relationshop is the
more flexible model will closer follow points and reduce train RSS.
An example of this beahvior is shown on Figure~2.9 from Chapter 2.

(d) There is not enough information to tell which test RSS would be lower
for either regression given the problem statement is defined as not knowing
"how far it is from linear". If it is closer to linear than cubic, the linear
regression test RSS could be lower than the cubic regression test RSS.
Or, if it is closer to cubic than linear, the cubic regression test RSS
could be lower than the linear regression test RSS. It is dues to
bias-variance tradeoff: it is not clear what level of flexibility will
fit data better.


5. See 5.jpg.


6. y = B_0 + B_1 x
from (3.4): B_0 = avg(y) - B_1 avg(x)
right hand side will equal 0 if (avg(x), avg(y)) is a point on the line
0 = B_0 + B_1 avg(x) - avg(y)
0 = (avg(y) - B_1 avg(x)) + B_1 avg(x) - avg(y)
0 = 0

7.


8a.
Auto = read.csv("../data/Auto.csv", header=T, na.strings="?")
Auto = na.omit(Auto)
summary(Auto)

attach(Auto)
lm.fit = lm(mpg ~ horsepower)
summary(lm.fit)

i.
Yes, there is a relationship between horsepower and mpg as deterined by testing the null hypothesis of all
regression coefficients equal to zero. Since the F-statistic is far larger than 1 and the p-value of the F-
statistic is close to zero we can reject the null hypothesis and state there is a statistically significant
relationship between horsepower and mpg.

ii.
To calculate the residual error relative to the response we use the mean of the response and the RSE. The
mean of mpg is 23.4459. The RSE of the lm.fit was 4.906 which indicates a percentage error of 20.9248%. The
R2 of the lm.fit was about 0.6059, meaning 60.5948% of the variance in mpg is explained by horsepower.

iii.
The relationship between mpg and horsepower is negative. The more horsepower an automobile has the linear
regression indicates the less mpg fuel efficiency the automobile will have.

iv.
predict(lm.fit, data.frame(horsepower=c(98)), interval="confidence")

predict(lm.fit, data.frame(horsepower=c(98)), interval="prediction")


8b.
plot(horsepower, mpg)
abline(lm.fit)


8c.
par(mfrow=c(2,2))
plot(lm.fit)

Based on the residuals plots, there is some evidence of non-linearity.


9a.
pairs(Auto)


9b.
cor(subset(Auto, select=-name))


9c.
lm.fit1 = lm(mpg~.-name, data=Auto)
summary(lm.fit1)

i.
Yes, there is a relatioship between the predictors and the response by testing the null hypothesis of whether
all the regression coefficients are zero. The F -statistic is far from 1 (with a small p-value), indicating
evidence against the null hypothesis.

ii.
Looking at the p-values associated with each predictor’s t-statistic, we see that displacement, weight,
year, and origin have a statistically significant relationship, while cylinders, horsepower, and acceleration
do not.

iii.
The regression coefficient for year, 0.7508, suggests that for every one year, mpg increases by the
coefficient. In other words, cars become more fuel efficient every year by almost 1 mpg / year.


9d.
par(mfrow=c(2,2))
plot(lm.fit1)

because there is a discernible curve pattern to the residuals plots. From the leverage plot, point 14 appears
to have high leverage, although not a high magnitude residual.

plot(predict(lm.fit1), rstudent(lm.fit1))

There are possible outliers as seen in the plot of studentized residuals because there are data with a value
greater than 3.


9e.
lm.fit2 = lm(mpg~cylinders*displacement+displacement*weight)
summary(lm.fit2)

From the correlation matrix, I obtained the two highest correlated pairs and used them in picking my
interaction effects. From the p-values, we can see that the interaction between displacement and weight is
statistically signifcant, while the interactiion between cylinders and displacement is not.


9f.
lm.fit3 = lm(mpg~log(weight)+sqrt(horsepower)+acceleration+I(acceleration^2))
summary(lm.fit3)

par(mfrow=c(2,2))
plot(lm.fit3)

plot(predict(lm.fit3), rstudent(lm.fit3))

Apparently, from the p-values, the log(weight), sqrt(horsepower), and acceleration^2 all have statistical
significance of some sort. The residuals plot has less of a discernible pattern than the plot of all linear
regression terms. The studentized residuals displays potential outliers (>3). The leverage plot indicates
more than three points with high leverage.

However, 2 problems are observed from the above plots: 1) the residuals vs fitted plot indicates
heteroskedasticity (unconstant variance over mean) in the model. 2) The Q-Q plot indicates somewhat
unnormality of the residuals.

So, a better transformation need to be applied to our model. From the correlation matrix in 9a.,
displacement, horsepower and weight show a similar nonlinear pattern against our response mpg. This nonlinear
pattern is very close to a log form. So in the next attempt, we use log(mpg) as our response variable.

The outputs show that log transform of mpg yield better model fitting (better R^2, normality of residuals).

lm.fit2<-lm(log(mpg)~cylinders+displacement+horsepower+weight+acceleration+year+origin,data=Auto)
summary(lm.fit2)

par(mfrow=c(2,2)) 
plot(lm.fit2)

plot(predict(lm.fit2),rstudent(lm.fit2))


10a.
library(ISLR)

summary(Carseats)

attach(Carseats)
lm.fit = lm(Sales~Price+Urban+US)
summary(lm.fit)


10b.
Price
The linear regression suggests a relationship between price and sales given the low p-value of the t-
statistic. The coefficient states a negative relationship between Price and Sales: as Price increases, Sales
decreases.

UrbanYes
The linear regression suggests that there isn’t a relationship between the location of the store and the
number of sales based on the high p-value of the t-statistic.

USYes
The linear regression suggests there is a relationship between whether the store is in the US or not and the
amount of sales. The coefficient states a positive relationship between USYes and Sales: if the store is in
the US, the sales will increase by approximately 1201 units.


10c.
Sales = 13.04 + -0.05 Price + -0.02 UrbanYes + 1.20 USYes


10d.
Price and USYes, based on the p-values, F-statistic, and p-value of the F-statistic.


10e.
lm.fit2 = lm(Sales ~ Price + US)
summary(lm.fit2)


10f.
Based on the RSE and R^2 of the linear regressions, they both fit the data similarly, with linear regression
from (e) fitting the data slightly better.


10g.
confint(lm.fit2)


10h.
plot(predict(lm.fit2), rstudent(lm.fit2))

All studentized residuals appear to be bounded by -3 to 3, so not potential outliers are suggested from the
linear regression.

par(mfrow=c(2,2))
plot(lm.fit2)

There are a few observations that greatly exceed (p+1)/n (0.0076) on the leverage-statistic plot that suggest
that the corresponding points have high leverage.


11.
set.seed(1)
x = rnorm(100)
y = 2*x + rnorm(100)


11a.
lm.fit = lm(y~x+0)
summary(lm.fit)

The p-value of the t-statistic is near zero so the null hypothesis is rejected.


11b.
lm.fit = lm(x~y+0)
summary(lm.fit)

The p-value of the t-statistic is near zero so the null hypothesis is rejected.


11c.
Both results in (a) and (b) reflect the same line created in 11a. In other words, y=2x+ϵ
could also be written x=0.5∗(y−ϵ).


11d.
(sqrt(length(x)-1) * sum(x*y)) / (sqrt(sum(x*x) * sum(y*y) - (sum(x*y))^2))

This is same as the t-statistic shown above.


11e.
If you swap t(x,y) as t(y,x), then you will find t(x,y) = t(y,x).


11f.
lm.fit = lm(y~x)
lm.fit2 = lm(x~y)
summary(lm.fit)

summary(lm.fit2)

You can see the t-statistic is the same for the two linear regressions.


12a.
When the sum of the squares of the observed y-values are equal to the sum of the squares of the observed x-
values.


12b.
set.seed(1)
x = rnorm(100)
y = 2*x
lm.fit = lm(y~x+0)
lm.fit2 = lm(x~y+0)
summary(lm.fit)

summary(lm.fit2)

The regression coefficients are different for each linear regression.


12c.
set.seed(1)
x <- rnorm(100)
y <- -sample(x, 100)
sum(x^2)

sum(y^2)

lm.fit <- lm(y~x+0)
lm.fit2 <- lm(x~y+0)
summary(lm.fit)

summary(lm.fit2)

The regression coefficients are the same for each linear regression. So long as sum sum(x^2) = sum(y^2) the
condition in 12a. will be satisfied. Here we have simply taken all the xi

in a different order and made them negative.


13a.
set.seed(1)
x = rnorm(100)

13b.
eps = rnorm(100, 0, sqrt(0.25))


13c.
y = -1 + 0.5*x + eps

y is of length 100. β0 is -1, β1 is 0.5.


13d.
plot(x, y)

plot of chunk unnamed-chunk-30

I observe a linear relationship between x and y with a positive slope, with a variance as is to be expected.


13e.
lm.fit = lm(y~x)
summary(lm.fit)

The linear regression fits a model close to the true value of the coefficients as was constructed. The model
has a large F-statistic with a near-zero p-value so the null hypothesis can be rejected.


13f.
plot(x, y)
abline(lm.fit, lwd=3, col=2)
abline(-1, 0.5, lwd=3, col=3)
legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)


13g.
lm.fit_sq = lm(y~x+I(x^2))
summary(lm.fit_sq)

There is evidence that model fit has increased over the training data given the slight increase in R2 and
RSE. Although, the p-value of the t-statistic suggests that there isn’t a relationship between y and x2.


13h.
set.seed(1)
eps1 = rnorm(100, 0, 0.125)
x1 = rnorm(100)
y1 = -1 + 0.5*x1 + eps1
plot(x1, y1)
lm.fit1 = lm(y1~x1)
summary(lm.fit1)

abline(lm.fit1, lwd=3, col=2)
abline(-1, 0.5, lwd=3, col=3)
legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)

As expected, the error observed in R2 and RSE decreases considerably.


13i.
set.seed(1)
eps2 = rnorm(100, 0, 0.5)
x2 = rnorm(100)
y2 = -1 + 0.5*x2 + eps2
plot(x2, y2)
lm.fit2 = lm(y2~x2)
summary(lm.fit2)

abline(lm.fit2, lwd=3, col=2)
abline(-1, 0.5, lwd=3, col=3)
legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)

As expected, the error observed in R2 and RSE increases considerably.


13j.
confint(lm.fit)

confint(lm.fit1)

confint(lm.fit2)

All intervals seem to be centered on approximately 0.5, with the second fit’s interval being narrower than
the first fit’s interval and the last fit’s interval being wider than the first fit’s interval.


14a.
set.seed(1)
x1 = runif(100)
x2 = 0.5 * x1 + rnorm(100)/10
y = 2 + 2*x1 + 0.3*x2 + rnorm(100)


14b.
cor(x1, x2)

plot(x1, x2)


14c.
lm.fit = lm(y~x1+x2)
summary(lm.fit)

The regression coefficients are close to the true coefficients, although with high standard error. We can
reject the null hypothesis for β1 because its p-value is below 5%. We cannot reject the null hypothesis for
β2

because its p-value is much above the 5% typical cutoff, over 60%.


14d.
lm.fit = lm(y~x1)
summary(lm.fit)

Yes, we can reject the null hypothesis for the regression coefficient given the p-value for its t-statistic
is near zero.


14e.
lm.fit = lm(y~x2)
summary(lm.fit)

Yes, we can reject the null hypothesis for the regression coefficient given the p-value for its t-statistic
is near zero.


14f.
No, because x1 and x2 have collinearity, it is hard to distinguish their effects when regressed upon
together. When they are regressed upon separately, the linear relationship between y and each predictor is
indicated more clearly.


14g.
x1 = c(x1, 0.1)
x2 = c(x2, 0.8)
y = c(y, 6)
lm.fit1 = lm(y~x1+x2)
summary(lm.fit1)

lm.fit2 = lm(y~x1)
summary(lm.fit2)

lm.fit3 = lm(y~x2)
summary(lm.fit3)

In the first model, it shifts x1 to statistically insignificance and shifts x2 to statistiscal significance
from the change in p-values between the two linear regressions.

par(mfrow=c(2,2))
plot(lm.fit1)

par(mfrow=c(2,2))
plot(lm.fit2)

par(mfrow=c(2,2))
plot(lm.fit3)

In the first and third models, the point becomes a high leverage point.

plot(predict(lm.fit1), rstudent(lm.fit1))

plot(predict(lm.fit2), rstudent(lm.fit2))

plot(predict(lm.fit3), rstudent(lm.fit3))

Looking at the studentized residuals, we don’t observe points too far from the |3| value cutoff, except for
the second linear regression: y ~ x1.


15a.
library(MASS)

summary(Boston)

Boston$chas <- factor(Boston$chas, labels = c("N","Y"))
summary(Boston)

attach(Boston)
lm.zn = lm(crim~zn)
summary(lm.zn) # yes

lm.indus = lm(crim~indus)
summary(lm.indus) # yes

lm.chas = lm(crim~chas) 
summary(lm.chas) # no

lm.nox = lm(crim~nox)
summary(lm.nox) # yes

lm.rm = lm(crim~rm)
summary(lm.rm) # yes

lm.age = lm(crim~age)
summary(lm.age) # yes

lm.dis = lm(crim~dis)
summary(lm.dis) # yes

lm.rad = lm(crim~rad)
summary(lm.rad) # yes

lm.tax = lm(crim~tax)
summary(lm.tax) # yes

lm.ptratio = lm(crim~ptratio)
summary(lm.ptratio) # yes

lm.black = lm(crim~black)
summary(lm.black) # yes

lm.lstat = lm(crim~lstat)
summary(lm.lstat) # yes

lm.medv = lm(crim~medv)
summary(lm.medv) # yes

All, except chas. Plot each linear regression using “plot(lm)” to see residuals.


15b.
lm.all = lm(crim~., data=Boston)
summary(lm.all)


15c.
x = c(coefficients(lm.zn)[2],
      coefficients(lm.indus)[2],
      coefficients(lm.chas)[2],
      coefficients(lm.nox)[2],
      coefficients(lm.rm)[2],
      coefficients(lm.age)[2],
      coefficients(lm.dis)[2],
      coefficients(lm.rad)[2],
      coefficients(lm.tax)[2],
      coefficients(lm.ptratio)[2],
      coefficients(lm.black)[2],
      coefficients(lm.lstat)[2],
      coefficients(lm.medv)[2])
y = coefficients(lm.all)[2:14]
plot(x, y)

15d.
lm.zn = lm(crim~poly(zn,3))
summary(lm.zn) # 1, 2

lm.indus = lm(crim~poly(indus,3))
summary(lm.indus) # 1, 2, 3

# lm.chas = lm(crim~poly(chas,3)) : qualitative predictor
lm.nox = lm(crim~poly(nox,3))
summary(lm.nox) # 1, 2, 3

lm.rm = lm(crim~poly(rm,3))
summary(lm.rm) # 1, 2

lm.age = lm(crim~poly(age,3))
summary(lm.age) # 1, 2, 3

lm.dis = lm(crim~poly(dis,3))
summary(lm.dis) # 1, 2, 3

lm.rad = lm(crim~poly(rad,3))
summary(lm.rad) # 1, 2

lm.tax = lm(crim~poly(tax,3))
summary(lm.tax) # 1, 2

lm.ptratio = lm(crim~poly(ptratio,3))
summary(lm.ptratio) # 1, 2, 3

lm.black = lm(crim~poly(black,3))
summary(lm.black) # 1

lm.lstat = lm(crim~poly(lstat,3))
summary(lm.lstat) # 1, 2

lm.medv = lm(crim~poly(medv,3))
summary(lm.medv) # 1, 2, 3


【资源来源】

代码、数据:
https://github.com/asadoughi/stat-learning/archive/refs/heads/master.zip
https://www.statlearning.com/resources-first-edition

课后答案:https://blog.princehonest.com/stat-learning/

网上视频:https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值