(部分不全)
第一题
- Logarithmic transformation and regression: consider the following regression:
log(weight) = −3.5+2.0 log(height) + error
with errors that have standard deviation 0.25. Weights are in pounds and heights are in inches.
(a) Fill in the blanks: approximately 68% of the persons will have weights within a factor of ___ and of ___ their predicted values from the regression.
(b) Draw the regression line and scatterplot of log(weight) versus log(height) that make sense and are consistent with the fitted model. Be sure to label the axes of your graph.
( a ) (a) (a)
− 0.25 , + 0.25 -0.25,+0.25 −0.25,+0.25
( b ) (b) (b)
#随机生成身高变量数组
height <- rnorm(100,160,1.6)
#用随机生成的数组生成weight变量
weight <- rnorm(-3.5 + 2.0*log(height),0.25)
weight <- exp(weight)
#绘制模拟生成的变量的散点图
plot(log(height),log(weight))
#重新拟合模型
fit.1 <- lm(log(weight) ~ log(height))
#绘制回归曲线
curve(cbind(1,x) %*% coef(fit.1), add=TRUE)
log(weight)存在负数,可见这是一个很差的模型
第二题
- The folder earnings has data from the Work, Family, and Well-Being Survey (Ross, 1990). Pull out the data on earnings, sex, height, and weight.
(a) In R, check the dataset and clean any unusually coded data.
(b) Fit a linear regression model predicting earnings from height. What transformation should you perform in order to interpret the intercept from this model as average earnings for people with average height?
© Fit some regression models with the goal of predicting earnings from some combination of sex, height, and weight. Be sure to try various transformations and interactions that might make sense. Choose your preferred model and justify.
简单回归,这次不造数据了,分析思路如下:
( a ) (a) (a)
删除含NA的行data<-na.omit(data)
删除含缺失值的行x <- x[complete.cases(x),]
( b ) (b) (b)
#将earning和height分别减去各自的均值
m.earning <- earning-mean(earning)
m.height <- height-mean(height
#拟合模型
fit.1 <- lm(m.earning ~ m.height)
当height等于均值时,z.height=0,此时截距就是height取均值时earning的平均数。
( c ) (c) (c)
首先是一个简单模型,正常情况下R-s