统计建模与R软件-第三章习题答案

习题-3.1

答:(1)新建一个文本文件:3.1.txt,内容如下:

 74.3 79.5 75.0 73.5 75.8 74.0 73.5 67.2 75.8 73.5 78.8 75.6 73.5 75.0 75.8
72.0 79.5 76.5 73.5 79.5 68.8 75.0 78.8 72.0 68.8 76.5 73.5 72.7 75.0 70.4
78.0 78.8 74.3 64.3 76.5 74.3 74.7 70.4 72.7 76.5 70.4 72.0 75.8 75.8 70.4
76.5 65.0 77.2 73.5 72.7 80.5 72.0 65.0 80.3 71.2 77.6 76.5 68.8 73.5 77.2
80.5 72.0 74.3 69.7 81.2 67.3 81.6 67.3 72.7 84.3 69.7 74.3 71.2 74.3 75.0
72.0 75.4 67.3 81.6 75.0 71.2 71.2 69.7 73.5 70.4 75.0 72.7 67.3 70.3 76.5
73.5 72.0 68.0 73.5 68.0 74.3 72.7 72.7 74.3 70.4

 (2)创建自定义函数:myfunction

   myfunction<-function(x){
n<-length(x)
m<-mean(x)
v<-var(x)
s<-sd(x)
me<-median(x)
cv<-100*s/m
css<-sum((x-m)^2)
uss<-sum(x^2)
R <- max(x)-min(x)
R1 <-quantile(x,3/4)-quantile(x,1/4)
sm <-s/sqrt(n)
g1 <-n/((n-1)*(n-2))*sum((x-m)^3)/s^3
g2 <-((n*(n+1))/((n-1)*(n-2)*(n-3))*sum((x-m)^4)/s^4-(3*(n-1)^2)/((n-2)*(n-3)))
data.frame(N=n,Mean=m,Var=v,std_dev=s,Median=me,std_mean=sm,CV=cv,CSS=css,USS=uss,R=R,R1=R1,Skewness=g1,Kurtosis=g2,row.names=1)
}

(3)将自定义函数加载到内存
> source("myfunction.r")
(4)将数据读入向量serumdata
> serumdata=scan("3.1.txt")
Read 100 items
> serumdata
  [1] 74.3 79.5 75.0 73.5 75.8 74.0 73.5 67.2 75.8 73.5 78.8 75.6 73.5 75.0 75.8 72.0 79.5 76.5 73.5 79.5 68.8 75.0 78.8
 [24] 72.0 68.8 76.5 73.5 72.7 75.0 70.4 78.0 78.8 74.3 64.3 76.5 74.3 74.7 70.4 72.7 76.5 70.4 72.0 75.8 75.8 70.4 76.5
 [47] 65.0 77.2 73.5 72.7 80.5 72.0 65.0 80.3 71.2 77.6 76.5 68.8 73.5 77.2 80.5 72.0 74.3 69.7 81.2 67.3 81.6 67.3 72.7
 [70] 84.3 69.7 74.3 71.2 74.3 75.0 72.0 75.4 67.3 81.6 75.0 71.2 71.2 69.7 73.5 70.4 75.0 72.7 67.3 70.3 76.5 73.5 72.0
 [93] 68.0 73.5 68.0 74.3 72.7 72.7 74.3 70.4
(5)执行自定义函数

   > myfunction(serumdata)
    N   Mean      Var  std_dev Median  std_mean       CV      CSS      USS  R  R1   Skewness   Kurtosis
1 100 73.696 15.41675 3.926417   73.5 0.3926417 5.327857 1526.258 544636.3 20 4.6 0.03854249 0.07051809


习题-3.2

(1)画直方图:hist(serumdata,freq=FALSE,col="purple",border="red",density=3,angle=60,main=paste("直方图"),xlab="age",ylab="frequency")

 

(2)画密度曲线

  lines(density(serumdata),col="blue")
x<-64:85
lines(x,dnorm(x,mean(serumdata),sd(serumdata)),col="green") 
plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE) 
lines(x,pnorm(x,mean(serumdata),sd(serumdata)),col="blue")
qqnorm(serumdata,col="purple") 
qqline(serumdata,col="red")


(3)画正态分布概率密度曲线

    hist(serumdata,freq=FALSE,col="purple",border="red",density=3,angle=60,main=paste("the histogram of serumdata"),xlab="age",ylab="frequency")/
lines(x,dnorm(x,mean(serumdata),sd(serumdata)),col="green") 

  

(4)绘制经验分布图

   plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE) 

 

(5)绘制正态经验分布图

   plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE)/
lines(x,pnorm(x,mean(serumdata),sd(serumdata)),col="blue")

(6)绘制QQ图

   qqnorm(serumdata,col="purple")

(7)绘制QQ直线

    qqnorm(serumdata,col="purple") /
    qqline(serumdata,col="red")

  


习题-3.3

 答:(1)制作茎叶图

      > stem(serumdata,scale=1)
  The decimal point is at the |
  64 | 300
  66 | 23333
  68 | 00888777
  70 | 34444442222
  72 | 0000000777777755555555555
  74 | 033333333700000004688888
  76 | 5555555226
  78 | 0888555
  80 | 355266
  82 | 
  84 | 3

(2)作箱线图(notch表示带有缺口)

   boxplot(serumdata,col="lightblue",notch=T)

  

 (3)五点总结

    > fivenum(serumdata) 
[1] 64.3 71.2 73.5 75.8 84.3


习题3.4

  答:(1)正态性Shapori-Wilk检验方法

  > shapiro.test(serumdata) 
        Shapiro-Wilk normality test
data:  serumdata 
W = 0.9897, p-value = 0.6437

 (2)Kolmogrov-Smirnov检验,正态性

 > ks.test(serumdata,"pnorm",mean(serumdata),sd(serumdata))


        One-sample Kolmogorov-Smirnov test


data:  serumdata 
D = 0.0701, p-value = 0.7097
alternative hypothesis: two-sided

 结论:p值>0.05,可认为来自正态分布的总体。


习题-3.9

答:(1)将数据导到studata数据框中

> studata
   V1       V2 V3 V4   V5    V6
1   1  alice   f  13 56.5  84.0
2   2  becka   f  13 65.3  98.0
3   3   gail   f  14 64.3  90.0
4   4  karen   f  12 56.3  77.0
5   5  kathy   f  12 59.8  84.5
6   6   mary   f  15 66.5 112.0
7   7  sandy   f  11 51.3  50.5
8   8 sharon   f  15 62.5 112.5
9   9  tammy   f  14 62.8 102.5
10 10  alfred   m 14 69.0 112.5
11 11    duke   m 14 63.5 102.5
12 12   guido   m 15 67.0 133.0
13 13   james   m 12 57.3  83.0
14 14 jeffery   m 13 62.5  84.0
15 15    john   m 12 59.0  99.5
16 16  philip   m 16 72.0 150.0
17 17  robert   m 12 64.8 128.0
18 18  thomas   m 11 57.5  85.0
19 19 william   m 15 66.5 112.0
(2)person相关性检验

> attach(studata)
> cor.test(height,weight)


        Pearson's product-moment correlation


data:  height and weight 
t = 2.8298, df = 3, p-value = 0.0662
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 -0.1185906  0.9901188 
sample estimates:
     cor 
0.852915 

结论:person的身高与体重是相关的


习题-6.1

答:(1)初始化数据

x=c(5.1,3.5,7.1,6.2,8.8,7.8,4.5,5.6,8.0,6.4)
y=c(1907,1287,2700,2373,3260,3000,1947,2273,3113,2493)

(2)画图plot(x,y)

 

 结论: 由此可以看出X与Y是有线性关系的

(2)求x与y的方程:y=140.95+364.18x

> lm.sol=lm(y~1+x)
> summary(lm.sol)

Call:
lm(formula = y ~ 1 + x)

Residuals:
     Min       1Q   Median       3Q      Max 
-128.591  -70.978   -3.727   49.263  167.228 


Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   140.95     125.11   1.127    0.293    
x             364.18      19.26  18.908 6.33e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 


Residual standard error: 96.42 on 8 degrees of freedom
Multiple R-squared: 0.9781,     Adjusted R-squared: 0.9754 
F-statistic: 357.5 on 1 and 8 DF,  p-value: 6.33e-08 

(3)β1项很显著,但常数项β0不显著。 回归方程很显著

(4)

> new=data.frame(x=7)
> new
  x
1 7
> lm.pred<-predict(lm.sol,new,interval="prediction")
> lm.pred
       fit      lwr      upr
1 2690.227 2454.971 2925.484

所以Y(7)= 2690.227, [2454.971,2925.484]

  • 4
    点赞
  • 44
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值