习题-3.1
答:(1)新建一个文本文件:3.1.txt,内容如下:
74.3 79.5 75.0 73.5 75.8 74.0 73.5 67.2 75.8 73.5 78.8 75.6 73.5 75.0 75.8
72.0 79.5 76.5 73.5 79.5 68.8 75.0 78.8 72.0 68.8 76.5 73.5 72.7 75.0 70.4
78.0 78.8 74.3 64.3 76.5 74.3 74.7 70.4 72.7 76.5 70.4 72.0 75.8 75.8 70.4
76.5 65.0 77.2 73.5 72.7 80.5 72.0 65.0 80.3 71.2 77.6 76.5 68.8 73.5 77.2
80.5 72.0 74.3 69.7 81.2 67.3 81.6 67.3 72.7 84.3 69.7 74.3 71.2 74.3 75.0
72.0 75.4 67.3 81.6 75.0 71.2 71.2 69.7 73.5 70.4 75.0 72.7 67.3 70.3 76.5
73.5 72.0 68.0 73.5 68.0 74.3 72.7 72.7 74.3 70.4
(2)创建自定义函数:myfunction
myfunction<-function(x){
n<-length(x)
m<-mean(x)
v<-var(x)
s<-sd(x)
me<-median(x)
cv<-100*s/m
css<-sum((x-m)^2)
uss<-sum(x^2)
R <- max(x)-min(x)
R1 <-quantile(x,3/4)-quantile(x,1/4)
sm <-s/sqrt(n)
g1 <-n/((n-1)*(n-2))*sum((x-m)^3)/s^3
g2 <-((n*(n+1))/((n-1)*(n-2)*(n-3))*sum((x-m)^4)/s^4-(3*(n-1)^2)/((n-2)*(n-3)))
data.frame(N=n,Mean=m,Var=v,std_dev=s,Median=me,std_mean=sm,CV=cv,CSS=css,USS=uss,R=R,R1=R1,Skewness=g1,Kurtosis=g2,row.names=1)
}
(3)将自定义函数加载到内存
> source("myfunction.r")
(4)将数据读入向量serumdata
> serumdata=scan("3.1.txt")
Read 100 items
> serumdata
[1] 74.3 79.5 75.0 73.5 75.8 74.0 73.5 67.2 75.8 73.5 78.8 75.6 73.5 75.0 75.8 72.0 79.5 76.5 73.5 79.5 68.8 75.0 78.8
[24] 72.0 68.8 76.5 73.5 72.7 75.0 70.4 78.0 78.8 74.3 64.3 76.5 74.3 74.7 70.4 72.7 76.5 70.4 72.0 75.8 75.8 70.4 76.5
[47] 65.0 77.2 73.5 72.7 80.5 72.0 65.0 80.3 71.2 77.6 76.5 68.8 73.5 77.2 80.5 72.0 74.3 69.7 81.2 67.3 81.6 67.3 72.7
[70] 84.3 69.7 74.3 71.2 74.3 75.0 72.0 75.4 67.3 81.6 75.0 71.2 71.2 69.7 73.5 70.4 75.0 72.7 67.3 70.3 76.5 73.5 72.0
[93] 68.0 73.5 68.0 74.3 72.7 72.7 74.3 70.4
(5)执行自定义函数
> myfunction(serumdata)
N Mean Var std_dev Median std_mean CV CSS USS R R1 Skewness Kurtosis
1 100 73.696 15.41675 3.926417 73.5 0.3926417 5.327857 1526.258 544636.3 20 4.6 0.03854249 0.07051809
习题-3.2
(1)画直方图:hist(serumdata,freq=FALSE,col="purple",border="red",density=3,angle=60,main=paste("直方图"),xlab="age",ylab="frequency")
(2)画密度曲线
lines(density(serumdata),col="blue")
x<-64:85
lines(x,dnorm(x,mean(serumdata),sd(serumdata)),col="green")
plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE)
lines(x,pnorm(x,mean(serumdata),sd(serumdata)),col="blue")
qqnorm(serumdata,col="purple")
qqline(serumdata,col="red")
(3)画正态分布概率密度曲线
hist(serumdata,freq=FALSE,col="purple",border="red",density=3,angle=60,main=paste("the histogram of serumdata"),xlab="age",ylab="frequency")/
lines(x,dnorm(x,mean(serumdata),sd(serumdata)),col="green")
(4)绘制经验分布图
plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE)
(5)绘制正态经验分布图
plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE)/
lines(x,pnorm(x,mean(serumdata),sd(serumdata)),col="blue")
(6)绘制QQ图
qqnorm(serumdata,col="purple")
(7)绘制QQ直线
qqnorm(serumdata,col="purple") /
qqline(serumdata,col="red")
习题-3.3
答:(1)制作茎叶图
> stem(serumdata,scale=1)
The decimal point is at the |
64 | 300
66 | 23333
68 | 00888777
70 | 34444442222
72 | 0000000777777755555555555
74 | 033333333700000004688888
76 | 5555555226
78 | 0888555
80 | 355266
82 |
84 | 3
>
(2)作箱线图(notch表示带有缺口)
boxplot(serumdata,col="lightblue",notch=T)
(3)五点总结
> fivenum(serumdata)
[1] 64.3 71.2 73.5 75.8 84.3
习题3.4
答:(1)正态性Shapori-Wilk检验方法
> shapiro.test(serumdata)
Shapiro-Wilk normality test
data: serumdata
W = 0.9897, p-value = 0.6437
(2)Kolmogrov-Smirnov检验,正态性
> ks.test(serumdata,"pnorm",mean(serumdata),sd(serumdata))
One-sample Kolmogorov-Smirnov test
data: serumdata
D = 0.0701, p-value = 0.7097
alternative hypothesis: two-sided
结论:p值>0.05,可认为来自正态分布的总体。
习题-3.9
答:(1)将数据导到studata数据框中
> studata
V1 V2 V3 V4 V5 V6
1 1 alice f 13 56.5 84.0
2 2 becka f 13 65.3 98.0
3 3 gail f 14 64.3 90.0
4 4 karen f 12 56.3 77.0
5 5 kathy f 12 59.8 84.5
6 6 mary f 15 66.5 112.0
7 7 sandy f 11 51.3 50.5
8 8 sharon f 15 62.5 112.5
9 9 tammy f 14 62.8 102.5
10 10 alfred m 14 69.0 112.5
11 11 duke m 14 63.5 102.5
12 12 guido m 15 67.0 133.0
13 13 james m 12 57.3 83.0
14 14 jeffery m 13 62.5 84.0
15 15 john m 12 59.0 99.5
16 16 philip m 16 72.0 150.0
17 17 robert m 12 64.8 128.0
18 18 thomas m 11 57.5 85.0
19 19 william m 15 66.5 112.0
(2)person相关性检验
> attach(studata)
> cor.test(height,weight)
Pearson's product-moment correlation
data: height and weight
t = 2.8298, df = 3, p-value = 0.0662
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1185906 0.9901188
sample estimates:
cor
0.852915
结论:person的身高与体重是相关的
习题-6.1
答:(1)初始化数据
x=c(5.1,3.5,7.1,6.2,8.8,7.8,4.5,5.6,8.0,6.4)
y=c(1907,1287,2700,2373,3260,3000,1947,2273,3113,2493)
(2)画图plot(x,y)
结论: 由此可以看出X与Y是有线性关系的
(2)求x与y的方程:y=140.95+364.18x
> lm.sol=lm(y~1+x)
> summary(lm.sol)
Call:
lm(formula = y ~ 1 + x)
Residuals:
Min 1Q Median 3Q Max
-128.591 -70.978 -3.727 49.263 167.228
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 140.95 125.11 1.127 0.293
x 364.18 19.26 18.908 6.33e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 96.42 on 8 degrees of freedom
Multiple R-squared: 0.9781, Adjusted R-squared: 0.9754
F-statistic: 357.5 on 1 and 8 DF, p-value: 6.33e-08
(3)β1项很显著,但常数项β0不显著。 回归方程很显著
(4)
> new=data.frame(x=7)
> new
x
1 7
> lm.pred<-predict(lm.sol,new,interval="prediction")
> lm.pred
fit lwr upr
1 2690.227 2454.971 2925.484
所以Y(7)= 2690.227, [2454.971,2925.484]