实验八—基本统计分析(一)

实验8 基本统计分析(一)

1.基础性实验

R自带的数据集Titanic记录了泰坦尼克号上乘客的生存和死亡信息,该数据集包含船舱等级(class)、性别 (sex)、年龄(age)、生存状态(survived)四个类别变量。根据该数据集,生成以下频数分布表。

a) 生成sex和survived两个变量的二维列联表,并为列联表添加边际和

b) 生成class, sex, age和 survived四个变量的多维列联表

c) 将问题b)生成的列联表转化为带有类别频数的数据框

> data<-data.frame(Titanic)
> xtabs(Freq~Sex+Survived,data = data)
> addmargins(xtabs(Freq~Sex+Survived,data = data)) 
> tab<-xtabs(Freq~Class+Sex+Age+Survived,data = data)
> tab
> as.data.frame.array(tab)

image-20221106172055990

image-20221010104256817

image-20221010104610032

image-20221010104854354

2.验证性实验

代码清单7-3

library(Hmisc)
myvars <- c("mpg", "hp", "wt")
describe(mtcars[myvars])

image-20221003113352737

代码清单7-4

library(pastecs)
myvars <- c("mpg", "hp", "wt")
stat.desc(mtcars[myvars])

image-20221003113943496

代码清单7-11

library(grid)
library(vcd)
mytable <- xtabs(~ Treatment+Sex+Improved, data=Arthritis)
mytable
ftable(mytable) 
margin.table(mytable, 1)
margin.table(mytable, 2)
margin.table(mytable, 3)
margin.table(mytable, c(1,3))
ftable(prop.table(mytable, c(1,2)))
ftable(addmargins(prop.table(mytable, c(1, 2)), 3))

image-20221003114311630

image-20221003114459680

3.设计性实验

生成如下数据框df,数据的范围[1,20],并设定y2的第3个和第8个值为缺失值。调用Hmisc包中的describe()对数据框生成描述性统计量,观察实验结果。

image-20221003114712865

> y1<-round(runif(10,1,20))
> y2<-round(runif(10,1,20))
> y2[c(3,8)]<-NA
> y3<-round(runif(10,1,20))
> df<-data.frame(y1,y2,y3)
> library(Hmisc)
> describe(df)
df 

 3  Variables      10  Observations
-----------------------------------------------------------------------------------------------------------------------------
y1 
       n  missing distinct     Info     Mean      Gmd 
      10        0        6    0.964      9.3    4.867 

lowest :  2  5  7  8 11, highest:  5  7  8 11 15
                                  
Value        2   5   7   8  11  15
Frequency    1   1   1   2   3   2
Proportion 0.1 0.1 0.1 0.2 0.3 0.2
-----------------------------------------------------------------------------------------------------------------------------
y2 
       n  missing distinct     Info     Mean      Gmd 
       8        2        6    0.952    10.12    6.607 

lowest :  4  5  7 16 17, highest:  5  7 16 17 18
                                              
Value          4     5     7    16    17    18
Frequency      1     1     3     1     1     1
Proportion 0.125 0.125 0.375 0.125 0.125 0.125
-------------------------------------------------------------------------------------------
y3 
       n  missing distinct     Info     Mean      Gmd 
      10        0        8    0.988     13.4    7.511 

lowest :  2  6  7 11 14, highest: 11 14 18 19 20
                                          
Value        2   6   7  11  14  18  19  20
Frequency    1   1   1   1   1   2   2   1
Proportion 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.1
-------------------------------------------------------------------------------------------

image-20221003115112627

image-20221003115125390

4.设计性实验

今测得10名非铅作业工人和7名铅作业工人的血铅值,如下表所示。试用Wilcoxon秩和检验分析两组工人血铅值有无差异。

image-20221003115356171

> x<-c(24,26,29,34,43,58,63)
> y<-c(82,87,97,121,164,208,213)
> wilcox.test(x,y,alternative = "less",exact = FALSE,correct = FALSE)

image-20221003115851123

H0:两组工人血铅值没有差异

H1:两组工人血铅值有差异

p=0.0008726<0.05,原假设不成立,备择假设成立,即两组工人血铅值有差异

5.将下表生成雷达图,雷达图形式不限。

image-20221004230856350

数据集表示的含义为:7种比较算法的三种评价指标(AE of Best, AE of Mean, AE of worst)

a<-c(0.106,0.16,0.135)
b<-c(0.065,0.177,0.103)
c<-c(0.076,0.11,0.096)
d<-c(0.235,0.293,0.271)
e<-c(0.187,0.248,0.222)
f<-c(0.119,0.169,0.134)
g<-c(0.091,0.129,0.108)
df<-data.frame(a,b,c,d,e,f,g)
rownames(df)<-c("AE of Best","AE of Mean","AE of Worst")
colnames(df)<-c("MS","HLMS","BIWOA","BMMVO","BSCA","BHHA","BSSA")
library(fmsb)
max<-c(0.3)
min<-c(0.05)
df<-data.frame(rbind(max,min,df))
radarchart(df=df,seg = 7,
           axistype = 1,
           pcol = c("#00AFBB", "#E7B800","#3401c9"),
           cglcol = "grey",
           plty = 1,
           plwd = 2,
           pty = c(16,18),
           cglty = 1,
           cglwd = 0.8,
           axislabcol = "grey",
           vlcex = 0.7,
           caxislabels = c(0,0.05,0.1,0.15,0.2,0.25,0.3))
legend(
  x = "bottomleft",legend =rownames(df)[3:5] , horiz = TRUE,
  bty = "n", pch = 20 , col = c("#00AFBB", "#E7B800","#3401c9"),
  text.col = "black", cex = 1, pt.cex = 1.5
)

输出结果为:

image-20221005001741845

  • 7
    点赞
  • 47
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

W_chuanqi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值