南航数据分析与挖掘作业1

最新推荐文章于 2021-08-07 14:48:23 发布

NUAA&XMU---朱林昊

最新推荐文章于 2021-08-07 14:48:23 发布

阅读量779

点赞数 1

分类专栏：数据分析与挖掘文章标签：数据挖掘 r语言 matlab 数据分析

本文链接：https://blog.csdn.net/zhulinhao/article/details/112482392

版权

数据分析与挖掘专栏收录该内容

5 篇文章 1 订阅

订阅专栏

在这里插入图片描述

知识点：
直方图：数据取值范围分成若干区间，区间长度为组距，高度的频率/组距，这样直方图总面积为1。
直方图可以对总体概率密度f(x)进行估计，这就是拟合分布曲线。
（4）茎叶图：
The decimal point is 1 digit(s) to the right of the |
12 | 03
12 | 67889
13 | 1122244
13 | 555677777888899999
14 | 0112222223344
14 | 5566677778999
15 | 01

知识点：
茎叶图特点：①直观看出数据分布情况，绝大部分数据在130-150之间，在135-140之间形成高峰；②自然显出数据排序，可看出原数据次序统计量。

（5）正态W检验：
Shapiro-Wilk normality test
data: data1.1
W = 0.97157, p-value = 0.1741
在这里插入图片描述

（2）
全体居民：

农村居民：

城镇居民：

（3）
全体居民：
The decimal point is 3 digit(s) to the right of the |

2 | 958
4 | 0122234567891145578
6 | 908
8 | 3
10 | 682
12 |
14 |
16 | 8
18 |
20 | 9

农村居民：
The decimal point is 3 digit(s) to the right of the |

1 | 689
2 | 01123444566778888
3 | 0155
4 | 2389
5 |
6 | 3
7 | 7
8 |
9 |
10 | 1

城镇居民：
The decimal point is 3 digit(s) to the right of the |

6 | 3457999
8 | 0122356790000045
10 | 257
12 | 6
14 | 99
16 |
18 | 5
20 |
22 | 3

三、练习1.8
（1）
协方差矩阵：
V1 V2 V3 V4 V5 V6
V1 609.62105 68.800000 -65.115789 -50.863158 -758.2211 -342.42105
V2 68.80000 10.252632 -8.147368 -9.347368 -129.2526 -32.78947
V3 -65.11579 -8.147368 51.989474 5.742105 100.2789 32.78947
V4 -50.86316 -9.347368 5.742105 27.944737 229.5711 142.97368
V5 -758.22105 -129.252632 100.278947 229.571053 3932.4500 2004.60526
V6 -342.42105 -32.789474 32.789474 142.973684 2004.6053 2609.84211

Pearson相关矩阵：
V1 V2 V3 V4 V5 V6
V1 1.00000 0.87024 -0.36576 -0.38969 -0.48970 -0.27147
V2 0.87024 1.00000 -0.35289 -0.55223 -0.64371 -0.20045
V3 -0.36576 -0.35289 1.00000 0.15065 0.22178 0.08902
V4 -0.38969 -0.55223 0.15065 1.00000 0.69253 0.52942
V5 -0.48970 -0.64371 0.22178 0.69253 1.00000 0.62573
V6 -0.27147 -0.20045 0.08902 0.52942 0.62573 1.00000
计算公式：
在这里插入图片描述
（2）
Spearman相关矩阵：
V1 V2 V3 V4 V5 V6
V1 1.00000 0.81423 -0.37070 -0.38020 -0.55673 -0.29525
V2 0.81423 1.00000 -0.23770 -0.54190 -0.71228 -0.19491
V3 -0.37070 -0.23770 1.00000 0.13662 0.17091 0.22605
V4 -0.38020 -0.54190 0.13662 1.00000 0.64973 0.37113
V5 -0.55673 -0.71228 0.17091 0.64973 1.00000 0.62103
V6 -0.29525 -0.19491 0.22605 0.37113 0.62103 1.00000
计算公式：
①Spearman相关矩阵：
在这里插入图片描述

（3）分析各指标之间的相关性
V1 V2 V3 V4 V5 V6
V1 0.00000 0.00019 0.88390 0.88390 0.11861 1.00000
V2 0.00001 0.00000 1.00000 0.13585 0.00596 1.00000
V3 0.10762 0.31291 0.00000 1.00000 1.00000 1.00000
V4 0.09821 0.01358 0.56573 0.00000 0.02510 0.88390
V5 0.01078 0.00043 0.47126 0.00193 0.00000 0.04169
V6 0.20631 0.41022 0.33790 0.10718 0.00347 0.00000
#对角线下方元素为基本P值
在这里插入图片描述
Matlab编程结果：
可以从里面看出他们的大概关系，如果做一个预处理，把一些较差的数据事先去除掉，相关性应该就更明显了，这个预处理在python里面有一些包可以快速实现，由于本科阶段，课程大纲只教授了matlab，故这里采用matlab来演示。
并且这里只展示了生理指标和训练指标之间的相关性，因为我认为，这两组指标之间的相关性更有研究价值，而指标内部的相关性则不然。
在这里插入图片描述
四、练习2.4
Wilcoxon rank sum test with continuity correction
data: new and ori
W = 987, p-value = 0.06029
alternative hypothesis: true location shift is not equal to 0
p-value = 0.06029<α=0.10
拒绝，即认为特殊心理咨询方法的效果显著优于一般方法；

Wilcoxon分析：
在这里插入图片描述

五、练习2.10
（1）双边符号检验法：
Exact Sign Test
data: y by x (pos, neg)
stratified by block
Z = 1, p-value = 0.5078
alternative hypothesis: true mu is not equal to 0
p-value = 0.5078>α=0.05
接受H0，即认为两种催化剂对该化工产品得率的影响差不多，相差不显著；
（2）Wilcoxon符号秩检验法
Wilcoxon signed rank exact test
data: Diff
V = 30, p-value = 0.4258
alternative hypothesis: true location is not equal to 0
p-value = 0.4258>α=0.05
接受H0，即认为两种催化剂对该化工产品得率的影响差不多，相差不显著；
（3）
①双边符号检验法正态逼近：
在这里插入图片描述
根据上表可得：

六、练习2.12
Kruskal-Wallis rank sum test
data: x
Kruskal-Wallis chi-squared = 1.6429, df = 2, p-value = 0.4398
p-value = 0.4398>α=0.05
接受H0，即认为这三条包装线包装白糖的重量相差不显著；

Kruskal-Wallis分析：
在这里插入图片描述

七、（1）Friedman检验练习2.17

Asymptotic Friedman Test
data: y by x (A, B, C, D)
stratified by block
chi-squared = 3.5143, df = 3, p-value = 0.3189
p-value = 0.3189>α=0.05
接受H0，即认为这四种药品对咳嗽的疗效相同；

Friedman分析：
在这里插入图片描述
（2）改进的Friedman检验
“改进Friedman检验P值= 0.495428125102509”
p-value = 0.495428125102509>α=0.05
接受H0，即认为这四种药品对咳嗽的疗效相同；

改进的Friedman检验分析：
在这里插入图片描述

八、附录：源代码
一、homework1.1.R

##读入数据##
data1.1=scan("homework1.1.txt");
mean(data1.1)    # 均值(mean)
var(data1.1)     #方差(variance)
sd(data1.1)      # 标准差(standard deviation)
(CV=sd(data1.1)/mean(data1.1))#变异系数
describe(data1.1)[11]  #偏度(skew)
describe(data1.1)[12]  #峰度(kurtosis)
median(data1.1)  # 中位数(median)
Q3=quantile(data1.1,0.75) #上四分位数
Q1=quantile(data1.1,0.25) #下四分位数
R1 <- Q3-Q1   #四分位极差
M_data=0.25*Q3+0.25*Q1+0.5*median(data1.1); #三均值
x=data1.1
hist(x,col="light blue",freq = FALSE);
##绘制茎叶图##
stem(data1.1)
shapiro.test(data1.1)##正态性W检验

二、homework1.4.R

##读入数据##
library(readxl)
data1.4 <- read_excel("1.4.xlsx")
summary(data1.4)
library(psych);
describe(data1.4)
var(data1.4[4])
x1=scan("homework1.41.txt");
x2=scan("homework1.42.txt");
x3=scan("homework1.43.txt");
hist(x1,col="light blue",freq = FALSE)
hist(x2,col="light blue",freq = FALSE)
hist(x3,col="light blue",freq = FALSE)
##绘制茎叶图##
stem(x1)
stem(x2)
stem(x3)

三、homework1.8.R

#读入数据
data1.8=read.table("homework1.8.txt");
#协方差矩阵
cov(data1.8)
library(psych);
#psych包中corr.test计算相关系数矩阵

#求Person相关系数矩阵
Person1.8 <- corr.test(data1.8);
round(Person1.8$r,5)
round(Person1.8$p,5)#对角线下方元素为基本P值

#求spearman相关系数矩阵
Spearman1.8=corr.test(data1.8,method = "spearman");
round(Spearman1.8$r,5)
round(Spearman1.8$p,5)#对角线下方元素为基本P值

1.8matlab代码：

clear;clc;
A=[191 36 50 5 162 60
189 37 52 2 110 60
193 38 58 12 101 101
162 35 62 12 101 101
189 35 46 13 155 58
182 36 56 4 101 42
211 38 56 8 101 38
167 34 60 6 125 40
176 31 74 15 200 40
154 33 56 17 251 250
169 34 50 17 120 38
166 33 52 13 210 115
154 34 64 14 215 105
247 46 50 1 50 50
193 36 46 6 70 31
202 37 62 12 210 120
176 37 54 4 60 25
157 32 52 11 230 80
156 33 54 15 225 73
138 33 68 2 110 43];
subplot(3,3,1);
plot(A(:,1),A(:,4),'b*');
xlabel('体重');
ylabel('引体向上');
subplot(3,3,2);
plot(A(:,1),A(:,5),'g*');
xlabel('体重');
ylabel('直坐次数');
subplot(3,3,3);
plot(A(:,1),A(:,6),'r*');
xlabel('体重');
ylabel('跳跃次数');
subplot(3,3,4);
plot(A(:,2),A(:,4),'c*');
xlabel('腰围');
ylabel('引体向上');
subplot(3,3,5);
plot(A(:,2),A(:,5),'m*');
xlabel('腰围');
ylabel('直坐次数');
subplot(3,3,6);
plot(A(:,2),A(:,6),'y*');
xlabel('腰围');
ylabel('跳跃次数');
subplot(3,3,7);
plot(A(:,3),A(:,4),'k*');
xlabel('脉搏');
ylabel('引体向上');
subplot(3,3,8);
plot(A(:,1),A(:,5),'b*');
xlabel('脉搏');
ylabel('直坐次数');
subplot(3,3,9);
plot(A(:,1),A(:,6),'g*');
xlabel('脉搏');
ylabel('跳跃次数');

四、homework2.4.R

data2.4=scan("homework2.4.txt");
new1 <- data2.4[1:4];
new1
ori1 <- data2.4[5:8];
ori1
new=rep(c(1,2,3,4),new1);
new
ori=rep(c(1,2,3,4),ori1);
ori
#wilcox秩和检验正态近似
wilcox.test(new,ori,exact = FALSE, correct=FALSE)
wilcox.test(new,ori,exact = FALSE)#带连续修正

五、homework2.10.R

data2.10=scan("homework2.10.txt");
(A <- data2.10[1:9])
(B <- data2.10[10:18])
library(coin)
sign_test(A~B,distribution = "exact")

(Diff=A-B);
#wilcox符号秩检验
wilcox.test(Diff)

#方法2：利用coin包作wilcox符号秩检验
#install.packages("coin")
library(coin)
wilcoxsign_test(A~B)

六、homework2.12.R
data2.12=scan("homework2.12.txt");
(A=data2.12[1:5])
(B=data2.12[6:10])
(C=data2.12[11:14])
(x<-list(A,B,C)) 

#Kruskal-Wallis秩和检验
kruskal.test(x)

七、homework2.17.R

#例2.13  Friedman检验(无节点情形)
data2.17=scan("homework2.17.txt");
(data217=matrix(data2.17,4,7,byrow = T))
rankdata217=matrix(0,4,7);#初始化秩矩阵
for (i in 1:7) rankdata217[,i]=rank(data217[,i]);#计算列秩
S=4;N=7;Rdot=1:4;NS1=N*(S+1);
for (i in 1:4) Rdot[i]=sum(rankdata217[i,]);
Rtotal=sum(Rdot^2);

#Friedman检验统计量
(Q=12*Rtotal/(S*NS1)-3*NS1)

#Friedman检验统计P值
(P=1-pchisq(Q,S-1))
paste("Friedman检验P值=",P)
P<0.05#判断P值

#利用coin包作friedman检验
#install.packages("coin")
(data217_friedman<- data.frame(y=data2.17,x=factor(rep(c("A","B","C","D"),each=7))))
library(coin)
friedman_test(y~x,data=data217_friedman)

(data217_1=scale(data217,scale = F))#数据矩阵列中心化
(data217_2=as.vector(data217_1))#矩阵变为向量
(rankdata217_2=rank(data217_2))#计算向量的秩
(R_tilde=matrix(rankdata217_2,4,7))#恢复为秩矩阵
S=4;N=7;Ridot=0;
for (i in 1:4) Ridot[i]=mean(R_tilde[i,]);#行秩和均值
Ridot
R_tilde_CEN=scale(R_tilde,scale = F);#秩矩阵列中心化
R_tilde_CEN_vector=as.vector(R_tilde_CEN);#向量化
#改进Friedman检验统计量
(Q_tilde=N^2*(S-1)/sum(R_tilde_CEN_vector^2)*sum((Ridot-(S*N+1)/2)^2))
#Rdotj=0;
#for (j in 1:8) Rdotj[j]=mean(R_tilde[,j]);#列秩和

#改进Friedman检验统计P值
(P=1-pchisq(Q_tilde,S-1))
paste("改进Friedman检验P值=",P)
P<0.05#判断P值

NUAA&XMU---朱林昊

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
南航数据分析与挖掘作业1

知识点：直方图：数据取值范围分成若干区间，区间长度为组距，高度的频率/组距，这样直方图总面积为1。直方图可以对总体概率密度f(x)进行估计，这就是拟合分布曲线。（4）茎叶图：The decimal point is 1 digit(s) to the right of the |12 | 0312 | 6788913 | 112224413 | 55567777788889999914 | 011222222334414 | 556667777899915 | 01知识点：...
复制链接

扫一扫

专栏目录