BIOS14:Statistic Exercise 3

最新推荐文章于 2024-02-23 10:15:00 发布

Chix1996

最新推荐文章于 2024-02-23 10:15:00 发布

阅读量229

点赞数

分类专栏： R 生信

本文链接：https://blog.csdn.net/qq_43063824/article/details/104272854

版权

生信同时被 2 个专栏收录

14 篇文章 0 订阅

订阅专栏

8 篇文章 0 订阅

订阅专栏

a) Is there a difference between the sexes with respect to total tree-felling? (3p)

Firstly, I imported the data and checked. Then I calculated the total tree-felling data. According to the question, plot the data with boxplot. And I made the assumption that there is no difference between sexes with respect to total tree-felling. And I would use two-sample test to test this.

elefell = read.csv('elefell.csv')
elefell$total = elefell$t1+elefell$t2+elefell$t3+elefell$t4
boxplot(total~sex, data = elefell)

在这里插入图片描述

Test the assumptions of a two-sample t-test:

# Normality of data within each group
elemale = subset(elefell, elefell$sex == 'male')
elefemale = subset(elefell, elefell$sex == 'female')
x = elemale$total
h = hist(x)
xfit = seq(min(x, na.rm=TRUE),max(x, na.rm=TRUE),length=100)
yfit = dnorm(xfit,mean=mean(x, na.rm=TRUE),sd=sd(x, na.rm=TRUE))
yfit = yfit*max(h$counts)/max(yfit)
hist(elemale$total, col="red")
lines(xfit, yfit, lwd=2)

在这里插入图片描述

> shapiro.test(elemale$total)

	Shapiro-Wilk normality test

data:  elemale$total
W = 0.98058, p-value = 0.9414

> x = elemale$total
> y = pnorm(summary(x), mean =mean(x, na.rm=TRUE), sd =sd(x, na.rm=TRUE))
> ks.test(x, y)

	Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 1, p-value = 8.687e-06
alternative hypothesis: two-sided

> qqnorm(elemale$total)
> qqline(elemale$total)

在这里插入图片描述

Shapiro-Wilks test is not significant, so we can accept the data as being normal.

x = elefemale$total
h = hist(x)
xfit = seq(min(x, na.rm=TRUE),max(x, na.rm=TRUE),length=100)
yfit = dnorm(xfit,mean=mean(x, na.rm=TRUE),sd=sd(x, na.rm=TRUE))
yfit = yfit*max(h$counts)/max(yfit)
hist(elefemale$total, col="red")
lines(xfit, yfit, lwd=2)

在这里插入图片描述

> shapiro.test(elefemale$total)

	Shapiro-Wilk normality test

data:  elefemale$total
W = 0.93097, p-value = 0.1612

> x = elefemale$total
> y = pnorm(summary(x), mean =mean(x, na.rm=TRUE), sd =sd(x, na.rm=TRUE))
> ks.test(x, y)

	Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 1, p-value = 2.252e-06
alternative hypothesis: two-sided

> qqnorm(elefemale$total)
> qqline(elefemale$total)

在这里插入图片描述

qq-plot looks so-so, Shapiro-Wilks test is not significant, normality is fulfilled.

# Homogeneity of variances
> leveneTest(total~sex, data = elefell)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0596 0.8085
      38

not significantly different.

#Carry out two-sample t-test
> t.test(total~sex, data = elefell, var.equal=TRUE)

	Two Sample t-test

data:  total by sex
t = -5.2142, df = 38, p-value = 6.787e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.2025661 -0.9705997
sample estimates:
mean in group female   mean in group male 
            10.53565             12.12224

P-value is smaller than 0.05, accept the alternative hypothesis, the total tree-felling is different between sex.

Illustrate result of t-test

#Barplot of mean with error bars for standard error
mean.total = aggregate(list(total=elefell$total), by=list(sex=elefell$sex), mean, na.rm=TRUE)
SE = function(x) sd(x, na.rm=TRUE)/(sqrt(sum(!is.na(x))))
se.total = aggregate(list(SE=elefell$total), by=list(sex=elefell$sex), FUN=SE)
fell.dat = cbind(mean.total, SE=se.total$SE)
bp = barplot(total~sex, data=fell.dat, ylim = c(0, 13), col="red", names=c("female elephant", "male elephant"))
arrows(bp, fell.dat$total-fell.dat$SE, bp, fell.dat$total+fell.dat$SE, code=3, angle=90)

在这里插入图片描述

b) We can imagine that elephants might be more active at certain times of the day. Is there a difference in tree felling rates depending on time of day, and if so, is the relationship the same for both sexes? (5p)

According to the question, I use repeated measure here.

#Import data and check
elefell = read.csv('elefell.csv')
# plot data
par(mfrow=c(1,4))
boxplot(t1~sex, data = elefell)
boxplot(t2~sex, data = elefell)
boxplot(t3~sex, data = elefell)
boxplot(t4~sex, data = elefell)
par(mfrow=c(1,1))

在这里插入图片描述

#Construct Anova() model
fit = lm(cbind(t1,t2,t3,t4)~sex, data = elefell)
times = factor(c('t1','t2','t3','t4'))
fit.rep = Anova(fit, idata = data.frame(times), idesign=~times, type=3)

> #Check assumptions
> #Independent measurements
> dwtest(fit)

	Durbin-Watson test

data:  fit
DW = 2.2451, p-value = 0.7333
alternative hypothesis: true autocorrelation is greater than 0
# fullfiled

#Multivariate normality
> mvn(data = elefell[,2:5], mvnTest = 'mardia')
$multivariateNormality
             Test         Statistic           p value Result
1 Mardia Skewness  24.6357510253147 0.215723130397143    YES
2 Mardia Kurtosis 0.511592900269535 0.608935955379645    YES
3             MVN              <NA>              <NA>    YES

$univariateNormality
          Test  Variable Statistic   p value Normality
1 Shapiro-Wilk    t1        0.9341    0.0219    NO    
2 Shapiro-Wilk    t2        0.9502    0.0769    YES   
3 Shapiro-Wilk    t3        0.9375    0.0284    NO    
4 Shapiro-Wilk    t4        0.8806    0.0005    NO    

$Descriptives
    n     Mean   Std.Dev   Median       Min      Max     25th     75th        Skew   Kurtosis
t1 40 2.781751 1.3011629 2.761282 0.8136861 5.331051 1.651604 3.916465  0.04958455 -1.3456845
t2 40 2.742962 0.6978275 2.942996 1.0766608 3.924595 2.194925 3.264789 -0.39371416 -0.9033891
t3 40 2.668648 0.9611272 2.701272 1.2224455 4.917754 1.696224 3.487321  0.14425076 -1.1321676
t4 40 3.135586 1.6546084 2.732309 0.3083314 5.560109 1.687355 4.635793  0.01601791 -1.6852439

> mvn(data = elefell[,2:5], mvnTest = 'hz')
$multivariateNormality
           Test       HZ     p value MVN
1 Henze-Zirkler 1.099816 0.002596917  NO

$univariateNormality
          Test  Variable Statistic   p value Normality
1 Shapiro-Wilk    t1        0.9341    0.0219    NO    
2 Shapiro-Wilk    t2        0.9502    0.0769    YES   
3 Shapiro-Wilk    t3        0.9375    0.0284    NO    
4 Shapiro-Wilk    t4        0.8806    0.0005    NO    

$Descriptives
    n     Mean   Std.Dev   Median       Min      Max     25th     75th        Skew   Kurtosis
t1 40 2.781751 1.3011629 2.761282 0.8136861 5.331051 1.651604 3.916465  0.04958455 -1.3456845
t2 40 2.742962 0.6978275 2.942996 1.0766608 3.924595 2.194925 3.264789 -0.39371416 -0.9033891
t3 40 2.668648 0.9611272 2.701272 1.2224455 4.917754 1.696224 3.487321  0.14425076 -1.1321676
t4 40 3.135586 1.6546084 2.732309 0.3083314 5.560109 1.687355 4.635793  0.01601791 -1.6852439

> mvn(data = elefell[,2:5], mvnTest = 'royston')
$multivariateNormality
     Test        H     p value MVN
1 Royston 11.65649 0.002446813  NO

$univariateNormality
          Test  Variable Statistic   p value Normality
1 Shapiro-Wilk    t1        0.9341    0.0219    NO    
2 Shapiro-Wilk    t2        0.9502    0.0769    YES   
3 Shapiro-Wilk    t3        0.9375    0.0284    NO    
4 Shapiro-Wilk    t4        0.8806    0.0005    NO    

$Descriptives
    n     Mean   Std.Dev   Median       Min      Max     25th     75th        Skew   Kurtosis
t1 40 2.781751 1.3011629 2.761282 0.8136861 5.331051 1.651604 3.916465  0.04958455 -1.3456845
t2 40 2.742962 0.6978275 2.942996 1.0766608 3.924595 2.194925 3.264789 -0.39371416 -0.9033891
t3 40 2.668648 0.9611272 2.701272 1.2224455 4.917754 1.696224 3.487321  0.14425076 -1.1321676
t4 40 3.135586 1.6546084 2.732309 0.3083314 5.560109 1.687355 4.635793  0.01601791 -1.6852439

> mvn(data = elefell[,2:5], mvnTest = 'dh')
$multivariateNormality
            Test        E df   p value MVN
1 Doornik-Hansen 3.476949  8 0.9009713 YES

$univariateNormality
          Test  Variable Statistic   p value Normality
1 Shapiro-Wilk    t1        0.9341    0.0219    NO    
2 Shapiro-Wilk    t2        0.9502    0.0769    YES   
3 Shapiro-Wilk    t3        0.9375    0.0284    NO    
4 Shapiro-Wilk    t4        0.8806    0.0005    NO    

$Descriptives
    n     Mean   Std.Dev   Median       Min      Max     25th     75th        Skew   Kurtosis
t1 40 2.781751 1.3011629 2.761282 0.8136861 5.331051 1.651604 3.916465  0.04958455 -1.3456845
t2 40 2.742962 0.6978275 2.942996 1.0766608 3.924595 2.194925 3.264789 -0.39371416 -0.9033891
t3 40 2.668648 0.9611272 2.701272 1.2224455 4.917754 1.696224 3.487321  0.14425076 -1.1321676
t4 40 3.135586 1.6546084 2.732309 0.3083314 5.560109 1.687355 4.635793  0.01601791 -1.6852439

Pass ‘mardia’ and ‘dh’, the multivariate normality is fullfiled.

#Get results of repeated measures model using Anova()
> summary(fit.rep, multivariate=FALSE)

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

            Sum Sq num Df Error SS den Df  F value    Pr(>F)    
(Intercept) 555.00      1    8.796     38 2397.758 < 2.2e-16 ***
sex           6.29      1    8.796     38   27.188 6.787e-06 ***
times        74.35      3   34.862    114   81.041 < 2.2e-16 ***
sex:times   177.87      3   34.862    114  193.876 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Mauchly Tests for Sphericity

          Test statistic p-value
times            0.86509 0.37805
sex:times        0.86509 0.37805


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

           GG eps Pr(>F[GG])    
times     0.92304  < 2.2e-16 ***
sex:times 0.92304  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            HF eps   Pr(>F[HF])
times     1.002872 3.840346e-28
sex:times 1.002872 1.327803e-44

The Mauchly Tests for Sphericity is pass.

From the first table, time effect are significant which means there is a linear relationship between felling-rate and time. And significant interactions between the sex and time would indicate that the slope of the relationship between felling-rate and times different between sexes.

#Illustrate results
#Barplot of mean with error bars for standard error
mean.fell = aggregate(list(fell=elefell_new$fell), by=list(Time=elefell_new$Time,sex=elefell_new$sex), mean, na.rm=TRUE)
SE = function(x) sd(x, na.rm=TRUE)/(sqrt(sum(!is.na(x))))
se.fell = aggregate(list(SE=elefell_new$fell), by=list(Time=elefell_new$Time,sex=elefell_new$sex), FUN=SE)
fell.dat <-cbind(mean.fell, SE=se.fell$SE)
coefs = fit$coefficients
fell.mat = matrix(fell.dat$fell,4,2)
rownames(fell.mat) = c('early morning','late morning', 'early afternoon', 'late afternoon')
colnames(fell.mat) = c("Female", "Male")
bp <-barplot(fell.mat, beside = T, ylim=c(0, 6), col=c("red",'purple','green','orange'), names=c("Female elephant", "Male elephant"),legend=rownames(fell.mat))
arrows(c(bp[,1],bp[,2]), fell.dat$fell-fell.dat$SE, c(bp[,1],bp[,2]), fell.dat$fell+fell.dat$SE, code=3, angle=90)

在这里插入图片描述
c) What are the assumptions of the tests you carried out in parts a) and b)? (2p)

In part a), I used two-sample t-test, the assumptions are:

felling data is independent
the felling data for female and male elephant are both normally distributed.
two groups data have similar variance.

In part b), I used Repeated measures, the assumptions are:

Independent observations
Multivariate normality
Sphericity (multivariate homogeneity of variances)

Chix1996

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
BIOS14:Statistic Exercise 3

a) Is there a difference between the sexes with respect to total tree-felling? (3p)Firstly, I imported the data and checked. Then I calculated the total tree-felling data. According to the question, ...
复制链接

扫一扫