a) Is there any difference between the sexes regarding the propensity to jump off cliffs and flap the ears? (ignore color for the moment) (2p)
I will the hypothesis that elephants of two sexes are equally likely to jump off cliffs and flap the ears with Frequency tests.
Firstly, data import and exploration
# data import and exploration
elejump = read.csv('elejump.csv')
str(elejump)
# reshape the data
f_yes = sum(elejump$sex=='female' & elejump$jumpflap == 1)
f_no = sum(elejump$sex=='female' & elejump$jumpflap == 0)
m_yes = sum(elejump$sex=='male' & elejump$jumpflap == 1)
m_no = sum(elejump$sex=='male' & elejump$jumpflap == 0)
g_yes = sum(elejump$colour=='grey' & elejump$jumpflap == 1)
g_no = sum(elejump$colour=='grey' & elejump$jumpflap == 0)
p_yes = sum(elejump$colour=='pink' & elejump$jumpflap == 1)
p_no = sum(elejump$colour=='pink' & elejump$jumpflap == 0)
f_grey = sum(elejump$sex == 'female' & elejump$colour == 'grey')
f_pink = sum(elejump$sex == 'female' & elejump$colour == 'pink')
m_grey = sum(elejump$sex == 'male' & elejump$colour == 'grey')
m_pink = sum(elejump$sex == 'male' & elejump$colour == 'pink')
sex_jump = data.frame(female = c(f_yes, f_no), male = c(m_yes, m_no), row.names = c('Yes', 'No'))
color_jump = data.frame(grey = c(g_yes, g_no), pink = c(p_yes, p_no), row.names = c('Yes', 'No'))
sex_col = data.frame(female = c(f_grey, f_pink), male = c(m_grey, m_pink), row.names = c('grey', 'pink'))
barplot(as.matrix(sex_jump), beside = T, legend = T, ylim = c(0, 25))
Check assumptions of frequency tests.
The first two assumptions are “input data must be counts” and “Categories must be mutually exclusive” which are obviously fulfilled.
So I need to test “For chi-square tests and G-tests the expected counts per category must be > 5”.
#Check assumptions
fdata = sex_jump
fexp = matrix(NA, nrow(fdata), ncol(fdata))
for(i in 1:nrow(fdata)) {
for(j in 1:ncol(fdata)) {
fexp[i,j] = (((sum(fdata[i,]))/sum(fdata))*((sum(fdata[,j]))/sum(fdata))*(sum(fdata)))
}
}
> fexp
[,1] [,2]
[1,] 10 15
[2,] 10 15
#Chi-squared test
> chisq.test(sex_jump)
Pearson's Chi-squared test with Yates' continuity correction
data: sex_jump
X-squared = 4.0833, df = 1, p-value = 0.04331
There seems to be a significant association between sex and tendency to jump off cliffs and flap the ears.
#Fisher's exact test
> fisher.test(sex_jump)
Fisher's Exact Test for Count Data
data: sex_jump
p-value = 0.04208
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.0608841 0.9595294
sample estimates:
odds ratio
0.2556412
The likely to jump off cliffs and flap the ears is about 4 times for male elephants compared to female elephants.
#G-test
> GTest(sex_jump)
Log likelihood ratio (G-test) test of independence without
correction
data: sex_jump
G = 5.4507, X-squared df = 1, p-value = 0.01956
Also gives a significant output.
#Illustrate results
barplot(as.matrix(sex_jump), xlab="Elephants sexes", names=c("Female","Male"), ylab="Counts", beside = T, legend = c('Jump', 'Not jump'), ylim = c(0, 25))
Conclusion, Male elephant is about 4 times likely than female elephants to jump off cliffs and flap the ears.
b) Ignoring sex, does color seem to influence the jumping behavior? (2p)
Same as part a), also use frequency tests to test that there is no difference between different elephants color to ump off cliffs and flap the ears.
# plot data
barplot(as.matrix(color_jump), beside = T, legend = T, ylim = c(0, 25))
#Check assumptions
fdata = color_jump
fexp = matrix(NA, nrow(fdata), ncol(fdata))
for(i in 1:nrow(fdata)) {
for(j in 1:ncol(fdata)) {
fexp[i,j] = (((sum(fdata[i,]))/sum(fdata))*((sum(fdata[,j]))/sum(fdata))*(sum(fdata)))
}
}
> fexp
[,1] [,2]
[1,] 12 13
[2,] 12 13
#Chi-squared test
> chisq.test(color_jump)
Pearson's Chi-squared test with Yates' continuity correction
data: color_jump
X-squared = 28.926, df = 1, p-value = 7.519e-08
#Fisher's exact test
> fisher.test(color_jump)
Fisher's Exact Test for Count Data
data: color_jump
p-value = 1.148e-08
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.001097566 0.095029572
sample estimates:
odds ratio
0.01417599
# G-test
> GTest(color_jump)
Log likelihood ratio (G-test) test of independence without
correction
data: color_jump
G = 36.95, X-squared df = 1, p-value = 1.212e-09
G-test and T-test both give the significant outputs, and from Fisher’s exact test, we get the conclusion that pink elephants is about 70 times likely than grey elephants to do the jumping behavior.
#Illustrate results
barplot(as.matrix(color_jump), xlab="Elephants' skin color", names=c("Grey","Pink"), ylab="Counts", beside = T, legend = c('Jump', 'Not jump'), ylim = c(0, 30))
c) Is there any difference in color between the sexes? (2p)
Assume there is no difference in color between sexes. Test it with frequency test.
# lot data
barplot(as.matrix(sex_col), beside = T, legend = T, ylim = c(0, 25))
#Check assumptions
> fdata = sex_col
> fexp = matrix(NA, nrow(fdata), ncol(fdata))
> for(i in 1:nrow(fdata)) {
+ for(j in 1:ncol(fdata)) {
+ fexp[i,j] = (((sum(fdata[i,]))/sum(fdata))*((sum(fdata[,j]))/sum(fdata))*(sum(fdata)))
+ }
+ }
> fexp
[,1] [,2]
[1,] 9.6 14.4
[2,] 10.4 15.6
#Chi-squared test
> chisq.test(sex_col)
Pearson's Chi-squared test with Yates' continuity correction
data: sex_col
X-squared = 5.0781, df = 1, p-value = 0.02423
#Fisher's exact test
> fisher.test(sex_col)
Fisher's Exact Test for Count Data
data: sex_col
p-value = 0.01997
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.191479 19.192502
sample estimates:
odds ratio
4.512379
#G-test
> GTest(sex_col)
Log likelihood ratio (G-test) test of independence without
correction
data: sex_col
G = 6.6093, X-squared df = 1, p-value = 0.01014
All tests return significant results, the skin color is relayed with sexes.
#Illustrate results
barplot(as.matrix(sex_col), xlab="Elephants' sexes", names=c("Female","male"), ylab="Counts", beside = T, legend = c('Grey', 'Pink'), ylim = c(0, 30), col =c('Grey', 'Pink'))
d) What is the result if both sex and color are taken into account when predicting jumping behavior? (4p)
# plot data
f_g_yes = sum(elejump$sex == 'female' & elejump$colour == 'grey' & elejump$jumpflap == 1)
f_g_no = sum(elejump$sex == 'female' & elejump$colour == 'grey' & elejump$jumpflap == 0)
f_p_yes = sum(elejump$sex == 'female' & elejump$colour == 'pink' & elejump$jumpflap == 1)
f_p_no = sum(elejump$sex == 'female' & elejump$colour == 'pink' & elejump$jumpflap == 0)
m_g_yes = sum(elejump$sex == 'male' & elejump$colour == 'grey' & elejump$jumpflap == 1)
m_g_no = sum(elejump$sex == 'male' & elejump$colour == 'grey' & elejump$jumpflap == 0)
m_p_yes = sum(elejump$sex == 'male' & elejump$colour == 'pink' & elejump$jumpflap == 1)
m_p_no = sum(elejump$sex == 'male' & elejump$colour == 'pink' & elejump$jumpflap == 0)
Jump = data.frame(
Sex = c('Female','Female','Female','Female','Male','Male','Male','Male'),
Color = c('Grey','Grey','Pink','Pink','Grey','Grey','Pink','Pink'),
Jump = c('Yes','No','Yes','No','Yes','No','Yes','No'),
Counts = c(f_g_yes,f_g_no,f_p_yes,f_p_no,m_g_yes,m_g_no,m_p_yes,m_p_no))
ggplot(Jump, aes(x=Jump, y=Counts, fill=Color)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~Sex) +
theme_bw()
> # Construct full model
> full.model = glm(jumpflap~sex+colour, data = elejump, family=binomial)
> # Model selection
> output = dredge(full.model, rank=AIC)
Fixed term is "(Intercept)"
> output
Global model call: glm(formula = jumpflap ~ sex + colour, family = binomial, data = elejump)
---
Model selection table
(Intrc) color sex df logLik AIC delta weight
2 -2.3980 + 2 -16.182 36.4 0.00 0.707
4 -2.6260 + + 3 -16.064 38.1 1.76 0.293
3 -0.8473 + 2 -31.932 67.9 31.50 0.000
1 0.0000 1 -34.657 71.3 34.95 0.000
Models ranked by AIC(x)
#Check assumptions
#Independent data
dwtest(fit)
#Residuals do not deviate from predicted distribution
par(mfrow=c(2,2))
plot(fit)
par(mfrow=c(1,1))
simulationOutput = simulateResiduals(fittedModel=fit, n=250)
plot(simulationOutput)
testDispersion(simulationOutput)
> #Get results of binomial GLM
> Anova(fit, type=3)
Analysis of Deviance Table (Type III tests)
Response: jumpflap
LR Chisq Df Pr(>Chisq)
colour 36.95 1 1.212e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(fit)
Call:
glm(formula = jumpflap ~ colour, family = binomial, data = elejump)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.07821 -0.41716 0.03901 0.49518 2.22931
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3979 0.7385 -3.247 0.00117 **
colourpink 4.4348 0.9603 4.618 3.88e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 69.315 on 49 degrees of freedom
Residual deviance: 32.365 on 48 degrees of freedom
AIC: 36.365
Number of Fisher Scoring iterations: 5
From the outcomes, we can see that the jump behavior is only related to color when both sex and color are taken into consideration. According to part c), sex is highly related to color, so when both of them is considered, only color is related to jump behavior. And the conclusion is pink elephant is more likely to jump off cliffs and flap the ears.
#Illustrate results
elejump$pred = predict(fit, elejump, type="response")
elejump$colour = factor(elejump$colour, levels(elejump$colour)[c(1,2)])
mean.ele = aggregate(list(exc=elejump$pred), by=list(colour=elejump$colour), mean, na.rm=TRUE)
barplot(exc~colour, data=mean.ele, names=c("Grey","Pink"), xlab="Sex", ylab="Probability of jump behaviour", ylim=c(0, 1), col="red")