Permutation test
lady tea test (Recall fisher test)
truth = c("milk","tea","tea","milk","tea","tea","milk","milk")
permute_gues s= permutations(truth)
B = nrow(permute_guess)
check_correct = vector("numeric", length = B)
for(i in 1:B) {
check_correct[i] = identical(permute_guess[i,], truth)
}
mean(check_correct) # p-value
tt = t.test(weight ~ group, data = dat, var.equal = TRUE) #observed t test
B = 10000 # number of permuted samples we will consider
permuted_dat = dat # make a copy of the data
t_null = vector("numeric", B) # initialise outside loop
for(i in 1:B) {
permuted_dat$group = sample(dat$group) # this does the permutation
t_null[i] = t.test(weight ~ group, data = permuted_dat)$statistic
}
mean(abs(t_null) >= abs(tt$statistic))
two sided test example, t test can be changed to Wilcoxon rank sum test
Robustly standardised difference in medians
mad():绝对中位差实际求法是用原数据减去中位数后得到的新数据的绝对值的中位数。但绝对中位差常用来估计标准差,估计标准差=1.4826*绝对中位差。R语言中返回的是估计的标准差。
Paired sample test
We resample the sign.
与t test的t0比,不需要管P-value. 出现t0以及比t0更极端的情况是permutation test 的p-value
忘记了看这个链接,很详细!:Permutation Test: Visual Explanation
Estimation vs hypothesis testing
Estimation
A population parameter is unknown.
Use the sample statistics to generate estimates of the population parameter.
Hypothesis testing
Explicit statement (or hypothesis) regarding the population parameter.
Test statistics are generated which will either support or reject the null hypothesis.
Confidence intervals
We should avoid reporting just a point estimate for a sample, always include a measure of variability
Bootstrapping
Bootstrapping is a computational process that allows us to as make inferences about the population where no information is available about the population. The classic approach to bootstrapping is to repeatedly resample from the sample (with replacement).
set.seed(123)
B = 10000
result = vector("numeric", length = B)
for(i in 1:B){
newData = sample(speed, replace = TRUE)
result[i] = mean(newData)
}
quantile(result, c(0.025, 0.975)) #95% CI
The bootstrap and the confidence intervals are now very similar.
Trimming the outliers will make the CI of bootstrapping more symmetric.
Summary