DATA2002-WEEK6_difference in median作图-CSDN博客

本文链接：https://blog.csdn.net/m0_55541117/article/details/120393528

Permutation test

lady tea test (Recall fisher test)

truth = c("milk","tea","tea","milk","tea","tea","milk","milk")
permute_gues s= permutations(truth)

B = nrow(permute_guess)
check_correct = vector("numeric", length = B)
for(i in 1:B) {
check_correct[i] = identical(permute_guess[i,], truth)
}
mean(check_correct) # p-value

tt = t.test(weight ~ group, data = dat, var.equal = TRUE) #observed t test
B = 10000 # number of permuted samples we will consider
permuted_dat = dat # make a copy of the data
t_null = vector("numeric", B) # initialise outside loop
for(i in 1:B) {
permuted_dat$group = sample(dat$group)  # this does the permutation
t_null[i] = t.test(weight ~ group, data = permuted_dat)$statistic
}
mean(abs(t_null) >= abs(tt$statistic))

two sided test example, t test can be changed to Wilcoxon rank sum test

Robustly standardised difference in medians

mad():绝对中位差实际求法是用原数据减去中位数后得到的新数据的绝对值的中位数。但绝对中位差常用来估计标准差，估计标准差=1.4826*绝对中位差。R语言中返回的是估计的标准差。

Paired sample test

We resample the sign.

与t test的t0比，不需要管P-value. 出现t0以及比t0更极端的情况是permutation test 的p-value

忘记了看这个链接，很详细！：Permutation Test: Visual Explanation

Estimation vs hypothesis testing

Estimation

A population parameter is unknown.

Use the sample statistics to generate estimates of the population parameter.

Hypothesis testing

Explicit statement (or hypothesis) regarding the population parameter.

Test statistics are generated which will either support or reject the null hypothesis.

Confidence intervals

We should avoid reporting just a point estimate for a sample, always include a measure of variability $\widehat{\Theta} +- critical value * SE(\widehat{\Theta})$

Bootstrapping

Bootstrapping is a computational process that allows us to as make inferences about the population where no information is available about the population. The classic approach to bootstrapping is to repeatedly resample from the sample (with replacement).

set.seed(123)
B = 10000
result = vector("numeric", length = B)
for(i in 1:B){
newData = sample(speed, replace = TRUE)
result[i] = mean(newData)
}

quantile(result, c(0.025, 0.975)) #95% CI

The bootstrap and the confidence intervals are now very similar.

Trimming the outliers will make the CI of bootstrapping more symmetric.