【R语言】置换检验

最新推荐文章于 2023-04-11 15:24:06 发布

萝卜丝皮尔

最新推荐文章于 2023-04-11 15:24:06 发布

阅读量2.5k

点赞数

分类专栏： R语言文章标签：置换检验 R语言

本文链接：https://blog.csdn.net/qq_43448491/article/details/116010062

版权

R语言专栏收录该内容

13 篇文章 4 订阅

订阅专栏

不是很懂R语言，翻译得不好，理解也不是很到位，欢迎指正批评，谢谢。

置换检验

理论部分

参考书：《统计学完全教程》
在这里插入图片描述

注：这里的算法应该就是一种基于蒙特卡洛模拟方法的置换检验，下面操作会提及。

实践部分

目的：检验两组独立的数据是否同分布
好处：不需要预先的理论分布、对样本量无要求
Rstudio中的置换检验：

# oneway_test , provide the Fisher-Pitman permutation test
## S3 method for class 'formula'
oneway_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
oneway_test(object, ...)
# Arguments
# formula：a formula of the form y ~ x | block where y is a numeric variable, x is a factor and block is an optional factor for stratification.
# data：an optional data frame containing the variables in the model formula.
diffusion <- data.frame(
    pd = c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46,
           1.15, 0.88, 0.90, 0.74, 1.21),
    age = factor(rep(c("At term", "12-26 Weeks"), c(10, 5))) # 前10个数据的处理是“At term”,后5个数据的处理是“12-26 Weeks”
)
## Asymptotic Fisher-Pitman test
oneway_test(pd ~ age, data = diffusion)

## Approximative (Monte Carlo) Fisher-Pitman test
pvalue(oneway_test(pd ~ age, data = diffusion,
                   distribution = approximate(nresample = 10000))) # 重抽样一万次，计算统计量，构造置换分布

## Exact Fisher-Pitman test
pvalue(ot <- oneway_test(pd ~ age, data = diffusion,
                         distribution = "exact")) # 取出检验结果的P值

## Plot density and distribution of the standardized test statistic
op <- par(no.readonly = TRUE) # save current settings
layout(matrix(1:2, nrow = 2))
s <- support(ot) # 根据上述置换分布结果，生成随机数据作为支撑，这个我很迷，它是置换检验中特有的函数
d <- dperm(ot, s) # 根据置换分布和支撑数据，计算置换分布的概率密度值
p <- pperm(ot, s) # 计算累积概率密度值
plot(s, d, type = "S", xlab = "Test Statistic", ylab = "Density")
plot(s, p, type = "S", xlab = "Test Statistic", ylab = "Cum. Probability")
par(op) # reset

注意事项

The null hypothesis of equality, or conditional equality given block,
of the distribution of y in the groups defined by x is tested against
shift alternatives. In the two-sample case, the two-sided null
hypothesis is H_0: mu = 0, where μ = Y_1 - Y_2 and Y_s is the median
of the responses in the sth sample. In case alternative = “less”, the
null hypothesis is H_0: mu >= 0. When alternative = “greater”, the
null hypothesis is H_0: mu <= 0.
也就是说：
① 这里选取的统计量是两组数据中位数差值
② 默认双边假设检验
③ 当p值过小时，拒绝原假设，认为二者不同分布
The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (distribution = “asymptotic”). Alternatively, the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting distribution to “approximate” or “exact” respectively.
可以设置distribution参数决定用哪一种，是渐近的、通过蒙特卡洛模拟的、准确的？