t检验和wilcoxon秩和检验判断两组数据间的显著性差异

最新推荐文章于 2025-04-23 14:49:53 发布

菠萝西斯

最新推荐文章于 2025-04-23 14:49:53 发布

阅读量4.2w

点赞数 28

分类专栏：学习笔记文章标签： r语言

本BLOG上原创文章未经本人许可，不得用于商业用途。转载请注明出处，否则保留追究法律责任的权利。

本文链接：https://blog.csdn.net/u013429737/article/details/116195181

版权

本文探讨了在R语言中如何选择和执行t检验与wilcoxon秩和检验来判断两组数据间的显著性差异。t检验关注样本平均数差异，适合正态分布数据，包括单一样本、配对样本和独立样本三种类型。wilcoxon秩和检验则基于数据的秩，适用于非正态分布数据。文章还介绍了检验方向、R语言实现及可视化箱图的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

所谓显著性差异，就是证明数据的差异不是偶然发生的。生信分析中九成以上的问题，本质上就是寻找差异或者证明差异。

一、T 检验和wilcoxon秩和检验，怎么选？

简单的说，T检验主要关注样本平均数的差异，而wilcoxon秩和检验是基于样本内数据的秩（排序）与值，关注中位数的差异。T检验的前提是数据正态分布，如何判断，见R语言检验数据正态分布。

二、T 检验

1. 单一样本t检验

单一样本t检验（One-sample t test），是用来比较一组数据和一个特定数值有无显著性差异。应用场景：20个肿瘤样本中，判断某个基因X测序后来的count数是否高于、低于还是等于25(前提是count符合正态分布）。

2. 配对样本t检验

配对样本t检验（paired-samples t test），要求两组数据每对之间要有一定的对应关系。比如：（1）同一组样本在处理前后的平均值有无显著性差异（这个好理解，处理前样本与处理后对应，比如一个人饭前饭后的体重）。（2）癌和癌旁（一定是对应的样本）某些特征的显著性差异。（3）有配对关系的样本，用不同处理方式处理。

3. 独立样本t检验

独立样本t检验（independent t test），比较两组独立数据有无显著性差异。应用场景：（1）TCGA数据库中的肿瘤样本与正常样本（他们来自不同patients，与癌与癌旁不一样）。（2）不同分子分型的肿瘤样本比较。

4. 检验方向

单尾检验和双尾检验的区别在于是否拒绝H0标准。单尾需要选择方向，假设包含一个<小于符号，则使用左尾；假设包含一个>大于符号，则使用右尾。双尾检验即拒绝域一分为二位于数据集的两侧，两侧各占α/2，总和为α。

4. 用R语言来做t检验

# One sample t test : 单一样本t检验
# Comparison of an observed mean with a
# a theoretical mean
t.test(x, mu=0) 
# Paired t test ：配对样本t检验
t.test(x, y, paired=TRUE)
# Independent t test：独立样本t检验
# Comparison of the means of two independent samples (x & y)
t.test(x, y)
# Paired t test
t.test(x, y, paired=TRUE)

Arguments
x
a (non-empty) numeric vector of data values.

y
an optional (non-empty) numeric vector of data values.

alternative
a character string specifying the alternative hypothesis, must be one of “two.sided” (default), “greater” or “less”. You can specify just the initial letter.

mu
a number indicating the true value of the mean (or difference in means if you are performing a two sample test).

paired
a logical indicating whether you want a paired t-test.

var.equal
a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.

alternative是用来假设方向的参数。“two.sided”, “less”, “greater” 双尾左尾右尾
t.test(x, y, alternative=c(“two.sided”, “less”, “greater”))
双尾，左尾，右尾，假设不同，P值不同，得出的显著性结论不同。

> t.test(1:10, y = c(7:20, 200))

	Welch Two Sample t-test

data:  1:10 and c(7:20, 200)
t = -1.6329, df = 14.165, p-value = 0.1245
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -47.242900   6.376233
sample estimates:
mean of x mean of y 
  5.50000  25.93333 

> t.test(1:10, y = c(7:20, 200), alternative = "less")

	Welch Two Sample t-test

data: