（生物信息学）R语言与统计学入门（一）——t 检验

本文链接：https://blog.csdn.net/weixin_46500027/article/details/123553352

t检验，亦称student t检验（Student's t test），比较两个平均数的差异是否显著。

t检验分为配对t检验和非配对t检验。

在生物信息学中，t检验常用于比较两组数据的均数是否存在显著差异，例如：比较某个基因在肿瘤组和正常组中表达是否存在差异，可以用到非配对t检验，下面用代码来看一下：

a <- runif(10,-5,5) ### 使用runif来生成两个随机数
b <- runif(15,-5,5)
a
b

> a
 [1]  4.9568725 -2.4474115 -3.6352439  2.8920518  3.8447700  3.2792848 -3.2225162 3.6083584
 [9] -0.7592118 -2.7116564

> b
 [1] -4.59610825 -1.90324164 -4.69885843 -1.63407638  2.30491072  4.72412488  3.48969289
 [8] -2.90816420  1.90937888 -0.03853778 -3.46269853  4.59276133  4.82084549  0.56812730
[15] -3.13750166

使用t.test比较两个随机数是否存在差异：

t.test(a,b,paired = F)
## 注意这里的paired参数选择的F，表示非配对t检验，如果样本是配对的，则需要改成T，但在大部分情况下都是使用非配对t检验。

结果如下：

> t.test(a,b,paired = F)

	Welch Two Sample t-test

data:  a and b
t = 0.41181, df = 19.593, p-value = 0.6849
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.355660  3.512632
sample estimates:
  mean of x   mean of y 
0.580529771 0.002043641

可以看到，P值是没有意义的，说明a和b在均数上没有明显差异。

下面我们举一个实际应用于生物信息学的例子，下面是TCGA肾癌数据集中的某个基因PPM1B的表达数据，肿瘤样本和正常样本我们已经区分好了，我们将它存成csv格式，然后读取进R语言：

setwd("D:\\PPM1B")
dir()
data <- read.csv("PPM1B.csv",header = T,sep = ",")
head(data)


> head(data)
                 X    PPM1B   Type
1 TCGA.B2.4101.01A 2.514681 Normal
2 TCGA.BP.4342.01A 2.397544 Normal
3 TCGA.B0.4691.01A 2.355844 Normal
4 TCGA.BP.4167.01A 2.381566 Normal
5 TCGA.B8.4620.01A 2.457742 Normal
6 TCGA.BP.4769.01A 3.015488 Normal

下面我们将肿瘤样本中PPM1B的表达传给Tumor，正常的表达传给Normal：

Tumor <- data[which(data$Type=="Tumor"),]
Normal <- data[which(data$Type=="Normal"),]
head(Tumor)
head(Normal)


> head(Tumor)
                   X    PPM1B  Type
536 TCGA.B0.5697.11A 3.308724 Tumor
537 TCGA.CZ.5454.11A 3.379635 Tumor
538 TCGA.CZ.5458.11A 3.396477 Tumor
539 TCGA.CJ.5672.11A 3.304051 Tumor
540 TCGA.CZ.5462.11A 3.416457 Tumor
541 TCGA.CZ.5453.11A 3.354339 Tumor
> head(Normal)
                 X    PPM1B   Type
1 TCGA.B2.4101.01A 2.514681 Normal
2 TCGA.BP.4342.01A 2.397544 Normal
3 TCGA.B0.4691.01A 2.355844 Normal
4 TCGA.BP.4167.01A 2.381566 Normal
5 TCGA.B8.4620.01A 2.457742 Normal
6 TCGA.BP.4769.01A 3.015488 Normal

下面使用t.test比较PPM1B在肿瘤组和正常组的表达是否存在差异：

t.test(Tumor$PPM1B,Normal$PPM1B)



> t.test(Tumor$PPM1B,Normal$PPM1B)

	Welch Two Sample t-test

data:  Tumor$PPM1B and Normal$PPM1B
t = 21.302, df = 160.12, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.5246910 0.6319213
sample estimates:
mean of x mean of y 
 3.245194  2.666888

可以看到，P值非常显著，说明PPM1B在肿瘤组和正常组表达是有差异的，我们来看看PPM1B的均值表达：

mean(Tumor$PPM1B)
mean(Normal$PPM1B)



> mean(Tumor$PPM1B)
[1] 3.245194
> mean(Normal$PPM1B)
[1] 2.666888

可以看到PPM1B在肿瘤组中表达上调，说明它可能是一个促癌基因。

那么，我们知道PPM1B是差异表达基因，如何用图片进行展示呢，下次将想大家分享箱线图的绘制代码！！！