第四次生物统计作业

Q1

研究者检测了6位胃癌病人的肿瘤样本和对应的癌旁正常组织样本中的TP53的表达量,试分别用wilcoxon signed-rank test和signed test,检测TP53的表达量在胃癌患者的肿瘤与正常样本中是否存在差异。

 

表1 胃癌患者的肿瘤样本和正常样本中TP53的表达量(FPKM)

肿瘤样本

30.09

18.68

19.90

5.13

39.69

30.87

正常样本

30.25

17.14

25.79

8.64

36.42

31.93

1) Wilcoxon signed-ranktest
> t=c(30.09,18.68,19.90,5.13,39.69,30.87)
> n=c(30.25,17.14,25.79,8.64,36.42,31.93)
> wilcox.test(t,n,paired=TRUE,exact=FALSE)


p>0.05,没有显著差异

1) signed-ranktest
> a=t-n
> a
[1] -0.16  1.54 -5.89 -3.51  3.27 -1.06
> n=6
> c=2
> pvalue=2*pbinom(2,6,0.5)
> pvalue
[1] 0.6875
#样本量大的话近似为正态分布用下面这种。
> miu=(c+0.5-n/2)
> xgm=sqrt(n/4)
> z=miu/xgm
> p=2*pnorm(z)
> p
[1] 0.6830914
p>0.05,没有显著差异


Q2

今测得某单位10名青年男性员工的体脂率和10名中年男性员工的体脂率,如表2,试用wilcoxon rank sum test分析中年男性员工的体脂率是否比青年男性员工的体脂率高?

表2 两组员工的体脂率(%)

青年男性

15

17

21

14

18

16

20

13

20

17

中年男性

20

16

19

18

17

22

21

18

19

16

> x=c(15,17,21,14,18,16,20,13,20,17)
> y=c(20,16,19,18,17,22,21,18,19,16)
> wilcox.test(x,y,exact=FALSE)

p>0.05,没有显著差异

 

Q3:

Alzheimer's Disease (AD) is the most commoncause of dementia, a group of brain disorders that results in the loss ofintellectual and social skills. These changes are severe enough to interferewith day-to-day life. AD is characterized by the presence of senile plaques andneurofibrillary tangles in cortical regions of the brain. These pathologicalmarkers are thought to be responsible for the massive corticalneurodegeneration and concomitant loss of memory, reasoning, and after aberrantbehaviors that are seen in patients with AD.

Here we have a gene expression data fromnormal neurons and neurons containing neurofibrillary tangles of 14 mid-stageAD cases. It can be found in “expression_data.txt”. Column 1-7 of“expression_data.txt” are normal neurons and column 8-14 are neurons withneurofibrillary tangles. Use the information mentioned above to answer thefollowing questions:

a)    Uset-test to find significantly differential expression genes between normal andtangle neuron sample (p-value < 0.01). Give the number of differentialexpressed genes and give the names of top 10 significantly differentialexpression genes.

Hint1: two types of t-test with equalvariance and unequal variance are different.

Hint2: “names()” or “rownames()” can beused to extract names of differentially expressed genes.

Hint3: “apply(data,1,function(x){…})” can apply function to every row in data more quickly than “for{}”, so try to use“apply”.

b)    Adjustthe p-values in question a) with both “bonferroni” and “fdr” method to find differentially expressed genes (adjusted p-value < 0.01). Give the number ofdifferential expressed genes.

Hint: you can do the adjustment accordingto the formula, or use “p.adjust()” instead

a)
> a=read.csv("expression_data.txt",header=TRUE,sep="\t")
> vartest=apply(a,1,function(x){
	var.test(x[1:7],x[8:14])$p.value})
> data=cbind(a,vartest)
> ttest=apply(data,1,function(x){
+     if(x[15]>=0.05){
+         t.test(x[1:7],x[8:14],var.equal=T)$p.value}
+     else{t.test(x[1:7],x[8:14],var.equal=F)$p.value}})
> diff=ttest[which(ttest<0.01)]
> length(diff)
[1] 2124
> diffsort=sort(diff,decreasing = FALSE)
> names(diffsort)[1:10]
 [1] "TECPR2"       "ZMAT2"        "NT5DC3"       "SIN3A"        "HYOU1"        "UBE2Z"        "SDR39U1"     
 [8] "GNL2"         "VIPAS39"      "LOC100505584"

有2124个基因差异表达显著。其中最高的10个是"TECPR2", "ZMAT2", "NT5DC3", "SIN3A", "HYOU1", "UBE2Z", "SDR39U1", "GNL2", "VIPAS39", "LOC100505584"
 
b)
> aj1=p.adjust(ttest,method = "fdr")
> ajj1=which(aj1<=0.01)
> length(ajj1)
[1] 1049

> aj2=p.adjust(ttest,method = "bonferroni")
> ajj2=which(aj2<=0.01)
> length(ajj2)
[1] 364

用FDR和Bonferroni校正p值分别有1049个和364个基因差异表达。

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值