<Question4> of R & Biostatistics

Question

  1. 大量检测已知正常人血浆载脂蛋白E总体平均水平为4.15mmol/L,总体分布近似于正态分布。某医师经抽样测得16例陈旧性心机梗死患者的血浆载脂蛋白E平均浓度为4.98mmol/L,标准差为2.78mmol/L。据此能否认为陈旧性心肌梗死患者的血浆载脂蛋白E平均浓度与正常人的平均浓度不一致?并给出置信区间(显著性水平)

  2. 为研究某铁剂治疗和饮食治疗对营养性缺铁性贫血的效果,将16名患者分成2组,分别使用铁剂治疗和饮食治疗,3个月后测得两种患者血红蛋白如表1,试用wilcoxon rank sum test, 检验两种方法治疗后的患者血红蛋白有无差异?

      		表1 铁剂治疗和饮食治疗后对患者的血红蛋白值 (g/L)
    
铁剂治疗组113120138120100118138123
饮食治疗组138116125136110132130110
  1. Researchers want to analyze the detail symptoms of Acute Keshan disease in a place. The result of phosphorus content (mg%) in patients and healthy people is showed in “homework4-3_data.txt”, in which column 1 represents phosphorus content of patients and 2 represents that of healthy people. We want to know if phosphorus content (mg%) in patients and healthy people is significantly different. (significance level: 0.05)
    (Please record your answer, R code, and R result)
    (1)Test if the mean of the two groups are different. Note that you should test if both two sample groups are drawn from normal distribution(Hint: shapiro.test()) and whether the variances of the two groups are the same or not(Hint: F test).
    (2)Assume that none of two sample groups is drawn from normal distribution, test if the mean of the two groups are different.(Hint: use non-parametrical method)

  2. 有一批12岁肥胖男童参加某减肥训练营,其中10人减肥训练前和减肥训练两个月后的BMI值如下:

012345678910
减肥前25.4626.1026.1426.6826.9627.0427.1527.6427.1126.37
减肥后23.6824.3024.1525.2325.4926.0325.4726.2225.7624.77

假设两组样本总体近似于正态分布,且方差相同
(1)2个月减肥训练是否有显著效果?(使用配对的两样本t检验,低尾检验)
(2)正常12岁男童的BMI平均为23.2, 这些肥胖男童在减肥两个月后的BMI是否仍高于正常同龄人群平均水平?
(3)计算第(2)问中,在检验显著水平为0.05时的统计效力(power)
(4)为了确定12岁肥胖男童在减肥训练2月后的BMI均值仍显著高于正常同龄人平均水平,且保证type II error:β=0.05 以及显著水平为0.001,请估计合适的样本数量样本数量

  1. Suppose we have separately analyzed the effects of 10 SNPs comparing people with type 1 diabetes vs. controls. The p values from these separate analyses are given in homework4-5_data.csv. (Use 0.05 as significant level)
    (1)Use the Bonferroni method to correct for multiple comparisons. Which SNPs show statistically significant effects?
    (2)Use the FDR method to correct for multiple comparisons using an FDR = 0.05. Which SNPs show statistically significant effects? How do the results compare with those in (1)

  2. We have expression data of 20 genes from two groups of samples in homework4-6_data.txt. The significant level is 0.05.
    (1)Assess whether there are differential expressions between two groups of each gene.
    (2)Use the Bonferroni method to correct for multiple comparisons in Problem (1). Which genes show statistically significant differential expression?
    (3)Use the FDR method to correct for multiple comparisons using an FDR = 0.05. Which genes show statistically significant differential expression?

Answer

1

(1)建立检验假设和确定检验水准

H 0 H_{0} H0: μ = μ 0 = 4.15 \mu=\mu_{0}=4.15 μ=μ0=4.15
H 1 H_{1} H1: μ ≠ μ 0 = 4.15 \mu\neq\mu_{0}=4.15 μ=μ0=4.15
α = 0.05 \alpha=0.05 α=0.05 双侧检验

(2)选定检验方法和计算统计量用单样本的t检验

X ‾ = 4.98 \overline{X}=4.98 X=4.98, μ 0 = 4.15 \mu_{0}=4.15 μ0=4.15, s = 2.78 s=2.78 s=2.78, n = 16 n=16 n=16
t = X ‾ − μ 0 s / n t=\frac{\overline{X}-\mu_{0}}{s/\sqrt{n}} t=s/n Xμ0

(3)计算P值、得出结论

alpha <- 0.05
x_bar <- 4.98
mu <- 4.15
sigma <- 2.78
n <- 16
sd_error <- sigma/sqrt(n)
th <- qt(1-alpha/2, n-1)


t = (x_bar - mu) / (sigma / sqrt(n))
p <- 2 * pt(t, n-1);p


left <- x_bar - th * sd_error
right <- x_bar + th * sd_error
left;right

p = 1.749074 > α p=1.749074 > \alpha p=1.749074>α,接受 H 0 H_{0} H0,即认为陈旧性心肌梗死患者的血浆载脂蛋白E平均浓度与正常人的一致。置信区间为[3.498643, 6.461357]。

2

x <- c(113, 120, 138, 120, 100, 118, 138, 123)
y <- c(138, 116, 125, 136, 110, 132, 130, 110)
wilcox.test(x, y, paired = TRUE, exact = FALSE)

p-value = 0.4406,因此不拒绝 H 0 H_{0} H0,认为两样本的均值不存在差异,即两种方法治疗后的患者血红蛋白不存在差异。

3

(1)

1)正态性检验

H 0 H_{0} H0: 总体服从正态分布
H 1 H_{1} H1: 总体服从正态分布
α = 0.05 \alpha = 0.05 α=0.05

data3 <- read.csv("homework4-3_data.txt", sep = ' ')
shapiro.test(data3$patient)
shapiro.test(data3$healthy)

p − v a l u e > α = 0.05 p-value > \alpha=0.05 pvalue>α=0.05, 故都不拒绝 H 0 H_{0} H0,即认为两组样本的来源总体符合正态分布

2)方差齐性检验

H 0 H_{0} H0: σ 0 2 = σ 1 2 \sigma_{0}^2 = \sigma_{1}^2 σ02=σ12
H 1 H_{1} H1: σ 0 2 ≠ σ 1 2 \sigma_{0}^2 \neq \sigma_{1}^2 σ02=σ12
α = 0.05 \alpha = 0.05 α=0.05

# attach(data3)
# var.test(patient, healthy)
# detach(data3)

var.test(data3$patient, data3$healthy)

p − v a l u e = 0.4077 > α p-value = 0.4077 > \alpha pvalue=0.4077>α, 故不拒绝 H 0 H_{0} H0,即两样本方差相同

3)计算检验统计量

H 0 H_{0} H0: μ 0 = μ 1 \mu_{0} = \mu_{1} μ0=μ1
H 1 H_{1} H1: μ 0 ≠ μ 1 \mu_{0} \neq \mu_{1} μ0=μ1
α = 0.05 \alpha = 0.05 α=0.05

t.test(data3$patient, data3$healthy, var.equal = TRUE)

p − v a l u e = 2.2 e − 16 < α p-value = 2.2e-16 < \alpha pvalue=2.2e16<α, 故拒绝 H 0 H_{0} H0,即两样本均值不同

(2)

H 0 H_{0} H0: μ 0 = μ 1 \mu_{0} = \mu_{1} μ0=μ1
H 1 H_{1} H1: μ 0 ≠ μ 1 \mu_{0} \neq \mu_{1} μ0=μ1
α = 0.05 \alpha = 0.05 α=0.05

wilcox.test(data3$patient, data3$healthy, exact = FALSE)

There is significant difference in the treatment effects of the two groups.

p − v a l u e = 1.791 e − 14 < α p-value = 1.791e-14 < \alpha pvalue=1.791e14<α, 故拒绝 H 0 H_{0} H0,即两样本均值不相同

4

(1)

H 0 H_{0} H0: μ 0 > = μ 1 \mu_{0} >= \mu_{1} μ0>=μ1
H 1 H_{1} H1: μ 0 < μ 1 \mu_{0} < \mu_{1} μ0<μ1
α = 0.05 \alpha = 0.05 α=0.05

befor <- c(25.46,	26.10,	26.14, 26.68,	26.96,	27.04,	27.15,	27.64,	27.11,	26.37)
after <- c(23.68,	24.30,	24.15, 25.23,	25.49,	26.03,	25.47,	26.22,	25.76,	24.77)
t.test(befor, after, paired = TRUE, var.equal = TRUE, alternative = "less")

p − v a l u e = 1 > α p-value = 1 > \alpha pvalue=1>α, 故不拒绝 H 0 H_{0} H0,即2个月减肥训练没有显著效果

(2)

H 0 H_{0} H0: μ 0 < = μ 1 \mu_{0} <= \mu_{1} μ0<=μ1
H 1 H_{1} H1: μ 0 > μ 1 \mu_{0} > \mu_{1} μ0>μ1
α = 0.05 \alpha = 0.05 α=0.05

t.test(after, mu = 23.2, alternative = "greater")

p − v a l u e = 2.858 e − 05 < α p-value = 2.858e-05 < \alpha pvalue=2.858e05<α, 故拒绝 H 0 H_{0} H0,即2个月减肥训练后的BMI仍高于正常同龄人群平均水平

(3)

library(pwr)
d = abs(23.2 - mean(after))/sd(after)
pwr.t.test(n = 10, d = d, type = "one.sample", alternative = "less")

(4)

pwr.t.test(d = d, sig.level = 0.001, power = 0.95, type = "one.sample", alternative = "greater")

估计合适的样本数量样本数量n = 10

5

(1)

data5 <- read.csv("homework4-5_data.csv", header = TRUE, sep = ",")
data5$Bonferroni = p.adjust(data5$p.value, method = "bonferroni", n = 10)
data5[data5$Bonferroni <= 0.05, ]

(2)

data5$FDR = p.adjust(data5$p.value, method = "fdr", n = 10)
data5[data5$FDR <= 0.05, ]

Bonferroni correction比FDR更加严格。

6

(1)

data6 <- read.table("homework4-6_data.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
rownames(data6) <- data6$Sample
data6 <- data6[, 2:81]

genes_value = function(x){
  p_value = t.test(x[1:40], x[41:80], paired = TRUE)$p.value
  p_value
}
p_t_test = apply(data6, 1, genes_value)
names(p_t_test[p_t_test < 0.05])

(2)

p_bf = p.adjust(p_t_test, method = "bonferroni")
names(p_bf[p_bf < 0.05])

(3)

p_fdr = p.adjust(p_t_test, method = "fdr")
names(p_fdr[p_fdr < 0.05])
  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值