<Question4> of R & Biostatistics

最新推荐文章于 2021-03-28 15:04:20 发布

「已注销」

最新推荐文章于 2021-03-28 15:04:20 发布

阅读量1.3k

点赞数 2

分类专栏： R - Biostatistics 生物统计学文章标签：生物学统计学 r语言

本文链接：https://blog.csdn.net/qq_42937176/article/details/105704146

版权

R - Biostatistics 同时被 2 个专栏收录

16 篇文章 3 订阅

订阅专栏

生物统计学

11 篇文章 1 订阅

订阅专栏

Question

大量检测已知正常人血浆载脂蛋白E总体平均水平为4.15mmol/L，总体分布近似于正态分布。某医师经抽样测得16例陈旧性心机梗死患者的血浆载脂蛋白E平均浓度为4.98mmol/L，标准差为2.78mmol/L。据此能否认为陈旧性心肌梗死患者的血浆载脂蛋白E平均浓度与正常人的平均浓度不一致？并给出置信区间(显著性水平)
为研究某铁剂治疗和饮食治疗对营养性缺铁性贫血的效果，将16名患者分成2组，分别使用铁剂治疗和饮食治疗，3个月后测得两种患者血红蛋白如表1，试用wilcoxon rank sum test, 检验两种方法治疗后的患者血红蛋白有无差异？
```
  		表1 铁剂治疗和饮食治疗后对患者的血红蛋白值 (g/L)
```

铁剂治疗组	113	120	138	120	100	118	138	123
饮食治疗组	138	116	125	136	110	132	130	110

Researchers want to analyze the detail symptoms of Acute Keshan disease in a place. The result of phosphorus content (mg%) in patients and healthy people is showed in “homework4-3_data.txt”, in which column 1 represents phosphorus content of patients and 2 represents that of healthy people. We want to know if phosphorus content (mg%) in patients and healthy people is significantly different. (significance level: 0.05)
(Please record your answer, R code, and R result)
(1)Test if the mean of the two groups are different. Note that you should test if both two sample groups are drawn from normal distribution(Hint: shapiro.test()) and whether the variances of the two groups are the same or not(Hint: F test).
(2)Assume that none of two sample groups is drawn from normal distribution, test if the mean of the two groups are different.(Hint: use non-parametrical method)
有一批12岁肥胖男童参加某减肥训练营，其中10人减肥训练前和减肥训练两个月后的BMI值如下：

0	1	2	3	4	5	6	7	8	9	10
减肥前	25.46	26.10	26.14	26.68	26.96	27.04	27.15	27.64	27.11	26.37
减肥后	23.68	24.30	24.15	25.23	25.49	26.03	25.47	26.22	25.76	24.77

假设两组样本总体近似于正态分布，且方差相同
（1）2个月减肥训练是否有显著效果？（使用配对的两样本t检验，低尾检验）
（2）正常12岁男童的BMI平均为23.2, 这些肥胖男童在减肥两个月后的BMI是否仍高于正常同龄人群平均水平？
（3）计算第（2）问中，在检验显著水平为0.05时的统计效力（power）
（4）为了确定12岁肥胖男童在减肥训练2月后的BMI均值仍显著高于正常同龄人平均水平，且保证type II error：β=0.05 以及显著水平为0.001，请估计合适的样本数量样本数量

Suppose we have separately analyzed the effects of 10 SNPs comparing people with type 1 diabetes vs. controls. The p values from these separate analyses are given in homework4-5_data.csv. (Use 0.05 as significant level)
(1)Use the Bonferroni method to correct for multiple comparisons. Which SNPs show statistically significant effects?
(2)Use the FDR method to correct for multiple comparisons using an FDR = 0.05. Which SNPs show statistically significant effects? How do the results compare with those in (1)
We have expression data of 20 genes from two groups of samples in homework4-6_data.txt. The significant level is 0.05.
(1)Assess whether there are differential expressions between two groups of each gene.
(2)Use the Bonferroni method to correct for multiple comparisons in Problem (1). Which genes show statistically significant differential expression?
(3)Use the FDR method to correct for multiple comparisons using an FDR = 0.05. Which genes show statistically significant differential expression?

Answer

1

(1)建立检验假设和确定检验水准

$H_{0}$ : $\mu=\mu_{0}=4.15$
$H_{1}$ : $\mu\neq\mu_{0}=4.15$
$\alpha=0.05$ 双侧检验

(2)选定检验方法和计算统计量用单样本的t检验

$\overline{X}=4.98$ , $\mu_{0}=4.15$ , $s = 2.78$ , $n = 16$
$t=\frac{\overline{X}-\mu_{0}}{s/\sqrt{n}}$

(3)计算P值、得出结论

alpha <- 0.05
x_bar <- 4.98
mu <- 4.15
sigma <- 2.78
n <- 16
sd_error <- sigma/sqrt(n)
th <- qt(1-alpha/2, n-1)


t = (x_bar - mu) / (sigma / sqrt(n))
p <- 2 * pt(t, n-1);p


left <- x_bar - th * sd_error
right <- x_bar + th * sd_error
left;right

$\alpha$ ，接受 $H_{0}$ ，即认为陈旧性心肌梗死患者的血浆载脂蛋白E平均浓度与正常人的一致。置信区间为[3.498643, 6.461357]。

2

x <- c(113, 120, 138, 120, 100, 118, 138, 123)
y <- c(138, 116, 125, 136, 110, 132, 130, 110)
wilcox.test(x, y, paired = TRUE, exact = FALSE)

p-value = 0.4406，因此不拒绝 $H_{0}$ ，认为两样本的均值不存在差异,即两种方法治疗后的患者血红蛋白不存在差异。

3

(1)

1)正态性检验

$H_{0}$ : 总体服从正态分布
$H_{1}$ : 总体服从正态分布
$\alpha = 0.05$

data3 <- read.csv("homework4-3_data.txt", sep = ' ')
shapiro.test(data3$patient)
shapiro.test(data3$healthy)

$\alpha=0.05$ , 故都不拒绝 $H_{0}$ ，即认为两组样本的来源总体符合正态分布

2)方差齐性检验

$H_{0}$ : $\sigma_{0}^2 = \sigma_{1}^2$
$H_{1}$ : $\sigma_{0}^2 \neq \sigma_{1}^2$
$\alpha = 0.05$

# attach(data3)
# var.test(patient, healthy)
# detach(data3)

var.test(data3$patient, data3$healthy)

$\alpha$ , 故不拒绝 $H_{0}$ ，即两样本方差相同

3)计算检验统计量

$H_{0}$ : $\mu_{0} = \mu_{1}$
$H_{1}$ : $\mu_{0} \neq \mu_{1}$
$\alpha = 0.05$

t.test(data3$patient, data3$healthy, var.equal = TRUE)

$\alpha$ , 故拒绝 $H_{0}$ ，即两样本均值不同

(2)

$H_{0}$ : $\mu_{0} = \mu_{1}$
$H_{1}$ : $\mu_{0} \neq \mu_{1}$
$\alpha = 0.05$

wilcox.test(data3$patient, data3$healthy, exact = FALSE)

There is significant difference in the treatment effects of the two groups.

$\alpha$ , 故拒绝 $H_{0}$ ，即两样本均值不相同

4

(1)

$H_{0}$ : $\mu_{0} >= \mu_{1}$
$H_{1}$ : $\mu_{0} < \mu_{1}$
$\alpha = 0.05$

befor <- c(25.46,	26.10,	26.14, 26.68,	26.96,	27.04,	27.15,	27.64,	27.11,	26.37)
after <- c(23.68,	24.30,	24.15, 25.23,	25.49,	26.03,	25.47,	26.22,	25.76,	24.77)
t.test(befor, after, paired = TRUE, var.equal = TRUE, alternative = "less")

$\alpha$ , 故不拒绝 $H_{0}$ ，即2个月减肥训练没有显著效果

(2)

$H_{0}$ : $\mu_{0} <= \mu_{1}$
$H_{1}$ : $\mu_{0} > \mu_{1}$
$\alpha = 0.05$

t.test(after, mu = 23.2, alternative = "greater")

$\alpha$ , 故拒绝 $H_{0}$ ，即2个月减肥训练后的BMI仍高于正常同龄人群平均水平

(3)

library(pwr)
d = abs(23.2 - mean(after))/sd(after)
pwr.t.test(n = 10, d = d, type = "one.sample", alternative = "less")

(4)

pwr.t.test(d = d, sig.level = 0.001, power = 0.95, type = "one.sample", alternative = "greater")

估计合适的样本数量样本数量n = 10

5

(1)

data5 <- read.csv("homework4-5_data.csv", header = TRUE, sep = ",")
data5$Bonferroni = p.adjust(data5$p.value, method = "bonferroni", n = 10)
data5[data5$Bonferroni <= 0.05, ]

(2)

data5$FDR = p.adjust(data5$p.value, method = "fdr", n = 10)
data5[data5$FDR <= 0.05, ]

Bonferroni correction比FDR更加严格。

6

(1)

data6 <- read.table("homework4-6_data.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
rownames(data6) <- data6$Sample
data6 <- data6[, 2:81]

genes_value = function(x){
  p_value = t.test(x[1:40], x[41:80], paired = TRUE)$p.value
  p_value
}
p_t_test = apply(data6, 1, genes_value)
names(p_t_test[p_t_test < 0.05])

(2)

p_bf = p.adjust(p_t_test, method = "bonferroni")
names(p_bf[p_bf < 0.05])

(3)

p_fdr = p.adjust(p_t_test, method = "fdr")
names(p_fdr[p_fdr < 0.05])

「已注销」

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
<Question4> of R & Biostatistics

Question大量检测已知正常人血浆载脂蛋白E总体平均水平为4.15mmol/L，总体分布近似于正态分布。某医师经抽样测得16例陈旧性心机梗死患者的血浆载脂蛋白E平均浓度为4.98mmol/L，标准差为2.78mmol/L。据此能否认为陈旧性心肌梗死患者的血浆载脂蛋白E平均浓度与正常人的平均浓度不一致？并给出置信区间(显著性水平)为研究某铁剂治疗和饮食治疗对营养性缺铁性贫血的效果，将...
复制链接

扫一扫

专栏目录