Data2002 - WEEK2

最新推荐文章于 2024-10-18 10:13:25 发布

鱼鱼冲鸭

最新推荐文章于 2024-10-18 10:13:25 发布

阅读量249

点赞数

文章标签： r语言

本文链接：https://blog.csdn.net/m0_55541117/article/details/119742550

版权

Poisson distribution

A poisson random variable represents the probability of a given number of events occurring in a fixed interval

for a Poisson distribution both E(X) and Var(X) are equal to the parameter λ.

R simulate

plot(table(rpois(n=10000, lambda= ？)), ylab = "Count")

#library(dplyr)

rpois(n=10000, lambda=6) %>% table() %>% plot(ylab = "Count")

Chi-squared tests for discrete distributions

we have a sample x1,x2,…,xn with a given distribution function F0(x|θ1,θ2,...,θh)F0(x|θ1,θ2,...,θh) where θlθl are parameters of the distribution.

general chi-squared goodness-of-fit test with test statistic

freq $\displaystyle Y_i$ for each sample Xi, expected freq $\displaystyle e_i$ .

However, the param $\Theta$ are usually unknow and have to be estimated from the sample. Then, $p_i$ is replaced by $\widehat{P_i}$ . The order statistic named $t_0$ .

approximate p-value

p( ${\alpha ^{2}}_{k-1-q}$ >= $t_0$ ) q is the number of parameters we need to estimate

lecture中的栗子：

Hypothesis: H0:the data come from a Poisson distribution vs H1: the data do not come from a Poisson distribution.
Assumptions: The expected frequencies, ei=npi≥5ei=npi≥5. Observations are independent.
Test statistic:

The observed test statistic: t0=1.43
P- value: by R: pchisq(1.43,2,lower.tail = FALSE) = 0.489

注意： $p_n= 1- sum(p_1 + ...+ p_{n-1})$

Conclusion: Since the p-value is greater than 0.05, we do not reject the null hypothesis. The data are consistent with a Poisson distribution.

R直接干：chisq.test(yr【图二中的 $y_i$ 】, p = pr【图二中的 $\widehat{p_i}$ 】) 但这里df是错的

The conditional probability

$P(A|B) = P(A\cap B)/P(B)$

Bayes' rule

$P(B|A) = \frac{P(A|B) * P(B)}{P(A|B)*P(B) + P(A|B^{C}) *P(B^{C})}$

	Actual + $D^{+}$	Actual - $D^{-}$
Test + $S^{+}$	a	b	a+b
Test - $S^{-}$	c	d	c+d
	a+c	b+d	a+b+c+d

1. False negative rate (在阳性的前提下检测出阴性) = c/(a+c)

2.False positive rate (在阴性的前提下检测出阳性) = b/(b+d)

3.Sensitivity(在阳性的前提下检测出阳性) = a/(a+c)

4.Specificity(在阴性的条件下检测出阴性) = d/(b+d)

5.Precision(检测出阳性的条件下实际也是阳性) = a/(a+b)

6.Negative predictive value(检测出阴性的条件下实际也是阴性) = c/(c+d)

7.Accuracy = (a+d)/(a+b+c+d)

8. prevalence = (a + c)/ (a+b+c+d)

Prospective and retrospective

Prospective (cohort study): A prospective study is based on subjects who are initially identified as disease-free and classified by presence or absence of a risk factor. A random sample from each group is followed in time (prospectively) until eventually classified by disease outcome.

最初没病，后来通过发病因素的有无进行分类（R），最后观察实际发病（D）

We can estimate P( $D^{+}$ | $R^{+}$ ) as well as P( $D^{-}$ | $R^{+}$ ), 但是不可以base on D，因为没有从D中抽样

Retrospective (case control) studies: A retrospective study is based on random samples from each of the two outcome categories which are followed back (retrospectively) to determine the presence or absence of the risk factor for each individual.

已经发病的，按照发病种类分（D），基于结果观察危险因素（R）

We can estimate P( $R^{+}$ | $D^{+}$ ） as well as P( $R^{-}$ | $D^{-}$ ), 但是不可以base on R，因为没有从R中抽样

Relative risk

RR = P( $D^{+}$ | $R^{+}$ )/P( $D^{+}$ | $R^{-}$ ) = $\frac{a(c+d)}{c(a+b)}$

If D and R are independent then P(D|R) = P(D) so RR = 1

RR < 1: the disease is less likely to occur in the group with the risk factor.

RR > 1: the disease is more likely to occur in the group with the risk factor.

这个只有prospective可

Odds ratio

O（A) = P（A）/(1 - P(A))

O( $D^{+}$ | $R^{+}$ ) = P ( $D^{+}$ | $R^{+}$ )/P( $D^{-}$ | $R^{+}$ )

If D and R are independent then P(D|R) = P(D) and OR = 1, 反推也OK

Large odds ratios (OR > 1 ) implies increased risk of disease and small odd ratios (OR < 1 ) implies decreased risk of disease

Standard errors and confidence intervals for odds ratios

asymptotic standard error for log( $\widehat{OR}$ ) is $\sqrt{\frac{1}{a}+\frac{1}{b} + \frac{1}{c} + \frac{1}{d}}$

confidence interval for log $\Theta$ is approximately log( $\widehat{OR}$ ) +- Z* $\sqrt{\frac{1}{a}+\frac{1}{b} + \frac{1}{c} + \frac{1}{d}}$

approximate a confidence interval for the odds-ratio:

(exp( log( $\widehat{OR}$ ) - Z* $\sqrt{\frac{1}{a}+\frac{1}{b} + \frac{1}{c} + \frac{1}{d}}$ ), exp( log( $\widehat{OR}$ ) + Z* $\sqrt{\frac{1}{a}+\frac{1}{b} + \frac{1}{c} + \frac{1}{d}}$ ))