Poisson distribution
A poisson random variable represents the probability of a given number of events occurring in a fixed interval
for a Poisson distribution both E(X) and Var(X) are equal to the parameter λ.
R simulate
plot(table(rpois(n=10000, lambda= ?)), ylab = "Count")
OR
#library(dplyr)
rpois(n=10000, lambda=6) %>% table() %>% plot(ylab = "Count")
Chi-squared tests for discrete distributions
we have a sample x1,x2,…,xn with a given distribution function F0(x|θ1,θ2,...,θh)F0(x|θ1,θ2,...,θh) where θlθl are parameters of the distribution.
general chi-squared goodness-of-fit test with test statistic
freq for each sample Xi, expected freq .
However, the param are usually unknow and have to be estimated from the sample. Then, is replaced by . The order statistic named .
approximate p-value
p( >= ) q is the number of parameters we need to estimate
lecture中的栗子:
-
Hypothesis: H0:the data come from a Poisson distribution vs H1: the data do not come from a Poisson distribution.
-
Assumptions: The expected frequencies, ei=npi≥5ei=npi≥5. Observations are independent.
-
Test statistic:
-
The observed test statistic: t0=1.43
-
P- value: by R: pchisq(1.43,2,lower.tail = FALSE) = 0.489
注意:
Conclusion: Since the p-value is greater than 0.05, we do not reject the null hypothesis. The data are consistent with a Poisson distribution.
R直接干:chisq.test(yr【图二中的】, p = pr【图二中的】) 但这里df是错的
The conditional probability
Bayes' rule
Actual + | Actual - | ||
Test + | a | b | a+b |
Test - | c | d | c+d |
a+c | b+d | a+b+c+d |
1. False negative rate (在阳性的前提下检测出阴性) = c/(a+c)
2.False positive rate (在阴性的前提下检测出阳性) = b/(b+d)
3.Sensitivity(在阳性的前提下检测出阳性) = a/(a+c)
4.Specificity(在阴性的条件下检测出阴性) = d/(b+d)
5.Precision(检测出阳性的条件下实际也是阳性) = a/(a+b)
6.Negative predictive value(检测出阴性的条件下实际也是阴性) = c/(c+d)
7.Accuracy = (a+d)/(a+b+c+d)
8. prevalence = (a + c)/ (a+b+c+d)
Prospective and retrospective
Prospective (cohort study): A prospective study is based on subjects who are initially identified as disease-free and classified by presence or absence of a risk factor. A random sample from each group is followed in time (prospectively) until eventually classified by disease outcome.
最初没病,后来通过发病因素的有无进行分类(R),最后观察实际发病(D)
We can estimate P(|) as well as P(|), 但是不可以base on D,因为没有从D中抽样
Retrospective (case control) studies: A retrospective study is based on random samples from each of the two outcome categories which are followed back (retrospectively) to determine the presence or absence of the risk factor for each individual.
已经发病的,按照发病种类分(D),基于结果观察危险因素(R)
We can estimate P(|) as well as P(|), 但是不可以base on R,因为没有从R中抽样
Relative risk
RR = P(|)/P(|) =
If D and R are independent then P(D|R) = P(D) so RR = 1
RR < 1: the disease is less likely to occur in the group with the risk factor.
RR > 1: the disease is more likely to occur in the group with the risk factor.
这个只有prospective可
Odds ratio
O(A) = P(A)/(1 - P(A))
O(|) = P (|)/P(|)
If D and R are independent then P(D|R) = P(D) and OR = 1, 反推也OK
Large odds ratios (OR > 1 ) implies increased risk of disease and small odd ratios (OR < 1 ) implies decreased risk of disease
Standard errors and confidence intervals for odds ratios
asymptotic standard error for log() is
confidence interval for log is approximately log() +- Z*
approximate a confidence interval for the odds-ratio:
(exp( log() - Z* ), exp( log() + Z* ))