本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记
目录
Binary Hypothesis Testing
- In this section, we revisit the problem of choosing between two hypotheses, but unlike the Bayesian formulation, we will assume no prior probabilities. We may view this as an inference problem where the parameter θ \theta θ takes just two values, but consistent with historical usage, we will forgo the θ \theta θ-notation and denote the two hypotheses as H 0 H_0 H0 and H 1 H_1 H1. In traditional statistical language, hypothesis H 0 H_0 H0 is often called the null hypothesis and H 1 H_1 H1 the alternative hypothesis. This indicates that H 0 H_0 H0 plays the role of a default model, to be proved or disproved on the basis of available data.
- The available observation is a vector
X
=
(
X
1
,
.
.
.
,
X
n
)
X = (X_1, ... , X_n)
X=(X1,...,Xn) of random variables whose distribution depends on the hypothesis. We want to find a decision rule that maps the realized values
x
x
x of the observation to one of the two hypotheses.
rejection/acceptance region
- Any decision rule can be represented by a partition of the set of all possible values of the observation vector
X
=
(
X
1
,
.
.
.
,
X
n
)
X = (X_1, ... , X_n )
X=(X1,...,Xn) into two subsets: a set
R
R
R, called the rejection region, and its complement,
R
C
R^C
RC, called the acceptance region. Hypothesis
H
0
H_0
H0 is rejected (declared to be false) when the observed data
X
=
(
X
1
,
.
.
.
,
X
n
)
X = (X_1, ... , X_n )
X=(X1,...,Xn) happen to fall in the rejection region
R
R
R and is accepted otherwise. Thus, the choice of a decision rule is equivalent to choosing the rejection region.
- For a particular choice of the rejection region
R
R
R, there are two possible
types of errors:- (a) Reject
H
0
H_0
H0 even though
H
0
H_0
H0 is true. This is called a false rejection, and happens with probability
α ( R ) = P ( X ∈ R ; H 0 ) \alpha(R)=P(X\in R;H_0 ) α(R)=P(X∈R;H0) - (b) Accept
H
0
H_0
H0 even though
H
0
H_0
H0 is false. This is called a false acceptance, and happens with probability
β ( R ) = P ( X ∉ R ; H 1 ) \beta(R)=P(X\notin R;H_1 ) β(R)=P(X∈/R;H1)
- (a) Reject
H
0
H_0
H0 even though
H
0
H_0
H0 is true. This is called a false rejection, and happens with probability
Binary Hypothesis Testing
- To motivate a particular form of rejection region, we draw an analogy with Bayesian hypothesis testing: given the observed value
x
x
x of
X
X
X, declare
H
1
H_1
H1 to be true if
p X ( x ; H 1 ) > p X ( x ; H 0 ) p_X(x;H_1)>p_X(x;H_0) pX(x;H1)>pX(x;H0)This decision rule can be rewritten as follows: define the likelihood ratio (似然比) L ( x ) L(x) L(x) by
L ( x ) = p X ( x ; H 1 ) p X ( x ; H 0 ) L(x)=\frac{p_{X}(x;H_1)}{p_{X}(x;H_0)} L(x)=pX(x;H0)pX(x;H1)and declare H 1 H_1 H1 to be true if the realized value x x x of the observation vector X X X satisfies
L ( x ) > ξ L(x)>\xi L(x)>ξ, where the critical value (临界值) ξ = 1 \xi=1 ξ=1. If X X X is continuous, the approach is the same, except that the likelihood ratio is defined as a ratio of PDFs:
L ( x ) = f X ( x ; H 1 ) f X ( x ; H 0 ) L(x)=\frac{f_{X}(x;H_1)}{f_{X}(x;H_0)} L(x)=fX(x;H0)fX(x;H1) - We are led to consider rejection regions of the form
R = { x ∣ L ( x ) > ξ } R=\{x|L(x)>\xi\} R={x∣L(x)>ξ}The critical value ξ \xi ξ remains free to be chosen on the basis of other considerations. The special case where ξ = 1 \xi= 1 ξ=1 corresponds to the ML rule.
This is the Bayesian hypothesis testing with a flat prior (均匀先验)
Example 9.10.
- We have a six-sided die that we want to test for fairness, and we formulate two hypotheses for the probabilities of the six faces:
- The likelihood ratio for a single roll x x x of the die is
- Sinece the likelihood ratio takes only two distinct values. there are three possibilities to consider for the critical value ξ \xi ξ, with three corresponding rejection regions:In fact. for a single roll of the die. the test makes sense only in the case 3 / 4 < ξ < 3 / 2 3 / 4 <\xi < 3 /2 3/4<ξ<3/2, since for other values of ξ \xi ξ, the decision does not depend on the observation.
- The error probabilities can be calculated from the problem data for each critical value. In particular. the probability of false rejection
P
(
R
e
j
e
c
t
H
0
;
H
0
)
P(Reject\ H_0; H_0)
P(Reject H0;H0) isand the probability of false acceptance
P
(
A
c
c
e
p
t
H
0
;
H
1
)
P(Accept\ H_0;H_1)
P(Accept H0;H1) is
Likelihood Ratio Test (LRT)
- Note that choosing
ξ
\xi
ξ trades off the probabilities of the two types of errors.
- Indeed, as ξ \xi ξ increases, the rejection region becomes smaller. As a result, the false rejection probability α ( R ) \alpha(R) α(R) decreases, while the false acceptance probability β ( R ) \beta(R) β(R) increases.
- Because of this tradeoff, there is no single best way of choosing the critical value. The most popular approach is as follows. (似然比检验)
Typical choices for α \alpha α are α = 0.1 , α = 0.05 \alpha = 0.1, \alpha = 0.05 α=0.1,α=0.05, or α = 0.01 \alpha = 0.01 α=0.01, depending on the degree of undesirability of false rejection.
Neyman-Pearson Lemma (内曼-皮尔逊引理)
- We have motivated so far the use of a LRT through an analogy with Bayesian inference. However, we will now provide a stronger justification: for a given false rejection probability, the LRT offers the smallest possible false acceptance probability.
- For a justification of the Neyman-Pearson Lemma, consider a hypothetical Bayesian decision problem where the prior probabilities of
H
0
H_0
H0 and
H
1
H_1
H1 satisfy
p Θ ( θ 0 ) p Θ ( θ 1 ) = ξ \frac{p_\Theta(\theta_0)}{p_\Theta(\theta_1)}=\xi pΘ(θ1)pΘ(θ0)=ξso that
p Θ ( θ 0 ) = ξ 1 + ξ , p Θ ( θ 1 ) = 1 1 + ξ p_\Theta(\theta_0)=\frac{\xi}{1+\xi},\ \ \ \ \ \ \ \ \ p_\Theta(\theta_1)=\frac{1}{1+\xi} pΘ(θ0)=1+ξξ, pΘ(θ1)=1+ξ1Then, the threshold used by the MAP rule is equal to ξ \xi ξ, and the MAP rule is identical to the LRT rule. ( L ( X ) = p X ∣ Θ ( x ∣ θ 1 ) p X ∣ Θ ( x ∣ θ 0 ) > p Θ ( θ 0 ) p Θ ( θ 1 ) = ξ L(X)=\frac{p_{X|\Theta}(x|\theta_1)}{p_{X|\Theta}(x|\theta_0)}>\frac{p_\Theta(\theta_0)}{p_\Theta(\theta_1)}=\xi L(X)=pX∣Θ(x∣θ0)pX∣Θ(x∣θ1)>pΘ(θ1)pΘ(θ0)=ξ) The probability of error with the MAP rule is
e M A P = ξ 1 + ξ α + 1 1 + ξ β e_{MAP}=\frac{\xi}{1+\xi}\alpha+\frac{1}{1+\xi}\beta eMAP=1+ξξα+1+ξ1βand from Section 8.2, we know that it is smaller than or equal to the probability of error of any other Bayesian decision rule. This implies that for any choice of rejection region R R R, we have
e M A P ≤ ξ 1 + ξ P ( X ∈ R ; H 0 ) + 1 1 + ξ P ( X ∉ R ; H 1 ) e_{MAP}\leq\frac{\xi}{1+\xi}P(X\in R;H_0)+\frac{1}{1+\xi}P(X\notin R;H_1) eMAP≤1+ξξP(X∈R;H0)+1+ξ1P(X∈/R;H1)Comparing the preceding two relations, we see that if P ( X ∈ R ; H 0 ) ≤ α P(X\in R;H_0)\leq\alpha P(X∈R;H0)≤α, we must have P ( X ∉ R ; H 1 ) ≥ β P(X\notin R;H_1)\geq\beta P(X∈/R;H1)≥β, and that if P ( X ∈ R ; H 0 ) < α P(X\in R;H_0)<\alpha P(X∈R;H0)<α, we must have P ( X ∉ R ; H 1 ) > β P(X\notin R;H_1)>\beta P(X∈/R;H1)>β, which is the conclusion of the Neyman-Pearson Lemma. - The Neyman-Pearson Lemma can be interpreted geometrically as shown in Fig. 9.11.
Example 9.13. Comparison of Different Rejection Regions.
- We observe two i.i.d. normal random variables X 1 X_1 X1 and X 2 X_2 X2, with unit variance. Under H 0 H_0 H0 their common mean is 0; under H 1 H_1 H1, their common mean is 2. We fix the false rejection probability to α = 0.05 \alpha = 0.05 α=0.05.
- We first derive the form of the LRT, and then calculate the resulting value of
β
\beta
β. The likelihood ratio is of the form
L ( x ) = 1 2 π exp { − ( ( x 1 − 2 ) 2 + ( x 2 − 2 ) 2 ) / 2 } 1 2 π exp { − ( x 1 2 + x 2 2 ) / 2 } = exp { 2 ( x 1 + x 2 − 4 ) } L(x)=\frac{\frac{1}{\sqrt{2\pi}}\exp\big\{-\big((x_1-2)^2+(x_2-2)^2\big)/2\big\}}{\frac{1}{\sqrt{2\pi}}\exp\big\{-(x_1^2+x_2^2)/2\big\}}=\exp\{2(x_1+x_2-4)\} L(x)=2π1exp{−(x12+x22)/2}2π1exp{−((x1−2)2+(x2−2)2)/2}=exp{2(x1+x2−4)}Comparing L ( x ) L(x) L(x) to a critical value ξ \xi ξ is equivalent to comparing x 1 + x 2 x_1 + x_2 x1+x2 to γ = ( 4 + l o g ξ ) / 2 \gamma =(4 + log\xi)/2 γ=(4+logξ)/2. Thus, under the LRT, we decide in favor of H 1 H_1 H1 if x 1 + x 2 > γ x_1 + x_2 > \gamma x1+x2>γ for some particular choice of γ \gamma γ. - To determine the exact form of the rejection region, we need to find
γ
\gamma
γ so that the false rejection probability
P
(
X
1
+
X
2
>
γ
;
H
0
)
P(X_1 + X_2 > \gamma; H_0)
P(X1+X2>γ;H0) is equal to
0.05
0.05
0.05. We note that under
H
0
H_0
H0,
Z
=
(
X
1
+
X
2
)
/
2
Z = (X_1 +X_2)/\sqrt2
Z=(X1+X2)/2 is a standard normal random variable. We have
0.05 = P ( X 1 + X 2 > γ ; H 0 ) = P ( X 1 + X 2 2 > γ 2 ; H 0 ) = P ( Z > γ 2 ) 0.05=P(X_1 + X_2 > \gamma; H_0)=P(\frac{X_1 +X_2}{\sqrt2}>\frac{\gamma}{\sqrt2};H_0)=P(Z>\frac{\gamma}{\sqrt2}) 0.05=P(X1+X2>γ;H0)=P(2X1+X2>2γ;H0)=P(Z>2γ)From the normal tables, we obtain P ( Z > 1.645 ) = 0.05 P(Z > 1.645) = 0.05 P(Z>1.645)=0.05, so we choose
γ = 1.645 ⋅ 2 = 2.33 \gamma=1.645\cdot\sqrt2=2.33 γ=1.645⋅2=2.33resulting in the rejection region
R = { ( x 1 , x 2 ) ∣ x 1 + x 2 > 2.33 } R=\{(x_1,x_2)|x_1+x_2>2.33\} R={(x1,x2)∣x1+x2>2.33} - To evaluate the performance of this test, we calculate the resulting false acceptance probability.
β ( R ) = P ( X 1 + X 2 ≤ 2.33 ; H 1 ) = P ( X 1 + X 2 − 4 2 ≤ − 1.18 ; H 1 ) = 1 − Φ ( 1.18 ) = 0.12 \beta(R)=P(X_1+X_2\leq2.33;H_1)=P(\frac{X_1 +X_2-4}{\sqrt2}\leq-1.18;H_1) \\=1-\Phi(1.18)=0.12 β(R)=P(X1+X2≤2.33;H1)=P(2X1+X2−4≤−1.18;H1)=1−Φ(1.18)=0.12 - We now compare the performance of the LRT with that resulting from a different rejection region
R
′
R'
R′. For example, let us consider a rejection region of the form
R ′ = { ( x 1 , x 2 ) ∣ max { x 1 , x 2 } > ξ } R'=\{(x_1,x_2)|\max\{x_1,x_2\}>\xi\} R′={(x1,x2)∣max{x1,x2}>ξ}where ξ \xi ξ is chosen so that the false rejection probability is again 0.05. To determine the value of ξ \xi ξ, we write
0.05 = P ( max { x 1 , x 2 } > ξ ; H 0 ) = 1 − P ( max { x 1 , x 2 } ≤ ξ ; H 0 ) = 1 − P ( X 1 ≤ ξ ; H 0 ) P ( X 2 ≤ ξ ; H 0 ) = 1 − ( P ( Z ≤ ξ ; H 0 ) ) 2 0.05=P(\max\{x_1,x_2\}>\xi;H_0)=1-P(\max\{x_1,x_2\}\leq\xi;H_0) \\=1-P(X_1\leq\xi;H_0)P(X_2\leq\xi;H_0)=1-(P(Z\leq\xi;H_0))^2 0.05=P(max{x1,x2}>ξ;H0)=1−P(max{x1,x2}≤ξ;H0)=1−P(X1≤ξ;H0)P(X2≤ξ;H0)=1−(P(Z≤ξ;H0))2where Z Z Z is a standard normal. This yields Φ ( z ) ≈ 0.975 \Phi(z)\approx0.975 Φ(z)≈0.975. Using the normal tables, we conclude that ξ = 1.96 \xi= 1.96 ξ=1.96. Let us now calculate the resulting false acceptance probability
β ( R ′ ) = P ( max { x 1 , x 2 } ≤ ξ ; H 1 ) = ( P ( X 1 ≤ 1.96 ) ) 2 = ( P ( Z ≤ − 0.04 ) ) 2 = 0.24 \beta(R')=P(\max\{x_1,x_2\}\leq\xi;H_1)=(P(X_1\leq1.96))^2 \\=(P(Z\leq-0.04))^2=0.24 β(R′)=P(max{x1,x2}≤ξ;H1)=(P(X1≤1.96))2=(P(Z≤−0.04))2=0.24 - We see that the false acceptance probability β ( R ) = 0.12 \beta(R) = 0.12 β(R)=0.12 of the LRT is much better than the false acceptance probability β ( R ′ ) = 0.24 \beta(R') = 0.24 β(R′)=0.24 of the alternative test.
Example 9.14. A Discrete Example.
Consider
n
=
25
n = 25
n=25 independent tosses of a coin. Under hypothesis
H
0
H_0
H0 (respectively,
H
1
H_1
H1), the probability of a head at each toss is equal to
θ
0
=
1
/
2
\theta_0= 1 /2
θ0=1/2 (respectively,
θ
1
=
2
/
3
\theta_1= 2/3
θ1=2/3). Let
X
X
X be the number of heads observed. If we set the false rejection probability to 0.1, what is the rejection region associated with the LRT?
SOLUTION
- We observe that when
X
=
k
X = k
X=k, the likelihood ratio is of the form
L ( k ) = 2 k ( 2 3 ) 25 L(k)=2^k(\frac{2}{3})^{25} L(k)=2k(32)25Note that L ( k ) L(k) L(k) is a monotonically increasing function of k k k. Thus, the rejection condition L ( k ) > ξ L(k) >\xi L(k)>ξ is equivalent to a condition k > γ k > \gamma k>γ, for a suitable value of γ \gamma γ. We conclude that the LRT is of the form
r e j e c t H 0 i f X > γ reject\ H_0\ if\ X>\gamma reject H0 if X>γ - To guarantee the requirement on the false rejection probability, we need to find the smallest possible value of, for which
P
(
X
>
γ
;
H
0
)
≤
0.1
P(X >\gamma;H_0) \leq 0.1
P(X>γ;H0)≤0.1, or
∑ i = γ + 1 25 ( 25 i ) 2 − 25 ≤ 0.1 \sum_{i=\gamma+1}^{25}\begin{pmatrix}25\\i\end{pmatrix}2^{-25}\leq0.1 i=γ+1∑25(25i)2−25≤0.1By evaluating numerically the right-hand side above for different choices of γ \gamma γ, we find that the required value is γ = 16 \gamma = 16 γ=16. (可以用二分法查找) - An alternative method for choosing
γ
\gamma
γ involves an approximation based on the central limit theorem. Under
H
0
H_0
H0,
Z = X − n θ 0 n θ 0 ( 1 − θ 0 ) = X − 12.5 25 / 4 Z=\frac{X-n\theta_0}{\sqrt{n\theta_0(1-\theta_0)}}=\frac{X-12.5}{\sqrt{25/4}} Z=nθ0(1−θ0)X−nθ0=25/4X−12.5is approximately a standard normal random variable. Therefore, we need
0.1 = P ( X ≥ γ ; H 0 ) = P ( X − 12.5 25 / 4 > γ − 12.5 25 / 4 ; H 0 ) = P ( Z > 2 γ 5 − 5 ) 0.1=P(X\geq\gamma;H_0)=P(\frac{X-12.5}{\sqrt{25/4}}>\frac{\gamma-12.5}{\sqrt{25/4}};H_0)=P(Z>\frac{2\gamma}{5}-5) 0.1=P(X≥γ;H0)=P(25/4X−12.5>25/4γ−12.5;H0)=P(Z>52γ−5)From the normal tables, we have Φ ( 1.28 ) = 0.9 \Phi(1.28) = 0.9 Φ(1.28)=0.9, and therefore, we should choose γ \gamma γ so that 2 γ 5 − 5 = 1.28 \frac{2\gamma}{5}-5= 1.28 52γ−5=1.28, or γ = 15.7 \gamma = 15. 7 γ=15.7. Since X X X is integer-valued, we find that the LRT should reject H 0 H_0 H0 whenever X > 15 X > 15 X>15.