本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记
目录
Independence
- We have introduced the conditional probability
P
(
A
∣
B
)
P(A | B)
P(A∣B) to capture the partial information that event
B
B
B provides about event
A
A
A. An interesting and important special case arises when the occurrence of
B
B
B provides no such information and does not alter the probability that
A
A
A has occurred, i.e.,
P ( A ∣ B ) = P ( A ) P(A|B)=P(A) P(A∣B)=P(A) - When the above equality holds. we say that
A
A
A is independent of
B
B
B (
A
A
A and
B
B
B are independent events). The equation above is equivalent to
P ( A ∩ B ) = P ( A ) P ( B ) P(A\cap B)=P(A)P(B) P(A∩B)=P(A)P(B)- We adopt this latter relation as the definition of independence because it can be used even when P ( B ) = 0 P(B) = 0 P(B)=0, in which case P ( A ∣ B ) P(A | B) P(A∣B) is undefined.
- If A A A and B B B are independent, the occurrence of B B B does not provide any new information on the probability of A A A occurring.
Pitfall: A common first thought is that two events are independent if they are disjoint, but in fact the opposite is true: two disjoint events A A A and B B B with P ( A ) > 0 P(A) > 0 P(A)>0 and P ( B ) > 0 P(B) > 0 P(B)>0 are never independent. (e.g. A A A and A C A^C AC)
性质
- 在 ( A , B ) , ( A , B ˉ ) , ( A ˉ , B ) , ( A ˉ , B ˉ ) (A,B),(A,\bar B),(\bar A,B),(\bar A,\bar B) (A,B),(A,Bˉ),(Aˉ,B),(Aˉ,Bˉ) 这四对事件中,如果有一对独立,则另外三对也独立
Conditional Independence 条件独立
- We noted earlier that the conditional probabilities of events form a legitimate probability law. We can thus talk about independence of various events with respect to this conditional law.
- In particular, given an event
C
C
C. the events
A
A
A and
B
B
B are called conditionally independent if
P ( A ∩ B ∣ C ) = P ( A ∣ C ) P ( B ∣ C ) P(A\cap B|C)=P(A|C)P(B|C) P(A∩B∣C)=P(A∣C)P(B∣C) - To derive an alternative characterization of conditional independence, we write
- We now compare the preceding two expressions. and after eliminating the common factor
P
(
B
∣
C
)
P(B|C)
P(B∣C), assumed nonzero. we see that conditional independence is the same as the condition
P ( A ∣ B ∩ C ) = P ( A ∣ C ) P(A|B\cap C)=P(A|C) P(A∣B∩C)=P(A∣C)In words, this relation states that if C C C is known to have occurred, the additional knowledge that B B B also occurred does not change the probability of A A A. - Interestingly, independence of two events A A A and B B B with respect to the unconditional probability law. does not imply conditional independence, and vice versa, as illustrated by the next example.
Example 1.21.
- There are two coins, a blue and a red one. We choose one of the two at random, each being chosen with probability 1 / 2 1 /2 1/2, and proceed with two independent tosses. The coins are biased: with the blue coin, the probability of heads in any given toss is 0.99 0.99 0.99, whereas for the red coin it is 0.01 0.01 0.01.
- Let
B
B
B be the event that the blue coin was selected. Let also
H
i
H_i
Hi be the event that the
i
i
ith toss resulted in heads. Given the choice of a coin, the events
H
1
H_1
H1 and
H
2
H_2
H2 are independent. Thus,
P ( H 1 ∩ H 2 ∣ B ) = P ( H 1 ∣ B ) P ( H 2 ∣ B ) = 0.99 ⋅ 0.99 P(H_1\cap H_2|B)=P(H_1|B)P(H_2|B)=0.99\cdot0.99 P(H1∩H2∣B)=P(H1∣B)P(H2∣B)=0.99⋅0.99 - On the other hand, the events
H
1
H_1
H1 and
H
2
H_2
H2 are not independent. Intuitively, if we are told that the first toss resulted in heads, this leads us to suspect that the blue coin was selected. Mathematically,
P ( H 1 ) = P ( B ) P ( H 1 ∣ B ) + P ( B C ) P ( H 1 ∣ B C ) = 1 2 ⋅ 0.99 + 1 2 ⋅ 0.01 = 1 2 P(H_1)=P(B)P(H_1|B)+P(B^C)P(H_1|B^C)=\frac{1}{2}\cdot0.99+\frac{1}{2}\cdot0.01=\frac{1}{2} P(H1)=P(B)P(H1∣B)+P(BC)P(H1∣BC)=21⋅0.99+21⋅0.01=21
Similarly, we have P ( H 2 ) = 1 / 2 P(H_2) =1/2 P(H2)=1/2. Now notice that
P ( H 1 ∩ H 2 ) = P ( B ) P ( H 1 ∩ H 2 ∣ B ) + P ( B C ) P ( H 1 ∩ H 2 ∣ B C ) = 1 2 ⋅ 0.99 ⋅ 0.99 + 1 2 ⋅ 0.01 ⋅ 0.01 ≈ 1 2 \begin{aligned}P(H_1\cap H_2)&=P(B)P(H_1\cap H_2|B)+P(B^C)P(H_1\cap H_2|B^C) \\&=\frac{1}{2}\cdot0.99\cdot0.99+\frac{1}{2}\cdot0.01\cdot0.01\approx\frac{1}{2}\end{aligned} P(H1∩H2)=P(B)P(H1∩H2∣B)+P(BC)P(H1∩H2∣BC)=21⋅0.99⋅0.99+21⋅0.01⋅0.01≈21 - Thus, the events H 1 H_1 H1 and H 2 H_2 H2 are dependent, even though they are conditionally independent given B B B.
Summary
Independence of a Collection of Events
- For the case of three events,
A
1
A_1
A1,
A
2
A_2
A2, and
A
3
A_3
A3, independence amounts to satisfying the four conditions
The first three conditions simply assert that any two events are independent, a property known as pairwise independence (两两独立). But the fourth condition is also important and does not follow from the first three. Conversely, the fourth condition does not imply the first three.
- The intuition behind the independence of a collection of events is analogous to the case of two events. Independence means that the occurrence or non-occurrence of any number of the events from that collection carries no information on the remaining events or their complements.
- For example, if the events
A
1
A_1
A1,
A
2
A_2
A2,
A
3
A_3
A3,
A
4
A_4
A4 are independent, one obtains relations such as
P ( A 1 ∪ A 2 ∣ A 3 ∩ A 4 ) = P ( A 1 ∪ A 2 ) P(A_1\cup A_2|A_3\cap A_4)=P(A_1\cup A_2) P(A1∪A2∣A3∩A4)=P(A1∪A2)( P ( A 1 ∪ A 2 ∣ A 3 ∩ A 4 ) = P ( A 1 ∣ A 3 ∩ A 4 ) + P ( A 2 ∣ A 3 ∩ A 4 ) + P ( A 1 ∩ A 2 ∣ A 3 ∩ A 4 ) = P ( A 1 ) + P ( A 2 ) + P ( A 1 ∩ A 2 ) = P ( A 1 ∪ A 2 ) P(A_1\cup A_2|A_3\cap A_4)=P(A_1|A_3\cap A_4)+P(A_2|A_3\cap A_4)+P(A_1\cap A_2|A_3\cap A_4)=P(A_1)+P(A_2)+P(A_1\cap A_2)=P(A_1\cup A_2) P(A1∪A2∣A3∩A4)=P(A1∣A3∩A4)+P(A2∣A3∩A4)+P(A1∩A2∣A3∩A4)=P(A1)+P(A2)+P(A1∩A2)=P(A1∪A2))
or P ( A 1 ∪ A 2 C ∣ A 3 C ∩ A 4 ) = P ( A 1 ∪ A 2 C ) P(A_1\cup A_2^C|A_3^C\cap A_4)=P(A_1\cup A_2^C) P(A1∪A2C∣A3C∩A4)=P(A1∪A2C)
Independent Trials and the Binomial Probabilities
独立实验 和 二项概率
- If an experiment involves a sequence of independent but identical stages, we say that we have a sequence of independent trials.
- In the special case where there are only two possible results at each stage, we say that we have a sequence of independent Bernoulli trials (伯努利试验序列).
- We can visualize independent Bernoulli trials by means of a sequential description, as shown below for the case where n = 3 n = 3 n=3.
- By multiplying the conditional probabilities along the corresponding path of the tree, we see that any particular outcome that involves k k k heads and 3 − k 3 - k 3−k tails has probability p k ( 1 − p ) 3 − k p^k(1 - p)^{3-k} pk(1−p)3−k .
- This formula extends to the case of a general number
n
n
n of tosses. We obtain that the probability of any particular
n
n
n-long sequence that contains
k
k
k heads and
n
−
k
n - k
n−k tails is
p
k
(
1
−
p
)
n
−
k
p^k(1 - p)^{n - k}
pk(1−p)n−k . for all
k
k
k from
0
0
0 to
n
n
n.
- Let us now consider the probability
where we use the notation
The number ( n k ) \begin{pmatrix}n\\k\end{pmatrix} (nk) (read as " n n n choose k k k") are known as binominal coefficients (二项式系数), while the probabilities p ( k ) p(k) p(k) are known as the binomial probabilities (二项概率).
Note that the binomial probabilities p ( k ) p(k) p(k) must add to 1, thus showing the binomial formula
Exercises
Problem 30.
A hunter has two hunting dogs. One day, on the trail of some animal, the hunter comes to a place where the road diverges into two paths. He knows that each dog. independent of the other. will choose the correct path with probability
p
p
p. The hunter decides to let each dog choose a path, and if they agree, take that one, and if they disagree, to randomly pick a path. Is his strategy better than just letting one of the two dogs decide on a path?
SOLUTION
- The events that lead to the correct path are:
- The above events are disjoint, so we can add the probabilities
- Thus, the two strategies are equally effective.
Problem 33.
Using a biased coin to make an unbiased decision. Alice and Bob want to choose between the opera and the 1novies by tossing a fair coin. Unfortunately, the only available coin is biased (though the bias is not known exactly). How can they use the biased coin to make a decision so that either option (opera or the movies) is equally likely to be chosen?
SOLUTION
- Flip the coin twice. If the outcome is heads-tails, choose the opera. if the outcome is tails-heads, choose the movies. Otherwise, repeat the process, until a decision can be made.
- Let
A
k
A_k
Ak be the event that a decision was made at the
k
k
kth round. Conditional on the event
A
k
A_k
Ak, the two choices are equally likely, and we have
Problem 41.
Consider a game show with an infinite pool of contestants, where at each round
i
i
i, contestant
i
i
i obtains a number by spinning a continuously calibrated wheel. The contestant with the smallest number thus far survives. Successive wheel spins are independent and we assume that there are no ties. Let
N
N
N be the round at which contestant
1
1
1 is eliminated. For any positive integer
n
n
n, find
P
(
N
=
n
)
P(N = n)
P(N=n).
SOLUTION 1
- For
i
≤
j
i\leq j
i≤j,
A
i
,
j
A_{i,j}
Ai,j is the event that contestant
i
i
i's number is the smallest of the numbers of contestants
1
,
.
.
.
,
j
1,..., j
1,...,j. We have
where for
P ( A 1 , n − 1 ) = 1 n − 1 P(A_{1,n-1})=\frac{1}{n-1} P(A1,n−1)=n−11 - We claim that
P
(
A
n
,
n
∣
A
1
,
n
−
1
)
=
P
(
A
n
,
n
)
=
1
n
P(A_{n,n}|A_{1,n-1})=P(A_{n,n})=\frac{1}{n}
P(An,n∣A1,n−1)=P(An,n)=n1The reason is that by symmetry, we have
while by the total probability theorem,
Hence
P ( N = n ) = 1 ( n − 1 ) n P(N=n)=\frac{1}{(n-1)n} P(N=n)=(n−1)n1
SOLUTION 2
- Let us fix a particular choice of n n n. Think of an outcome of the experiment as an ordering of the values of the n n n contestants, so that there are n ! n! n! equally likely outcomes. The event { N = n } \{N = n\} {N=n} occurs if and only if the first contestant’s number is smallest among the first n − 1 n- 1 n−1 contestants, and contestant n n n's number is the smallest among the first n n n contestants. This event can occur in ( n − 2 ) ! (n-2)! (n−2)! different ways, namely, all the possible ways of ordering contestants 2 , . . . , n − 1 2,...,n - 1 2,...,n−1. Thus, the probability of this event is ( n − 2 ) ! / n ! = 1 / ( n ( n − 1 ) ) (n- 2)!/n! = 1/(n(n - 1)) (n−2)!/n!=1/(n(n−1)), in agreement with the previous solution.
也可以设第一个人得到数 i i i,则 P ( N = n ) = i n − 2 ( 1 − i ) P(N=n)=i^{n-2}(1-i) P(N=n)=in−2(1−i),然后将左式在 [ 0 , 1 ] [0,1] [0,1] 上对 i i i 积分,也可以得到相同的答案
Problem 42. Gambler’s ruin.
A gambler makes a sequence of independent bets. In each bet, he wins $1 with probability
p
p
p, and loses $1 with probability
1
−
p
1 - p
1−p. Initially, the gambler has
k
k
k, and plays until he either accumulates $
n
n
n or has no money left. What is the probability that the gambler will end up with $n?
SOLUTION
- Let us denote by
A
A
A the event that he ends up with $n, and by
F
F
F the event that he wins the first bet. Denote also by
w
k
w_k
wk the probability of event
A
A
A, if he starts with $k. We apply the total probability theorem to obtain
where q = 1 − p q = 1 - p q=1−p. P ( A ∣ F ) = w k + 1 P(A |F) = w_{k+1} P(A∣F)=wk+1 and P ( A ∣ F C ) = w k − 1 P(A|F^C)= w_{k-1} P(A∣FC)=wk−1. Thus, we have w k = p w k + 1 + q w k − 1 w_k = pw_{k+1} + qw_{k-1} wk=pwk+1+qwk−1, ,which can be written as
where r = q / p r = q/p r=q/p. We will solve for w k w_k wk in terms of p p p and q q q using iteration, and the boundary values w 0 = 0 w_0 = 0 w0=0 and w n = 1 w_n = 1 wn=1. - We have
w
k
+
1
−
w
k
=
r
k
(
w
1
−
w
0
)
w_{k+1} -w_k= r^k(w_1 - w_0)
wk+1−wk=rk(w1−w0), and since
w
0
=
0
w_0 = 0
w0=0,
We have
Since w n = 1 w_n= 1 wn=1, we can solve for w 1 w_1 w1 and therefore for w k w_k wk:
so that
Problem 46. Laplace’s rule of succession.
Consider
m
+
1
m + 1
m+1 boxes with the
k
k
kth box containing
k
k
k red balls and
m
−
k
m - k
m−k white balls, where
k
k
k ranges from
0
0
0 to
m
m
m. We choose a box at random (all boxes are equally likely) and then choose a ball at random from that box,
n
n
n successive times (the ball drawn is replaced each time, and a new ball is selected independently). Suppose a red ball was drawn each of the
n
n
n times. What is the probability that if we draw a ball one more time it will be red? Estimate this probability for large
m
m
m.
SOLUTION
- We want to find the conditional probability
P
(
E
∣
R
n
)
P(E | R_n)
P(E∣Rn), where
E
E
E is the event of a red ball drawn at time
n
+
1
n + 1
n+1, and
R
n
R_n
Rn is the event of a red ball drawn each of the
n
n
n preceding times.
P ( E ∣ R n ) = P ( E ∩ R n ) P ( R n ) P(E|R_n)=\frac{P(E\cap R_n)}{P(R_n)} P(E∣Rn)=P(Rn)P(E∩Rn)
and by using the total probability theorem, we obtain
- For large
m
m
m, we can view
P
(
R
n
)
P(R_n )
P(Rn) as a piecewise constant approximation to an integral:
Similarly,
P ( E ∩ R n ) = P ( R n + 1 ) ≈ 1 n + 2 P(E\cap R_n)=P(R_{n+1})\approx\frac{1}{n+2} P(E∩Rn)=P(Rn+1)≈n+21
so that
P ( E ∣ R n ) ≈ n + 1 n + 2 P(E|R_n)\approx\frac{n+1}{n+2} P(E∣Rn)≈n+2n+1Thus, for large m m m, drawing a red ball one more time is almost certain when n n n is large.
Problem 48. The Borel-Cantelli lemma (博雷尔-坎泰利引理).
Consider an infinite sequence of trials. The probability of success at the
i
i
ith trial is some positive number
p
i
p_i
pi. Let
N
N
N be the event that there is no success, and let
I
I
I be the event that there is an infinite number of successes
- (a) Assume that the trials are independent and that ∑ i = 1 ∞ p i = ∞ \sum_{i=1}^\infty p_i=\infty ∑i=1∞pi=∞. Show that P ( N ) = 0 P(N)=0 P(N)=0 and P ( I ) = 1. P(I)=1. P(I)=1.
- (b) Assume that ∑ i = 1 ∞ p i < ∞ \sum_{i=1}^\infty p_i<\infty ∑i=1∞pi<∞. Show that P ( I ) = 0 P(I)=0 P(I)=0.
SOLUTION
(a)
- The event
N
N
N is a subset of the event that there were no successes in the first
n
n
n trials, so that
P ( N ) ≤ ∏ i = 1 n ( 1 − p i ) P(N)\leq\prod_{i=1}^n(1-p_i) P(N)≤i=1∏n(1−pi)Taking logarithms,
l o g P ( N ) ≤ ∑ i = 1 n l o g ( 1 − p i ) ≤ ∑ i = 1 n ( − p i ) logP(N)\leq\sum_{i=1}^nlog(1-p_i)\leq\sum_{i=1}^n(-p_i) logP(N)≤i=1∑nlog(1−pi)≤i=1∑n(−pi)Taking the limit as n n n tends to infinity, we obtain l o g P ( N ) = − ∞ log P(N) = -\infty logP(N)=−∞. or P ( N ) = 0 P(N) = 0 P(N)=0. - Let now
L
n
L_n
Ln be the event that there is a finite number of successes and that the last success occurs at the
n
n
nth trial. We use the already established result
P
(
N
)
=
0
P(N) = 0
P(N)=0, and apply it to the sequence of trials after trial
n
n
n, to obtain
P
(
L
n
)
=
0
P(L_n ) = 0
P(Ln)=0. The event
I
C
I^C
IC (finite number of successes) is the union of the disjoint events
L
n
L_n
Ln ,
n
≥
1
n\geq1
n≥1. and
N
N
N, so that
P ( I C ) = P ( N ) + ∑ n = 1 ∞ P ( L n ) = 0 P(I^C)=P(N)+\sum_{n=1}^\infty P(L_n)=0 P(IC)=P(N)+n=1∑∞P(Ln)=0and P ( I ) = 1 P(I) = 1 P(I)=1.
(b)
- Let
S
i
S_i
Si be the event that the
i
i
ith trial is a success. Fix some number
n
n
n and for every
i
>
n
i > n
i>n, let
F
i
F_i
Fi be the event that the first success after time
n
n
n occurs at time
i
i
i. Note that
F
i
⊂
S
i
F_i\subset S_i
Fi⊂Si. Finally, let
A
n
A_n
An be the event that there is at least one success after time
n
n
n. Note that
I
⊂
A
n
I \subset A_n
I⊂An. Furthermore, the event
A
n
A_n
An is the union of the disjoint events
F
i
,
i
>
n
F_i, i > n
Fi,i>n. Therefore,
We take the limit of both sides as n → ∞ n\rightarrow\infty n→∞. Because of the assumption ∑ i = 1 ∞ p i < ∞ \sum_{i=1}^\infty p_i<\infty ∑i=1∞pi<∞, the right-hand side converges to zero. This implies that P ( I ) = 0 P(I) = 0 P(I)=0.