Chapter 5 Random Variables on a Countable Space
南京审计大学统计学研究生第一学期课程,《高等概率论》。
欢迎大家来我的github下载源码呀,https://github.com/Berry-Wen/statistics-note-system
Random variable
In this Chapter, we again assume Ω \Omega Ω is countable and A = 2 Ω \mathcal{A} = 2^{\Omega} A=2Ω
A random variable X X X is de nfined to be a function from Ω \Omega Ω into a set T T T
- A random variable represents an unknown quantity (hence the term variable) that varies not as a variable in an algebraic relation, but rather varies with the outcome of a random
event.随机变量表示未知量(因此称为术语变量),其不随代数关系的变量而变化,而是随随机结果的变化而变化的事件。 - Before the random event, we know which values X X X could possibly assume, but we do not know which one it will take until the random event happens. 在随机事件发生之前,我们知道$ X $可能假定哪个值,但是在随机事件发生之前我们不知道它将取哪个值。
Remark
Note that even if the state space (or range space) T T T is not countable, the image T ′ T' T′ of Ω \Omega Ω under X X X (that is, all points { i } \{i\} {i} in T for which there exists an ω ∈ Ω \omega \in \Omega ω∈Ω such that X ( ω ) = i X(\omega)=i X(ω)=i ) is either finite or countably in finite.
Distribution 分布
We define distribution of
X
X
X (also called the law of
X
X
X) on the range space
T
′
T'
T′ of
X
X
X by
P
X
(
A
)
=
P
(
{
ω
:
X
(
ω
)
∈
A
}
)
=
P
(
X
−
1
(
A
)
)
=
P
(
X
∈
A
)
P^{X}(A) = P(\{\omega:X(\omega) \in A\}) = P(X^{-1}(A)) = P(X \in A)
PX(A)=P({ω:X(ω)∈A})=P(X−1(A))=P(X∈A)
That this formula defined a Probability measure on
T
′
T'
T′ (with the
σ
\sigma
σ-algebra
2
T
′
2^{T'}
2T′ of all subsets of
T
′
T'
T′) is evident.
Remark
Since
T
′
T'
T′ is at most countable, this probability is completely determined by the folllowing numbers:
P
j
X
=
P
(
X
=
j
)
=
∑
{
ω
:
X
(
ω
)
=
j
}
p
ω
P^X_j = P(X=j) = \sum_{\{\omega:X(\omega)=j\}} p_{\omega}
PjX=P(X=j)={ω:X(ω)=j}∑pω
Sometimes, the family (
p
j
X
:
j
∈
T
′
p_j^X:j \in T'
pjX:j∈T′) is also called the distribution (or the law) of
X
X
X.
We have of course
P
X
(
A
)
=
∑
j
∈
A
p
ω
P^X(A)= \sum_{j \in A} p_{\omega}
PX(A)=j∈A∑pω
Define
L
1
\mathcal{L}^1
L1 to be the space of real valued random variables on
(
Ω
,
A
,
P
)
(\Omega,\mathcal{A},P)
(Ω,A,P) which have a finite expectation, i.e.
L
1
=
{
X
:
X
∈
(
Ω
,
A
,
P
)
and
E
{
X
}
<
∞
}
\mathcal{L}^1 = \{X:X\in (\Omega,\mathcal{A},P) \text{ and } E\{X\}< \infty\}
L1={X:X∈(Ω,A,P) and E{X}<∞}
定义的
L
1
\mathcal{L}^1
L1是所有在概率空间
(
Ω
,
A
,
P
)
(\Omega,\mathcal{A},P)
(Ω,A,P)上期望有限的随机变量的集合
-
L 1 \mathcal{L}^1 L1 is a vector space, and the expectation operator E E E is linear.
-
the expectation operator E E E is positive:
If X ∈ L 1 X\in \mathcal{L}^1 X∈L1 and X ≥ 0 X\ge 0 X≥0, then E ( X ) ≥ 0 E(X)\ge0 E(X)≥0
If X , Y ∈ L 1 X,Y\in \mathcal{L}^1 X,Y∈L1 and X ≤ Y X\le Y X≤Y, then E ( X ) ≤ E ( Y ) E(X)\le E(Y) E(X)≤E(Y) -
L 1 \mathcal{L}^1 L1 contains all bounded random variables.
If X ≡ a X \equiv a X≡a, then E ( X ) = a E(X) = a E(X)=a -
If X ∈ L 1 X\in \mathcal{L}^1 X∈L1, its expectation depends only on its distribution
If T ′ T' T′ is the range of X X X,
E { X } = ∑ j ∈ T ′ j P ( X = j ) E\{X\} = \sum_{j \in T'} j P(X=j) E{X}=j∈T′∑jP(X=j) -
If X = 1 A X = 1_{A} X=1A is the indicator function of an event, then E { X } = P ( A ) E\{X\}=P(A) E{X}=P(A)
Theorem 5.1 更一般的马尔可夫 → \to → 找上界
Let H : R → [ 0 , ∞ ) H:\mathbb{R} \to [0, \infty) H:R→[0,∞) be a nonnegative function 非负的函数
Let X X X be a real valued random variable. 实值随机变量
Then.
P { ω : h ( X ( ω ) ) ≥ a } ≤ E { h ( X ) } a ∀ a > 0 P \left\{ \omega : h (X(\omega)) \ge a \right\} \le \frac{E \left\{ h(X) \right\}}{a} \qquad \forall a>0 P{ω:h(X(ω))≥a}≤aE{h(X)}∀a>0
Proof.
Since X X X is an r . v . r.v. r.v. so also is Y = h ( X ) Y=h(X) Y=h(X) ; Let
A = Y − 1 ( [ a , ∞ ) ) = { ω : h ( X ( ω ) ) ≥ a } = { h ( X ) ≥ a } A=Y^{-1} ([a, \infty)) = \left\{ \omega : h(X(\omega))\ge a \right\} = \left\{ h(X) \ge a \right\} A=Y−1([a,∞))={ω:h(X(ω))≥a}={h(X)≥a}
Then h ( X ) ≥ a 1 A h(X)\ge a 1_{A} h(X)≥a1A, hence
E { h ( X ) } ≥ E { a 1 A } = a E { 1 A } = a P ( A ) E \left\{ h(X) \right\} \ge E \left\{ a1_{A} \right\}=aE \left\{ 1_{A} \right\} =aP(A) E{h(X)}≥E{a1A}=aE{1A}=aP(A)And we have the result.
Proof_2 好一点的证明
P ( { ω ∈ Ω : h ( X ( ω ) ) ≥ a } ) = E 1 ( { ω ∈ Ω : h ( X ( ω ) ) ≥ a } ) = ∫ ω ∈ Ω 1 ( { ω ∈ Ω } : h ( X ( ω ) ) ≥ a ) d P = ∫ ω ∈ Ω : h ( X ( ω ) ) ≥ a 1 d P ≤ ∫ ω ∈ Ω : h ( X ( ω ) ) ≥ a h ( X ( ω ) ) a d P ≤ ∫ ω ∈ Ω h ( X ( ω ) ) a d P = E { h ( X ( ω ) ) } a \begin{aligned} & P \left( \left\{ \omega \in \Omega:h(X(\omega))\ge a \right\} \right) \\ =& E 1 \left( \left\{ \omega \in \Omega : h(X(\omega))\ge a \right\} \right) \\ =& \int_{\omega \in \Omega} 1 \left( \left\{ \omega \in \Omega \right\} : h(X(\omega)) \ge a \right) dP \\ =& \int_{\omega \in \Omega : h(X(\omega))\ge a} 1 dP \\ \le & \int_{\omega \in \Omega:h(X(\omega))\ge a} \frac{h(X(\omega))}{a} dP \\ \le & \int_{\omega \in \Omega} \frac{h(X(\omega))}{a} dP \\ =& \frac{E \left\{ h(X(\omega)) \right\}}{a} \\ \end{aligned} ===≤≤=P({ω∈Ω:h(X(ω))≥a})E1({ω∈Ω:h(X(ω))≥a})∫ω∈Ω1({ω∈Ω}:h(X(ω))≥a)dP∫ω∈Ω:h(X(ω))≥a1dP∫ω∈Ω:h(X(ω))≥aah(X(ω))dP∫ω∈Ωah(X(ω))dPaE{h(X(ω))}
概率可以用示性函数的期望来表示
Corollary 5.1 Markov’s Inequality
P { ∣ X ∣ ≥ a } ≤ E { ∣ X ∣ } a P \left\{ |X| \ge a \right\} \le \frac{E \left\{ |X| \right\}}{a} P{∣X∣≥a}≤aE{∣X∣}
Proof.
Take h ( x ) = ∣ x ∣ h(x)=|x| h(x)=∣x∣ in Theorem 5.1
Proof 2 尽量不用Th5.1来证明,而是用Th5.1的证明方法来证
考 试 的 时 候 用 这 个 方 法 \tiny{考试的时候用这个方法} 考试的时候用这个方法
P { ∣ X ∣ ≥ a } = E 1 ( { ∣ X ∣ ≥ a } ) = ∫ ∣ X ∣ ≥ a 1 d P ≤ ∫ ∣ X ∣ ≥ a ∣ X ∣ a d P ≤ ∫ ∣ X ∣ a d P = E { ∣ X ∣ } a \begin{aligned} P \left\{ |X| \ge a \right\} &= E 1( \left\{ |X|\ge a \right\}) \\ &= \int_{|X|\ge a} 1 dP \\ &\le \int_{|X| \ge a} \frac{|X|}{a} dP \\ &\le \int \frac{|X|}{a} dP \\ &= \frac{E \left\{ |X| \right\}}{a} \\ \end{aligned} P{∣X∣≥a}=E1({∣X∣≥a})=∫∣X∣≥a1dP≤∫∣X∣≥aa∣X∣dP≤∫a∣X∣dP=aE{∣X∣}
Definition 5.2 方差
Let
X
X
X be a real valued random variable with
X
2
X^{2}
X2 in
L
1
\mathcal{L}^{1}
L1
The Variance of
X
X
X is defined to be
σ
2
=
σ
X
2
≡
E
{
(
X
−
E
(
X
)
)
2
}
\sigma^{2} = \sigma_{X}^{2} \equiv E \left\{ \left( X- E(X) \right)^2 \right\}
σ2=σX2≡E{(X−E(X))2}
The standard deviation of
X
X
X , is the nonnegative square root of the variance.
- The primary use of the standard deviation is to report statistics in the correct (and meaningful) units.
条件:二阶矩存在 X 2 ∈ L 1 X^2 \in \mathcal{L}^1 X2∈L1 ,即 E ( X 2 ) E(X^2) E(X2) 存在
也可以写成 X ∈ L 2 X \in \mathcal{L}^{2} X∈L2
Remark
-
E { X } E\{X\} E{X} represents the expected, or average, value of X X X (often called the mean)
-
E { ∣ X − E ( X ) ∣ } = E { ∣ X − μ ∣ } E\{|X-E(X)|\}=E\{|X-\mu|\} E{∣X−E(X)∣}=E{∣X−μ∣} where μ = E { X } \mu=E\{X\} μ=E{X}, represents the average difference from the mean, and is a measure of how “spread out” the values of X X X are. It measures how the values vary form the mean
一阶绝对中心矩
-
The variance is the average squared distance from the mean. This has the effect of diminishing small deviations from the mean and enlarging big ones.
方差放大了大偏差,缩小了小偏差(马太效应)
-
The variance is usually easier to compute than E { ∣ X − μ ∣ } E\{|X-\mu|\} E{∣X−μ∣}, and often it has a simpler expression
方差比一阶绝对中心矩更好算,但是方差有其缺点
统计四大天王
- jrssb: Journal of the Royal Statistical Society, Series B
- aos: Annals of Statistics
- jasa: Journal of the American Statistical Association
- biometrika
Chebyshev’s Inequality
If X 2 X^2 X2 is in L 1 \mathcal{L}^1 L1 , then we have
(
a
)
.
P
{
∣
X
∣
≥
a
}
≤
E
{
X
2
}
a
2
for
a
>
0
(
b
)
.
P
{
∣
X
−
E
{
X
}
∣
≥
a
}
≤
σ
X
2
a
2
for
a
>
0
\begin{aligned} (a).\quad & P \left\{ |X| \ge a \right\} \le \frac{E \left\{ X^2 \right\}}{a^2} \quad \text{for} \quad a>0 \\ (b).\quad & P \left\{ |X-E\{X\}| \ge a \right\} \le \frac{\sigma_{X}^2}{a^2} \quad \text{for} \quad a>0 \end{aligned}
(a).(b).P{∣X∣≥a}≤a2E{X2}fora>0P{∣X−E{X}∣≥a}≤a2σX2fora>0
还是用证明定理5.1的方法来证明(略)
Examples
-
X X X is Poisson with parameter λ \lambda λ . Then E { X } = λ E\{X\}=\lambda E{X}=λ
-
X X X has the Bernoulli distribution if X X X takes on only two value: 0 and 1.
X X X corresponds to an experiment with only two outcomes, usually called “success” and “failure”
Usually { X = 1 } \{X=1\} {X=1} corresponds to “success”
Also it is customary to call P ( { X = 1 } ) = p P \left( \left\{ X=1 \right\} \right)=p P({X=1})=p and P ( { X = 0 } ) = q = 1 − p P \left( \left\{ X=0 \right\} \right)=q=1-p P({X=0})=q=1−p
Note E { X } = p E\{X\}=p E{X}=p
伯努利分布
-
X X X has the Binomial distribution if P X P^{X} PX is the Binomial probability.
That is, for given and fixed n n n, X X X can take on the values { 0 , 1 , . . . , n } \{0,1,...,n\} {0,1,...,n}
P ( { X = k } ) = C n k p k ( 1 − p ) n − k where 0 ≤ p ≤ 1 is fixed P \left( \left\{ X=k \right\} \right)= C_n^k p^k (1-p)^{n-k} \qquad \text{ where} \quad 0\le p \le 1 \quad \text{is fixed} P({X=k})=Cnkpk(1−p)n−k where0≤p≤1is fixed
Suppose we perform a success/failure experiment n n n times independently. LetY i = { 1 if success on the i t h trial 0 if failure on the i t h t r i a l Y_i = \left\{ \begin{array}{lll} 1 & \text{if success on the $i^{th}$ trial} \\ 0 & \text{if failure on the $i^{th}$ } trial \end{array} \right. Yi={10if success on the ith trialif failure on the ith trial
Then X = Y 1 + . . . + Y n X=Y_1+...+Y_n X=Y1+...+Yn has the Binomial distribution.
That is, a Binomial random variable is the sum of n n n Bernoulli random variables.
Therefore
E { X } = E { ∑ i = 1 n Y i } = ∑ i = 1 n E { Y i } = n p E\{X\} = E \left\{ \sum_{i=1}^{n} Y_i \right\}= \sum_{i=1}^{n} E\{Y_i\}=np E{X}=E{i=1∑nYi}=i=1∑nE{Yi}=np
Note that we could also have computed E { X } E\{X\} E{X} combinatorially by using the definition, but this would have been an unpleasant calculation.
二项分布,随机变量 X X X 是成功次数
-
-
We are performing repeated independent Bernoulli trials.
If instead of having a fixed number n n n of trials to be chosen in advance, suppose we keep performing trials until we have achieved a given number of success.
注意:这里的随机变量 X X X 代表第一次成功时,失败的次数,这样设置的好处可以从负二项分布看出来
-
Let X X X denote the number of failures before we reach a success
X X X has a Geometric distribution, with parameter 1 − p 1-p 1−p :
P ( X = k ) = ( 1 − p ) k p , k = 0 , 1 , 2 , . . . 注意这里的k从0开始计数 P (X=k) = (1-p)^{k}p,\quad k=0,1,2,...\quad \text{注意这里的k从0开始计数} P(X=k)=(1−p)kp,k=0,1,2,...注意这里的k从0开始计数Where p p p is the probability of success.
We then have E { X } = 1 − p p = 1 p − 1 E\{X\}=\frac{1-p}{p}=\frac{1}{p}-1 E{X}=p1−p=p1−1
含义:若成功的概率为0.1,则一般做10次试验才成功,即做9次失败的试验后会成功一次。
-
-
-
In the same framework as Geometric distribution, if we continue independent Bernoulli trials until we achieve the r t h r^{th} rth success, then we have Pascal’s distribution, also known as the Negative Binomial distribution.
-
We say X X X has the Negative Binomial distribution with parameters r r r and p p p if
P ( X = j ) = C j + r − 1 r − 1 p r ( 1 − p ) j for j = 0 , 1 , . . . P(X=j) = C_{j+r-1}^{r-1} p^r (1-p)^{j} \quad \text{for $j=0,1,...$} P(X=j)=Cj+r−1r−1pr(1−p)jfor j=0,1,...
-
X X X represents the number of failures that must be observed before r r r success are observed
随机变量 X X X 代表 r r r 次成功时,失败的总次数,这样的设随机变量的好处是可以将其分解为两次相邻的成功之间的失败次数。
-
If one is interested in the total number of trials required, call that r . v . Y r.v. \ Y r.v. Y , then Y = X + r Y=X+r Y=X+r
-
Note that if X X X is Negative Binomial, then
X = ∑ i = 1 r Z i X = \sum_{i=1}^{r} Z_i X=i=1∑rZi
Where Z i Z_i Zi are geometric random variables with parameter 1 − p 1-p 1−p. Therefore
E { X } = ∑ i = 1 r E { Z i } = r ( 1 − p ) p = r × 1 − p p E\{X\}=\sum_{i=1}^{r} E\{Z_i\}=\frac{r(1-p)}{p}=r \times \frac{1-p}{p} E{X}=i=1∑rE{Zi}=pr(1−p)=r×p1−p
-
i . i . d i.i.d i.i.d :independently identically distribution
-
-
A distribution common in the social sciences is the Pareto distribution, also known as the Zeta distribution.
-
X X X takes its values in Z + \mathbb{Z}^{+} Z+, where
P ( X = j ) = c 1 j α + 1 , j = 1 , 2 , 3 , . . . for a fixed parameter α > 0 P(X=j)=c \frac{1}{j^{\alpha +1}}, \quad j=1,2,3,... \qquad \text{for a fixed parameter $\alpha>0$} P(X=j)=cjα+11,j=1,2,3,...for a fixed parameter α>0
-
The constant c c c is such that c ∑ j = 1 ∞ 1 j α + 1 = 1 c \sum_{j=1}^{\infty} \frac{1}{j^{\alpha+1}}=1 c∑j=1∞jα+11=1
-
The function
ζ ( s ) = ∑ k = 1 ∞ 1 k s , s > 1 \zeta(s) = \sum_{k=1}^{\infty} \frac{1}{k^s}, \quad s>1 ζ(s)=k=1∑∞ks1,s>1
is known as the Riemann zeta function, and it is extensively tabulated. -
c = 1 ζ ( α + 1 ) c = \frac{1}{\zeta(\alpha+1)} c=ζ(α+1)1
-
P ( X = j ) = 1 ζ ( α + 1 ) 1 j α + 1 P(X=j)=\frac{1}{\zeta(\alpha+1)}\frac{1}{j^{\alpha+1}} P(X=j)=ζ(α+1)1jα+11
-
If α > 1 \alpha>1 α>1, then
E { X } = ∑ j = 1 ∞ j P ( X = j ) = 1 ζ ( α + 1 ) ∑ j = 1 ∞ j j α + 1 = 1 ζ ( α + 1 ) ∑ j = 1 ∞ 1 j α = ζ ( α ) ζ ( α + 1 ) E\{X\} = \sum_{j=1}^{\infty} jP(X=j) = \frac{1}{\zeta(\alpha+1)} \sum_{j=1}^{\infty} \frac{j}{j^{\alpha+1}} = \frac{1}{\zeta(\alpha+1)} \sum_{j=1}^{\infty} \frac{1}{j^{\alpha}} = \frac{\zeta(\alpha)}{\zeta(\alpha+1)} E{X}=j=1∑∞jP(X=j)=ζ(α+1)1j=1∑∞jα+1j=ζ(α+1)1j=1∑∞jα1=ζ(α+1)ζ(α)
-
-
If the state space E E E of a random variable X X X has only a finite number of points, say n n n, and each point is equally likely, then X X X is said to have a uniform distribution.
In this case
P ( X = j ) = 1 n , j = 1 , 2 , . . . , n P(X=j)=\frac{1}{n}, \quad j=1,2,...,n P(X=j)=n1,j=1,2,...,n
then X X X has the Discrete Uniform distribution with parameter n n nE { X } = n + 1 2 E\{X\}=\frac{n+1}{2} E{X}=2n+1
Exercise 5.2 尽量不用定理来证明
Let h : R → [ 0 , a ] h:\mathbb{R}\to[0,a] h:R→[0,a] be a nonnegative (bounded) function.
Show that for 0 ≤ a ≤ α 0\le a\le \alpha 0≤a≤α
P
{
h
(
X
)
≥
a
}
≥
E
{
h
(
X
)
−
a
}
α
−
a
P\{h(X)\ge a\} \ge \frac{E\{h(X)-a\}}{\alpha -a}
P{h(X)≥a}≥α−aE{h(X)−a}
(补)
Proof.
P { h ( X ) ≥ a } ≥ E { h ( X ) − a } α − a ⇔ P { h ( X ) < a } ≤ α − E { h ( X ) } α − a ⇔ P { α − h ( X ) > α − a } ≤ α − E { h ( X ) } α − a \begin{aligned} &P \{ h(X)\ge a \} \ge \frac{E\{h(X) -a\}}{\alpha-a} \\ \Leftrightarrow & P \{h(X) < a \} \le \frac{\alpha-E \{h(X)\}}{\alpha-a} \\ \Leftrightarrow & P \{ \alpha - h(X) > \alpha -a \} \le \frac{\alpha - E \{h(X)\}}{\alpha-a} \end{aligned} ⇔⇔P{h(X)≥a}≥α−aE{h(X)−a}P{h(X)<a}≤α−aα−E{h(X)}P{α−h(X)>α−a}≤α−aα−E{h(X)}
Exercise 5.9-5.10
Let X X X be Poisson( λ \lambda λ),
- What value of j j j maximizes P ( X = j ) P(X=j) P(X=j)?
Solve
P ( X = j ) P ( X = j + 1 ) = ⋯ = j + 1 λ \frac{P(X=j)}{P(X=j+1)} = \cdots = \frac{j+1}{\lambda} P(X=j+1)P(X=j)=⋯=λj+1
由这个式子可以看出来, 当 j j j 取 λ \lambda λ 前面的整数时可以取到最大值
- For fixed j > 0 j>0 j>0, what value of λ \lambda λ maximizes P ( X = j ) P(X=j) P(X=j)?
Solve
P { X = j } = λ j j ! e − λ 看作是关于 λ 的一个函数 P\{X=j\}= \frac{\lambda^j}{j!}e^{-\lambda} \qquad \text{看作是关于 $\lambda$ 的一个函数} P{X=j}=j!λje−λ看作是关于 λ 的一个函数
令其为 f ( λ ) = λ j j ! e − λ f(\lambda)=\frac{\lambda^j}{j!}e^{-\lambda} f(λ)=j!λje−λ
f ′ ( λ ) = ⋯ = 1 j ! ( ⋯ ) = e − λ λ j − 1 j ! ( j − λ ) = 0 ⇓ λ = j f'(\lambda)=\cdots=\frac{1}{j!}(\cdots) = \frac{e^{-\lambda} \lambda^{j-1}}{j!}(j-\lambda) =0 \\ \Downarrow \\ \lambda = j f′(λ)=⋯=j!1(⋯)=j!e−λλj−1(j−λ)=0⇓λ=j
分析: λ > j f ′ ( λ ) < 0 λ < j f ′ ( λ ) > 0 ⇒ f ( λ ) \begin{array}{lll} \lambda > j & f'(\lambda)<0 \\ \lambda < j & f'(\lambda) > 0 \end{array} \Rightarrow \quad f(\lambda) λ>jλ<jf′(λ)<0f′(λ)>0⇒f(λ) 在 λ = j \lambda=j λ=j 时取最大值
Exercise 5.11
Let X X X be Poisson( λ \lambda λ) with λ \lambda λ a positive integer.
Show
E { ∣ X − λ ∣ } = 2 λ λ e − λ ( λ − 1 ) ! E\{|X-\lambda|\}=\frac{2\lambda^{\lambda} e^{-\lambda}}{(\lambda-1)!} E{∣X−λ∣}=(λ−1)!2λλe−λ
Proof.
Obviously
E { ∣ X − λ ∣ } = ∑ k = 0 λ ( λ − k ) λ k k ! e − λ + ∑ k = λ + 1 ∞ ( k − λ ) λ k k ! e − λ ≜ l 1 + l 2 E\{|X-\lambda|\} = \sum_{k=0}^{\lambda} (\lambda-k) \frac{\lambda^k}{k!}e^{-\lambda} + \sum_{k=\lambda+1}^{\infty} (k-\lambda) \frac{\lambda^k}{k!}e^{-\lambda} \triangleq l_1+l_2 E{∣X−λ∣}=k=0∑λ(λ−k)k!λke−λ+k=λ+1∑∞(k−λ)k!λke−λ≜l1+l2
On the other hand
l 1 = λ ∑ k = 0 λ λ k k ! e − λ − ∑ k = 1 λ λ k ( k − 1 ) ! e − λ = λ ∑ k = 0 λ λ k k ! e − λ − λ ∑ k = 0 λ − 1 λ k k ! e − λ l 2 = ∑ k = λ + 1 ∞ λ k ( k − 1 ) ! e − λ − ∑ k = λ + 1 ∞ λ k k ! e − λ = λ ∑ k = λ ∞ λ k k ! e − λ − λ ∑ k = λ + 1 ∞ λ k k ! e − λ \begin{aligned} l_1 &= \lambda \sum_{k=0}^{\lambda} \frac{\lambda^k}{k!}e^{-\lambda} - \sum_{k=1}^{\lambda} \frac{\lambda^k}{(k-1)!}e^{-\lambda} = \lambda \sum_{k=0}^{\lambda} \frac{\lambda^k}{k!}e^{-\lambda} - \lambda \sum_{k=0}^{\lambda-1} \frac{\lambda^k}{k!}e^{-\lambda} \\ l_2 &= \sum_{k=\lambda+1}^{\infty} \frac{\lambda^k}{(k-1)!}e^{-\lambda} - \sum_{k=\lambda+1}^{\infty} \frac{\lambda^{k}}{k!} e^{-\lambda} = \lambda \sum_{k=\lambda}^{\infty} \frac{\lambda^k}{k!}e^{-\lambda} - \lambda \sum_{k=\lambda+1}^{\infty} \frac{\lambda^k}{k!}e^{-\lambda} \end{aligned} l1l2=λk=0∑λk!λke−λ−k=1∑λ(k−1)!λke−λ=λk=0∑λk!λke−λ−λk=0∑λ−1k!λke−λ=k=λ+1∑∞(k−1)!λke−λ−k=λ+1∑∞k!λke−λ=λk=λ∑∞k!λke−λ−λk=λ+1∑∞k!λke−λ
Combining l 1 l_1 l1 and l 2 l_2 l2 , we get the result.对于 l 2 l_2 l2 ,要保证其绝对收敛。
Exercise 5.13
Let X n X_n Xn be Binomial B ( n , p ) B(n,p) B(n,p) with 0 < p < 1 0<p<1 0<p<1 fixed.
Show that for any fixed b > 0 , P ( X n ≤ b ) b>0,P(X_n\le b) b>0,P(Xn≤b) tends to 0 0 0
Proof.
One can have
P ( X n ≤ b ) = ∑ i = 0 [ b ] C n i p i ( 1 − p ) n − i ≤ ∑ i = 1 [ b ] n i ( 1 − p ) n − [ b ] ≤ C n [ b ] ( 1 − p ) n → 0 as n → ∞ \begin{aligned} P(X_n \le b) &= \sum_{i=0}^{[b]} C_n^i p^i (1-p)^{n-i} \\ &\le \sum_{i=1}^{[b]} n^i (1-p)^{n-[b]} \\ &\le C_n^{[b]} (1-p)^{n} \\ &\to 0 \qquad \text{as} \quad n \to \infty \end{aligned} P(Xn≤b)=i=0∑[b]Cnipi(1−p)n−i≤i=1∑[b]ni(1−p)n−[b]≤Cn[b](1−p)n→0asn→∞
Since 0 < p < 1 0<p<1 0<p<1 and b b b are fixed real numbers.补充:
C n i = n ( n − 1 ) ( n − 2 ) ⋯ ( n − i + 1 ) i ! < n i p i < 1 ( 1 − p ) n − i < ( 1 − p ) n − [ b ] \begin{aligned} & C_n^i = \frac{n(n-1)(n-2)\cdots (n-i+1)}{i!}<n^i \\ & p^i < 1 \\ & (1-p)^{n-i} < (1-p)^{n-[b]} \end{aligned} Cni=i!n(n−1)(n−2)⋯(n−i+1)<nipi<1(1−p)n−i<(1−p)n−[b]