文章目录
The Order Statistic
所谓顺序统计量, 即一族独立的观测
X
1
,
X
2
,
…
,
X
n
X_1, X_2, \ldots, X_n
X1,X2,…,Xn的排序后的产物
X
(
1
)
≤
X
(
2
)
≤
⋯
≤
X
(
n
)
.
X_{(1)} \le X_{(2)} \le \cdots \le X_{(n)}.
X(1)≤X(2)≤⋯≤X(n).
用大写的原因, 自然是我们可以将每一个元
X
(
i
)
X_{(i)}
X(i)看成一个随机变量, 实际上它是
X
i
,
i
=
1
,
…
,
n
X_i, i=1,\ldots, n
Xi,i=1,…,n的一个函数,
X
(
i
)
=
X
(
i
)
(
X
1
,
X
2
,
⋯
,
X
n
)
X_{(i)} = X_{(i)}(X_1,X_2,\cdots, X_n)
X(i)=X(i)(X1,X2,⋯,Xn).
推导顺序统计量的性质, 需要用到一个非常有用的表示方法, 设
F
(
x
)
=
P
(
X
≤
x
)
F(x)=P(X\le x)
F(x)=P(X≤x)为分布函数, 定义其逆为
F
−
1
(
y
)
=
inf
{
x
:
F
(
x
)
≥
y
}
,
F^{-1}(y) = \inf \{x: F(x) \ge y\},
F−1(y)=inf{x:F(x)≥y},
有一个很好的性质是, 设
U
U
U为
[
0
,
1
]
[0,1]
[0,1]上的均匀分布, 则
F
−
1
(
U
)
=
F
=
X
,
F^{-1}(U) = F=X,
F−1(U)=F=X,
实际上, 这是因为
P
(
F
−
1
(
U
)
≤
u
)
⇔
P
(
U
≤
F
(
u
)
)
=
F
(
u
)
P(F^{-1}(U) \le u) \Leftrightarrow P(U \le F(u))=F(u)
P(F−1(U)≤u)⇔P(U≤F(u))=F(u).
故, 倘若我们有独立的随机变量
U
1
,
U
2
,
…
,
U
n
U_1, U_2, \ldots, U_n
U1,U2,…,Un以及独立同分布的
X
1
,
X
2
,
…
,
X
n
X_1, X_2,\ldots, X_n
X1,X2,…,Xn, 我们有
(
X
(
1
)
,
X
(
2
)
,
⋯
,
X
(
n
)
)
=
(
F
−
1
(
U
(
1
)
)
,
F
−
1
(
U
(
2
)
)
,
⋯
,
F
−
1
(
U
(
n
)
)
)
.
(X_{(1)}, X_{(2)}, \cdots, X_{(n)}) = (F^{-1}(U_{(1)}), F^{-1}(U_{(2)}), \cdots, F^{-1}(U_{(n)})).
(X(1),X(2),⋯,X(n))=(F−1(U(1)),F−1(U(2)),⋯,F−1(U(n))).
另外, 令
F
n
F_n
Fn表示
X
X
X的一个经验分布, 显示为
F
n
(
x
)
=
1
n
∑
i
=
1
n
I
(
X
i
≤
x
)
.
F_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{I}(X_i \le x).
Fn(x)=n1i=1∑nI(Xi≤x).
并令
ξ
p
:
=
F
−
1
(
p
)
,
ξ
^
p
n
:
=
F
n
−
1
(
p
)
.
\xi_p := F^{-1}(p), \quad \hat{\xi}_{pn} := F_n^{-1}(p).
ξp:=F−1(p),ξ^pn:=Fn−1(p).
引理1 F − 1 F^{-1} F−1的一些基本性质
引理1: 假设 F F F为一分布函数, 则 F − 1 ( t ) , 0 < t < 1 F^{-1}(t), 0 < t < 1 F−1(t),0<t<1是非降左连续的且满足
- F − 1 F ( x ) ≤ x , − ∞ < x < ∞ F^{-1}F(x) \le x, -\infty < x < \infty F−1F(x)≤x,−∞<x<∞;
- F ( F − 1 ( t ) ) ≥ t , 0 < t < 1 F(F^{-1}(t)) \ge t, 0 < t < 1 F(F−1(t))≥t,0<t<1;
- F ( x ) ≥ t F(x) \ge t F(x)≥t当前仅当 x ≥ F − 1 ( t ) x \ge F^{-1}(t) x≥F−1(t).
注: F ( x ) F(x) F(x)是非降右连续.
顺序统计量的分布
定理1: 设 F ( x ) F(x) F(x)存在密度函数 f ( x ) f(x) f(x).
-
P ( X ( k ) ≤ x ) = ∑ i = k n C n i [ F ( x ) ] i [ 1 − F ( x ) ] n − i , − ∞ < x < ∞ . P(X_{(k)} \le x) = \sum_{i=k}^n \mathrm{C}_n^i [F(x)]^i [1-F(x)]^{n-i}, -\infty < x < \infty. P(X(k)≤x)=i=k∑nCni[F(x)]i[1−F(x)]n−i,−∞<x<∞.
-
X k X_k Xk的密度函数为
n C n − 1 k − 1 F k − 1 ( x ) [ 1 − F ( x ) ] n − k f ( x ) . n\mathrm{C}_{n-1}^{k-1} F^{k-1}(x) [1-F(x)]^{n-k} f(x). nCn−1k−1Fk−1(x)[1−F(x)]n−kf(x). -
X ( k 1 ) , X ( k 2 ) X_{(k_1)}, X_{(k_2)} X(k1),X(k2)的联合密度函数( x 1 < x 2 , k 1 < k 2 x_1<x_2, k_1<k_2 x1<x2,k1<k2)为
n ! ( k 1 − 1 ) ! ( k 2 − k 1 − 1 ) ! ( n − k 2 ) ! [ F ( x 1 ) ] k 1 − 1 [ F ( x 2 ) − F ( x 1 ) ] k 2 − k 1 − 1 [ 1 − F ( x 2 ) ] n − k 2 f ( x 1 ) f ( x 2 ) . \frac{n!}{(k_1-1)!(k_2-k_1-1)!(n-k_2)!}[F(x_1)]^{k_1-1} [F(x_2)-F(x_1)]^{k_2-k_1-1} \\ [1-F(x_2)]^{n-k_2} f(x_1)f(x_2). (k1−1)!(k2−k1−1)!(n−k2)!n![F(x1)]k1−1[F(x2)−F(x1)]k2−k1−1[1−F(x2)]n−k2f(x1)f(x2). -
全体顺序统计量的密度函数为
n ! f ( x 1 ) f ( x 2 ) ⋯ f ( z n ) , − ∞ < x 1 < x 2 < ⋯ < x n < ∞ . n!f(x_1)f(x_2)\cdots f(z_n), \quad -\infty < x_1<x_2<\cdots <x_n < \infty. n!f(x1)f(x2)⋯f(zn),−∞<x1<x2<⋯<xn<∞.
proof: 1, 2的证明是简单的, 3需注意
X
(
k
1
)
,
X
(
k
2
)
X_{(k_1)}, X_{(k_2)}
X(k1),X(k2)的分布函数为
KaTeX parse error: Invalid delimiter: '{"type":"ordgroup","mode":"math","loc":{"lexer":{"input":"\n\\sum_{i=k_2}^n \\mathrm{C}_n^i [1-F(x_2)]^{n-i} \\Big{\\{} \\sum_{j=k_1}^i \\mathrm{C}_{k_2}^j [F(x_1)]^i [F(x_2)-F(x_1)]^{k_2-j} \\Big{\\}}.\n","settings":{"displayMode":true,"leqno":false,"fleqn":false,"throwOnError":true,"errorColor":"#cc0000","macros":{},"colorIsTextColor":false,"strict":"warn","maxSize":null,"maxExpand":1000,"allowedProtocols":["http","https","mailto","_relative"]},"tokenRegex":{},"catcodes":{"%":14}},"start":52,"end":56},"body":[{"type":"atom","mode":"math","family":"open","loc":{"lexer":{"input":"\n\\sum_{i=k_2}^n \\mathrm{C}_n^i [1-F(x_2)]^{n-i} \\Big{\\{} \\sum_{j=k_1}^i \\mathrm{C}_{k_2}^j [F(x_1)]^i [F(x_2)-F(x_1)]^{k_2-j} \\Big{\\}}.\n","settings":{"displayMode":true,"leqno":false,"fleqn":false,"throwOnError":true,"errorColor":"#cc0000","macros":{},"colorIsTextColor":false,"strict":"warn","maxSize":null,"maxExpand":1000,"allowedProtocols":["http","https","mailto","_relative"]},"tokenRegex":{},"catcodes":{"%":14}},"start":53,"end":55},"text":"\\{"}]}' after '\Big' at position 53: …_2)]^{n-i} \Big{̲\̲{̲}̲ \sum_{j=k_1}^i…
此公式进行求导实际上是和1, 2的证明是类似的. 4的证明是平凡的.
顺序统计量的条件分布
定理2: 设 F ( x ) F(x) F(x)存在密度函数 f ( x ) f(x) f(x), 则 X ( j ) ∣ X ( i ) , i < j X_{(j)}|X_{(i)}, i< j X(j)∣X(i),i<j的分布等价于以 F ( x ) − F ( x i ) 1 − F ( x i ) , x i ≤ x < ∞ \frac{F(x)-F(x_i)}{1-F(x_i)}, x_i \le x < \infty 1−F(xi)F(x)−F(xi),xi≤x<∞为分布函数的 n − i n-i n−i个顺序统计量的第 j − i j-i j−i个分布.
proof:
KaTeX parse error: Invalid delimiter: '{"type":"ordgroup","mode":"math","loc":{"lexer":{"input":"\n\\begin{array}{ll}\nf(x_j|X_{(i)}=x_i)\n&= f_{X_(i), X_{(j)}}(x_i, x_j) / f_{X_{(i)}}(x_i) \\\\\n&= \\frac{(n-i)!}{(j-i-1)!(n-j)!} \\Big{\\{} \\frac{F(x_j)-F(x_i)}{1-F(x_i)} \\Big{\\}}^{j-i-1} \\times \\Big{\\{} \\frac{1-F(x_j)}{1-F(x_i)} \\Big{\\}} \\frac{f(x_j)}{1-F(x_i)} \\\\\n&= (n-i)\\mathrm{C}_{n-i-1}^{j-i-1} [F_i(x_j)]^{j-i-1} [1-F_i(x_j)]^{n-j} [F_i(x_j)]'.\n\\end{array}\n","settings":{"displayMode":true,"leqno":false,"fleqn":false,"throwOnError":true,"errorColor":"#cc0000","macros":{"\\\\":"\\cr"},"colorIsTextColor":false,"strict":"warn","maxSize":null,"maxExpand":1000,"allowedProtocols":["http","https","mailto","_relative"]},"tokenRegex":{},"catcodes":{"%":14}},"start":129,"end":133},"body":[{"type":"atom","mode":"math","family":"open","loc":{"lexer":{"input":"\n\\begin{array}{ll}\nf(x_j|X_{(i)}=x_i)\n&= f_{X_(i), X_{(j)}}(x_i, x_j) / f_{X_{(i)}}(x_i) \\\\\n&= \\frac{(n-i)!}{(j-i-1)!(n-j)!} \\Big{\\{} \\frac{F(x_j)-F(x_i)}{1-F(x_i)} \\Big{\\}}^{j-i-1} \\times \\Big{\\{} \\frac{1-F(x_j)}{1-F(x_i)} \\Big{\\}} \\frac{f(x_j)}{1-F(x_i)} \\\\\n&= (n-i)\\mathrm{C}_{n-i-1}^{j-i-1} [F_i(x_j)]^{j-i-1} [1-F_i(x_j)]^{n-j} [F_i(x_j)]'.\n\\end{array}\n","settings":{"displayMode":true,"leqno":false,"fleqn":false,"throwOnError":true,"errorColor":"#cc0000","macros":{"\\\\":"\\cr"},"colorIsTextColor":false,"strict":"warn","maxSize":null,"maxExpand":1000,"allowedProtocols":["http","https","mailto","_relative"]},"tokenRegex":{},"catcodes":{"%":14}},"start":130,"end":132},"text":"\\{"}]}' after '\Big' at position 130: …1)!(n-j)!} \Big{̲\̲{̲}̲ \frac{F(x_j)-F…
对比定理1中的公式即可知.
定理3: 设 F ( x ) F(x) F(x)存在密度函数 f ( x ) f(x) f(x), 则 X ( i ) ∣ X ( j ) , i < j X_{(i)}|X_{(j)}, i<j X(i)∣X(j),i<j的分布等价于以 F ( x ) F ( x j ) , − ∞ < x ≤ x j \frac{F(x)}{F(x_j)}, -\infty < x \le x_j F(xj)F(x),−∞<x≤xj为分布的 j − 1 j-1 j−1个顺序统计量的第 i i i个分布.
proof: 证明同上.
特殊分布的特殊性质
定理4: 设
X
1
,
X
2
,
…
,
X
n
X_1, X_2, \ldots, X_n
X1,X2,…,Xn独立服从于标准指数分布, 令
Z
i
:
=
(
n
−
i
+
1
)
(
X
(
i
)
−
X
(
i
−
1
)
)
,
X
(
0
)
≡
0
,
Z_i := (n-i+1) (X_{(i)} - X_{(i-1)}), \quad X_{(0)} \equiv 0,
Zi:=(n−i+1)(X(i)−X(i−1)),X(0)≡0,
则
Z
1
,
Z
2
,
…
,
Z
n
Z_1, Z_2,\ldots,Z_n
Z1,Z2,…,Zn也独立服从于标准指数分布.
proof: 通过变量替换并利用Jacobian行列式从 x x x变换到 z z z, 需要注意俩个分布的区域的差别.
定理5: 对于 [ 0 , 1 ] [0, 1] [0,1]上的均匀分布, 则随机变量 V 1 = U ( i ) / U ( j ) V_1 = U_{(i)} / U_{(j)} V1=U(i)/U(j) 且 V 2 = U ( j ) , 1 ≤ i < j ≤ n V_2=U_{(j)}, 1 \le i < j \le n V2=U(j),1≤i<j≤n, 独立, 前者服从 B e t a ( i , j − 1 ) Beta(i, j-1) Beta(i,j−1), 后者服从 B e t a ( j , n − j + 1 ) Beta(j, n-j+1) Beta(j,n−j+1).
proof: 同上利用变量替换.
定理6: 对于
[
0
,
1
]
[0, 1]
[0,1]上的均匀分布, 则随机变量
V
1
∗
=
U
(
1
)
U
(
2
)
,
V
2
∗
=
(
U
(
2
)
U
(
3
)
)
2
,
⋯
,
V
n
−
1
∗
=
(
U
(
n
−
1
)
U
(
n
)
)
2
,
V
n
∗
=
U
(
n
)
n
,
V_1^* = \frac{U_{(1)}}{U_{(2)}}, V_2^*=\Big(\frac{U_{(2)}}{U_{(3)}}\Big)^2, \cdots, V_{n-1}^*=\Big(\frac{U_{(n-1)}}{U_{(n)}}\Big)^2, V_n^*=U_{(n)}^n,
V1∗=U(2)U(1),V2∗=(U(3)U(2))2,⋯,Vn−1∗=(U(n)U(n−1))2,Vn∗=U(n)n,
独立且均服从于
[
0
,
1
]
[0, 1]
[0,1]的均匀分布.
proof: 同样可以用变量替换来做, 不过文中是转换成指数分布然后利用前面的结论来证明的.
ξ ^ p n − ξ p \hat{\xi}_{pn}-\xi_p ξ^pn−ξp
定理7: 令
0
<
p
<
1.
0 < p < 1.
0<p<1. 假设
ξ
p
\xi_p
ξp存在唯一解
x
x
x使得
F
(
x
−
)
≤
p
≤
F
(
x
)
F(x^{-}) \le p \le F(x)
F(x−)≤p≤F(x), 则
P
(
∣
ξ
^
p
n
−
ξ
p
∣
>
ϵ
)
≤
2
exp
(
−
2
n
δ
ϵ
2
)
,
∀
ϵ
>
0
,
n
,
P(|\hat{\xi}_{pn} - \xi_p| > \epsilon) \le 2 \exp (-2n\delta_{\epsilon}^2), \forall \epsilon > 0, n,
P(∣ξ^pn−ξp∣>ϵ)≤2exp(−2nδϵ2),∀ϵ>0,n,
其中
δ
ϵ
=
min
{
F
(
ξ
p
+
ϵ
)
−
p
,
p
−
F
(
ξ
p
−
ϵ
)
}
\delta_{\epsilon} = \min \{F(\xi_p+\epsilon)-p, p-F(\xi_p-\epsilon)\}
δϵ=min{F(ξp+ϵ)−p,p−F(ξp−ϵ)}.
proof: 证明拆成并用到了Hoffeding不等式, 感觉挺有技巧性的.
F n F_n Fn
定理11:
- E ( F n ( x ) ) = F ( x ) \mathbb{E}(F_n(x)) = F(x) E(Fn(x))=F(x);
- V a r ( F n ( x ) ) = F ( x ) ( 1 − F ( x ) ) n → 0. \mathrm{Var}(F_n(x)) = \frac{F(x)(1-F(x))}{n}\rightarrow 0. Var(Fn(x))=nF(x)(1−F(x))→0.
proof: 只需注意到, n F n ( x ) nF_n(x) nFn(x)实际上服从的是 b i n o m i a l ( n , F ( x ) ) \mathrm{binomial}(n, F(x)) binomial(n,F(x))即可.
定理12:
P
{
sup
x
∣
F
n
(
x
)
−
F
(
x
)
∣
→
0
}
=
1.
P\{\sup_x |F_n(x) - F(x)| \rightarrow 0\} = 1.
P{xsup∣Fn(x)−F(x)∣→0}=1.
proof: 令
ϵ
>
0
\epsilon >0
ϵ>0, 取
k
>
1
/
ϵ
k > 1/\epsilon
k>1/ϵ以及
−
∞
=
x
0
<
x
1
<
⋯
<
x
k
−
1
<
x
k
=
∞
-\infty =x_0 < x_1 < \cdots < x_{k-1} < x_k = \infty
−∞=x0<x1<⋯<xk−1<xk=∞
使得
F
(
x
j
−
)
≤
j
/
k
≤
F
(
x
j
)
,
j
=
1
…
,
k
−
1
F(x_j^-) \le j/k\le F(x_j), j=1\ldots, k-1
F(xj−)≤j/k≤F(xj),j=1…,k−1. 若
x
j
−
1
<
x
j
x_{j-1}< x_j
xj−1<xj, 则
F
(
x
j
−
)
−
F
(
x
j
−
1
)
<
ϵ
F(x_j^-)-F(x_{j-1}) < \epsilon
F(xj−)−F(xj−1)<ϵ.
根据强大数定律, 有
F
n
(
x
j
)
→
a
.
s
.
F
(
x
j
)
,
F
n
(
x
j
−
)
→
a
.
s
.
F
(
x
j
−
)
,
j
=
1
,
…
,
k
−
1.
F_n(x_j) \mathop{\rightarrow} \limits^{a.s.} F(x_j), F_n(x_j^-) \mathop{\rightarrow} \limits^{a.s.} F(x_j^-), j=1,\ldots, k-1.
Fn(xj)→a.s.F(xj),Fn(xj−)→a.s.F(xj−),j=1,…,k−1.
故
Δ
n
=
max
(
∣
F
n
(
x
j
)
−
F
(
x
j
)
∣
,
∣
F
n
(
x
j
−
)
−
F
(
x
j
−
)
∣
,
j
=
1
,
…
,
k
−
1
)
→
a
.
s
.
0.
\Delta_n = \max(|F_n(x_j) - F(x_j)|, |F_n(x_j^-) - F(x_j^-)|, j=1,\ldots,k-1) \mathop{\rightarrow} \limits^{a.s.} 0.
Δn=max(∣Fn(xj)−F(xj)∣,∣Fn(xj−)−F(xj−)∣,j=1,…,k−1)→a.s.0.
对于
x
j
−
1
<
x
<
x
j
−
x_{j-1}< x < x_j^-
xj−1<x<xj− (注
x
=
x
j
x=x_j
x=xj的情况下面不等式成立是天然的):
F
n
(
x
)
−
F
(
x
)
≤
F
n
(
x
j
−
)
−
F
(
x
j
−
1
)
≤
F
n
(
x
j
−
)
−
F
(
x
j
−
)
+
ϵ
≤
Δ
n
+
ϵ
F
n
(
x
)
−
F
(
x
)
≥
F
n
(
x
j
−
1
)
−
F
(
x
j
−
)
≥
F
n
(
x
j
−
1
)
−
F
(
x
j
−
1
)
−
ϵ
≥
Δ
n
−
ϵ
.
F_n(x) - F(x) \le F_n(x_j^-) - F(x_{j-1}) \le F_n(x_j^-)-F(x_j^-)+\epsilon\le \Delta_n + \epsilon \\ F_n(x) - F(x) \ge F_n(x_{j-1}) - F(x_j^-) \ge F_n(x_{j-1}) - F(x_{j-1}) -\epsilon \ge \Delta_n - \epsilon.
Fn(x)−F(x)≤Fn(xj−)−F(xj−1)≤Fn(xj−)−F(xj−)+ϵ≤Δn+ϵFn(x)−F(x)≥Fn(xj−1)−F(xj−)≥Fn(xj−1)−F(xj−1)−ϵ≥Δn−ϵ.
故
sup
x
∣
F
n
(
x
)
−
F
(
x
)
∣
≤
Δ
n
+
ϵ
→
a
.
s
.
ϵ
.
\sup_x|F_n(x) - F(x)| \le \Delta_n + \epsilon \mathop{\rightarrow}\limits^{a.s.} \epsilon.
xsup∣Fn(x)−F(x)∣≤Δn+ϵ→a.s.ϵ.
对于任意的
ϵ
\epsilon
ϵ均成立. 故不等式成立.
注: 这里的证明和文中的有点不同, 感觉这么写更加合理.
注: 文中还讲了不少其它特别是渐进性质, 能力有限只能看个大概, 便不记录了.