文章目录
定义
统计量是一些随机样本
X
1
,
X
2
,
⋯
,
X
n
X_1, X_2, \cdots, X_n
X1,X2,⋯,Xn的函数
T
=
r
(
X
1
,
X
2
,
⋯
,
X
n
)
.
T = r(X_1, X_2, \cdots, X_n).
T=r(X1,X2,⋯,Xn).
样本
X
X
X的分布
f
θ
(
X
)
=
f
(
X
;
θ
)
f_{\theta}(X)=f(X;\theta)
fθ(X)=f(X;θ)由位置参数
θ
\theta
θ决定, 通常我们通过极大似然估计
max
θ
P
(
X
1
,
X
2
,
⋯
,
X
n
;
θ
)
=
∏
i
=
1
n
P
(
X
i
;
θ
)
=
∏
i
=
1
n
f
θ
(
X
i
)
.
\max_{\theta} \quad P(X_1,X_2,\cdots, X_n ;\theta) = \prod_{i=1}^n P(X_i;\theta) = \prod_{i=1}^n f_{\theta}(X_i).
θmaxP(X1,X2,⋯,Xn;θ)=i=1∏nP(Xi;θ)=i=1∏nfθ(Xi).
而充分统计量是指这样的统计量:
P
(
{
X
i
}
∣
T
=
t
;
θ
)
=
P
(
{
X
i
}
∣
T
=
t
)
,
P(\{X_i\}|T=t;\theta) = P(\{X_i\}|T=t),
P({Xi}∣T=t;θ)=P({Xi}∣T=t),
即在给定
T
(
X
)
=
t
T(X)=t
T(X)=t的情况下,
{
X
i
}
\{X_i\}
{Xi}的条件联合分布与未知参数
θ
\theta
θ无关.
Example: 考虑伯努利分布, 成功的概率为
p
p
p, 失败的概率为
1
−
p
1-p
1−p, 有
n
n
n个独立同分布的样本
X
1
,
X
2
,
⋯
,
X
n
X_1, X_2,\cdots, X_n
X1,X2,⋯,Xn, 则:
P
(
{
X
i
}
;
p
)
=
p
∑
i
X
i
(
1
−
p
)
n
−
∑
i
X
i
,
P(\{X_i\};p) = p^{\sum_i X_i}(1-p)^{n-\sum_i X_i},
P({Xi};p)=p∑iXi(1−p)n−∑iXi,
实际上(后面会讲到)
T
=
∑
i
n
X
i
T=\sum_i^n X_i
T=∑inXi为其一充分统计量. 实际上,
P
(
{
X
i
}
∣
T
=
t
;
p
)
=
P
(
{
X
i
}
,
T
=
t
;
p
)
P
(
T
=
t
;
p
)
=
I
[
∑
i
n
X
i
=
t
]
⋅
p
t
(
1
−
p
)
n
−
t
C
n
t
p
t
(
1
−
p
)
n
−
t
=
I
[
∑
i
n
X
i
=
t
]
C
n
t
.
P(\{X_i\}|T=t;p) = \frac{P(\{X_i\}, T=t; p)}{P(T=t;p)} = \frac{\mathbb{I}[{\sum_{i}^nX_i=t]}\cdot p^t (1-p)^{n-t}}{C_n^t p^t (1-p)^{n-t}}=\frac{\mathbb{I}[\sum_i^n X_i = t]}{C_n^t}.
P({Xi}∣T=t;p)=P(T=t;p)P({Xi},T=t;p)=Cntpt(1−p)n−tI[∑inXi=t]⋅pt(1−p)n−t=CntI[∑inXi=t].
显然与位置参数
p
p
p无关.
充分统计量特别的意义, 比如上面提到的极大似然估计, 由于
P
(
{
X
i
}
;
θ
)
=
P
(
{
X
i
}
,
T
;
θ
)
=
P
(
{
X
i
}
∣
T
;
θ
)
P
(
T
;
θ
)
=
P
(
{
X
i
}
∣
T
)
P
(
T
;
θ
)
,
P(\{X_i\};\theta) = P(\{X_i\}, T;\theta) = P(\{X_i\}|T;\theta) \:P(T;\theta) = P(\{X_i\}|T) \:P(T;\theta),
P({Xi};θ)=P({Xi},T;θ)=P({Xi}∣T;θ)P(T;θ)=P({Xi}∣T)P(T;θ),
由于
P
(
{
X
i
}
∣
T
)
P(\{X_i\}|T)
P({Xi}∣T)与
θ
\theta
θ无关, 所以最大化上式等价于
max
θ
P
(
T
;
θ
)
=
P
(
r
(
X
1
,
X
2
,
⋯
,
X
n
)
;
θ
)
.
\max_{\theta} \quad P(T;\theta) = P(r(X_1, X_2,\cdots, X_n); \theta).
θmaxP(T;θ)=P(r(X1,X2,⋯,Xn);θ).
特别地, 有时候标量 T T T并不充分, 需要 T = ( T 1 , T 2 , ⋯ , T k ) T=(T_1, T_2,\cdots, T_k) T=(T1,T2,⋯,Tk) 整体作为充分统计量, 比如当正态分布地 μ , σ \mu, \sigma μ,σ均为未知参数的时候, T = ( 1 n ∑ i X i , 1 n − 1 ∑ i ( X i − X ˉ ) 2 ) T=(\frac{1}{n}\sum_i X_i, \frac{1}{n-1}\sum_i (X_i - \bar{X})^2) T=(n1∑iXi,n−11∑i(Xi−Xˉ)2). 性质和上面的别无二致, 所以下面也不特别说明了.
当置于贝叶斯框架下时, 可以发现:
P
(
θ
∣
{
X
i
}
)
=
P
(
{
X
i
}
,
θ
)
P
(
{
X
i
}
)
=
P
(
{
X
i
}
,
T
,
θ
)
P
(
{
X
i
}
,
T
)
=
P
(
{
X
i
}
∣
T
,
θ
)
P
(
T
∣
θ
)
P
(
{
X
i
}
,
T
)
=
P
(
{
X
i
}
∣
T
)
P
(
T
∣
θ
)
P
(
{
X
i
}
,
T
)
=
P
(
θ
∣
T
)
.
P(\theta|\{X_i\}) = \frac{P(\{X_i\}, \theta)}{P(\{X_i\})} = \frac{P(\{X_i\}, T, \theta)}{P(\{X_i\}, T)} = \frac{P(\{X_i\}| T, \theta) P(T|\theta)}{P(\{X_i\}, T)} = \frac{P(\{X_i\}| T) P(T|\theta)}{P(\{X_i\}, T)} = P(\theta|T).
P(θ∣{Xi})=P({Xi})P({Xi},θ)=P({Xi},T)P({Xi},T,θ)=P({Xi},T)P({Xi}∣T,θ)P(T∣θ)=P({Xi},T)P({Xi}∣T)P(T∣θ)=P(θ∣T).
即给定
{
X
i
}
\{X_i\}
{Xi}或者
T
T
T,
θ
\theta
θ的条件(后验)分布是一致的.
特别地, 我们可以用互信息来定义充分统计量,
T
T
T为充分统计量, 当且仅当
I
(
θ
;
X
)
=
I
(
θ
;
T
(
X
)
)
.
I(\theta;X) = I(\theta;T(X)).
I(θ;X)=I(θ;T(X)).
注: 一般情况下
I
(
θ
;
X
)
≥
I
(
θ
;
T
(
X
)
)
I(\theta;X) \ge I(\theta;T(X))
I(θ;X)≥I(θ;T(X)).
充分统计量的判定
用上面的标准来判断充分统计量是非常困难的一件事, 好在有Fisher-Neyman分离定理:
Factorization Theorem:
{
X
i
}
\{X_i\}
{Xi}的联合密度函数为
f
θ
(
X
)
f_{\theta}(X)
fθ(X), 则
T
T
T是关于
θ
\theta
θ的充分统计量当且仅当存在非负函数
g
,
h
g, h
g,h满足
f
(
X
1
,
X
2
,
⋯
,
X
n
;
θ
)
=
h
(
X
1
,
X
2
,
⋯
,
X
n
)
g
(
T
;
θ
)
.
f(X_1, X_2,\cdots, X_n; \theta) = h(X_1, X_2,\cdots, X_n) g(T; \theta).
f(X1,X2,⋯,Xn;θ)=h(X1,X2,⋯,Xn)g(T;θ).
注:
T
T
T可以是
T
=
(
T
1
,
T
2
,
⋯
,
T
k
)
T=(T_1, T_2,\cdots, T_k)
T=(T1,T2,⋯,Tk).
proof:
⇒
\Rightarrow
⇒
p
(
X
1
,
X
2
,
⋯
,
X
n
;
θ
)
=
p
(
{
X
i
}
∣
T
;
θ
)
=
p
(
{
X
i
}
∣
T
;
θ
)
p
(
T
;
θ
)
=
p
(
{
X
i
}
∣
T
)
p
(
T
;
θ
)
p(X_1,X_2,\cdots, X_n;\theta) = p(\{X_i\}|T;\theta) = p(\{X_i\}|T;\theta)p(T;\theta) = p(\{X_i\}|T)p(T;\theta)
p(X1,X2,⋯,Xn;θ)=p({Xi}∣T;θ)=p({Xi}∣T;θ)p(T;θ)=p({Xi}∣T)p(T;θ)
此时
g
(
T
;
θ
)
=
p
(
T
;
θ
)
,
h
(
X
1
,
X
2
,
⋯
,
X
n
)
=
p
(
{
X
i
}
∣
T
)
.
g(T;\theta) = p(T;\theta), \\ h(X_1, X_2,\cdots, X_n) = p(\{X_i\}|T).
g(T;θ)=p(T;θ),h(X1,X2,⋯,Xn)=p({Xi}∣T).
⇐ \Leftarrow ⇐
为了符号简便, 令
X
=
{
X
1
,
X
2
,
⋯
,
X
n
}
X = \{X_1, X_2,\cdots, X_n\}
X={X1,X2,⋯,Xn}.
p
(
T
=
t
;
θ
)
=
∫
T
(
X
)
=
t
p
(
X
,
T
=
t
;
θ
)
d
X
=
∫
T
(
X
)
=
t
f
(
X
;
θ
)
d
X
=
∫
T
(
X
)
=
t
h
(
X
)
g
(
T
=
t
;
θ
)
d
X
=
∫
T
(
X
)
=
t
h
(
X
)
d
X
⋅
g
(
T
=
t
;
θ
)
.
\begin{array}{ll} p(T=t;\theta) &= \int_{T(X)=t} p(X,T=t;\theta) \mathrm{d}X \\ &= \int_{T(X)=t} f(X;\theta) \mathrm{d}X \\ &= \int_{T(X)=t} h(X) g(T=t;\theta) \mathrm{d}X \\ &= \int_{T(X)=t} h(X) \mathrm{d}X \cdot g(T=t;\theta) \\ \end{array}.
p(T=t;θ)=∫T(X)=tp(X,T=t;θ)dX=∫T(X)=tf(X;θ)dX=∫T(X)=th(X)g(T=t;θ)dX=∫T(X)=th(X)dX⋅g(T=t;θ).
则
p
(
X
∣
T
=
t
;
θ
)
=
p
(
X
,
T
=
t
;
θ
)
p
(
T
=
t
;
θ
)
=
p
(
X
;
θ
)
p
(
T
=
t
;
θ
)
=
h
(
X
)
g
(
T
=
t
;
θ
)
∫
T
(
X
)
=
t
h
(
X
)
d
X
⋅
g
(
T
=
t
;
θ
)
=
h
(
X
)
∫
T
(
X
)
=
t
h
(
X
)
.
\begin{array}{ll} p(X | T=t;\theta) &= \frac{p(X,T=t;\theta)}{p(T=t;\theta)} \\ &= \frac{p(X;\theta)}{p(T=t;\theta)} \\ &= \frac{h(X)g(T=t;\theta)}{\int_{T(X)=t}h(X)\mathrm{d} X \cdot g(T=t;\theta)} \\ &= \frac{h(X)}{\int_{T(X)=t}h(X)}. \\ \end{array}
p(X∣T=t;θ)=p(T=t;θ)p(X,T=t;θ)=p(T=t;θ)p(X;θ)=∫T(X)=th(X)dX⋅g(T=t;θ)h(X)g(T=t;θ)=∫T(X)=th(X)h(X).
与
θ
\theta
θ无关.
注: 上述的证明存疑.
最小统计量
最小统计量S, 即
- S是充分统计量;
- 充分统计量 T T T, 存在 f f f, 使得 S = f ( T ) S=f(T) S=f(T).
注: 若 T T T是充分统计量, 则任意的可逆函数 f f f得到的 f ( T ) f(T) f(T)也是充分统计量.
例子
U [ 0 , θ ] U[0, \theta] U[0,θ]
均匀分布, 此时
p
(
X
1
,
X
2
,
⋯
,
X
n
;
θ
)
=
1
θ
n
I
[
0
≤
min
{
X
i
}
]
⋅
I
[
max
{
X
i
}
≤
θ
]
,
p(X_1, X_2,\cdots, X_n;\theta) = \frac{1}{\theta^n} \mathbb{I}[0\le \min \{X_i\}] \cdot \mathbb{I}[\max \{X_i\} \le \theta],
p(X1,X2,⋯,Xn;θ)=θn1I[0≤min{Xi}]⋅I[max{Xi}≤θ],
故
T
=
max
{
X
i
}
,
g
(
T
;
θ
)
=
I
[
max
{
X
i
}
⋅
1
θ
n
,
h
(
X
)
=
I
[
0
≤
min
{
X
i
}
]
.
T = \max \{X_i\}, \: g(T;\theta) = \mathbb{I}[\max \{X_i\} \cdot \frac{1}{\theta^n}, \: h(X) = \mathbb{I}[0\le \min \{X_i\}].
T=max{Xi},g(T;θ)=I[max{Xi}⋅θn1,h(X)=I[0≤min{Xi}].
U [ α , β ] U[\alpha, \beta] U[α,β]
p ( X 1 , X 2 , ⋯ , X n ; α , β ) = 1 ( β − α ) n I [ α ≤ min { X i } ] ⋅ I [ max { X i } ≤ θ ] , p(X_1, X_2,\cdots, X_n;\alpha,\beta) = \frac{1}{(\beta - \alpha)^n} \mathbb{I}[\alpha\le \min \{X_i\}] \cdot \mathbb{I}[\max \{X_i\} \le \theta], p(X1,X2,⋯,Xn;α,β)=(β−α)n1I[α≤min{Xi}]⋅I[max{Xi}≤θ],
T = ( min { X i } , max { X i } ) , g ( T ; α , β ) = 1 ( β − α ) n I [ α ≤ min { X i } ] ⋅ I [ max { X i } ≤ θ ] , h ( X ) = 1. T = (\min \{X_i\}, \max \{X_i\}), \\ g(T;\alpha, \beta) = \frac{1}{(\beta - \alpha)^n} \mathbb{I}[\alpha\le \min \{X_i\}] \cdot \mathbb{I}[\max \{X_i\} \le \theta], \\ h(X) = 1. T=(min{Xi},max{Xi}),g(T;α,β)=(β−α)n1I[α≤min{Xi}]⋅I[max{Xi}≤θ],h(X)=1.
Poisson
P ( X ; λ ) = λ X e − λ X ! . P(X;\lambda) = \frac{\lambda^X e^{-\lambda}}{X!}. P(X;λ)=X!λXe−λ.
p ( X 1 , X 2 , ⋯ , X n ; λ ) = e − n λ λ ∑ i X i ⋅ 1 ∏ i X i ! . p(X_1, X_2,\cdots, X_n;\lambda) = e^{-n\lambda} \lambda^{\sum_{i}X_i} \cdot \frac{1}{\prod_i X_i!}. p(X1,X2,⋯,Xn;λ)=e−nλλ∑iXi⋅∏iXi!1.
T = ∑ i X i , g ( T ; θ ) = e − n λ ⋅ λ T , h ( X ) = 1 ∏ i X i ! . T = \sum_iX_i, \\ g(T;\theta) = e^{-n\lambda} \cdot \lambda^T, \\ h(X) = \frac{1}{\prod_{i} X_i!}. T=i∑Xi,g(T;θ)=e−nλ⋅λT,h(X)=∏iXi!1.
Normal
P ( X ; μ , σ ) = 1 2 π σ 2 exp ( − ( X − μ ) 2 2 σ 2 ) . P(X;\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-\frac{(X-\mu)^2}{2\sigma^2}). P(X;μ,σ)=2πσ21exp(−2σ2(X−μ)2).
p
(
X
1
,
X
2
,
⋯
,
X
n
;
μ
,
σ
)
=
(
2
π
σ
2
)
−
n
2
exp
(
−
1
2
σ
2
∑
i
=
1
n
(
X
i
−
X
ˉ
)
2
)
exp
(
−
n
2
σ
2
)
(
μ
−
X
ˉ
)
2
.
p(X_1, X_2,\cdots, X_n;\mu, \sigma) = (2\pi\sigma^2)^{-\frac{n}{2}} \exp (-\frac{1}{2\sigma^2}\sum_{i=1}^n (X_i - \bar{X})^2) \exp(-\frac{n}{2\sigma^2})(\mu-\bar{X})^2.
p(X1,X2,⋯,Xn;μ,σ)=(2πσ2)−2nexp(−2σ21i=1∑n(Xi−Xˉ)2)exp(−2σ2n)(μ−Xˉ)2.
若
σ
\sigma
σ已知:
T
=
1
n
∑
X
i
=
X
ˉ
,
g
(
T
;
μ
)
=
(
2
π
σ
2
)
−
n
2
exp
(
−
n
2
σ
2
)
(
μ
−
T
)
2
,
h
(
X
)
=
exp
(
−
1
2
σ
2
∑
i
=
1
n
(
X
i
−
X
ˉ
)
2
)
.
T=\frac{1}{n}\sum X_i = \bar{X} , \\ g(T;\mu) = (2\pi\sigma^2)^{-\frac{n}{2}} \exp(-\frac{n}{2\sigma^2})(\mu-T)^2, \\ h(X) = \exp (-\frac{1}{2\sigma^2}\sum_{i=1}^n (X_i - \bar{X})^2).
T=n1∑Xi=Xˉ,g(T;μ)=(2πσ2)−2nexp(−2σ2n)(μ−T)2,h(X)=exp(−2σ21i=1∑n(Xi−Xˉ)2).
若
σ
\sigma
σ未知:
T
=
(
X
ˉ
,
s
2
)
,
s
2
=
∑
i
=
1
n
(
X
i
−
X
ˉ
)
2
n
−
1
,
g
(
T
;
μ
,
σ
)
=
(
2
π
σ
2
)
−
n
2
exp
(
−
n
−
1
2
σ
2
s
2
)
exp
(
−
n
2
σ
2
)
(
μ
−
X
ˉ
)
2
,
h
(
X
)
=
1.
T = (\bar{X}, s^2), s^2 = \frac{\sum_{i=1}^n(X_i-\bar{X})^2}{n-1}, \\ g(T;\mu,\sigma) = (2\pi\sigma^2)^{-\frac{n}{2}}\exp(-\frac{n-1}{2\sigma^2}s^2) \exp(-\frac{n}{2\sigma^2})(\mu-\bar{X})^2, \\ h(X) = 1.
T=(Xˉ,s2),s2=n−1∑i=1n(Xi−Xˉ)2,g(T;μ,σ)=(2πσ2)−2nexp(−2σ2n−1s2)exp(−2σ2n)(μ−Xˉ)2,h(X)=1.
指数分布
p ( X ) = 1 λ e − X λ , X ≥ 0. p(X) = \frac{1}{\lambda} e^{-\frac{X}{\lambda}}, \quad X \ge 0. p(X)=λ1e−λX,X≥0.
p ( X 1 , X 2 , ⋯ , X n ; λ ) = 1 λ n e − ∑ i = 1 n X i λ . p(X_1, X_2,\cdots, X_n;\lambda) = \frac{1}{\lambda^n} e^{-\frac{\sum_{i=1}^n X_i}{\lambda}}. p(X1,X2,⋯,Xn;λ)=λn1e−λ∑i=1nXi.
T = ∑ i = 1 n X i , g ( T ; λ ) = 1 λ n e − T λ , h ( X ) = 1. T = \sum_{i=1}^n X_i, \\ g(T;\lambda) = \frac{1}{\lambda^n} e^{-\frac{T}{\lambda}}, \\ h(X) = 1. T=i=1∑nXi,g(T;λ)=λn1e−λT,h(X)=1.
Gamma
Γ ( α , β ) = 1 Γ ( α ) β α X α − 1 e − X β . \Gamma(\alpha, \beta) = \frac{1}{\Gamma(\alpha) \beta^{\alpha}}X^{\alpha-1} e^{-\frac{X}{\beta}}. Γ(α,β)=Γ(α)βα1Xα−1e−βX.
p ( X 1 , X 2 , ⋯ , X n ; α , β ) = 1 ( Γ ( α ) β α ) n ( ∏ i X i ) α − 1 e − ∑ i X i β . p(X_1, X_2,\cdots, X_n;\alpha, \beta) = \frac{1}{(\Gamma(\alpha) \beta^{\alpha})^n}(\prod_{i} X_i)^{\alpha-1} e^{-\frac{\sum_iX_i}{\beta}}. p(X1,X2,⋯,Xn;α,β)=(Γ(α)βα)n1(i∏Xi)α−1e−β∑iXi.
T = ( ∏ i X i , ∑ i X i ) , g ( T ; θ ) = 1 ( Γ ( α ) β α ) n ( ∏ i X i ) α − 1 e − ∑ i X i β , h ( X ) = 1. T = (\prod_i X_i, \sum_i X_i), \\ g(T;\theta) = \frac{1}{(\Gamma(\alpha) \beta^{\alpha})^n}(\prod_{i} X_i)^{\alpha-1} e^{-\frac{\sum_iX_i}{\beta}}, \\ h(X) = 1. T=(i∏Xi,i∑Xi),g(T;θ)=(Γ(α)βα)n1(i∏Xi)α−1e−β∑iXi,h(X)=1.