5.极大似然估计
Fisher的极大似然思想: 随机试验有多个可能结果, 但在一次实验中, 有且只有一个结果会出现. 如果在某次实验中, 结果 ω \omega ω出现了, 则认为该结果(事件{ ω \omega ω})发生的概率 P { ω } P\{\omega\} P{ω}最大.
假设总体
X
X
X是离散随机变量, 其分布律为:
P
{
X
=
a
k
}
=
p
k
(
θ
)
(
k
=
1
,
2
,
.
.
.
)
P\{X=a_k\}=p_k(\theta)(k=1, 2, ...)
P{X=ak}=pk(θ)(k=1,2,...)
其中
θ
(
θ
∈
Θ
)
\theta(\theta\in \Theta)
θ(θ∈Θ)是未知参数.
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ..., X_n
X1,X2,...,Xn是来自总体
X
X
X的样本,
x
1
,
x
2
,
.
.
.
,
x
n
x_1, x_2, ..., x_n
x1,x2,...,xn是样本的观测值. 即事件
{
X
1
=
x
1
,
X
2
=
x
2
,
.
.
.
,
X
n
=
x
n
}
\{X_1=x_1, X_2=x_2, ..., X_n=x_n\}
{X1=x1,X2=x2,...,Xn=xn}发生了.
由Fisher的极大似然思想可以得到, 概率:
P
{
X
1
=
x
1
,
X
2
=
x
2
,
.
.
.
,
X
n
=
x
n
}
P\{X_1=x_1, X_2=x_2, ..., X_n=x_n\}
P{X1=x1,X2=x2,...,Xn=xn}最大.
P { X 1 = x 1 , X 2 = x 2 , . . . , X n = x n } = P { X 1 = x 1 } P { X 2 = x 2 } ⋯ P { X n = x n } = P { X = x 1 } P { X = x 2 } ⋯ P { X = x n } = L ( θ ) \begin{aligned} &P\{X_1=x_1, X_2=x_2, ..., X_n=x_n\}\\ &=P\{X_1=x_1\}P\{X_2=x_2\}\cdots P\{X_n=x_n\}\\ &=P\{X=x_1\}P\{X=x_2\}\cdots P\{X=x_n\}=L(\theta) \end{aligned} P{X1=x1,X2=x2,...,Xn=xn}=P{X1=x1}P{X2=x2}⋯P{Xn=xn}=P{X=x1}P{X=x2}⋯P{X=xn}=L(θ)
5.1 似然函数定义
定义1:
设
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ..., X_n
X1,X2,...,Xn是来自总体
X
X
X的样本,
x
1
,
x
2
,
.
.
.
,
x
n
x_1, x_2, ..., x_n
x1,x2,...,xn是样本的观测值.
- 若X是离散型总体, 其分布律为:
P { X = a k } = p k ( θ ) ( k = 1 , 2 , . . . ) P\{X=a_k\}=p_k(\theta)\\(k=1,2,...) P{X=ak}=pk(θ)(k=1,2,...)
令 L ( θ ) = L ( θ ; x 1 , x 2 , . . . , x n ) = ∏ i = 1 n P { X i = x i } , θ ∈ Θ L(\theta)=L(\theta; x_1,x_2,...,x_n)=\prod_{i=1}^{n}P\{X_i=x_i\}, \theta\in \Theta L(θ)=L(θ;x1,x2,...,xn)=∏i=1nP{Xi=xi},θ∈Θ - 若X是连续型总体, 其密度为
f
(
x
;
θ
)
f(x;\theta)
f(x;θ).
令 L ( θ ) = L ( θ ; x 1 , x 2 , . . . , x n ) = ∏ i = 1 n f ( x i ; θ ) , θ ∈ Θ L(\theta)=L(\theta; x_1,x_2,...,x_n)=\prod_{i=1}^{n}f(x_i;\theta), \theta\in \Theta L(θ)=L(θ;x1,x2,...,xn)=∏i=1nf(xi;θ),θ∈Θ
称 L ( θ ) L(\theta) L(θ)为似然函数
例子1: 设
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ..., X_n
X1,X2,...,Xn是来自总体
X
∼
B
(
1
,
p
)
X\sim B(1,p)
X∼B(1,p)的样本,
x
1
,
x
2
,
.
.
.
,
x
n
x_1, x_2, ..., x_n
x1,x2,...,xn是样本的观测值.
p
p
p是未知参数. 试写出似然函数.
解:
P
{
X
=
x
}
=
p
x
(
1
−
p
)
1
−
x
P\{X=x\}=p^x(1-p)^{1-x}
P{X=x}=px(1−p)1−x其中
x
∈
{
0
,
1
}
x\in \{0,1\}
x∈{0,1}
L
(
p
)
=
∏
i
=
1
n
P
{
X
i
=
x
i
}
=
∏
i
=
1
n
p
x
i
(
1
−
p
)
1
−
x
i
=
p
n
x
ˉ
(
1
−
p
)
n
(
1
−
x
ˉ
)
\begin{aligned} L(p)&=\prod_{i=1}^nP\{X_i=x_i\}\\ &=\prod_{i=1}^np^{x_i}(1-p)^{1-x_i}\\ &=p^{n\bar x}(1-p)^{n(1-\bar x)} \end{aligned}
L(p)=i=1∏nP{Xi=xi}=i=1∏npxi(1−p)1−xi=pnxˉ(1−p)n(1−xˉ)
例子2: 设
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ..., X_n
X1,X2,...,Xn是来自总体
X
∼
N
(
μ
,
σ
2
)
X\sim N(\mu,\sigma^2)
X∼N(μ,σ2)的样本,
x
1
,
x
2
,
.
.
.
,
x
n
x_1, x_2, ..., x_n
x1,x2,...,xn是样本的观测值.
μ
,
σ
2
\mu,\sigma^2
μ,σ2是未知参数. 试写出似然函数.
**解:**正态分布的密度函数
f
(
x
)
=
1
2
π
σ
e
−
(
x
−
μ
)
2
2
σ
2
f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
f(x)=2πσ1e−2σ2(x−μ)2
则似然函数可以写为:
L
(
μ
,
σ
2
)
=
∏
i
=
1
n
f
(
x
i
)
=
∏
i
=
1
n
1
2
π
σ
e
−
(
x
i
−
μ
)
2
2
σ
2
=
(
1
2
π
)
n
(
σ
2
)
−
n
2
e
−
1
2
σ
2
∑
i
=
1
n
(
x
i
−
μ
)
2
\begin{aligned} L(\mu,\sigma^2)&=\prod_{i=1}^nf(x_i)\\ &=\prod_{i=1}^n\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x_i-\mu)^2}{2\sigma^2}}\\ &=(\frac{1}{\sqrt{2\pi}})^n(\sigma^2)^{-\frac{n}{2}}e^{-\frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu)^2} \end{aligned}
L(μ,σ2)=i=1∏nf(xi)=i=1∏n2πσ1e−2σ2(xi−μ)2=(2π1)n(σ2)−2ne−2σ21∑i=1n(xi−μ)2
5.2 极大似然估计定义
定义2
设
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ..., X_n
X1,X2,...,Xn是来自总体
X
X
X的样本,
x
1
,
x
2
,
.
.
.
,
x
n
x_1, x_2, ..., x_n
x1,x2,...,xn是样本的观测值.
L
(
θ
)
(
θ
∈
Θ
)
L(\theta)(\theta\in\Theta)
L(θ)(θ∈Θ)是似然函数. 若存在统计量
θ
^
=
θ
^
(
x
1
,
x
2
,
⋯
,
x
n
)
\hat \theta=\hat\theta(x_1,x_2,\cdots,x_n)
θ^=θ^(x1,x2,⋯,xn)使得:
L
(
θ
^
)
=
sup
θ
∈
Θ
L
(
θ
)
L(\hat\theta)=\sup_{\theta\in\Theta}L(\theta)
L(θ^)=θ∈ΘsupL(θ)
则称
θ
^
=
θ
^
(
X
1
,
X
2
,
⋯
,
X
n
)
\hat \theta=\hat\theta(X_1,X_2,\cdots,X_n)
θ^=θ^(X1,X2,⋯,Xn)为
θ
\theta
θ的极大似然估计量, 简记为MLE(Maximum Likehood Estimate)
5.3 极大似然估计求解的一般过程
- 根据总体分布的表达式, 写出似然函数:
L ( θ 1 , θ 2 , ⋯ , θ m ) ( θ = ( θ 1 , θ 2 , ⋯ , θ m ) ∈ Θ ) L(\theta_1,\theta_2,\cdots,\theta_m)\qquad(\theta=(\theta_1,\theta_2,\cdots,\theta_m)\in\Theta) L(θ1,θ2,⋯,θm)(θ=(θ1,θ2,⋯,θm)∈Θ) - 因为 L ( θ 1 , θ 2 , ⋯ , θ m ) L(\theta_1,\theta_2,\cdots,\theta_m) L(θ1,θ2,⋯,θm)与 ln L ( θ 1 , θ 2 , ⋯ , θ m ) \ln L(\theta_1,\theta_2,\cdots,\theta_m) lnL(θ1,θ2,⋯,θm)有相同的极值点, 称 ln L ( θ 1 , θ 2 , ⋯ , θ m ) \ln L(\theta_1,\theta_2,\cdots,\theta_m) lnL(θ1,θ2,⋯,θm)为对数似然函数, 记为 l ( θ 1 , θ 2 , ⋯ , θ m ) l(\theta_1,\theta_2,\cdots,\theta_m) l(θ1,θ2,⋯,θm). 求出 l ( θ 1 , θ 2 , ⋯ , θ m ) l(\theta_1,\theta_2,\cdots,\theta_m) l(θ1,θ2,⋯,θm)
- 求出 l ( θ 1 , θ 2 , ⋯ , θ m ) l(\theta_1,\theta_2,\cdots,\theta_m) l(θ1,θ2,⋯,θm)的极大值点 θ ^ 1 , θ ^ 2 , ⋯ , θ ^ n \hat \theta_1,\hat \theta_2,\cdots,\hat \theta_n θ^1,θ^2,⋯,θ^n, 即为 θ 1 , θ 2 , ⋯ , θ m \theta_1,\theta_2,\cdots,\theta_m θ1,θ2,⋯,θm的MLE
说明:
若
l
(
θ
1
,
θ
2
,
⋯
,
θ
m
)
l(\theta_1,\theta_2,\cdots,\theta_m)
l(θ1,θ2,⋯,θm)关于
θ
i
(
i
=
1
,
2
,
⋯
,
m
)
\theta_i(i=1,2,\cdots,m)
θi(i=1,2,⋯,m)可导, 则称:
{
∂
l
(
θ
1
,
θ
2
,
⋯
,
θ
m
)
∂
θ
i
=
0
∂
l
(
θ
1
,
θ
2
,
⋯
,
θ
m
)
∂
θ
i
=
0
⋮
∂
l
(
θ
1
,
θ
2
,
⋯
,
θ
m
)
∂
θ
i
=
0
\left\{\begin{aligned} &\frac{\partial l(\theta_1,\theta_2,\cdots,\theta_m)}{\partial \theta_i}=0\\ &\frac{\partial l(\theta_1,\theta_2,\cdots,\theta_m)}{\partial \theta_i}=0\\ &\vdots\\ &\frac{\partial l(\theta_1,\theta_2,\cdots,\theta_m)}{\partial \theta_i}=0 \end{aligned} \right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧∂θi∂l(θ1,θ2,⋯,θm)=0∂θi∂l(θ1,θ2,⋯,θm)=0⋮∂θi∂l(θ1,θ2,⋯,θm)=0
为对数似然方程组.
例子3: 设
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ..., X_n
X1,X2,...,Xn是来自总体
X
∼
B
(
1
,
p
)
X\sim B(1,p)
X∼B(1,p)的样本,
x
1
,
x
2
,
.
.
.
,
x
n
x_1, x_2, ..., x_n
x1,x2,...,xn是样本的观测值.
p
p
p是未知参数. 试写出极大似然估计.
解:
P
{
X
=
x
}
=
p
x
(
1
−
p
)
1
−
x
P\{X=x\}=p^x(1-p)^{1-x}
P{X=x}=px(1−p)1−x其中
x
∈
{
0
,
1
}
x\in \{0,1\}
x∈{0,1}
L
(
p
)
=
∏
i
=
1
n
P
{
X
i
=
x
i
}
=
∏
i
=
1
n
p
x
i
(
1
−
p
)
1
−
x
i
=
p
n
x
ˉ
(
1
−
p
)
n
(
1
−
x
ˉ
)
\begin{aligned} L(p)&=\prod_{i=1}^nP\{X_i=x_i\}\\ &=\prod_{i=1}^np^{x_i}(1-p)^{1-x_i}\\ &=p^{n\bar x}(1-p)^{n(1-\bar x)} \end{aligned}
L(p)=i=1∏nP{Xi=xi}=i=1∏npxi(1−p)1−xi=pnxˉ(1−p)n(1−xˉ)
则对数似然函数为:
l
(
p
)
=
ln
L
(
p
)
=
n
x
ˉ
ln
p
+
n
(
1
−
x
ˉ
)
ln
(
1
−
p
)
l(p)=\ln L(p)=n\bar x\ln p+n(1-\bar x)\ln(1-p)
l(p)=lnL(p)=nxˉlnp+n(1−xˉ)ln(1−p)
对
l
(
p
)
l(p)
l(p)求导:
d
l
(
p
)
d
p
=
n
x
ˉ
1
p
−
n
(
1
−
x
ˉ
)
1
1
−
p
=
0
⇒
n
x
ˉ
(
1
−
p
)
−
n
(
1
−
x
ˉ
)
p
=
0
⇒
n
x
ˉ
−
n
p
=
0
⇒
p
^
=
x
ˉ
\begin{aligned} \frac{dl(p)}{dp}&=n\bar x\frac{1}{p}-n(1-\bar x)\frac{1}{1-p}=0\\ &\Rightarrow n\bar x(1-p)-n(1-\bar x)p=0\\ &\Rightarrow n\bar x-np=0\\ &\Rightarrow \hat p=\bar x \end{aligned}
dpdl(p)=nxˉp1−n(1−xˉ)1−p1=0⇒nxˉ(1−p)−n(1−xˉ)p=0⇒nxˉ−np=0⇒p^=xˉ
例子4: 设
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ..., X_n
X1,X2,...,Xn是来自总体
X
∼
N
(
μ
,
σ
2
)
X\sim N(\mu,\sigma^2)
X∼N(μ,σ2)的样本,
x
1
,
x
2
,
.
.
.
,
x
n
x_1, x_2, ..., x_n
x1,x2,...,xn是样本的观测值.
μ
,
σ
2
\mu,\sigma^2
μ,σ2是未知参数. 试写出似然函数.
**解:**正态分布的密度函数
f
(
x
)
=
1
2
π
σ
e
−
(
x
−
μ
)
2
2
σ
2
f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
f(x)=2πσ1e−2σ2(x−μ)2
则似然函数可以写为:
L
(
μ
,
σ
2
)
=
∏
i
=
1
n
f
(
x
i
)
=
∏
i
=
1
n
1
2
π
σ
e
−
(
x
i
−
μ
)
2
2
σ
2
=
(
1
2
π
)
n
(
σ
2
)
−
n
2
e
−
1
2
σ
2
∑
i
=
1
n
(
x
i
−
μ
)
2
\begin{aligned} L(\mu,\sigma^2)&=\prod_{i=1}^nf(x_i)\\ &=\prod_{i=1}^n\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x_i-\mu)^2}{2\sigma^2}}\\ &=(\frac{1}{\sqrt{2\pi}})^n(\sigma^2)^{-\frac{n}{2}}e^{-\frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu)^2} \end{aligned}
L(μ,σ2)=i=1∏nf(xi)=i=1∏n2πσ1e−2σ2(xi−μ)2=(2π1)n(σ2)−2ne−2σ21∑i=1n(xi−μ)2
则对数似然函数为:
l
(
μ
,
σ
2
)
=
−
n
2
ln
2
π
−
n
2
ln
σ
2
−
1
2
σ
2
∑
i
=
1
n
(
x
i
−
μ
)
2
l(\mu,\sigma^2)=-\frac{n}{2}\ln{2\pi}-\frac{n}{2}\ln \sigma^2-\frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu)^2
l(μ,σ2)=−2nln2π−2nlnσ2−2σ21i=1∑n(xi−μ)2
求导可得:
∂
l
∂
μ
=
1
σ
2
∑
i
=
1
n
(
x
i
−
μ
)
=
0
∂
l
∂
σ
2
=
−
n
2
σ
2
+
1
2
σ
4
∑
i
=
1
n
(
x
i
−
μ
)
2
=
0
⇒
μ
^
=
1
n
∑
i
=
1
n
x
i
=
x
ˉ
⇒
σ
^
2
=
1
n
∑
i
=
1
n
(
x
i
−
x
ˉ
)
2
\begin{aligned} \frac{\partial l}{ \partial \mu}&=\frac{1}{\sigma^2}\sum_{i=1}^{n}(x_i-\mu)=0\\ \frac{\partial l}{ \partial \sigma^2}&=-\frac{n}{2\sigma^2}+\frac{1}{2\sigma^4}\sum_{i=1}^{n}(x_i-\mu)^2=0\\ &\Rightarrow \hat \mu=\frac{1}{n}\sum_{i=1}^{n}x_i=\bar x\\ &\Rightarrow \hat \sigma^2=\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar x)^2 \end{aligned}
∂μ∂l∂σ2∂l=σ21i=1∑n(xi−μ)=0=−2σ2n+2σ41i=1∑n(xi−μ)2=0⇒μ^=n1i=1∑nxi=xˉ⇒σ^2=n1i=1∑n(xi−xˉ)2
5.4 极大似然估计的不变性
定理: 设
θ
^
\hat \theta
θ^是
θ
\theta
θ的极大似然估计,
u
=
u
(
θ
)
u=u(\theta)
u=u(θ)是函数
θ
\theta
θ的函数, 且有单值反函数:
θ
=
θ
(
u
)
\theta=\theta(u)
θ=θ(u)
则
u
(
θ
^
)
u(\hat \theta)
u(θ^)是u的极大似然估计
例子5: 假设袋中有黑球和白球, 其中白球所占比例为 p ( 0 < p < 1 ) p(0<p<1) p(0<p<1)未知. 每次有放回的从袋中随机摸取一个求出来观测其颜色后放回, 共摸了m个球, 其中白球的个数记为 X X X. 共重复了n次这样的试验, 得到样本观察值为 x 1 , x 2 , ⋯ , x n x_1, x_2, \cdots, x_n x1,x2,⋯,xn, 试求:
- p p p的极大似然估计
- 袋中白球和黑球之比R的极大似然估计
解:
(1) 总体的分布为: X ∼ B ( m , p ) X\sim B(m,p) X∼B(m,p)
所以似然函数为:
L ( p ) = ∏ i = 1 n P { X i = x i } = ∏ i = 1 n ( m x i ) p x i ( 1 − p ) m − x i = p n x ˉ ( 1 − p ) n ( m − x ˉ ) ∏ i = 1 n ( m x i ) l ( p ) = ln L ( p ) = n x ˉ ln p + n ( m − x ˉ ) ( 1 − p ) + ln ∏ i = 1 n ( m x i ) \begin{aligned} L(p)&=\prod_{i=1}^{n}P\{X_i=x_i\}=\prod_{i=1}^n\begin{pmatrix}m \\ x_i \\ \end{pmatrix}p^{x_i}(1-p)^{m-x_i}=p^{n\bar x}(1-p)^{n(m-\bar x)}\prod_{i=1}^n\begin{pmatrix}m \\ x_i \\ \end{pmatrix}\\ l(p)&=\ln L(p)=n\bar x\ln p+n(m-\bar x)(1-p)+\ln\prod_{i=1}^{n}\begin{pmatrix}m \\ x_i \\ \end{pmatrix} \end{aligned} L(p)l(p)=i=1∏nP{Xi=xi}=i=1∏n(mxi)pxi(1−p)m−xi=pnxˉ(1−p)n(m−xˉ)i=1∏n(mxi)=lnL(p)=nxˉlnp+n(m−xˉ)(1−p)+lni=1∏n(mxi)
对于 l ( p ) l(p) l(p)求导, 可得到对数似然方程:
d l ( p ) d p = n x ˉ p − n ( m − x ˉ ) 1 − p = 0 ⇒ p ^ = x ˉ m \begin{aligned} \frac{dl(p)}{dp}&=\frac{n\bar x}{p}-\frac{n(m-\bar x)}{1-p}=0\\ &\Rightarrow \hat p=\frac{\bar x}{m} \end{aligned} dpdl(p)=pnxˉ−1−pn(m−xˉ)=0⇒p^=mxˉ
(2) 由极大似然估计的不变性可得:
R = p 1 − p R=\frac{p}{1-p} R=1−pp
则:
R = p ^ 1 − p ^ = x ˉ m − x ˉ R=\frac{\hat p}{1-\hat p}=\frac{\bar x}{m-\bar x} R=1−p^p^=m−xˉxˉ
问题: 矩估计是否有不变性?