文章目录
1.极大似然估计思想
我们获得了样本
x
1
,
x
2
,
.
.
.
,
x
m
x_{1},x_{2},...,x_{m}
x1,x2,...,xm,这些样本被我们从总体中抽到了,从概率的角度思考,我们认为这些样本出现的概率是极大的,因此,可以构造出这些样本的联合概率函数,使得联合概率函数最大的参数值就是极大似然估计值.
现假设随机变量
X
1
,
X
2
,
.
.
.
,
X
m
X_{1},X_{2},...,X_{m}
X1,X2,...,Xm的联合概率密度函数或联合概率质量函数为
p
(
X
;
θ
)
p(X;\pmb{\theta})
p(X;θθ),
θ
∈
Θ
\pmb{\theta}\in \pmb{\Theta}
θθ∈ΘΘ,其中
θ
\pmb{\theta}
θθ是一个或多个未知参数组成的参数向量,
Θ
\pmb{\Theta}
ΘΘ是参数空间,
x
1
,
x
2
,
.
.
.
,
x
m
x_{1},x_{2},...,x_{m}
x1,x2,...,xm是来自总体的样本,将样本的联合概率函数看成是
θ
\pmb{\theta}
θθ的函数,用
L
(
θ
;
x
1
,
x
2
,
.
.
.
,
x
m
)
L(\pmb{\theta};x_{1},x_{2},...,x_{m})
L(θθ;x1,x2,...,xm)表示,简记为
L
(
θ
)
L(\pmb{\theta})
L(θθ).
如果
X
i
X_{i}
Xi是i.i.d的,那么
L
(
θ
)
L(\pmb{\theta})
L(θθ)可以写为
L
(
θ
)
=
L
(
θ
;
x
1
,
x
2
,
.
.
.
,
x
m
)
=
∏
i
=
1
m
p
(
x
i
,
θ
)
L(\pmb{\theta})=L(\pmb{\theta};x_{1},x_{2},...,x_{m})=\prod_{i=1}^{m} p(x_{i},\pmb{\theta })
L(θθ)=L(θθ;x1,x2,...,xm)=i=1∏mp(xi,θθ)
L
(
θ
)
L(\pmb{\theta})
L(θθ)称为样本的似然函数。如果统计量
θ
^
=
θ
^
(
x
1
,
x
1
,
.
.
.
,
x
m
)
\hat{\pmb{\theta}}=\pmb{\hat{\theta} }(x_{1},x_{1},...,x_{m})
θθ^=θ^θ^(x1,x1,...,xm)
满足
L
(
θ
^
)
=
max
θ
∈
Θ
L
(
θ
)
L(\hat{\pmb{\theta}})=\max_{\pmb{\theta}\in \pmb{\Theta}}L(\pmb{\theta})
L(θθ^)=θθ∈ΘΘmaxL(θθ)
则称
θ
^
\hat{\pmb{\theta}}
θθ^是
θ
\pmb{\theta}
θθ的极大似然估计,简记为MLE(maximum likelihood estimate).
2.一元正态总体参数MLE推导
设样本
x
1
,
x
2
,
.
.
.
,
x
m
x_{1},x_{2},...,x_{m}
x1,x2,...,xm来自正态总体
N
(
μ
,
σ
2
)
N(\mu,\sigma^{2})
N(μ,σ2),且相互独立,则
θ
=
(
μ
,
σ
2
)
\pmb{\theta}=(\mu,\sigma^{2})
θθ=(μ,σ2).由于样本来自正态总体,则
p
(
x
i
;
μ
,
σ
2
)
=
1
2
π
σ
e
x
p
{
−
(
x
i
−
μ
)
2
2
σ
2
}
,
i
=
1
,
2
,
.
.
.
,
m
p(x_{i};\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi}\sigma}exp\{-\tfrac{(x_{i}-\mu )^{2}}{2\sigma^{2}}\} ,i=1,2,...,m
p(xi;μ,σ2)=2πσ1exp{−2σ2(xi−μ)2},i=1,2,...,m
故样本的似然函数为
L
(
θ
)
=
∏
i
=
1
m
p
(
x
i
,
θ
)
=
∏
i
=
1
m
1
2
π
σ
e
x
p
{
−
(
x
i
−
μ
)
2
2
σ
2
}
=
(
2
π
σ
2
)
−
m
/
2
e
x
p
{
−
1
2
σ
2
∑
i
=
1
m
(
x
i
−
μ
)
2
}
\begin{aligned} L(\pmb{\theta})&=\prod_{i=1}^{m} p(x_{i},\pmb{\theta })\\ &=\prod_{i=1}^{m} \frac{1}{\sqrt{2\pi}\sigma}exp\{-\tfrac{(x_{i}-\mu )^{2}}{2\sigma^{2}}\} \\ &=(2\pi\sigma^{2})^{-m/2}exp\{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{m}(x_{i}-\mu )^{2}\} \end{aligned}
L(θθ)=i=1∏mp(xi,θθ)=i=1∏m2πσ1exp{−2σ2(xi−μ)2}=(2πσ2)−m/2exp{−2σ21i=1∑m(xi−μ)2}
对数似然函数为
l
n
L
(
θ
)
=
−
m
2
l
n
(
2
π
)
−
m
2
l
n
(
σ
2
)
−
1
2
σ
2
∑
i
=
1
m
(
x
i
−
μ
)
2
\begin{aligned} lnL(\pmb{\theta})&=-\frac{m}{2}ln(2\pi)-\frac{m}{2}ln(\sigma^{2})-\frac{1}{2\sigma^{2}}\sum_{i=1}^{m}(x_{i}-\mu )^{2} \end{aligned}
lnL(θθ)=−2mln(2π)−2mln(σ2)−2σ21i=1∑m(xi−μ)2
将
l
n
L
(
θ
)
lnL(\pmb{\theta})
lnL(θθ)对两个分量分别求偏导得
∂
l
n
L
(
θ
)
∂
μ
=
1
σ
2
∑
i
=
1
m
(
x
i
−
μ
)
=
0
\frac{\partial lnL(\pmb{\theta})}{\partial \mu} =\frac{1}{\sigma^{2}}\sum_{i=1}^{m}(x_{i}-\mu)=0
∂μ∂lnL(θθ)=σ21i=1∑m(xi−μ)=0
∂
l
n
L
(
θ
)
∂
σ
2
=
1
2
σ
4
∑
i
=
1
m
(
x
i
−
μ
)
2
−
m
2
σ
2
=
0
\frac{\partial lnL(\pmb{\theta})}{\partial \sigma^{2}} =\frac{1}{2\sigma^{4}}\sum_{i=1}^{m}(x_{i}-\mu)^{2}-\frac{m}{2\sigma^{2}}=0
∂σ2∂lnL(θθ)=2σ41i=1∑m(xi−μ)2−2σ2m=0
解上述方程组得
μ
\mu
μ的极大似然估计为
μ
^
=
1
m
∑
i
=
1
m
x
i
=
x
ˉ
\hat{\mu}=\frac{1}{m}\sum_{i=1}^{m}x_{i}=\bar{x}
μ^=m1i=1∑mxi=xˉ
σ
2
\sigma^{2}
σ2的极大似然估计为
σ
2
^
=
1
m
∑
i
=
1
m
(
x
i
−
x
ˉ
)
2
\hat{\sigma^{2}}=\frac{1}{m}\sum_{i=1}^{m}(x_{i}-\bar{x})^{2}
σ2^=m1i=1∑m(xi−xˉ)2
3.代码实现
代码使用python进行实现,本文使用两种方式实现参数估计
- 根据估计量的表达式进行计算.
- 使用scipy的optimize模块进行优化.
3.1.抽取样本
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optim
首先从 μ = 2 , σ = 4 \mu=2,\sigma=4 μ=2,σ=4的正态分布中抽取 m = 100000 m=100000 m=100000个样本
np.random.seed(10)
mu=2;sigma=4;m=100000
X=np.sort(np.random.normal(mu,sigma,m))
y=(1/np.sqrt(2*np.pi)*sigma)*np.exp(-(X-mu)**2/(2*sigma**2))
使用样本绘制图形
def plot_density(X,y,mu,sigma):
fig=plt.Figure(figsize=(8,6))
grid=plt.GridSpec(1,1)
axes=fig.add_subplot(grid[0,0])
axes.plot(X,y)
axes.set_xlabel("x")
axes.set_ylabel("$p(x)$")
for dire in ['top','right']:
axes.spines[dire].set_visible(False)
axes.vlines(2,ymin=0,ymax=np.max(y),color='red')
axes.text(8, 0.25, '$\mu={},\sigma={}$'.format(mu,sigma))
fig.subplots_adjust(left=0.1, bottom=0.12, right=0.96, top=0.95, wspace=0.5, hspace=0.5)
axes.text(mu-0.5, 0.5*np.max(y), '$x={}$'.format(mu))
return fig
fig1=plot_density(X,y,mu,sigma)
3.2.根据估计量的表达式进行计算
μ
\mu
μ的极大似然估计为
μ
^
=
1
m
∑
i
=
1
m
x
i
=
x
ˉ
\hat{\mu}=\frac{1}{m}\sum_{i=1}^{m}x_{i}=\bar{x}
μ^=m1i=1∑mxi=xˉ
σ
2
\sigma^{2}
σ2的极大似然估计为
σ
2
^
=
1
m
∑
i
=
1
m
(
x
i
−
x
ˉ
)
2
\hat{\sigma^{2}}=\frac{1}{m}\sum_{i=1}^{m}(x_{i}-\bar{x})^{2}
σ2^=m1i=1∑m(xi−xˉ)2
根据上述两式计算:
hat_mu=np.mean(X)
hat_sigma=np.sqrt((X-hat_mu).dot(X-hat_mu)/m)
print("The estimated value of mean:",hat_mu)
print("The estimated value of Standard deviation:",hat_sigma)
结果:
The estimated value of mean: 1.9880863402374647
The estimated value of Standard deviation: 3.9976758117844575
3.3.使用scipy的optimize模块进行优化
极大似然估计要求使得似然函数最大化的参数
θ
\pmb{\theta}
θθ,也即
θ
^
=
a
g
r
max
θ
∈
Θ
L
(
θ
)
\pmb{\hat{\theta}}=agr\max_{\pmb{\theta}\in \pmb{\Theta}}L(\pmb{\theta})
θ^θ^=agrθθ∈ΘΘmaxL(θθ)
上述优化问题等价于
θ
^
=
a
g
r
min
θ
∈
Θ
−
L
(
θ
)
\pmb{\hat{\theta}}=agr\min_{\pmb{\theta}\in \pmb{\Theta}}-L(\pmb{\theta})
θ^θ^=agrθθ∈ΘΘmin−L(θθ)
即
θ
^
=
a
g
r
min
θ
∈
Θ
−
L
(
θ
)
=
a
g
r
min
θ
∈
Θ
−
{
−
m
l
n
(
σ
)
−
m
2
l
n
(
2
π
)
−
1
2
σ
2
∑
i
=
1
m
(
x
i
−
μ
)
2
}
=
a
g
r
min
θ
∈
Θ
{
m
l
n
(
σ
)
+
m
2
l
n
(
2
π
)
+
1
2
σ
2
∑
i
=
1
m
(
x
i
−
μ
)
2
}
\begin{aligned} \pmb{\hat{\theta}} &=agr\min_{\pmb{\theta}\in \pmb{\Theta}}-L(\pmb{\theta})\\ &=agr\min_{\pmb{\theta}\in \pmb{\Theta}}-\{-mln(\sigma)-\frac{m}{2}ln(2\pi)-\frac{1}{2\sigma^{2}}\sum_{i=1}^{m} (x_{i}-\mu)^{2}\}\\ &=agr\min_{\pmb{\theta}\in \pmb{\Theta}}\{mln(\sigma)+\frac{m}{2}ln(2\pi)+\frac{1}{2\sigma^{2}}\sum_{i=1}^{m} (x_{i}-\mu)^{2}\} \end{aligned}
θ^θ^=agrθθ∈ΘΘmin−L(θθ)=agrθθ∈ΘΘmin−{−mln(σ)−2mln(2π)−2σ21i=1∑m(xi−μ)2}=agrθθ∈ΘΘmin{mln(σ)+2mln(2π)+2σ21i=1∑m(xi−μ)2}
由于右边第二项
m
2
l
n
(
2
π
)
\frac{m}{2}ln(2\pi)
2mln(2π)是常数,不影响优化,故可以去掉。
将上述极小化问题用代码实现如下:
def likelihood_func(theta,X):
#theta表示要优化的参数向量,其中theta[0]表示均值,theta[1]表示标准差。
m=len(X)
z=m*np.log(theta[1])+(1/(2*theta[1]**2))*((X-theta[0]).dot(X-theta[0]))
return z
将 μ 0 = 4 , σ 0 = 8 \mu_{0}=4,\sigma_{0}=8 μ0=4,σ0=8作为初始点,执行优化如下:
pt=np.array([4,8])
result = optim.minimize(likelihood_func, pt,args=(X,), method='Nelder-Mead')
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
solution = result['x']
print('Solution: f(%s) ' % solution)
结果:
Status : Optimization terminated successfully.
Total Evaluations: 82
Solution: f([1.98805021 3.99765954])
4.总结
本文根据极大似然估计的思想给出了极大似然估计一般定义,并推导了一元正态总体参数的极大似然估计,同时,使用两种计算方式得出了参数的极大似然估计。从两种计算结果来看,估计差异很小,且估计误差也很小。两种方式中, μ \mu μ和 σ \sigma σ的估计值均非常接近真实值。
参考文献
[1] 茆诗松,程依明,濮晓龙.概率论与数理统计教程[M]. 北京:高等教育出版社, 2019. 277-279.
[2] Rice J A. Mathematical statistics and data analysis[M]. Cengage Learning, 2006.
所有代码
#导入包
import scipy.optimize as optim
import numpy as np
from numpy.random import rand
import matplotlib.pyplot as plt
#抽取样本
np.random.seed(10)
mu=2;sigma=4;m=100000
X=np.sort(np.random.normal(mu,sigma,m))
y=(1/np.sqrt(2*np.pi)*sigma)*np.exp(-(X-mu)**2/(2*sigma**2))
#绘制图形
def plot_density(X,y,mu,sigma):
fig=plt.Figure(figsize=(6,4))
grid=plt.GridSpec(1,1)
axes=fig.add_subplot(grid[0,0])
axes.plot(X,y)
axes.set_xlabel("x")
axes.set_ylabel("$p(x)$")
for dire in ['top','right']:
axes.spines[dire].set_visible(False)
axes.vlines(2,ymin=0,ymax=np.max(y),color='red')
axes.text(8, 0.25, '$\mu={},\sigma={}$'.format(mu,sigma))
fig.subplots_adjust(left=0.1, bottom=0.12, right=0.96, top=0.95, wspace=0.5, hspace=0.5)
axes.text(mu-0.5, 0.5*np.max(y), '$x={}$'.format(mu))
return fig
fig1=plot_density(X,y,mu,sigma)
#根据估计量的表达式进行计算
hat_mu=np.mean(X)
hat_sigma=np.sqrt((X-hat_mu).dot(X-hat_mu)/m)
print("The estimated value of mean:",hat_mu)
print("The estimated value of Standard deviation:",hat_sigma)
#使用scipy的optimize模块进行优化
def likelihood_func(theta,X):
m=len(X)
z=m*np.log(theta[1])+(1/(2*theta[1]**2))*((X-theta[0]).dot(X-theta[0]))
return z
pt=np.array([4,8])
result = optim.minimize(likelihood_func, pt,args=(X,), method='Nelder-Mead')
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
solution = result['x']
print('Solution: f(%s) ' % solution)