最大似然估计的一些优点

最新推荐文章于 2022-05-15 15:46:27 发布

策策cece

最新推荐文章于 2022-05-15 15:46:27 发布

阅读量1.1w

点赞数 4

最大似然估计的一些优点

主要内容：
- asymptotic correctness
- asymptotic normality
- efficiency

随机变量 $X$ 服从分布 $p(x|\theta)$ ， $\theta$ 为参数。在 $n$ 次独立重复实验中产生观测值 $x_1,\cdots,x_n$ 。可以选择 $\hat{\theta}$ 作为 $\theta$ 的估计值，使似然函数 $L(\hat{\theta})=\prod_{i=1}^np(x_i|\hat{\theta})$ 达到最大值。

asymptotic correctness

随着样本数 $n$ 增多，估计值 $\hat{\theta}$ 会最终趋向于真实值 $\theta$ 。
使似然函数达到最大值，等价于使

1 n log L (θ^) - c o n s t a n t

$\frac{1}{n}\log L(\hat{\theta})-\mathrm{constant}$ 达到最大值

1 n log L (θ^) - c o n s t a n t = 1 n \sum i = 1 n log p (x i | θ^) - \int p (x | θ) log p (x | θ) d x ⟶ n \to \infty \int p (x | θ) log p (x | θ^) d x - \int p (x | θ) log p (x | θ) d x = \int p (x | θ) log p ( x | θ ^ ) p ( x | θ ) d x = - D (p (x | θ) ∥ p (x | θ^)) \leq 0

$\begin{aligned} \frac{1}{n}\log L(\hat{\theta})-\mathrm{constant} &=\frac{1}{n}\sum_{i=1}^n\log p(x_i|\hat{\theta})-\int p(x|\theta)\log p(x|\theta)dx\\ &\stackrel{n\to\infty}{\longrightarrow}\int p(x|\theta)\log p(x|\hat{\theta})dx-\int p(x|\theta)\log p(x|\theta)dx\\ &=\int p(x|\theta)\log\frac{p(x|\hat{\theta})}{p(x|\theta)}dx\\ &=-D(p(x|\theta)\parallel p(x|\hat{\theta}))\\ &\le0 \end{aligned}$
所以，只有在

θ^=θ $\hat{\theta}=\theta$ 时，才能取到最大值。

asymptotic normality

估计 $\hat{\theta}=\hat{\theta}(X_1,\cdots,X_n)$ 的抽样分布服从正态分布。

由于 $n$ 很大， $\hat{\theta}$ 很接近 $\theta$ ，可以对等式进行泰勒展开。（网站崩溃了。。。下面的没有保存成功。。全要重新再写一遍。。。不过发现了之前的一个错误！）

0 = d d θ log L (θ^) = \sum i = 1 n d d θ log p (X i | θ^) = \sum i = 1 n d d θ log p (X i | θ) + (θ^- θ) \sum i = 1 n d 2 d θ 2 log p (X i | θ) + O ((θ - θ^) 2) = \sum i = 1 n d d θ log p (X i | θ) + (θ^- θ) n \int p (x | θ) d 2 d θ 2 log p (x | θ) d x + O ((θ - θ^) 2) = \sum i = 1 n d d θ log p (X i | θ) - (θ^- θ) n I + O ((θ - θ^) 2)

$\begin{aligned} 0&=\frac{d}{d\theta}\log L(\hat{\theta})\\ &=\sum_{i=1}^n\frac{d}{d\theta}\log p(X_i|\hat{\theta})\\ &=\sum_{i=1}^n\frac{d}{d\theta}\log p(X_i|\theta)+(\hat{\theta}-\theta)\sum_{i=1}^n\frac{d^2}{d\theta^2}\log p(X_i|\theta)+O((\theta-\hat{\theta})^2)\\ &=\sum_{i=1}^n\frac{d}{d\theta}\log p(X_i|\theta)+(\hat{\theta}-\theta)n\int p(x|\theta)\frac{d^2}{d\theta^2}\log p(x|\theta)dx+O((\theta-\hat{\theta})^2)\\ &=\sum_{i=1}^n\frac{d}{d\theta}\log p(X_i|\theta)-(\hat{\theta}-\theta)nI+O((\theta-\hat{\theta})^2) \end{aligned}$
其中

I $I$ 为Fisher Information

(θ^- θ) = 1 n I \sum i = 1 n d d θ log p (X i | θ) + n e g l i g i b l e t e r m s

$\begin{aligned} (\hat{\theta}-\theta)=\frac{1}{nI}\sum_{i=1}^n\frac{d}{d\theta}\log p(X_i|\theta)+\mathrm{negligible terms} \end{aligned}$
根据中心极限定理，等式右边服从正态分布

N(0,1nI−1) $N(0,\frac{1}{n}I^{-1})$
均值：

μ = \int p (x | θ) (d d θ log p (x | θ)) d x = \int d d θ p (x | θ) d x = d d θ \int p (x | θ) d x = d d θ 1 = 0

$\begin{aligned} \mu&=\int p(x|\theta)(\frac{d}{d\theta}\log p(x|\theta))dx\\ &=\int\frac{d}{d\theta}p(x|\theta)dx\\ &=\frac{d}{d\theta}\int p(x|\theta)dx\\ &=\frac{d}{d\theta}1\\ &=0 \end{aligned}$
方差：

σ 2 = (1 n I) 2 n V a r [d d θ log p (X | θ)] = (1 n I) 2 n \int p (x | θ) (d d θ log p (x | θ) - μ) 2 d x = (1 n I) 2 n I = 1 n I

$\begin{aligned} \sigma^2&=(\frac{1}{nI})^2n\mathrm{Var}[\frac{d}{d\theta}\log p(X|\theta)]\\ &=(\frac{1}{nI})^2n\int p(x|\theta)(\frac{d}{d\theta}\log p(x|\theta)-\mu)^2dx\\ &=(\frac{1}{nI})^2nI\\ &=\frac{1}{nI} \end{aligned}$
因此