# 从KL散度到MLE

MIT的课程 18.650 statistics for applications

$TV\left({P}_{\theta }^{\ast },{P}_{\stackrel{^}{\theta }}\right)=\underset{A}{max}|{P}_{{\theta }^{\ast }}\left(A\right)-{P}_{\stackrel{^}{\theta }}\left(A\right)|$
, 其中A表示某个事件。然后我们的策略是构造一个Esitmator $\stackrel{^}{TV}\left({P}_{\theta },{P}_{{\theta }^{\ast }}\right)$$\hat{TV}(P_\theta, P_{\theta^*})$, 求使得它最小的$\theta$$\theta$, 即
$\underset{\theta }{min}\left[\stackrel{^}{TV}\left({P}_{\theta },{P}_{{\theta }^{\ast }}\right)\right]$

$\underset{\theta }{min}\left[KL\left({P}_{{\theta }^{\ast }},{P}_{\theta }\right)\right]$

$KL\left({P}_{{\theta }^{\ast }},{P}_{\theta }\right)={E}_{{\theta }^{\ast }}\left[\mathrm{log}\frac{{P}_{{\theta }^{\ast }}\left(x\right)}{{P}_{\theta }\left(x\right)}\right]={E}_{{\theta }^{\ast }}\left[\mathrm{log}{P}_{{\theta }^{\ast }}\left(x\right)\right]-{E}_{{\theta }^{\ast }}\left[\mathrm{log}{P}_{\theta }\left(x\right)\right]$

$KL\left({P}_{{\theta }^{\ast }},{P}_{\theta }\right)\approx Constant-\frac{1}{n}\sum _{i=1}^{N}\mathrm{log}{P}_{\theta }\left({x}_{i}\right)$

$\mathrm{arg}\underset{\theta }{min}\stackrel{^}{KL}\left({P}_{{\theta }^{\ast }},{P}_{\theta }\right)=\mathrm{arg}\underset{\theta }{max}\sum _{i=1}^{N}\mathrm{log}{P}_{\theta }\left({x}_{i}\right)=\mathrm{arg}\underset{\theta }{max}\prod _{i=1}^{N}{P}_{\theta }\left({x}_{i}\right)$