# 变分法(Calculus of variations)

$f\left({x}_{0}\right)\le f\left({x}_{0}+ϵ\right)$

${\mathrm{\Phi }}^{\prime }\left(0\right)=\frac{d\mathrm{\Phi }\left(ϵ\right)}{dϵ}{|}_{ϵ=0}={f}^{\prime }\left({x}_{0}+0\right)={f}^{\prime }\left({x}_{0}\right)=0$

$J\left(y\right)={\int }_{{x}_{1}}^{{x}_{2}}L\left(y\left(x\right),{y}^{\prime }\left(x\right),x\right)dx$

$J\left(f\right)\le J\left(f+ϵ\eta \right)$
，其中$ϵ\eta$$\epsilon \eta$称为函数$f$$f$的变分，记为$\delta f$$\delta f$

$\mathrm{\Phi }\left(ϵ\right)=J\left(f+ϵ\eta \right)$
，同样的有：
${\mathrm{\Phi }}^{\prime }\left(0\right)=\frac{d\mathrm{\Phi }\left(ϵ\right)}{dϵ}{|}_{ϵ=0}=\frac{dJ\left(f+ϵ\eta \right)}{dϵ}{|}_{ϵ=0}\phantom{\rule{0ex}{0ex}}={\int }_{{x}_{1}}^{{x}_{2}}\frac{dL}{dϵ}{|}_{ϵ=0}dx=0$

$\frac{dL}{dϵ}=\frac{\mathrm{\partial }L}{\mathrm{\partial }y}\frac{\mathrm{\partial }y}{\mathrm{\partial }ϵ}+\frac{\mathrm{\partial }L}{\mathrm{\partial }{y}^{\prime }}\frac{\mathrm{\partial }{y}^{\prime }}{\mathrm{\partial }ϵ}$

$\frac{\mathrm{\partial }y}{\mathrm{\partial }ϵ}=\eta \phantom{\rule{0ex}{0ex}}\frac{\mathrm{\partial }{y}^{\prime }}{\mathrm{\partial }ϵ}={\eta }^{\prime }$

$\frac{dL}{dϵ}=\frac{\mathrm{\partial }L}{\mathrm{\partial }y}\eta +\frac{\mathrm{\partial }L}{\mathrm{\partial }{y}^{\prime }}{\eta }^{\prime }$
，再根据分部积分法可得：
${\int }_{{x}_{1}}^{{x}_{2}}\frac{dL}{dϵ}{|}_{ϵ=0}dx={\int }_{{x}_{1}}^{{x}_{2}}\left\{\frac{\mathrm{\partial }L}{\mathrm{\partial }y}\eta +\frac{\mathrm{\partial }L}{\mathrm{\partial }{y}^{\prime }}{\eta }^{\prime }\right\}{|}_{ϵ=0}dx\phantom{\rule{0ex}{0ex}}={\int }_{{x}_{1}}^{{x}_{2}}\eta \left\{\frac{\mathrm{\partial }L}{\mathrm{\partial }f}-\frac{d}{dx}\frac{\mathrm{\partial }L}{\mathrm{\partial }{f}^{\prime }}\right\}dx+\frac{\mathrm{\partial }L}{\mathrm{\partial }{f}^{\prime }}\eta {|}_{{x}_{1}}^{{x}_{2}}\phantom{\rule{0ex}{0ex}}={\int }_{{x}_{1}}^{{x}_{2}}\eta \left\{\frac{\mathrm{\partial }L}{\mathrm{\partial }f}-\frac{d}{dx}\frac{\mathrm{\partial }L}{\mathrm{\partial }{f}^{\prime }}\right\}dx=0$

$\frac{\mathrm{\partial }L}{\mathrm{\partial }f}-\frac{d}{dx}\frac{\mathrm{\partial }L}{\mathrm{\partial }{f}^{\prime }}=0$

# 平均场定理(Mean Field Theory)

In physics and probability theory, mean field theory (MFT also known as self-consistent field theory) studies the behavior of large and complex stochastic models by studying a simpler model. Such models consider a large number of small individual components which interact with each other. The effect of all the other individuals on any given individual is approximated by a single averaged effect, thus reducing a many-body problem to a one-body problem.

$P\left({x}_{1},{x}_{2},{x}_{3},...,{x}_{n}\right)\phantom{\rule{0ex}{0ex}}=P\left({x}_{1}\right)P\left({x}_{2}|x1\right)P\left({x}_{3}|{x}_{2},{x}_{1}\right)...P\left({x}_{n}|{x}_{n-1},{x}_{n-2},{x}_{n-3},...,{x}_{1}\right)$

$Q\left({x}_{1},{x}_{2},{x}_{3},...,{x}_{n}\right)=Q\left({x}_{1}\right)Q\left({x}_{2}\right)Q\left({x}_{3}\right)...Q\left({x}_{n}\right)$

# 变分贝叶斯推断

$KL\left(Q||P\right)=\int Q\left(Z\right)log\frac{Q\left(Z\right)}{P\left(Z|X\right)}dZ$

$KL\left(Q||P\right)=\int Q\left(Z\right)log\frac{Q\left(Z\right)}{P\left(Z|X\right)}dZ\phantom{\rule{0ex}{0ex}}=-\int Q\left(Z\right)log\frac{P\left(Z|X\right)}{Q\left(Z\right)}dZ\phantom{\rule{0ex}{0ex}}=-\int Q\left(Z\right)log\frac{P\left(Z,X\right)}{Q\left(Z\right)P\left(X\right)}dZ\phantom{\rule{0ex}{0ex}}=\int Q\left(Z\right)\left[logQ\left(Z\right)+logP\left(X\right)\right]dZ-\int Q\left(Z\right)logP\left(Z,X\right)dZ\phantom{\rule{0ex}{0ex}}=logP\left(X\right)+\int Q\left(Z\right)logQ\left(Z\right)dZ-\int Q\left(Z\right)logP\left(Z,X\right)dZ$

$L\left(Q\right)=\int Q\left(Z\right)logP\left(Z,X\right)dZ-\int Q\left(Z\right)logQ\left(Z\right)dZ$$L(Q) =\int Q(Z) logP(Z,X)dZ-\int Q(Z) logQ(Z) dZ$

$logP\left(X\right)=KL\left(Q||P\right)+L\left(Q\right)$

$logP\left(X\right)\ge L\left(Q\right)$

$L\left(Q\right)=\int Q\left(Z\right)logP\left(Z,X\right)dZ-\int Q\left(Z\right)logQ\left(Z\right)dZ$

$Q\left(Z\right)=\prod _{i}Q\left({z}_{i}\right)$

$\int Q\left(Z\right)logQ\left(Z\right)dZ＝\int \prod _{i}Q\left({z}_{i}\right)log\prod _{j}Q\left({z}_{j}\right)dZ\phantom{\rule{0ex}{0ex}}＝\int \prod _{i}Q\left({z}_{i}\right)\sum _{j}logQ\left({z}_{j}\right)dZ\phantom{\rule{0ex}{0ex}}=\sum _{j}\int \prod _{i}Q\left({z}_{i}\right)logQ\left({z}_{j}\right)dZ\phantom{\rule{0ex}{0ex}}＝\sum _{j}\int Q\left({z}_{j}\right)logQ\left({z}_{j}\right)d{z}_{j}\int \prod _{i:i\ne j}Q\left({z}_{i}\right)d{z}_{i}\phantom{\rule{0ex}{0ex}}=\sum _{j}\int Q\left({z}_{j}\right)logQ\left({z}_{j}\right)d{z}_{j}$

$\int Q\left(Z\right)logP\left(Z,X\right)dZ=\int \prod _{i}Q\left({z}_{i}\right)logP\left(Z,X\right)dZ\phantom{\rule{0ex}{0ex}}=\int Q\left({z}_{j}\right)\left(\prod _{i:i\ne j}Q\left({z}_{i}\right)logP\left(Z,X\right)d{z}_{i}\right)d{z}_{j}\phantom{\rule{0ex}{0ex}}=\int Q\left({z}_{j}\right){E}_{i\ne j}\left[logP\left(Z,X\right)\right]d{z}_{j}\phantom{\rule{0ex}{0ex}}=\int Q\left({z}_{j}\right)log\left\{exp\left({E}_{i\ne j}\left[logP\left(Z,X\right)\right]\right)\right\}d{z}_{j}\phantom{\rule{0ex}{0ex}}=\int Q\left({z}_{j}\right)log\frac{exp\left({E}_{i\ne j}\left[logP\left(Z,X\right)\right]\right)}{\int exp\left({E}_{i\ne j}\left[logP\left(Z,X\right)\right]\right)}d{z}_{j}-C\phantom{\rule{0ex}{0ex}}=\int Q\left({z}_{j}\right)log{Q}^{\ast }\left({z}_{j}\right)d{z}_{j}-C$

$L\left(Q\right)=\int Q\left({z}_{j}\right)log{Q}^{\ast }\left({z}_{j}\right)d{z}_{j}-\sum _{j}\int Q\left({z}_{j}\right)logQ\left({z}_{j}\right)d{z}_{j}-C\phantom{\rule{0ex}{0ex}}=\int Q\left({z}_{j}\right)\frac{log{Q}^{\ast }\left({z}_{j}\right)}{Q\left({z}_{j}\right)}d{z}_{j}-\sum _{i:i\ne j}\int Q\left({z}_{i}\right)logQ\left({z}_{i}\right)d{z}_{i}-C\phantom{\rule{0ex}{0ex}}＝-KL\left(Q\left({z}_{j}\right)||{Q}^{\ast }\left({z}_{j}\right)\right)+\prod _{i:i\ne j}H\left(Q\left({z}_{i}\right)\right)-C$

$KL\left(Q\left({z}_{j}\right)||{Q}^{\ast }\left({z}_{j}\right)\right)\ge 0$

$H\left(Q\left({z}_{i}\right)\right)\ge 0$
，那么要最大化$L\left(Q\left(Z\right)\right)$$L(Q(Z))$只需要令
$-KL\left(Q\left({z}_{j}\right)||{Q}^{\ast }\left({z}_{j}\right)\right)=0$

，如果想直接用变分法求得最优解也是可以的，结合拉格朗日乘子法：
$\frac{\delta }{\delta Q\left({z}_{j}\right)}\left\{\int Q\left({z}_{j}\right)log{Q}^{\ast }\left({z}_{j}\right)d{z}_{j}-\int Q\left({z}_{j}\right)logQ\left({z}_{j}\right)d{z}_{j}+{\lambda }_{i}\left({\int }_{i}Q\left({z}_{i}\right)d{z}_{i}-1\right)\right\}$

• 循环直到收敛 ：
• 对于每一个$Q\left({z}_{j}\right)$$Q(z_j)$:
• $Q\left({z}_{j}\right)={Q}^{\ast }\left({z}_{j}\right)$$Q(z_j) = Q^*(z_j)$

1. 确定好研究模型各个参数的的共轭先验分布如果想做full bayes model
2. 写出研究模型的联合分布$P\left(Z,X\right)$$P(Z,X)$
3. 根据联合分布确定变分分布的形式$Q\left(Z\right)$$Q(Z)$
4. 对于每个变分因子$Q\left({z}_{j}\right)$$Q(z_j)$求出$P\left(Z,X\right)$$P(Z,X)$关于不包含变量${z}_{j}$$z_j$的数学期望，再规整化为概率分布

# 参考文献

A Tutorial on Variational Bayesian Inference
《Pattern Recognition and Machine Learning》第十章

Latent Dirichlet Allocation