# 变分法(Calculus of variations)

f(x0)f(x0+ϵ) f ( x 0 ) ≤ f ( x 0 + ϵ )

Φ(0)=dΦ(ϵ)dϵϵ=0=f(x0+0)=f(x0)=0 Φ ′ ( 0 ) = d Φ ( ϵ ) d ϵ | ϵ = 0 = f ′ ( x 0 + 0 ) = f ′ ( x 0 ) = 0

J(y)=x2x1L(y(x),y(x),x)dx J ( y ) = ∫ x 1 x 2 L ( y ( x ) , y ′ ( x ) , x ) d x

J(f)J(f+ϵη) J ( f ) ≤ J ( f + ϵ η )
，其中 ϵη ϵ η $\epsilon \eta$称为函数 f f $f$的变分，记为$\delta f$$\delta f$

Φ(ϵ)=J(f+ϵη) Φ ( ϵ ) = J ( f + ϵ η )
，同样的有：
Φ(0)=dΦ(ϵ)dϵϵ=0=dJ(f+ϵη)dϵϵ=0=x2x1dLdϵϵ=0dx=0 Φ ′ ( 0 ) = d Φ ( ϵ ) d ϵ | ϵ = 0 = d J ( f + ϵ η ) d ϵ | ϵ = 0 = ∫ x 1 x 2 d L d ϵ | ϵ = 0 d x = 0

dLdϵ=Lyyϵ+Lyyϵ d L d ϵ = ∂ L ∂ y ∂ y ∂ ϵ + ∂ L ∂ y ′ ∂ y ′ ∂ ϵ

yϵ=ηyϵ=η ∂ y ∂ ϵ = η ∂ y ′ ∂ ϵ = η ′

dLdϵ=Lyη+Lyη d L d ϵ = ∂ L ∂ y η + ∂ L ∂ y ′ η ′
，再根据 分部积分法可得：
x2x1dLdϵϵ=0dx=x2x1{Lyη+Lyη}ϵ=0dx=x2x1η{LfddxLf}dx+Lfηx2x1=x2x1η{LfddxLf}dx=0 ∫ x 1 x 2 d L d ϵ | ϵ = 0 d x = ∫ x 1 x 2 { ∂ L ∂ y η + ∂ L ∂ y ′ η ′ } | ϵ = 0 d x = ∫ x 1 x 2 η { ∂ L ∂ f − d d x ∂ L ∂ f ′ } d x + ∂ L ∂ f ′ η | x 1 x 2 = ∫ x 1 x 2 η { ∂ L ∂ f − d d x ∂ L ∂ f ′ } d x = 0

LfddxLf=0 ∂ L ∂ f − d d x ∂ L ∂ f ′ = 0

# 平均场定理(Mean Field Theory)

In physics and probability theory, mean field theory (MFT also known as self-consistent field theory) studies the behavior of large and complex stochastic models by studying a simpler model. Such models consider a large number of small individual components which interact with each other. The effect of all the other individuals on any given individual is approximated by a single averaged effect, thus reducing a many-body problem to a one-body problem.

P(x1,x2,x3,...,xn)=P(x1)P(x2|x1)P(x3|x2,x1)...P(xn|xn1,xn2,xn3,...,x1) P ( x 1 , x 2 , x 3 , . . . , x n ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 2 , x 1 ) . . . P ( x n | x n − 1 , x n − 2 , x n − 3 , . . . , x 1 )

Q(x1,x2,x3,...,xn)=Q(x1)Q(x2)Q(x3)...Q(xn) Q ( x 1 , x 2 , x 3 , . . . , x n ) = Q ( x 1 ) Q ( x 2 ) Q ( x 3 ) . . . Q ( x n )

# 变分贝叶斯推断

KL(Q||P)=Q(Z)logQ(Z)P(Z|X)dZ K L ( Q | | P ) = ∫ Q ( Z ) l o g Q ( Z ) P ( Z | X ) d Z

$KL\left(Q||P\right)=\int Q\left(Z\right)log\frac{Q\left(Z\right)}{P\left(Z|X\right)}dZ\phantom{\rule{0ex}{0ex}}=-\int Q\left(Z\right)log\frac{P\left(Z|X\right)}{Q\left(Z\right)}dZ\phantom{\rule{0ex}{0ex}}=-\int Q\left(Z\right)log\frac{P\left(Z,X\right)}{Q\left(Z\right)P\left(X\right)}dZ\phantom{\rule{0ex}{0ex}}=\int Q\left(Z\right)\left[logQ\left(Z\right)+logP\left(X\right)\right]dZ-\int Q\left(Z\right)logP\left(Z,X\right)dZ\phantom{\rule{0ex}{0ex}}=logP\left(X\right)+\int Q\left(Z\right)logQ\left(Z\right)dZ-\int Q\left(Z\right)logP\left(Z,X\right)dZ$

L(Q)=Q(Z)logP(Z,X)dZQ(Z)logQ(Z)dZ L ( Q ) = ∫ Q ( Z ) l o g P ( Z , X ) d Z − ∫ Q ( Z ) l o g Q ( Z ) d Z $L(Q) =\int Q(Z) logP(Z,X)dZ-\int Q(Z) logQ(Z) dZ$

logP(X)=KL(Q||P)+L(Q) l o g P ( X ) = K L ( Q | | P ) + L ( Q )

max L(Q) m a x   L ( Q )

logP(X)L(Q) l o g P ( X ) ≥ L ( Q )

L(Q)=Q(Z)logP(Z,X)dZQ(Z)logQ(Z)dZ L ( Q ) = ∫ Q ( Z ) l o g P ( Z , X ) d Z − ∫ Q ( Z ) l o g Q ( Z ) d Z

Q(Z)=iQ(zi) Q ( Z ) = ∏ i Q ( z i )

Q(Z)logQ(Z)dZiQ(zi)logjQ(zj)dZiQ(zi)jlogQ(zj)dZ=jiQ(zi)logQ(zj)dZjQ(zj)logQ(zj)dzji:ijQ(zi)dzi=jQ(zj)logQ(zj)dzj ∫ Q ( Z ) l o g Q ( Z ) d Z ＝ ∫ ∏ i Q ( z i ) l o g ∏ j Q ( z j ) d Z ＝ ∫ ∏ i Q ( z i ) ∑ j l o g Q ( z j ) d Z = ∑ j ∫ ∏ i Q ( z i ) l o g Q ( z j ) d Z ＝ ∑ j ∫ Q ( z j ) l o g Q ( z j ) d z j ∫ ∏ i : i ≠ j Q ( z i ) d z i = ∑ j ∫ Q ( z j ) l o g Q ( z j ) d z j

Q(Z)logP(Z,X)dZ=iQ(zi)logP(Z,X)dZ=Q(zj)(i:ijQ(zi)logP(Z,X)dzi)dzj=Q(zj)Eij[logP(Z,X)]dzj=Q(zj)log{exp(Eij[logP(Z,X)])}dzj=Q(zj)logexp(Eij[logP(Z,X)])exp(Eij[logP(Z,X)])dzjC=Q(zj)logQ(zj)dzjC ∫ Q ( Z ) l o g P ( Z , X ) d Z = ∫ ∏ i Q ( z i ) l o g P ( Z , X ) d Z = ∫ Q ( z j ) ( ∏ i : i ≠ j Q ( z i ) l o g P ( Z , X ) d z i ) d z j = ∫ Q ( z j ) E i ≠ j [ l o g P ( Z , X ) ] d z j = ∫ Q ( z j ) l o g { e x p ( E i ≠ j [ l o g P ( Z , X ) ] ) } d z j = ∫ Q ( z j ) l o g e x p ( E i ≠ j [ l o g P ( Z , X ) ] ) ∫ e x p ( E i ≠ j [ l o g P ( Z , X ) ] ) d z j − C = ∫ Q ( z j ) l o g Q ∗ ( z j ) d z j − C

L(Q)=Q(zj)logQ(zj)dzjjQ(zj)logQ(zj)dzjC=Q(zj)logQ(zj)Q(zj)dzji:ijQ(zi)logQ(zi)dziCKL(Q(zj)||Q(zj))+i:ijH(Q(zi))C L ( Q ) = ∫ Q ( z j ) l o g Q ∗ ( z j ) d z j − ∑ j ∫ Q ( z j ) l o g Q ( z j ) d z j − C = ∫ Q ( z j ) l o g Q ∗ ( z j ) Q ( z j ) d z j − ∑ i : i ≠ j ∫ Q ( z i ) l o g Q ( z i ) d z i − C ＝ − K L ( Q ( z j ) | | Q ∗ ( z j ) ) + ∏ i : i ≠ j H ( Q ( z i ) ) − C

KL(Q(zj)||Q(zj))0 K L ( Q ( z j ) | | Q ∗ ( z j ) ) ≥ 0

H(Q(zi))0 H ( Q ( z i ) ) ≥ 0
，那么要最大化 L(Q(Z)) L ( Q ( Z ) ) $L(Q(Z))$只需要令
KL(Q(zj)||Q(zj))=0 − K L ( Q ( z j ) | | Q ∗ ( z j ) ) = 0

Q(zj)=Q(zj)=exp(Eij[logP(Z,X)])normalize constant Q ( z j ) = Q ∗ ( z j ) = e x p ( E i ≠ j [ l o g P ( Z , X ) ] ) n o r m a l i z e   c o n s t a n t
，如果想直接用变分法求得最优解也是可以的，结合拉格朗日乘子法：
δδQ(zj){Q(zj)logQ(zj)dzjQ(zj)logQ(zj)dzj+λi(iQ(zi)dzi1)} δ δ Q ( z j ) { ∫ Q ( z j ) l o g Q ∗ ( z j ) d z j − ∫ Q ( z j ) l o g Q ( z j ) d z j + λ i ( ∫ i Q ( z i ) d z i − 1 ) }

• 循环直到收敛 ：
• 对于每一个 Q(zj) Q ( z j ) $Q(z_j)$:
• Q(zj)=Q(zj) Q ( z j ) = Q ∗ ( z j ) $Q(z_j) = Q^*(z_j)$

1. 确定好研究模型各个参数的的共轭先验分布如果想做full bayes model
2. 写出研究模型的联合分布 P(Z,X) P ( Z , X ) $P(Z,X)$
3. 根据联合分布确定变分分布的形式 Q(Z) Q ( Z ) $Q(Z)$
4. 对于每个变分因子 Q(zj) Q ( z j ) $Q(z_j)$求出 P(Z,X) P ( Z , X ) $P(Z,X)$关于不包含变量 zj z j $z_j$的数学期望，再规整化为概率分布

# 参考文献

A Tutorial on Variational Bayesian Inference
《Pattern Recognition and Machine Learning》第十章

Latent Dirichlet Allocation

11-29 4854
11-20 1863
08-09 5418
08-07 689
11-02 2026
07-29 5043
08-04 2884
07-11 1万+
09-24 1万+
07-27 8万+