Variational Inference Review
Idea: posit a family of density and find one from the family that is the closest (in K-L divergence) to the target density.
For statisticians: VI provides a method to approximate complicate densities. Compared with MCMC, it’s easier to compute and scale to big data.
Problem of Approximate Inference
x x x-observations
z z z-latent variables
want to estimate
p ( z ∣ x ) = p ( z , x ) p ( x ) p(z|x)=\frac{p(z,x)}{p(x)} p(z∣x)=p(x)p(z,x)
pick a family of distribution Q \mathcal{Q} Q, and approximate p ( z ∣ x ) p(z|x) p(z∣x) by
q ∗ ( z ) = arg min q ( z ) ∈ Q K L ( q ( z ) ∣ ∣ p ( z ∣ x ) ) q^*(z)=\argmin_{q(z) \in \mathcal{Q}}\ KL(q(z)||p(z|x)) q∗(z)=q(z)∈Qargmin KL(q(z)∣∣p(z∣x))
Variational Object Function
K L ( q ( z ) ∣ ∣ p ( z ∣ x ) ) = E q [ log q ( z ) ] − E q [ log p ( z ∣ x ) ] = E q [ log q ( z ) ] − E q [ log p ( z , x ) ] + log p ( x ) = − E L B O ( q ) + c o n s t . KL(q(z)||p(z|x))=E_q[\log q(z)]-E_q[\log p(z|x)] \\ = E_q[\log q(z)]-E_q[\log p(z,x)]+\log p(x) \\ = -ELBO(q)+const. KL(q(z)∣∣p(z∣x))=Eq[logq(z)]−Eq[logp(z∣x)]=Eq[logq(z)]−Eq[logp(z,x)]+logp(x)=−ELBO(q)+const.
ELBO = evidence lower bound, and minimizing K-L is equivalent to maximizing ELBO:
E L B O ( q ) = E q [ log p ( z , x ) ] − E q [ log q ( z ) ] = E q [ log p ( z ) ] + E q [ log p ( x ∣ z ) ] − E q [ log q ( z ) ] = E [ log p ( x ∣ z ) ] ⏟ e x p e c t e d l i k e h o o d − K L ( q ( z ) ∣ ∣ p ( z ) ) ⏟ p e n a l i z e d e v i a t i o n f r o m p r i o r ELBO(q)=E_q[\log p(z,x)]-E_q[\log q(z)] \\ = E_q[\log p(z)]+E_q[\log p(x|z)]-E_q[\log q(z)] \\ = \underbrace{E[\log p(x|z)]}_{expected\ likehood}-\underbrace{KL(q(z)||p(z))}_{penalize\ deviation\ from\ prior} ELBO(q)=Eq[logp(z,x)]−Eq[logq(z)]=Eq[logp(z)]+Eq[logp(x∣z)]−Eq[logq(z)]=expected likehood
E[logp(x∣z)]