flow model

  • https://www.youtube.com/watch?v=uXY18nzdSsM
  • component by component(auto regresive)的生成模型存在的问题(生成的顺序,生成的速度慢);variational auto-encoder存在的问题(optimize a maxihood lower bound,是一个近似);generative adaversarial network(unstable training)
  • generator:
    G G G一个Network,定义了一个probability distribution p G p_G pG,现在有一个从正态分布中采样的 z z z,经过 G G G得到 x = G ( z ) x = G(z) x=G(z) x x x满足 p G ( x ) p_G(x) pG(x)的分布,希望这个分布和 p d a t a ( x ) p_{data}(x) pdata(x)越接近越好, { x 1 , x 2 , ⋯   , x m } ∈ p d a t a ( x ) \{x^1,x^2,\cdots,x^m\}\in p_{data}(x) {x1,x2,,xm}pdata(x)
    – 如何使得两个分布越相近越好,一般是maximize likelihood G ∗ = a r g m a x G ∑ i = 1 m log ⁡ p G ( x i ) G^* = argmax_G\sum_{i = 1}^m\log{p_G}(x^i) G=argmaxGi=1mlogpG(xi),使得 x i x^i xi G G G产生的几率越大越好;
    – flow model直接优化上式;
  • math background
    – jacobian: x = f ( z ) , z = [ z 1 z 2 ] , x = [ x 1 x 2 ] x = f(z), z = \left [\begin{matrix} z_1\\z_2\end{matrix}\right],x = \left [\begin{matrix} x_1\\x_2\end{matrix}\right] x=f(z),z=[z1z2],x=[x1x2],jacobian的定义是 J f = [ ∂ x 1 / ∂ z 1 ∂ x 1 / ∂ z 2 ∂ x 2 / ∂ z 1 ∂ x 2 / ∂ z 2 ] J_f = \left [\begin{matrix} \partial x_1/\partial z_1 & \partial x_1 / \partial z_2 \\ \partial x_2 / \partial z_1 & \partial x_2 / \partial z_2\end{matrix}\right] Jf=[x1/z1x2/z1x1/z2x2/z2],此时 z = f − 1 ( x ) z = f^{-1}(x) z=f1(x), J f − 1 = [ ∂ z 1 / ∂ x 1 ∂ z 1 / ∂ x 2 ∂ z 2 / ∂ x 1 ∂ z 2 / ∂ x 2 ] J_{f^{-1}} = \left [\begin{matrix} \partial z_1/\partial x_1 & \partial z_1 / \partial x_2 \\ \partial z_2 / \partial x_1 & \partial z_2 / \partial x_2\end{matrix}\right] Jf1=[z1/x1z2/x1z1/x2z2/x2];两个jacobian J f ∗ J f − 1 = I J_f * J_{f^{-1}} = I JfJf1=I,两者互逆
    – determinant:行列式det(A),A是一个方阵, d e t ( A ) = 1 / d e t ( A − 1 ) det(A) = 1/det(A^{-1}) det(A)=1/det(A1);行列式可以理解为高维空间中的体积的概念;
    – change of variable: 假设现在有一个正态分布 π ( z ) \pi(z) π(z) x = f ( z ) x = f(z) x=f(z) x x x满足分布 p ( x ) p(x) p(x) p ( x ′ ) Δ x = π ( z ′ ) Δ z → p ( x ′ ) = π ( z ′ ) Δ z Δ x → p ( x ′ ) = π ( z ′ ) ∣ d z d x ∣ p(x')\Delta x = \pi(z')\Delta z\rightarrow p(x') = \pi(z')\frac{\Delta z}{\Delta x}\rightarrow p(x') = \pi(z')\left| \frac{dz}{dx}\right | p(x)Δx=π(z)Δzp(x)=π(z)ΔxΔzp(x)=π(z) dxdz ,接下来扩展到二维,两块的面积相等, p ( x ′ ) ∣ d e t [ Δ x 11 Δ x 21 Δ x 12 Δ x 22 ] ∣ = π ( z ′ ) Δ z 1 Δ z 2 p(x')\left |det\left [\begin{matrix} \Delta x_{11} & \Delta x_{21}\\\Delta x_{12}& \Delta x_{22}\end{matrix}\right ]\right | = \pi(z')\Delta z_1\Delta z_2 p(x) det[Δx11Δx12Δx21Δx22] =π(z)Δz1Δz2,其中 Δ x 12 , Δ x 22 \Delta x_{12},\Delta x_{22} Δx12,Δx22分别是 z 2 z_2 z2改变的时候 x 1 , x 2 x_1,x_2 x1,x2的改变量, Δ x 11 , Δ x 21 \Delta x_{11},\Delta x_{21} Δx11,Δx21是当 z 1 z_1 z1改变的时候 x 1 , x 2 x_1,x_2 x1,x2的改变量;接下来进行整理: π ( z ′ ) = p ( x ′ ) ∣ 1 Δ z 1 Δ z 2 d e t [ Δ x 11 Δ x 21 Δ x 12 Δ x 22 ] ∣ = p ( x ′ ) ∣ d e t [ Δ x 11 / Δ z 1 Δ x 21 / Δ z 1 Δ x 12 / Δ z 2 Δ x 22 / Δ z 2 ] ∣ = p ( x ′ ) ∣ d e t [ ∂ x 1 / ∂ z 1 ∂ x 2 / ∂ z 1 ∂ x 1 / ∂ z 2 ∂ x 2 / ∂ z 2 ] ∣ = p ( x ′ ) ∣ d e t [ ∂ x 1 / ∂ z 1 ∂ x 1 / ∂ z 2 ∂ x 2 / ∂ z 1 ∂ x 2 / ∂ z 2 ] ∣ = p ( x ′ ) ∣ d e t ( J f ) ∣ \pi(z') = p(x')\left |\frac{1}{\Delta z_1\Delta z_2}det\left [\begin{matrix} \Delta x_{11} & \Delta x_{21}\\\Delta x_{12}& \Delta x_{22}\end{matrix}\right ]\right | = p(x')\left |det\left [\begin{matrix} \Delta x_{11} / \Delta z_1 & \Delta x_{21} / \Delta z_1\\\Delta x_{12} / \Delta z_2& \Delta x_{22} / \Delta z_2\end{matrix}\right ]\right | = p(x')\left |det\left [\begin{matrix} \partial x_{1} / \partial z_1 & \partial x_{2} / \partial z_1\\\partial x_{1} / \partial z_2& \partial x_{2} / \partial z_2\end{matrix}\right ]\right | = p(x')\left |det\left [\begin{matrix} \partial x_{1} / \partial z_1 & \partial x_{1} / \partial z_2\\\partial x_{2} / \partial z_1& \partial x_{2} / \partial z_2\end{matrix}\right ]\right | = p(x') |det(J_f)| π(z)=p(x) Δz1Δz21det[Δx11Δx12Δx21Δx22] =p(x) det[Δx11z1Δx12z2Δx21z1Δx22z2] =p(x) det[x1/z1x1/z2x2/z1x2/z2] =p(x) det[x1/z1x2/z1x1/z2x2/z2] =p(x)det(Jf),也可以得到 p ( x ′ ) = π ( z ′ ) ∣ d e t ( J f − 1 ) ∣ p(x') = \pi(z')|det(J_{f^{-1}})| p(x)=π(z)det(Jf1)
    在这里插入图片描述
  • flow model:原目标是maximize likelihood G ∗ = a r g m a x G ∑ i = 1 m log ⁡ p G ( x i ) G^* = argmax_G\sum_{i = 1}^m\log{p_G}(x^i) G=argmaxGi=1mlogpG(xi),而 p G ( x i ) = π ( z i ) ∣ d e t ( J G − 1 ) ∣ , z i = G − 1 ( x i ) p_G(x^i) = \pi(z^i)|det(J_{G^{-1}})|,z^i = G^{-1}(x^i) pG(xi)=π(zi)det(JG1),zi=G1(xi),有 log ⁡ p G ( x i ) = log ⁡ π ( G − 1 ( x i ) ) + log ⁡ ∣ d e t ( J G − 1 ) ∣ \log p_G(x^i) = \log \pi(G^{-1}(x^i)) + \log |det(J_{G^{-1}})| logpG(xi)=logπ(G1(xi))+logdet(JG1);需要计算 d e t ( J G ) , G − 1 det(J_G),G^{-1} det(JG),G1,为了保证invertible,输入和输出保持维度相同;在这里插入图片描述
    p 1 ( x i ) = π ( z i ) ( ∣ d e t ( J G 1 − 1 ) ∣ ) p_1(x^i) = \pi(z^i)(|det(J_{G_1^{-1}})|) p1(xi)=π(zi)(det(JG11))
    p 2 ( x i ) = π ( z i ) ( ∣ d e t ( J G 1 − 1 ) ∣ ) ( ∣ d e t ( J G 2 − 1 ) ∣ ) p_2(x^i) = \pi(z^i)(|det(J_{G_1^{-1}})|)(|det(J_{G_2^{-1}})|) p2(xi)=π(zi)(det(JG11))(det(JG21))
    ⋯ \cdots
    p K ( x i ) = π ( z i ) ( ∣ d e t ( J G 1 − 1 ) ∣ ) ⋯ ( ∣ d e t ( J G K − 1 ) ∣ ) p_K(x^i) = \pi(z^i)(|det(J_{G_1^{-1}})|)\cdots (|det(J_{G_K^{-1}})|) pK(xi)=π(zi)(det(JG11))(det(JGK1))
    log ⁡ p K ( x i ) = log ⁡ π ( z i ) + ∑ h = 1 K log ⁡ ∣ d e t ( J G K − 1 ) ∣ , z i = G 1 − 1 ( ⋯ G K − 1 ( x i ) ) \log p_K(x^i) = \log\pi(z^i) + \sum_{h = 1}^K \log|det(J_{G_K^{-1}})|,z^i = G_1^{-1}(\cdots G_K^{-1}(x^i)) logpK(xi)=logπ(zi)+h=1Klogdet(JGK1),zi=G11(GK1(xi))
    可以看到上式只有 G − 1 G^{-1} G1
  • coupling layer:nice nvp glow
    前向:
    在这里插入图片描述
    逆向在这里插入图片描述
    接下来计算Jacobian
    在这里插入图片描述
    上面是一层的情况,接下来叠起来:
    在这里插入图片描述
  • 7
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值