RNN & Generative Model

本文介绍了LSTM的门控机制,如何缓解梯度消失和爆炸问题。还探讨了其他RNN应用,如图像标题生成,以及梯度裁剪的原理。接着,深入讲解了生成模型,特别是GAN的训练过程和成本函数,以及模式塌陷问题。最后提到了Pix2Pix和CycleGAN等生成模型的应用。
摘要由CSDN通过智能技术生成

LSTM

  • With gated RNN,the network learns to which info is remembered and which should be forgot over a long duration (through forgot gate).
  • distinguish between the cell state and hidden state, the former aims to maintain a long term dependency, the latter is just the input of (forget, input and gate gate) and output of output gate.
  • The introduction of cell state in LSTM is the primary reason why the vanishing or exploding gradient is mitigated. Pls see Tutorial here.

Others

  • Eg of image captioning:
    • combination of CNN and RNN
    • CNN takes the input of an image and output a feature vector
    • then this feature vector is input into RNN as something like a hidden state (but actually not!!), with conversion matrix: Wih W i h
  • Gradient cliping:
    • solving two problems: sharp cliff in parameter space and exploding gradient space.

The basic idea is to recall that the gradient specifies not the optimal step size, but only the optimal direction within an infinitesimal region.
The objective function for highly nonlinear deep neural networks or forrecurrent neural networks often contains sharp nonlinearities in parameter space resulting from the multiplication of several parameters.
Thus limit the gradient size by a predefined threshold.

  • Exploding and vanishing gradient:
    • It is sufficient for λ1<1γ λ 1 < 1 γ for the vanishing gradient occurs.
    • The necessary condition for exploding gradient is the largest singular value λ1>1γ λ 1 > 1 γ

Tutorial

Generative Model:

  • Training example of GAN:
    • we sample a mini-batch of m noise example {z(1),,z(m)} { z ( 1 ) , ⋯ , z ( m ) } from noise prior pg(z) p g ( z ) (used to generate image).
    • sample minibatch of m example (for training discriminator) {x(1),,x(m)} { x ( 1 ) , ⋯ , x ( m ) } from data generation distribution pdata(x) p d a t a ( x )
  • Cost functions may not converge using gradient descent in a minimax game.
    • A zero-sum game is also called minimax. Your opponent wants to maximize its actions and your actions are to minimize them.
  • Maximum likelihood Estimation: θ̂ =argmaxθNi=1p(xi|θ) θ ^ = arg ⁡ m a x θ ∏ i = 1 N p ( x i | θ ) can be viewed as minimizing the KL Divergence: DKL(PQ) D K L ( P ‖ Q ) , where P is the true probability distribution we want to approximate, wthile q is the estimated distribution.

So This can be seen that KL divergence DKL(PQ) D K L ( P ‖ Q ) penalize probability generator which miss some mode of the real-life distri, a.k.a: p(x)>0 p ( x ) > 0 but q(x)0 q ( x ) → 0 , while it is acceptable that some generated image looks unreal (in other word DKL(PQ) D K L ( P ‖ Q ) won’t penalize this wrong case): p(x)0 p ( x ) → 0 but q(x)>0 q ( x ) > 0 .
By contrast the reverse KL-divergence DKL(QP) D K L ( Q ‖ P ) penalize generator who only generate unreal image, a.k.a: q(x)>0 q ( x ) > 0 but p(x)0 p ( x ) → 0 . While it accepts that generator which is less variable but always produces real image, a.k.a: q(x)0 q ( x ) → 0 but p(x)>0 p ( x ) > 0

Tutorial

  • The gradient of JS divergence will vanish if there is a huge mismatch between the μ(x) μ ( x ) and μg(x) μ g ( x ) , especially for an optimal discriminator (cause JS divergence saturate). This makes the learning very slow.
  • Mode collapse: nice explain in the above tutorial, less variant: all random noise always generate similar images: collapse to a single mode which can already fool the discriminator.

Extension of GAN & Application

  • Pix2Pix formulation: the y in the formulation is the paired example, like the corresponding map from aerial
    • The generator now try to generate the paired image: from day to night
    • L1 L L 1 measures the difference between the generated pairs image and true pair image.
    • requires paired images as training data
    • This GAN is on contional setting, which means that the random noise z in latent space should be conditioned on the input image x. So the generator takes z and x as input to output the desired data sample.
  • CycleGAN: unpaired image translation, learns two densities and translate a sample from the first (“images of apples”) into a sample likely under the second (“images of oranges”).
    • measure the cycle-consistency loss
    • F and G are two generator (transformer) whose input is an image (instead of the random noise) and output is the unpaired corresponding image.
    • Input - Generate image - Reconstruct

VAE

  • Intuition of VAE:
    • assuming training dataset
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值