RNN & Generative Model

最新推荐文章于 2024-04-22 23:51:51 发布

Ph8_0

最新推荐文章于 2024-04-22 23:51:51 发布

阅读量392

点赞数

分类专栏： Academic Note

本文链接：https://blog.csdn.net/Ph8_0/article/details/81873963

版权

Academic Note 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

本文介绍了LSTM的门控机制，如何缓解梯度消失和爆炸问题。还探讨了其他RNN应用，如图像标题生成，以及梯度裁剪的原理。接着，深入讲解了生成模型，特别是GAN的训练过程和成本函数，以及模式塌陷问题。最后提到了Pix2Pix和CycleGAN等生成模型的应用。

摘要由CSDN通过智能技术生成

LSTM

With gated RNN,the network learns to which info is remembered and which should be forgot over a long duration (through forgot gate).
distinguish between the cell state and hidden state, the former aims to maintain a long term dependency, the latter is just the input of (forget, input and gate gate) and output of output gate.
The introduction of cell state in LSTM is the primary reason why the vanishing or exploding gradient is mitigated. Pls see Tutorial here.

Others

Eg of image captioning:
- combination of CNN and RNN
- CNN takes the input of an image and output a feature vector
- then this feature vector is input into RNN as something like a hidden state (but actually not!!), with conversion matrix: $W_{ih}$
Gradient cliping:
- solving two problems: sharp cliff in parameter space and exploding gradient space.

The basic idea is to recall that the gradient speciﬁes not the optimal step size, but only the optimal direction within an inﬁnitesimal region.
The objective function for highly nonlinear deep neural networks or forrecurrent neural networks often contains sharp nonlinearities in parameter space resulting from the multiplication of several parameters.
Thus limit the gradient size by a predefined threshold.

Exploding and vanishing gradient:
- It is sufficient for $\lambda_1 < \frac{1}{\gamma}$ for the vanishing gradient occurs.
- The necessary condition for exploding gradient is the largest singular value $\lambda_1 > \frac{1}{\gamma}$

Tutorial

Generative Model:

Training example of GAN:
- we sample a mini-batch of m noise example $\{z^{(1)}, \cdots, z^{(m)} \}$ from noise prior $p_g(z)$ (used to generate image).
- sample minibatch of m example (for training discriminator) $\{x^{(1)}, \cdots, x^{(m)} \}$ from data generation distribution $p_{data}(x)$
Cost functions may not converge using gradient descent in a minimax game.
- A zero-sum game is also called minimax. Your opponent wants to maximize its actions and your actions are to minimize them.
Maximum likelihood Estimation: $\hat{\theta} = \arg max_{\theta} \prod^N_{i=1} p (x_i|\theta)$ can be viewed as minimizing the KL Divergence: $D_{KL}(P \| Q)$ , where P is the true probability distribution we want to approximate, wthile q is the estimated distribution.

So This can be seen that KL divergence $D_{KL}(P \| Q)$ penalize probability generator which miss some mode of the real-life distri, a.k.a: $p(x) > 0$ but $q(x) \rightarrow 0$ , while it is acceptable that some generated image looks unreal (in other word $D_{KL}(P \| Q)$ won’t penalize this wrong case): $p(x) \rightarrow 0$ but $q(x) > 0$ .
By contrast the reverse KL-divergence $D_{KL}(Q \| P)$ penalize generator who only generate unreal image, a.k.a: $q(x) > 0$ but $p(x) \rightarrow 0$ . While it accepts that generator which is less variable but always produces real image, a.k.a: $q(x) \rightarrow 0$ but $p(x) > 0$

Tutorial

The gradient of JS divergence will vanish if there is a huge mismatch between the $\mu(x)$ and $\mu_g(x)$ , especially for an optimal discriminator (cause JS divergence saturate). This makes the learning very slow.
Mode collapse: nice explain in the above tutorial, less variant: all random noise always generate similar images: collapse to a single mode which can already fool the discriminator.

Extension of GAN & Application

Pix2Pix formulation: the y in the formulation is the paired example, like the corresponding map from aerial
- The generator now try to generate the paired image: from day to night
- $\mathcal{L}_{L1}$ measures the difference between the generated pairs image and true pair image.
- requires paired images as training data
- This GAN is on contional setting, which means that the random noise z in latent space should be conditioned on the input image x. So the generator takes z and x as input to output the desired data sample.
CycleGAN: unpaired image translation, learns two densities and translate a sample from the first (“images of apples”) into a sample likely under the second (“images of oranges”).
- measure the cycle-consistency loss
- F and G are two generator (transformer) whose input is an image (instead of the random noise) and output is the unpaired corresponding image.
- Input - Generate image - Reconstruct

VAE

Intuition of VAE:
- assuming training dataset

Ph8_0

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RNN & Generative Model

Gaze EstimationSegment face, detect landmarks, segment eyes, and put into the algorithm to get : 3D gaze directionand 2D point regard/gazecategorized into Feature-based Appearance-based and Mode...
复制链接

扫一扫