深度学习入门(8) - Generative models 生成模型

Generative models

Supervised vs. Unsupervised
Discriminative Model vs. Generative Models vs. Conditional Generative

Discriminative: only label compete for probability mass, no competition between images

Generative: images compete with each other for probability mass

usage

Discriminative:

  1. Feature learning
  2. Assign labels to data

Generative:

  1. detect outliers
  2. feature learning
  3. sample to generate new data

Conditional Generative:

  1. assign labels while rejecting outliers
  2. generate new data conditioned on input labels

请添加图片描述

Autoregressive

Goal: Write down an explicit function for p ( x ) = f ( x , W ) p(x) = f(x,W) p(x)=f(x,W)

We can break down the probability function to get p ( x ) = p ( x 1 , x 2 , . . . , x T ) = Π t = 1 T p ( x t ∣ x 1 , x 2 , . . . x t − 1 ) p(x) = p(x_1,x_2,...,x_T) = \Pi_{t=1}^Tp(x_t|x_1,x_2,...x_{t-1}) p(x)=p(x1,x2,...,xT)=Πt=1Tp(xtx1,x2,...xt1)

We can use RNN to train a density function

PixelRNN

generate image pixels one at a time, staring at the upper left corner

compute a hidden state for each pixel that depends on hidden state and RGB values from the left and above (LSTM recurrence)

h x , y = f ( h x − 1 , y , h x , y − 1 , W ) h_{x,y} = f(h_{x-1,y},h_{x,y-1},W) hx,y=f(hx1,y,hx,y1,W)

At each pixel, predict red, then blue, then green

Each pixel depends implicitly on all pixels above and to the left

Problem: Really slow both training and testing

PixelCNN

Dependency on previous pixel now modeled using a CNN over context

on new CIFAR images

Autoregressive Models: PixelRNN/CNN

Pros:

  1. can explicitly compute likelihood
  2. gives good evaluation metric
  3. good samples

Cons:

  1. sequential generation -> slow

Variational Autoencoders

VAE define an intractable density.

we can optimize a lower bound on this density instead.

(non-variational) Autoencoders

Features should extract useful information that can use for downstream tasks

Problem: how can we learn this feature transform from raw data?

idea: use the features to reconstruct the input data with a decoder

loss: L2 distance between input and reconstructed data

Variational Autoencoders

  1. learn latent features z from raw data
  2. sample from the model to generate new data

请添加图片描述

How to train this model: maximize the likelihood of data

However, we can’t get full access to all the x

so, we need to find a computable lower bound of the likelihood

Hopefully, with the growing of lower-bound, the real likelihood will increase

请添加图片描述

Process:

  1. run input data through encoder to get a distribution over latent codes
  2. encoder output should match the prior p ( z ) p(z) p(z)

​ Here, we assume the distribution we learn and the prior p ( z ) p(z) p(z) are both diagonal Gaussian distribution for computational convenience.

  1. sample code z from encoder output
  2. run sampled code through decoder to get a distribution over data samples
  3. original input data should be likely under the distribution output from step 4

Step 2, 5 are the two terms of loss we try to minimize

Generative Adversarial Networks

Setup: Assume we have data x x x drawn from distribution p d a t a ( x ) p_{data}(x) pdata(x). We want to sample from p d a t a ( x ) p_{data}(x) pdata(x).

Idea: Introduce a latent variable z with simple prior p ( z ) p(z) p(z).

sample z from p ( z ) p(z) p(z) and pass to a Generator Network x = G ( z ) x = G(z) x=G(z)

Then x is a sample from the generator distribution p G p_G pG , we want p G = p d a t a p_G = p_{data} pG=pdata

We train the Generator Network to convert z into fake data x sampled from p G p_G pG

by fooling the discriminator D

Train D to classify data as real or fake

We will train the two networks jointly, they are fighting against each other.

loss

请添加图片描述

problem: At the beginning of training, vanishing gradients for G

solution: change minimize log(1-D(G(z))) to maximize -log(D(G(z))) then G gets strong gradients at the beginning of training ! nice idea

We can do some math calculations to prove GAN can get the optimal p d a t a p_{data} pdata.

But there are still caveats about the capability of our fixed architecture to reach the optimal and its convergence.

DC-GAN

interpolating between points in latent z space

We can even do latent vector math!
Conditional GANs

we can use conditional Batch Normalization

learn parameters for different labels

Spectral Normalization

we can generate images in specific labels

  • 30
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值