【论文阅读笔记】论文总结 AdaGAN、GAIL、SeqGAN等四篇

这篇博客总结了四篇关于深度学习中生成模型的论文,包括AdaGAN、GAN与IRL的联系、Generative Adversarial Imitation Learning(GAIL)和SeqGAN。AdaGAN通过Boosting方法提升生成模型,GAIL通过将模仿学习与GAN结合实现策略的直接学习。SeqGAN解决了GAN在生成离散序列时的难题,采用RL策略梯度方法更新生成模型。
摘要由CSDN通过智能技术生成

论文总结

AdaGAN: Boosting Generative Models

AdaGAN论文阅读笔记

A Connection Between GAN, IRL and EBM

###Motivation
在这里插入图片描述

Then maximizing likelihood will lead to a distribution which “covers” all of the modes, but puts most of its mass in parts of the space that have negligible density under the data distribution.

A generator trained adversarially will instead try to “fill in” as many of modes as it can.

Complex multimodal distribution很难通过maximize likelihood评估密度分布。它会将尽可能地覆盖 P d a t a P_{data} Pdata而使其 P G P_{G} PG主要分布在各个mode之间的空白区域;而GAN则会尽量填满 P d a t a P_{data} Pdata而使 P G P_G PG更接近 P d a t a P_{data} Pdata

###Summary

IRL methods are in fact mathematically equivalent to GANs.

In particular, a sample-based algorithm for maximum entropy IRL and a GAN.

Definition

Boltzmann distribution

p θ ( τ ) = 1 Z e − E θ ( τ ) p_{\theta}(\tau)=\frac{1}{Z}e^{-E_{\theta}(\tau)} pθ(τ)=Z1eEθ(τ).

Partition function

Z = ∫ e − E θ ( x ) d x Z=\int{e^{-E_{\theta}(x)}dx} Z=eEθ(x)dx.

Discriminator loss

L d i s c r i m i n a t o r ( D ) = E x ∼ p [ − log ⁡ D ( x ) ] + E x ∼ G [ − log ⁡ ( 1 − D ( x ) ) ] \mathcal{L}_{discriminator}(D)=\Bbb{E}_{x \sim p}[-\log D(x)]+\Bbb{E}_{x \sim G}[-\log (1-D(x))] Ldiscriminator(D)=Exp[logD(x)]+ExG[log(1D(x))].

Generator loss

L g e n e r a t o r ( G ) = E x ∼ G [ − log ⁡ D ( x ) ] + E x ∼ G [ log ⁡ ( 1 − D ( x ) ) ] \mathcal{L}_{generator}(G)=\Bbb{E}_{x \sim G}[-\log D(x)]+\Bbb{E}_{x \sim G}[\log (1-D(x))] Lgenerator(G)=ExG[logD(x)]+ExG[log(1D(x))].

Calculation

使用cost function c θ c_{\theta} cθ表示energy E θ E_{\theta} Eθ

Z = ∫ e − c θ ( τ ) d τ Z=\int{e^{-c_{\theta}(\tau)}d\tau} Z=ecθ(τ)dτ

使用sampling distribution q ( τ ) q(\tau) q(τ)估算 Z Z Z

L c o s t ( θ ) = E τ ∼ p [ c θ ( τ ) ] + log ⁡ ( E τ ∼ q [ e − c θ ( τ ) q ( τ ) ] ) \mathcal{L}_{cost}(\theta)=\Bbb{E}_{\tau \sim p}[c_{\theta}(\tau)]+\log{(\Bbb{E}_{\tau \sim q}[\frac{e^{-c_{\theta}(\tau)}}{q(\tau)}])} Lcost(θ)=Eτp[cθ(τ)]+log(Eτq[q(τ)ecθ(τ)])

通常最小化 q ( τ ) q(\tau) q(τ) 1 Z e − c θ ( τ ) \frac{1}{Z}e^{-c_{\theta(\tau)}} Z1ecθ(τ)的KL散度来更新 q ( τ ) q(\tau) q(τ),其等价于最小化learned cost同时最大化熵值(最小化交叉熵等价于最小化KL散度,花书P49)。

L s a m p l e r ( q ) = E τ ∼ q [ c θ ( τ ) ] + E τ ∼ q [ log ⁡ q ( τ ) ] \mathcal{L}_{sampler}(q)=\Bbb{E}_{\tau \sim q}[c_{\theta}(\tau)]+\Bbb{E}_{\tau \sim q}[\log{q(\tau)}] Lsampler(q)=Eτq[cθ(τ)]+Eτq[logq(τ)]

为了防止

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值