7月28日深度学习笔记——GAN

最新推荐文章于 2024-07-21 07:50:37 发布

Ashen_0nee

最新推荐文章于 2024-07-21 07:50:37 发布

阅读量177

点赞数

文章标签：深度学习生成对抗网络机器学习

本文链接：https://blog.csdn.net/Ashen_0nee/article/details/126028725

版权

文章目录

前言
一、Theory behind GAN
二、Framework of GAN
- 1、f-divergence
- 2、Fenchel Conjugate
三、Tips for Improving GAN

前言

本文为7月28日深度学习笔记，分为三个章节：

Theory behind GAN：Generation、max V(G, D)、Algorithm；
Framework of GAN：f-divergence、Fenchel Conjugate；
Tips for Improving GAN：JS divergence is not suitable、Least Square GAN（LSGAN）、Wasserstein GAN（WGAN）.

一、Theory behind GAN

1、Generation

$x$ : an image(a high-dimensional vector) ；
Find data distribution $P_{data}(x)$ .

(1)、Maximum Likelihood Estimation = Minimize KL Divergence

Given $P_{data}(x)$ (sample from it);
We have a distribution $P_G(x; \theta)$ :
- we want to find $\theta$ such that $P_G(x; \theta)$ close to $P_{data}(x)$ ;
Sample ${x^1, x^2, …, x^m \}$ from $P_{data}(x)$ ;
- we can compute $P_G(x^i; \theta)$
- Likelihood of generating the samples:
  $\prod_{i=1}^{m}P_G(x^i;\theta )$
- Find $\theta^*$ maximizing the likelihood.
  $\theta^* = arg\ max \prod_{i=1}^{m}P_G(x^i;\theta ) = arg\ max\ log \prod_{i=1}^{m}P_G(x^i;\theta ) = arg\ max\ \sum_{i=1}^{m} logP_G(x^i; \theta) \approx arg\ max\ E_{x~P_{data}}[logP_G(x^i; \theta)] = arg\ max \int\limits_{x}P_{data}(x)logP_G(x;\theta )dx - \int \limits_{x}P_{data}(x)logP_{data}(x)dx = arg\ min\ KL(P_{data}||P_G)$

(2)、Generator

A generator $G$ is a network. The network defines a probability distribution $𝑃_𝐺$ .

Divergence between distributions $P_G$ and $P_{data}$ :
$G^* = arg\ min\ Div(P_G, P_{data})$

(3)、Discriminator

Objective Function for D:
$V(G, D) = E_{x~P_{data}}[logD(x)] + E_{x~P_G}[log(1 - D(x))]$

Training:
$D^* = arg\ max\ V(D, G)$

2、max V(G, D)

Given $G$ , what is the optimal $D^*$ maximizing:
$E_{x~P_{data}}[logD(x)] + E_{x~P_G}[log(1 - D(x))]\\ = \int_{x}[P_{data}(x)logD(x) + P_G(x)log(1 - D(x))]dx$
**Assume that D(x) can be any function, ** given $x$ , the optimal $D^*$ maximizing:
$P_{data}(x)logD(x) + P_G(x)log(1 - D(x))$
令：
$P_{data}(x)\quad b = P_G(x)\quad D = D(x)$
Find $D^*$ maximizing: $f (D) = a l o g (D) + b l o g (1 - D)$

$\frac{df(D)}{dD} = a\times \frac{1}{D} + b\times \frac{1}{1 - D}\times (-1) = 0\\ D^* = \frac{a}{a + b}\\ 0 < D^*(x) = \frac{P_{data}(x)}{P_{data}(x) + P_G(x)} < 1\\ D^*(x) = -2log2 + 2JSD(P_{data} || P_G)$

Jensen-Shannon divergence:
$\frac{1}{2}D(P || M) + \frac{1}{2}D(Q || M)\\ M = \frac{1}{2}(P + Q)$
$G^* = arg\ min\ max\ V(G, D)\\ D^* = arg\ max\ V(D, G)$

3、Algorithm

To find the best $G$ minimizing the loss function $L (G)$ :
$\theta_G - \eta \frac{\partial L(G)}{\partial \theta_G} → \theta_{G + 1}\\ f(x) = max\{f_1(x), f_2(x), f_3(x)\}$
Given $G_0$ ;
Find $𝐷_0^∗$ maximizing $𝑉(𝐺_0,𝐷)$ (gradient ascent);
$\theta_0 - \eta \frac{\partial V(G, 𝐷_0^∗)}{\partial \theta_G} → G_1$ ;
Find $𝐷_1^∗$ maximizing $𝑉(𝐺_1,𝐷)$ ;

二、Framework of GAN

1、f-divergence

$P$ and $Q$ are two distributions. $p (x)$ and $q (x)$ are the probability of sampling $x$ . $D_f(P || Q)$ evaluates the difference of $P$ and $Q$ .

$f$ is convex;
f(1) = 0.
$D_f(P || Q) = \int_{x} q(x)f(\frac{p(x)}{q(x)})dx$

2、Fenchel Conjugate

Every convex function $f$ has a conjugate function (共轭函数) $f^*$ :
$f^*(t) = max\{xt - f(x) \}$

三、Tips for Improving GAN

1、JS divergence is not suitable

In most cases, $𝑃_𝐺$ and $𝑃_{𝑑𝑎𝑡𝑎}$ are not overlapped.

2、Least Square GAN（LSGAN）

Replace sigmoid with linear (replace classification with regression).

3、Wasserstein GAN（WGAN）

A “moving plan” is a matrix. Average distance of a plan $\gamma$ :
$B(\gamma) = \sum_{x_p, x_q} \gamma(x_p, x_q) ||x_p - x_q||$
Earth Mover’s Distance:
$min\ B(\gamma)$

Ashen_0nee

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
7月28日深度学习笔记——GAN

本文为7月28日深度学习笔记，分为三个章节：- Theory behind GAN：Generation、max V(G, D)、Algorithm；- Framework of GAN：f-divergence、Fenchel Conjugate；- Tips for Improving GAN：JS divergence is not suitable、Least Square GAN（LSGAN）、Wasserstein GAN（WGAN）....
复制链接

扫一扫