A Style-Based Generator Architecture for Generative Adversarial Networks（CVPR19）

最新推荐文章于 2024-01-30 21:18:26 发布

o0Helloworld0o

最新推荐文章于 2024-01-30 21:18:26 发布

阅读量485

点赞数

分类专栏：读书笔记

本文链接：https://blog.csdn.net/o0Helloworld0o/article/details/103718312

版权

读书笔记专栏收录该内容

40 篇文章 1 订阅

订阅专栏

一篇很好的article介绍StyleGAN，见https://www.lyrn.ai/2018/12/26/a-style-based-generator-architecture-for-generative-adversarial-networks/

2. Style-based generator

在这里插入图片描述
如Figure 1a所示，传统的生成器的输入层负责接收一个latent code $\mathbf{z}\in\mathcal{Z}$ （其实就是GAN中的noise，作者为了和生成器中的noise区分，此处使用术语latent code）用于生成图像

如Figure 1b所示，本文的抛弃了传统的生成器设计，令生成器的输入层负责接收一个learned constant（即图中的Const 4x4x512）
此外，额外使用一个mapping network $f:\mathcal{Z}\rightarrow\mathcal{W}$ ，将latent code $\mathbf{z}$ 转换为intermediate latent code $\mathbf{w}\in\mathcal{W}$
实验中， $\mathbf{z}$ 和 $\mathbf{w}$ 的维度均为512， $f$ 的结构是8-layer MLP

$\mathbf{w}$ 通过affine transform变换为styles $\mathbf{y}=(\mathbf{y}_s, \mathbf{y}_b)$ （下标 $s$ 和 $b$ 分别表示 $s c a l e$ 和 $b i a s$ ），之后 $\mathbf{y}=(\mathbf{y}_s, \mathbf{y}_b)$ 被用于执行adaptive instance normalization（AdaIN）
${\rm AdaIN}(\mathbf{x}_i,\mathbf{y})=\mathbf{y}_{s,i}\frac{\mathbf{x}_i-\mu(\mathbf{x}_i)}{\sigma(\mathbf{x}_i)}+\mathbf{y}_{b,i} \qquad(1)$
其中 $\mathbf{x}_i$ 是某个尺寸下的feature map，执行AdaIN后，就将 $\mathbf{x}_i$ 的style转换为 $\mathbf{y}=(\mathbf{y}_s, \mathbf{y}_b)$ 所指定的样子

为了生成stochatic detail，额外引入explicit noise input，本质上是单通道的uncorrelated Gaussian noise，首先使用learned per-feature scaling factors将它们scale，然后加到feature map上，如Figure 1b的右边部分所示

2.1. Quality of generated images

在这里插入图片描述
如Table 1所示，使用5种模型（A～F）进行了实验

mixing regularization的作用是decorrelate neighboring styles

3. Properties of the style-based generator

The effects of each style are localized in the network, i.e., modifying
a specific subset of the styles can be expected to affect only certain aspects of the image.

本文希望学习到的style能够localized in the network，意思是希望学习出来的style不是相互关联的，而是每一个style控制图像一个方面的性质，换句话说，如果style无法localized in the network，那么修改a subset of the styles会影响图像的全局信息

3.1. Style mixing

在训练过程中，使用2个latent code $\mathbf{z}_1,\mathbf{z}_2$ ，变换得到 $\mathbf{w}_1,\mathbf{w}_2$ ，在生成网络中随机选择一层作为分界线，前半部分使用 $\mathbf{w}_1$ 的分量，后半部分使用 $\mathbf{w}_2$ 的分量

Figure 3展示了Style mixing的效果
在这里插入图片描述

3.2. Stochastic variation

stochastic定义为人脸图像中的一些微小的属性（如头发、胡茬、雀斑、毛孔等）

传统的生成器难以做到生成一些微小的stochastic variation，而在本文的生成器中，在不同尺度的feature map上加noise，这样在尺寸大的feature map上加的noise就形成了面部细节（如雀斑）的变化，蕴含的思想是，将noise分散到各个尺寸，控制不同global和local的feature

4. Disentanglement studies

作者提出了2个定量metric来衡量latent space的线性程度，分别是Perceptual path length和Linear separability

B. Truncation trick in $\mathcal{W}$

对于训练数据的分布，在low density的地方很难被学习到，已有的研究表明，对latent vector使用截断或者shrunk sampling space能够提高生成图像的质量，付出的代价是variation会受到一些损失

使用相同的策略，首先计算空间 $\mathcal{W}$ 的质心， $\bar{\mathbf{w}}=\mathbb{E}_{\mathbf{z}\sim P(\mathbf{z})}[f(\mathbf{z})]$ ，在FFHQ数据库上，质心可以看作平均脸（见Figure 8中 $\psi=0$ ）
对于一个给定的 $\mathbf{w}$ ，引入一个参数 $\psi<1$ ，控制其偏移质心的scale， $\mathbf{w}'=\bar{\mathbf{w}}+\psi(\mathbf{w}-\bar{\mathbf{w}})$
在这里插入图片描述
【总结】
StyleGAN的贡献在于，提出了一个新颖的生成器结构，结合style transfer中style的概念，引入一个intermediate latent space $\mathcal{W}$ ，逐层处理feature map，最终能够生成大尺寸的高清人脸（仅就生成高清图像而言，一般的GAN很难做到），不足之处在于StyleGAN不是conditional，无法指定类别生成人脸，并且StyleGAN的训练非常消耗资源（8张V100训练一周），因此如何将StyleGAN发布的预训练模型利用起来，也是一个值得考虑的问题

重读原文

Abstract

We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature.
主要指的是AdaIN

The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis.
理解：high-level attributes指常规的attribute，stochastic variation则是一些随机的细枝末节的东西（头发丝、雀斑位置等）

Our generator starts from a learned constant input and adjusts the “style” of the image at each convolution layer based on the latent code, therefore directly controlling the strength of image features at different scales.
constant input是学习出来的？

Combined with noise injected directly into the network, this architectural change leads to automatic, unsupervised separation of high-level attributes (e.g., pose, identity) from stochastic variation (e.g., freckles, hair) in the generated images, and enables intuitive scale-specific mixing and interpolation operations.
所以属性由latent code指定的style来控制，随机因素由noise的注入产生

The input latent space must follow the probability density of the training data, and we argue that this leads to some degree of unavoidable entanglement. Our intermediate latent space is free from that restriction and is therefore allowed to be disentangled.

【代码解析】

训练脚本train.py

if 1使用EasyDict定义了许多训练中涉及的东西，包括
training_loop
G和D的结构、训练参数、loss function
数据集
Training Schedule
训练过程中可视化目录
metrics
submit_config？（包括minibatch_size, minibatch_dict）
tf_config（tensorflow random seed）
desc，实验名称，如sgan-ffhq-8gpu

然后把if 1中定义的所有东西统一整合到kwargs中，然后用一行代码dnnlib.submit_run(**kwargs)，启动实验

kwargs中包含的key/value如下

run_func_name   'training.training_loop.training_loop'（字符串）
mirror_augment  True
total_kimg      25000

G_args          {'func_name': 'training.networks_stylegan.G_style'}
D_args          {'func_name': 'training.networks_stylegan.D_basic'}
G_opt_args      {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
D_opt_args      {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
G_loss_args     {'func_name': 'training.loss.G_logistic_nonsaturating'}
D_loss_args     {'func_name': 'training.loss.D_logistic_simplegp', 'r1_gamma': 10.0}
dataset_args    {'tfrecord_dir': 'ffhq277'}

sched_args      {
                    'minibatch_base': 4,
                    'minibatch_dict': {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4},
                    'lod_initial_resolution': 8,
                    'G_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003},
                    'D_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}
                }

grid_args       {'size': '4k', 'layout': 'random'}
metric_arg_list [metric_base.fid50k]
tf_config       {'rnd.np_random_seed': 1000}

submit_config   {
                    'run_dir_root': 'results',
                    'run_desc': 'sgan-ffhq277-1gpu',
                    'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
                    'run_dir_extra_files': None,
                    'submit_target': <SubmitTarget.LOCAL: 1>,
                    'num_gpus': 1,
                    'print_info': False,
                    'ask_confirmation': False,
                    'run_id': None,
                    'run_name': None,
                    'run_dir': None,
                    'run_func_name': None,
                    'run_func_kwargs': None,
                    'user_name': None,
                    'task_name': None,
                    'host_name': 'localhost'
                }

进入submit_run函数

def submit_run(submit_config: SubmitConfig, run_func_name: str, **run_func_kwargs) -> None:
其中，submit_config和run_func_name是显示指定的参数

接下来，submit_config的user_name、run_func_name、run_func_kwargs被更新了，submit_config变成了
{
     'run_dir_root': 'results',
     'run_desc': 'sgan-ffhq277-1gpu',
     'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
     'run_dir_extra_files': None,
     'submit_target': <SubmitTarget.LOCAL: 1>,
     'num_gpus': 1,
     'print_info': False,
     'ask_confirmation': False,
     'run_id': None,
     'run_name': None,
     'run_dir': None,
     'run_func_name'（更新）: training.training_loop.training_loop,
     'run_func_kwargs'（更新）: 其它所有配置都塞到了这里,
     'user_name'（更新）: bowen.pbw,
     'task_name': None,
     'host_name': 'localhost'
 }

继续往下，创建了实验目录，把一些文件copy进去
Creating the run dir root: results
Creating the run dir: results/00000-sgan-ffhq277-1gpu
Copying files to the run dir

results
   └─00000-sgan-ffhq277-1gpu
          ├─run.py
          ├─src（文件夹）
          ├─submit_config.pkl
          └─submit_config.txt

最终submit_config变成了
{
     'run_dir_root': 'results',
     'run_desc': 'sgan-ffhq277-1gpu',
     'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
     'run_dir_extra_files': None,
     'submit_target': <SubmitTarget.LOCAL: 1>,	# <enum 'SubmitTarget'>型
     'num_gpus': 1,
     'print_info': False,
     'ask_confirmation': False,
     'run_id': 0,
     'run_name': '00000-sgan-ffhq277-1gpu',
     'run_dir': 'results/00000-sgan-ffhq277-1gpu',
     'run_func_name': 'training.training_loop.training_loop',
     'run_func_kwargs': 其它所有配置都塞到了这里,
     'user_name': 'bowen.pbw',
     'task_name': 'bowen.pbw-00000-sgan-ffhq277-1gpu',
     'host_name': 'localhost'
 }

最后一条语句run_wrapper(submit_config)

进入run_wrapper函数

util.call_func_by_name(func_name=submit_config.run_func_name, submit_config=submit_config, **submit_config.run_func_kwargs)
相当于调用了training.training_loop.training_loop函数

进入training.training_loop.training_loop函数，注释解释了参数的含义
传参如下

submit_config   {
                    'run_dir_root': 'results',
                    'run_desc': 'sgan-ffhq277-1gpu',
                    'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
                    'run_dir_extra_files': None,
                    'submit_target': <SubmitTarget.LOCAL: 1>,
                    'num_gpus': 1,
                    'print_info': False,
                    'ask_confirmation': False,
                    'run_id': 0,
                    'run_name': '00000-sgan-ffhq277-1gpu',
                    'run_dir': 'results/00000-sgan-ffhq277-1gpu',
                    'run_func_name': 'training.training_loop.training_loop',
                    'run_func_kwargs': {
                        'mirror_augment': True,
                        'total_kimg': 25000,
                        'G_args': {'func_name': 'training.networks_stylegan.G_style'},
                        'D_args': {'func_name': 'training.networks_stylegan.D_basic'},
                        'G_opt_args': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
                        'D_opt_args': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
                        'G_loss_args': {'func_name': 'training.loss.G_logistic_nonsaturating'},
                        'D_loss_args': {'func_name': 'training.loss.D_logistic_simplegp', 'r1_gamma': 10.0},
                        'dataset_args': {'tfrecord_dir': 'ffhq277'},
                        'sched_args': {
                            'minibatch_base': 4,
                            'minibatch_dict': {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4},
                            'lod_initial_resolution': 8,
                            'G_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003},
                            'D_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}
                        },
                        'grid_args': {'size': '4k', 'layout': 'random'},
                        'metric_arg_list': [{'func_name': 'metrics.frechet_inception_distance.FID', 'name': 'fid50k', 'num_images': 50000, 'minibatch_per_gpu': 8}],
                        'tf_config': {'rnd.np_random_seed': 1000}
                    },
                    'user_name': 'bowen.pbw',
                    'task_name': 'bowen.pbw-00000-sgan-ffhq277-1gpu',
                    'host_name': 'localhost'
                }
G_args  		{'func_name': 'training.networks_stylegan.G_style'}
D_args  		{'func_name': 'training.networks_stylegan.D_basic'}
G_opt_args      {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
D_opt_args      {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
G_loss_args     {'func_name': 'training.loss.G_logistic_nonsaturating'}
D_loss_args     {'func_name': 'training.loss.D_logistic_simplegp', 'r1_gamma': 10.0}
dataset_args    {'tfrecord_dir': 'ffhq277'}
sched_args      {
                    'minibatch_base': 4,
                    'minibatch_dict': {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4},
                    'lod_initial_resolution': 8,
                    'G_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}, 'D_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}
                }
grid_args       {'size': '4k', 'layout': 'random'}
metric_arg_list [{'func_name': 'metrics.frechet_inception_distance.FID', 'name': 'fid50k', 'num_images': 50000, 'minibatch_per_gpu': 8}]
tf_config       {'rnd.np_random_seed': 1000}
G_smoothing_kimg    10
D_repeat        	1
minibatch_repeats   4
reset_opt_for_new_lod   True
total_kimg  	15000 -> 25000
mirror_augment	False -> True
drange_net		[-1, 1]
image_snapshot_ticks	1
network_snapshot_ticks	10
save_tf_graph	False
save_weight_histograms	False
resume_run_id	None
resume_snapshot	None
resume_kimg		0.0
resume_time		0.0

来到代码以下代码，创建数据集
training_set = dataset.load_dataset(data_dir=config.data_dir, verbose=True, **dataset_args)

创建生成器
G = tflib.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **G_args)

进入dnnlib/tflib/network.py class Network的构造函数__init__

传参如下
name			`G`
func_name		`training.networks_stylegan.G_style`
static_kwargs	{'num_channels': 3, 'resolution': 1024, 'label_size': 0}

然后设置
self._build_func为函数指针training.networks_stylegan.G_style
self._build_module_src为爬取training/networks_stylegan.py的源代码

进入 def _init_graph(self) -> None:
从G_style的函数定义中找到相关信息，更新self.input_names（使用`inspect.signature`库）
self.input_names = ['latents_in', 'labels_in']

继续更新其它属性
self.num_inputs = 2
self.name = 'G'
self.scope = 'G'

设置
build_kwargs = {
	'num_channels': 3,
	'resolution': 256,
	'label_size': 0,
	'is_template_graph': True,
	'components': {}
}

调用函数创建网络
out_expr = self._build_func(*self.input_templates, **build_kwargs)
其中
self.input_templates = [
	<tf.Tensor 'G/latents_in:0' shape=<unknown> dtype=float32>,
	<tf.Tensor 'G/labels_in:0' shape=<unknown> dtype=float32>
]
build_kwargs，见上

进入G_style，传参如下

latents_in		Tensor("G/latents_in:0", dtype=float32, device=/device:GPU:0)
labels_in		Tensor("G/labels_in:0", dtype=float32, device=/device:GPU:0)
truncation_psi			0.7
truncation_cutoff		8
truncation_psi_val		None
truncation_cutoff_val	None
dlatent_avg_beta		0.995
style_mixing_prob		0.9
is_training				False
is_validation			False
is_template_graph		True
components				{}
**kwargs				{'num_channels': 3, 'resolution': 256, 'label_size': 0}

来到以下代码，创建子网络（仍然是`tflib.Network`型）
components.synthesis = tflib.Network('G_synthesis', func_name=G_synthesis, **kwargs)

dlatents_in		Tensor("G_synthesis/dlatents_in:0", dtype=float32, device=/device:GPU:0)
dlatent_size	512
num_channels	3
resolution		1024
fmap_base		8192
fmap_decay		1.0
fmap_max		512
use_styles		True
const_input_layer	True
use_noise		True
randomize_noise	True
nonlinearity	'lrelu'
use_wscale		True
use_pixel_norm	False
use_instance_norm	True
dtype			'float32'
fused_scale		'auto'
blur_filter		[1, 2, 1]
structure		'auto'
is_template_graph	True
force_clean_graph	False
_kwargs			{'label_size': 0, 'components': {}}

Graph传播Note

o0Helloworld0o

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
A Style-Based Generator Architecture for Generative Adversarial Networks（CVPR19）

2. Style-based generator如Figure 1a所示，传统的生成器的输入层负责接收一个latent code z∈Zz\in\mathcal{Z}z∈Z（其实就是GAN中的noise，作者为了和生成器中的noise区分，此处使用术语latent code）用于生成图像如Figure 1b所示，本文的抛弃了传统的生成器设计，令生成器的输入层负责接收一个learned cons...
复制链接

扫一扫