一篇很好的article介绍StyleGAN,见https://www.lyrn.ai/2018/12/26/a-style-based-generator-architecture-for-generative-adversarial-networks/
2. Style-based generator
如Figure 1a所示,传统的生成器的输入层负责接收一个latent code
z
∈
Z
\mathbf{z}\in\mathcal{Z}
z∈Z(其实就是GAN中的noise,作者为了和生成器中的noise区分,此处使用术语latent code)用于生成图像
如Figure 1b所示,本文的抛弃了传统的生成器设计,令生成器的输入层负责接收一个learned constant(即图中的Const 4x4x512)
此外,额外使用一个mapping network
f
:
Z
→
W
f:\mathcal{Z}\rightarrow\mathcal{W}
f:Z→W,将latent code
z
\mathbf{z}
z转换为intermediate latent code
w
∈
W
\mathbf{w}\in\mathcal{W}
w∈W
实验中,
z
\mathbf{z}
z和
w
\mathbf{w}
w的维度均为512,
f
f
f的结构是8-layer MLP
w
\mathbf{w}
w通过affine transform变换为styles
y
=
(
y
s
,
y
b
)
\mathbf{y}=(\mathbf{y}_s, \mathbf{y}_b)
y=(ys,yb)(下标
s
s
s和
b
b
b分别表示
s
c
a
l
e
scale
scale和
b
i
a
s
bias
bias),之后
y
=
(
y
s
,
y
b
)
\mathbf{y}=(\mathbf{y}_s, \mathbf{y}_b)
y=(ys,yb)被用于执行adaptive instance normalization(AdaIN)
A
d
a
I
N
(
x
i
,
y
)
=
y
s
,
i
x
i
−
μ
(
x
i
)
σ
(
x
i
)
+
y
b
,
i
(
1
)
{\rm AdaIN}(\mathbf{x}_i,\mathbf{y})=\mathbf{y}_{s,i}\frac{\mathbf{x}_i-\mu(\mathbf{x}_i)}{\sigma(\mathbf{x}_i)}+\mathbf{y}_{b,i} \qquad(1)
AdaIN(xi,y)=ys,iσ(xi)xi−μ(xi)+yb,i(1)
其中
x
i
\mathbf{x}_i
xi是某个尺寸下的feature map,执行AdaIN后,就将
x
i
\mathbf{x}_i
xi的style转换为
y
=
(
y
s
,
y
b
)
\mathbf{y}=(\mathbf{y}_s, \mathbf{y}_b)
y=(ys,yb)所指定的样子
为了生成stochatic detail,额外引入explicit noise input,本质上是单通道的uncorrelated Gaussian noise,首先使用learned per-feature scaling factors将它们scale,然后加到feature map上,如Figure 1b的右边部分所示
2.1. Quality of generated images
如Table 1所示,使用5种模型(A~F)进行了实验
mixing regularization的作用是decorrelate neighboring styles
3. Properties of the style-based generator
The effects of each style are localized in the network, i.e., modifying
a specific subset of the styles can be expected to affect only certain aspects of the image.
本文希望学习到的style能够localized in the network,意思是希望学习出来的style不是相互关联的,而是每一个style控制图像一个方面的性质,换句话说,如果style无法localized in the network,那么修改a subset of the styles会影响图像的全局信息
3.1. Style mixing
在训练过程中,使用2个latent code z 1 , z 2 \mathbf{z}_1,\mathbf{z}_2 z1,z2,变换得到 w 1 , w 2 \mathbf{w}_1,\mathbf{w}_2 w1,w2,在生成网络中随机选择一层作为分界线,前半部分使用 w 1 \mathbf{w}_1 w1的分量,后半部分使用 w 2 \mathbf{w}_2 w2的分量
Figure 3展示了Style mixing的效果
3.2. Stochastic variation
stochastic定义为人脸图像中的一些微小的属性(如头发、胡茬、雀斑、毛孔等)
传统的生成器难以做到生成一些微小的stochastic variation,而在本文的生成器中,在不同尺度的feature map上加noise,这样在尺寸大的feature map上加的noise就形成了面部细节(如雀斑)的变化,蕴含的思想是,将noise分散到各个尺寸,控制不同global和local的feature
4. Disentanglement studies
作者提出了2个定量metric来衡量latent space的线性程度,分别是Perceptual path length和Linear separability
B. Truncation trick in W \mathcal{W} W
对于训练数据的分布,在low density的地方很难被学习到,已有的研究表明,对latent vector使用截断或者shrunk sampling space能够提高生成图像的质量,付出的代价是variation会受到一些损失
使用相同的策略,首先计算空间
W
\mathcal{W}
W的质心,
w
ˉ
=
E
z
∼
P
(
z
)
[
f
(
z
)
]
\bar{\mathbf{w}}=\mathbb{E}_{\mathbf{z}\sim P(\mathbf{z})}[f(\mathbf{z})]
wˉ=Ez∼P(z)[f(z)],在FFHQ数据库上,质心可以看作平均脸(见Figure 8中
ψ
=
0
\psi=0
ψ=0)
对于一个给定的
w
\mathbf{w}
w,引入一个参数
ψ
<
1
\psi<1
ψ<1,控制其偏移质心的scale,
w
′
=
w
ˉ
+
ψ
(
w
−
w
ˉ
)
\mathbf{w}'=\bar{\mathbf{w}}+\psi(\mathbf{w}-\bar{\mathbf{w}})
w′=wˉ+ψ(w−wˉ)
【总结】
StyleGAN的贡献在于,提出了一个新颖的生成器结构,结合style transfer中style的概念,引入一个intermediate latent space
W
\mathcal{W}
W,逐层处理feature map,最终能够生成大尺寸的高清人脸(仅就生成高清图像而言,一般的GAN很难做到),不足之处在于StyleGAN不是conditional,无法指定类别生成人脸,并且StyleGAN的训练非常消耗资源(8张V100训练一周),因此如何将StyleGAN发布的预训练模型利用起来,也是一个值得考虑的问题
重读原文
Abstract
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature.
主要指的是AdaIN
The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis.
理解:high-level attributes指常规的attribute,stochastic variation则是一些随机的细枝末节的东西(头发丝、雀斑位置等)
Our generator starts from a learned constant input and adjusts the “style” of the image at each convolution layer based on the latent code, therefore directly controlling the strength of image features at different scales.
constant input是学习出来的?
Combined with noise injected directly into the network, this architectural change leads to automatic, unsupervised separation of high-level attributes (e.g., pose, identity) from stochastic variation (e.g., freckles, hair) in the generated images, and enables intuitive scale-specific mixing and interpolation operations.
所以属性由latent code指定的style来控制,随机因素由noise的注入产生
The input latent space must follow the probability density of the training data, and we argue that this leads to some degree of unavoidable entanglement. Our intermediate latent space is free from that restriction and is therefore allowed to be disentangled.
【代码解析】
训练脚本train.py
if 1
使用EasyDict
定义了许多训练中涉及的东西,包括
training_loop
G和D的结构、训练参数、loss function
数据集
Training Schedule
训练过程中可视化目录
metrics
submit_config?(包括minibatch_size, minibatch_dict)
tf_config(tensorflow random seed)
desc,实验名称,如sgan-ffhq-8gpu
然后把if 1
中定义的所有东西统一整合到kwargs
中,然后用一行代码dnnlib.submit_run(**kwargs)
,启动实验
kwargs
中包含的key/value如下
run_func_name 'training.training_loop.training_loop'(字符串)
mirror_augment True
total_kimg 25000
G_args {'func_name': 'training.networks_stylegan.G_style'}
D_args {'func_name': 'training.networks_stylegan.D_basic'}
G_opt_args {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
D_opt_args {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
G_loss_args {'func_name': 'training.loss.G_logistic_nonsaturating'}
D_loss_args {'func_name': 'training.loss.D_logistic_simplegp', 'r1_gamma': 10.0}
dataset_args {'tfrecord_dir': 'ffhq277'}
sched_args {
'minibatch_base': 4,
'minibatch_dict': {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4},
'lod_initial_resolution': 8,
'G_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003},
'D_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}
}
grid_args {'size': '4k', 'layout': 'random'}
metric_arg_list [metric_base.fid50k]
tf_config {'rnd.np_random_seed': 1000}
submit_config {
'run_dir_root': 'results',
'run_desc': 'sgan-ffhq277-1gpu',
'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
'run_dir_extra_files': None,
'submit_target': <SubmitTarget.LOCAL: 1>,
'num_gpus': 1,
'print_info': False,
'ask_confirmation': False,
'run_id': None,
'run_name': None,
'run_dir': None,
'run_func_name': None,
'run_func_kwargs': None,
'user_name': None,
'task_name': None,
'host_name': 'localhost'
}
进入submit_run函数
def submit_run(submit_config: SubmitConfig, run_func_name: str, **run_func_kwargs) -> None:
其中,submit_config和run_func_name是显示指定的参数
接下来,submit_config的user_name、run_func_name、run_func_kwargs被更新了,submit_config变成了
{
'run_dir_root': 'results',
'run_desc': 'sgan-ffhq277-1gpu',
'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
'run_dir_extra_files': None,
'submit_target': <SubmitTarget.LOCAL: 1>,
'num_gpus': 1,
'print_info': False,
'ask_confirmation': False,
'run_id': None,
'run_name': None,
'run_dir': None,
'run_func_name'(更新): training.training_loop.training_loop,
'run_func_kwargs'(更新): 其它所有配置都塞到了这里,
'user_name'(更新): bowen.pbw,
'task_name': None,
'host_name': 'localhost'
}
继续往下,创建了实验目录,把一些文件copy进去
Creating the run dir root: results
Creating the run dir: results/00000-sgan-ffhq277-1gpu
Copying files to the run dir
results
└─00000-sgan-ffhq277-1gpu
├─run.py
├─src(文件夹)
├─submit_config.pkl
└─submit_config.txt
最终submit_config变成了
{
'run_dir_root': 'results',
'run_desc': 'sgan-ffhq277-1gpu',
'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
'run_dir_extra_files': None,
'submit_target': <SubmitTarget.LOCAL: 1>, # <enum 'SubmitTarget'>型
'num_gpus': 1,
'print_info': False,
'ask_confirmation': False,
'run_id': 0,
'run_name': '00000-sgan-ffhq277-1gpu',
'run_dir': 'results/00000-sgan-ffhq277-1gpu',
'run_func_name': 'training.training_loop.training_loop',
'run_func_kwargs': 其它所有配置都塞到了这里,
'user_name': 'bowen.pbw',
'task_name': 'bowen.pbw-00000-sgan-ffhq277-1gpu',
'host_name': 'localhost'
}
最后一条语句run_wrapper(submit_config)
进入run_wrapper函数
util.call_func_by_name(func_name=submit_config.run_func_name, submit_config=submit_config, **submit_config.run_func_kwargs)
相当于调用了training.training_loop.training_loop函数
进入training.training_loop.training_loop函数,注释解释了参数的含义
传参如下
submit_config {
'run_dir_root': 'results',
'run_desc': 'sgan-ffhq277-1gpu',
'run_dir_ignore': ['__pycache__', '*.pyproj', '*.sln', '*.suo', '.cache', '.idea', '.vs', '.vscode', 'results', 'datasets', 'cache'],
'run_dir_extra_files': None,
'submit_target': <SubmitTarget.LOCAL: 1>,
'num_gpus': 1,
'print_info': False,
'ask_confirmation': False,
'run_id': 0,
'run_name': '00000-sgan-ffhq277-1gpu',
'run_dir': 'results/00000-sgan-ffhq277-1gpu',
'run_func_name': 'training.training_loop.training_loop',
'run_func_kwargs': {
'mirror_augment': True,
'total_kimg': 25000,
'G_args': {'func_name': 'training.networks_stylegan.G_style'},
'D_args': {'func_name': 'training.networks_stylegan.D_basic'},
'G_opt_args': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
'D_opt_args': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
'G_loss_args': {'func_name': 'training.loss.G_logistic_nonsaturating'},
'D_loss_args': {'func_name': 'training.loss.D_logistic_simplegp', 'r1_gamma': 10.0},
'dataset_args': {'tfrecord_dir': 'ffhq277'},
'sched_args': {
'minibatch_base': 4,
'minibatch_dict': {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4},
'lod_initial_resolution': 8,
'G_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003},
'D_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}
},
'grid_args': {'size': '4k', 'layout': 'random'},
'metric_arg_list': [{'func_name': 'metrics.frechet_inception_distance.FID', 'name': 'fid50k', 'num_images': 50000, 'minibatch_per_gpu': 8}],
'tf_config': {'rnd.np_random_seed': 1000}
},
'user_name': 'bowen.pbw',
'task_name': 'bowen.pbw-00000-sgan-ffhq277-1gpu',
'host_name': 'localhost'
}
G_args {'func_name': 'training.networks_stylegan.G_style'}
D_args {'func_name': 'training.networks_stylegan.D_basic'}
G_opt_args {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
D_opt_args {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08}
G_loss_args {'func_name': 'training.loss.G_logistic_nonsaturating'}
D_loss_args {'func_name': 'training.loss.D_logistic_simplegp', 'r1_gamma': 10.0}
dataset_args {'tfrecord_dir': 'ffhq277'}
sched_args {
'minibatch_base': 4,
'minibatch_dict': {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4},
'lod_initial_resolution': 8,
'G_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}, 'D_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003}
}
grid_args {'size': '4k', 'layout': 'random'}
metric_arg_list [{'func_name': 'metrics.frechet_inception_distance.FID', 'name': 'fid50k', 'num_images': 50000, 'minibatch_per_gpu': 8}]
tf_config {'rnd.np_random_seed': 1000}
G_smoothing_kimg 10
D_repeat 1
minibatch_repeats 4
reset_opt_for_new_lod True
total_kimg 15000 -> 25000
mirror_augment False -> True
drange_net [-1, 1]
image_snapshot_ticks 1
network_snapshot_ticks 10
save_tf_graph False
save_weight_histograms False
resume_run_id None
resume_snapshot None
resume_kimg 0.0
resume_time 0.0
来到代码以下代码,创建数据集
training_set = dataset.load_dataset(data_dir=config.data_dir, verbose=True, **dataset_args)
创建生成器
G = tflib.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **G_args)
进入dnnlib/tflib/network.py
class Network
的构造函数__init__
传参如下
name `G`
func_name `training.networks_stylegan.G_style`
static_kwargs {'num_channels': 3, 'resolution': 1024, 'label_size': 0}
然后设置
self._build_func为函数指针training.networks_stylegan.G_style
self._build_module_src为爬取training/networks_stylegan.py的源代码
进入 def _init_graph(self) -> None:
从G_style的函数定义中找到相关信息,更新self.input_names(使用`inspect.signature`库)
self.input_names = ['latents_in', 'labels_in']
继续更新其它属性
self.num_inputs = 2
self.name = 'G'
self.scope = 'G'
设置
build_kwargs = {
'num_channels': 3,
'resolution': 256,
'label_size': 0,
'is_template_graph': True,
'components': {}
}
调用函数创建网络
out_expr = self._build_func(*self.input_templates, **build_kwargs)
其中
self.input_templates = [
<tf.Tensor 'G/latents_in:0' shape=<unknown> dtype=float32>,
<tf.Tensor 'G/labels_in:0' shape=<unknown> dtype=float32>
]
build_kwargs,见上
进入G_style,传参如下
latents_in Tensor("G/latents_in:0", dtype=float32, device=/device:GPU:0)
labels_in Tensor("G/labels_in:0", dtype=float32, device=/device:GPU:0)
truncation_psi 0.7
truncation_cutoff 8
truncation_psi_val None
truncation_cutoff_val None
dlatent_avg_beta 0.995
style_mixing_prob 0.9
is_training False
is_validation False
is_template_graph True
components {}
**kwargs {'num_channels': 3, 'resolution': 256, 'label_size': 0}
来到以下代码,创建子网络(仍然是`tflib.Network`型)
components.synthesis = tflib.Network('G_synthesis', func_name=G_synthesis, **kwargs)
dlatents_in Tensor("G_synthesis/dlatents_in:0", dtype=float32, device=/device:GPU:0)
dlatent_size 512
num_channels 3
resolution 1024
fmap_base 8192
fmap_decay 1.0
fmap_max 512
use_styles True
const_input_layer True
use_noise True
randomize_noise True
nonlinearity 'lrelu'
use_wscale True
use_pixel_norm False
use_instance_norm True
dtype 'float32'
fused_scale 'auto'
blur_filter [1, 2, 1]
structure 'auto'
is_template_graph True
force_clean_graph False
_kwargs {'label_size': 0, 'components': {}}
Graph传播Note