文章目录
style-GAN Generator 代码解析
PART 2 of Sefa study:
对于style-GAN Generator部分的代码解析
-
style-GAN论文讲解[参考](StyleGAN-基于样式的生成对抗网络(论文阅读总结) - 知乎 (zhihu.com))
-
代码中每个模块的原理作用[解析参考](StyleGAN 和 StyleGAN2 的深度理解 - 知乎 (zhihu.com))
-
styleGAN2有部分添加,而styleGAN的Synthesis部分参考PGGAN
-
下面主要是梳理代码思路
<
代码主要分为两个部分:
- Mapping network
- Synthesis network
Mapping主要网络组块在 c l a s s D e n s e B l o c k ( ) class\quad DenseBlock() classDenseBlock()
Synthesis主要的网络组块在 c l a s s C o n v B l o c k ( ) class\quad ConvBlock() classConvBlock()依靠输入参数(‘position’)不同进行调整
最后组装为 c l a s s S t y l e G A N G e n e r a t o r ( n n . M o d u l e ) class\,StyleGANGenerator(nn.Module) classStyleGANGenerator(nn.Module)
StyleGAN部分
分辨率resolution在 2 3 ∼ 2 10 2^3\sim 2^{10} 23∼210
将潜变量空间$z\in Z(dim=512)\rightarrow w\in W(dim=512) $
n u m _ l a y e r s = l o g 2 ( 2 ⋅ r e s 4 ) ∗ 2 num\_layers=log_2(\frac{2\cdot res}{4})*2 num_layers=log2(42⋅res)∗2
-
输入 z : d i m = 2 , s h a p e = [ z . s h a p e [ 0 ] , 512 ] z:dim=2,shape=[z.shape[0],512] z:dim=2,shape=[z.shape[0],512]
-
先过mapping层得到dict{‘z’: z;‘label’:None;‘w’:w}
其中 z = p i x e l _ n o r m ( z ) z=pixel\_norm(z) z=pixel_norm(z)
z z z过8层 F C FC FC,维度不变Linear+relu 变成w
保持输入输出channels=512, w : [ z . s h a p e [ 0 ] , 512 ] w:[z.shape[0],512] w:[z.shape[0],512]
self.add_module(f'dense{i}', DenseBlock(in_channels=in_channels, out_channels=out_channels, use_wscale=self.use_wscale,#=True lr_mul=self.lr_mul))#=0.01
-
两个tricks(option):
-
truncation(self.training=True进行自学习,用truncation技巧处理每层得到的w)
ps:其中使用all_gather函数源自.sync_op.py文件,主要把分布运算的w汇总求均值 w ˉ \bar w wˉ
-
style_mix (有0.9的可能性在任意mapping层以后用新的z在后面层产生的w代替原来这些层的w)
-
-
一层truncation:
- l a y e r _ n u m = l o g 2 ( 2 ⋅ r e s o l u t i o n 4 ) ∗ 2 layer\_num=log_2(\frac{2\cdot resolution}{4})*2 layer_num=log2(42⋅resolution)∗2,通过repeat改变mapping layer输出的w的结构为 w : [ w . s h a p e [ 0 ] , l a y e r _ n u m , 512 ] w:[w.shape[0],layer\_num,512] w:[w.shape[0],layer_num,512]
-
c o n s t 4 × 4 × 512 const\quad 4\times4\times512 const4×4×512过synthesis层加入噪声生成图片
synthesis:分层
-
PS:函数 d e f g e t _ n f ( s e l f , r e s ) : r e t u r n m i n { ( 4 2 ∗ ( 2 10 ) / r e s ) , 512 } def\quad get\_nf(self,res):return\quad min\lbrace(4^2*(2^{10})/res),512\rbrace defget_nf(self,res):returnmin{(42∗(210)/res),512}即卷积feature maps的个数
-
这里lod(layer of detail是从0开始加细节的)
那么网络结构就是:(详细每层运算见下面convblock部分)
- 第0层layer0 初始化x,把channel调整到512
- 循环下面block:
- start:
- 双数层(第0个block没有)channel减少一半(通过conv2d),res增加一倍(通过upsample) ,且加噪声、style_code、即B+AdaIN+A
- 单数层channel 只做B+AdaIN+A
- 最后将16channels转为3channels生成图片
- end.
-
第0层: res=init_res: 2 2 2^2 22: get_nf(res)=512
# 'Const' self.add_module(layer_name,#=layer0 ConvBlock(in_channels=self.get_nf(res), out_channels=self.get_nf(res), resolution=self.init_res,#4 w_space_dim=self.w_space_dim, position='const_init', use_wscale=self.use_wscale))
-
双数层 res= 2 3 , 4 , . . . , 10 2^{3,4,...,10} 23,4,...,10, l a y e r _ n a m e = l a y e r 2 , 4 , . . . , 16 layer\_name=layer2,4,...,16 layer_name=layer2,4,...,16 分辨率每层*2
# 'Conv0_up' self.add_module(layer_name, ConvBlock(in_channels=self.get_nf(res // 2),#*2 out_channels=self.get_nf(res), resolution=res, w_space_dim=self.w_space_dim, upsample=True, fused_scale=fused_scale, use_wscale=self.use_wscale))
-
单数层 r e s = 2 2 , 3 , . . . , 10 res=2^{2,3,...,10} res=22,3,...,10,layer1,3,…,17 保持分辨率再次卷积
#第一层叫‘Conv’,后面层叫'Conv1' self.add_module(layer_name, ConvBlock(in_channels=self.get_nf(res), out_channels=self.get_nf(res), resolution=res, w_space_dim=self.w_space_dim, use_wscale=self.use_wscale))
-
输出层 output0,1,2,…,8
self.add_module(f'output{block_idx}', ConvBlock(in_channels=self.get_nf(res), out_channels=self.image_channels,#3 resolution=res, w_space_dim=self.w_space_dim, position='last', kernel_size=1, padding=0, use_wscale=self.use_wscale, wscale_gain=1.0, activation_type='linear'))
-
DenseBlock部分(Mapping主要组件)
每层适用wscale: 2 k e r n e l _ s i z e ∗ k e r n e l _ s i z e ∗ c h a n n e l s \sqrt{\frac{2}{kernel\_size*kernel\_size*channels}} kernel_size∗kernel_size∗channels2 平衡Conv对参数大小的影响
- 输入/把x整理为: [ x . s h a p e [ 0 ] , − 1 ] [x.shape[0],-1] [x.shape[0],−1]的格式
- 过F.linear层
- relu激活
- 输出: [ x . s h a p e [ 0 ] , o u t _ c h a n n e l s ] [x.shape[0],out\_channels] [x.shape[0],out_channels]
Truncation部分(trick1)
这部分原理在开头原理参考网址上说的很清楚
class TruncationModule(nn.Module):
def将数据整理为: [ w . s h a p e [ 0 ] , n u m _ l a y e r , 512 ] [w.shape[0],num\_layer,512] [w.shape[0],num_layer,512]的形式
做的工作是将trunc_layers以上的层得到的w的值之间以trunc_psi的比例等距缩小,这里没有求
ConvBlock部分(Synthesis主要组件)
每层适用wscale: 2 k e r n e l _ s i z e ∗ k e r n e l _ s i z e ∗ c h a n n e l s \sqrt{\frac{2}{kernel\_size*kernel\_size*channels}} kernel_size∗kernel_size∗channels2 平衡Conv对参数大小的影响
parameter: w e i g h t . s h a p e = [ o u t _ c h a n n e l s , i n _ c h a n n e l s , k e r , k e r ] weight.shape=[out\_channels,in\_channels,ker,ker] weight.shape=[out_channels,in_channels,ker,ker]
其中卷积使用F.conv2d接口解释
-
position=‘const_init’:B+AdaIN+A
-
初始化x: o n e s [ w . s h a p e [ 0 ] , 512 , 4 , 4 ] ones[w.shape[0],512,4,4] ones[w.shape[0],512,4,4]
-
对x加noise,bias,relu,pixel_norm
-
style(x,w),得到x,style
-
-
position=None, upsample=True:
-
out_channels=3
-
upsample把图res变大一倍: x : [ x . s h a p e [ 0 ] , i n _ c h a n n e l s , 2 ∗ r e s , 2 ∗ r e s ] x:[x.shape[0],in\_channels,2*res,2*res] x:[x.shape[0],in_channels,2∗res,2∗res]
-
c o n v 2 d : x ∘ w e i g h t ( [ o u t _ c h a n n e l s , i n _ c h a n n e l s , k e r = 3 , k e r = 3 ] ) → [ x . s h a p e [ 0 ] , o u t _ c h a n n e l s , 2 ∗ r e s , 2 ∗ r e s ] conv2d:x\circ weight([out\_channels,in\_channels,ker=3,ker=3])\rightarrow\\ [x.shape[0],out\_channels,2*res,2*res] conv2d:x∘weight([out_channels,in_channels,ker=3,ker=3])→[x.shape[0],out_channels,2∗res,2∗res]
-
再进行blur:对所有图进行相同卷积一次
-
对x加noise,bias,relu,pixel_norm (B+AdaIN+A)
-
style(x,w),得到x,style
-
-
position=None,upsample=None:
- 除去upsample,out_channel=in_channel 重复上述情况
-
position=last:
-
upsample=nn.Indentity()
-
c o n v 2 d : x ∘ w e i g h t ( [ o u t _ c h a n n e l s , i n _ c h a n n e l s , k e r = 1 , k e r = 1 ] ) → [ x . s h a p e [ 0 ] , o u t _ c h a n n e l s , r e s , r e s ] conv2d:x\circ weight([out\_channels,in\_channels,ker=1,ker=1])\rightarrow\\ [x.shape[0],out\_channels,res,res] conv2d:x∘weight([out_channels,in_channels,ker=1,ker=1])→[x.shape[0],out_channels,res,res]
-
x+bias ,return x
-
class StyleModLayer
这层是将mapping layer得到的w与图片融合
- A:先Linear将 w : [ w . s h a p e [ 0 ] , 512 ] → [ w . s h a p e [ 0 ] , o u t _ c h a n n e l s ∗ 2 ] w :[w.shape[0],512]\rightarrow [w.shape[0],out\_channels*2] w:[w.shape[0],512]→[w.shape[0],out_channels∗2]
- 上式 o u t _ c h a n n e l s out\_channels out_channels分为两部分 y 1 , y 2 : [ w . s h a p e [ 0 ] , 1 , o u t _ c h a n n e l s , 1 , 1 ] y_1,y_2:[w.shape[0],1,out\_channels,1,1] y1,y2:[w.shape[0],1,out_channels,1,1]
- 对x进行缩放平移 x = x ∗ ( 1 + y 1 ) + y 2 x=x*(1+y_1)+y_2 x=x∗(1+y1)+y2
class BlurLayer
-
class BlurLayer部分初始化 3 × 3 3\times 3 3×3kernel,再对channels广播:得到 k e r n e l . s h a p e = [ c h a n n e l s , 1 , 3 , 3 ] kernel.shape=[channels,1,3,3] kernel.shape=[channels,1,3,3]
-
传到 class Blur: forward部分实现conv2d: x : [ x . s h a p e [ 0 ] , c h a n n e l s , r e s , r e s ] ∘ k e r n e l → y : [ x . s h a p e [ 0 ] , c h a n n e l s , r e s , r e s ] x:[x.shape[0],channels,res,res]\circ kernel\rightarrow y:[x.shape[0],channels,res,res] x:[x.shape[0],channels,res,res]∘kernel→y:[x.shape[0],channels,res,res]
并在backward部分求梯度torch.autograd.Function使用参考
-
输出y是用同一个kernel
class NoiseApplyingLayer
- 输入 x = [ x . s h a p e [ 0 ] , c h a n n e l s , r e s , r e s ] x=[x.shape[0],channels,res,res] x=[x.shape[0],channels,res,res]
- N o i s e : r a n d n ( 1 , 1 , r e s , r e s ) ∗ w e i g h t ( 1 , c h a n n e l s , 1 , 1 ) Noise:randn(1,1,res,res)*weight(1,channels,1,1) Noise:randn(1,1,res,res)∗weight(1,channels,1,1)# noise对所有样本一致
- parameter: w e i g h t s weights weights
- 输出 x + n o i s e x+noise x+noise
class UpsamplingLayer
- 输入x:[x.shape[0],channels,res,res]
- interpolate插值上采样->[x.shape[0],channels,2res,2res]