行人重识别0-08：DG-Net(ReID)-代码无死角解读(4)-网络Es编码解码过程

江南才尽，年少无知！

已于 2022-07-16 19:46:47 修改

阅读量2.4k

点赞数 10

分类专栏： # 行人重识别文章标签： DG-Net GANS ReID 行人重识别

于 2019-10-11 12:50:39 首次发布

本文链接：https://blog.csdn.net/weixin_43013761/article/details/102496205

版权

行人重识别专栏收录该内容

30 篇文章 65 订阅

订阅专栏

以下链接是个人关于DG-Net(行人重识别ReID)所有见解，如有错误欢迎大家指出，我会第一时间纠正。有兴趣的朋友可以加微信：17575010159 相互讨论技术。若是帮助到了你什么，一定要记得点赞！因为这是对我最大的鼓励。 $\color{blue}{文末附带}$ $\color{blue}{公众号 -}$ $\color{blue}{ 海量资源}。$

行人重识别0-08：DG-GAN(行人重识别ReID)-目录-史上最新最全：https://blog.csdn.net/weixin_43013761/article/details/102364512

$\color{red}{极度推荐的商业级项目：}$ 这是本人落地的行为分析项目，主要包含（1.行人检测，2.行人追踪，3.行为识别三大模块）：行为分析(商用级别)00-目录-史上最新无死角讲解

Es的编码以及解码过程总览

从上篇博客的注释中可以看到，Es的解码和编码过程，主要集中在E:\1.PaidOn\5.ReID\1.DG-Net\1.DG-Net-master\networks.py：

        # 注意这里包含了两个步骤，Es编码+解码过程，既然解码（论文Figure 2的黄色梯形G）包含到这里了，下面Ea应该不会包含解码过程了
        # 因为这里是一个类，如后续gen_a.encode()可以进行编码，gen_b.encode()可以进行解码
		self.gen_a = AdaINGen(hyperparameters['input_dim_a'], hyperparameters['gen'], fp16 = False)  # auto-encoder for domain a

既然这样，我们就来看看AdaINGen这个类（先把注释浏览一遍，内部函数的解释后面有讲解）。

class AdaINGen(nn.Module):
    # AdaIN auto-encoder architecture
    def __init__(self, input_dim, params, fp16):
        super(AdaINGen, self).__init__()

        # 这个是生成器最后一层 filters的数目，默认为16
        dim = params['dim']

        # 进行编码下采样的层数，默认设置为2
        n_downsample = params['n_downsample']

        # 跳跃链接的层数
        n_res = params['n_res']

        # 使用激活函数，默认设置为rule
        activ = params['activ']

        # 填补的类型，如补零
        pad_type = params['pad_type']

        # 默认为512 filters，mlp表示的是全链接层
        mlp_dim = params['mlp_dim']
        # 该处默认为None，估计是全连接层正则化的类型
        mlp_norm = params['mlp_norm']

        # length of appearance code，把图片编译成ap code的长度，默认为2048
        id_dim = params['id_dim']

        # 注释为basic，偏置吗？表示很懵逼，可选 # [basic/parallel/series]，默认为basic
        which_dec = params['dec']

        # 为了防止过拟合，默认为0
        dropout = params['dropout']

        # 最后一层的激活函数
        tanh = params['tanh']
        # 非本地层的数目，不知道什么东西，默认设置为0
        non_local = params['non_local']

        # content encoder
        # 内容编码，把图片经过ContentEncoder(Es) 编码成st code的过程，进去看看就明白了，论文是这样描述Es的：
        # Es是一个输出st code（128x64x32）的浅层网络，他由四个卷积层然后接着4个跳跃连接块组成，进去查看的确如此
        # 这里创建出来的仅仅是一个类，还没有进行前向传播。input_dim为1，应该是送入的为灰度图，指示通道数
        self.enc_content = ContentEncoder(n_downsample, n_res, input_dim, dim, 'in', activ, pad_type=pad_type, dropout=dropout, tanh=tanh, res_type='basic')


        # 获得解码器输出的特征向量的维度，self.output_dim为128维度（其实是解码器输入的维度=编码器输出的维度），请看论文Figure 2 ，其中黄色的梯形，
        # 解码过程也就是一些列的卷积，我就不详细注释了，
        # 论文中是这样说的：G（图中黄色梯形）对s处理的时候，是4个卷积加四个跳跃连接块，
        # 其中每个跳跃连接块中都包含了两个instance normalization层，将a当作可以缩放的偏置参数。
        # self.dec：表示ap code和st code(合成，也可没有合成)结合的latent code经过G得到合成图片，就是解码之后的
        # 注意，这里也仅仅是创建了类，没有网络搭建和前向传播
        self.output_dim = self.enc_content.output_dim
        if which_dec =='basic':        
            self.dec = Decoder(n_downsample, n_res, self.output_dim, 3, dropout=dropout, res_norm='adain', activ=activ, pad_type=pad_type, res_type='basic', non_local = non_local, fp16 = fp16)
        elif which_dec =='slim':
            self.dec = Decoder(n_downsample, n_res, self.output_dim, 3, dropout=dropout, res_norm='adain', activ=activ, pad_type=pad_type, res_type='slim', non_local = non_local, fp16 = fp16)
        elif which_dec =='series':
            self.dec = Decoder(n_downsample, n_res, self.output_dim, 3, dropout=dropout, res_norm='adain', activ=activ, pad_type=pad_type, res_type='series', non_local = non_local, fp16 = fp16)
        elif which_dec =='parallel':
            self.dec = Decoder(n_downsample, n_res, self.output_dim, 3, dropout=dropout, res_norm='adain', activ=activ, pad_type=pad_type, res_type='parallel', non_local = non_local, fp16 = fp16)
        else:
            ('unkonw decoder type')

        # MLP to generate AdaIN parameters
        # 全链接层去产生AdaIN的参数，这里的输出mlp_dim=512
        # 我们可以看到w和b的维度是相同的，这个大家要知道输入id_dim=1024，输出为128*2=256
        self.mlp_w1 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)
        self.mlp_w2 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)
        self.mlp_w3 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)
        self.mlp_w4 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)
        
        self.mlp_b1 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)
        self.mlp_b2 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)
        self.mlp_b3 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)
        self.mlp_b4 = MLP(id_dim, 2*self.output_dim, mlp_dim, 3, norm=mlp_norm, activ=activ)

        # 权重初始化
        self.apply(weights_init(params['init']))

    # 调用该函数，会对输入图像进行论文中Es编码
    def encode(self, images):
        # encode an image to its content and style codes
        content = self.enc_content(images)
        return content

    # 调用该函数，会进行论文的解码操作，即论文Figure 2中的decoder
    def decode(self, content, ID):
        """
        :param content: 经过Es得到的st code
        :param ID: 经过Ea得到的ap code
        :return: 合成的图片
        """
        # decode style codes to an image
        ID1 = ID[:,:2048]
        ID2 = ID[:,2048:4096]
        ID3 = ID[:,4096:6144]
        ID4 = ID[:,6144:]

        # adain_params[batch_size, 1024]=[batch_size,256*4]
        # 这里不知道为什么把输入的ap code分成4个部分进行全连接，然后得到四个256维的
        adain_params_w = torch.cat( (self.mlp_w1(ID1), self.mlp_w2(ID2), self.mlp_w3(ID3), self.mlp_w4(ID4)), 1)
        adain_params_b = torch.cat( (self.mlp_b1(ID1), self.mlp_b2(ID2), self.mlp_b3(ID3), self.mlp_b4(ID4)), 1)

        #分配合适的参数，同时完成ap code和st code的统一   
        self.assign_adain_params(adain_params_w, adain_params_b, self.dec)
        
        # 这里就是进行解码（前面仅仅创建了类）了，也就是论文中的G操作，然后返回获得的图片,
        images = self.dec(content)

    def assign_adain_params(self, adain_params_w, adain_params_b, model):
        # assign the adain_params to the AdaIN layers in model
        dim = self.output_dim
        for m in model.modules():
            if m.__class__.__name__ == "AdaptiveInstanceNorm2d":
                mean = adain_params_b[:,:dim].contiguous()
                std = adain_params_w[:,:dim].contiguous()
                m.bias = mean.view(-1)
                m.weight = std.view(-1)
                if adain_params_w.size(1)>dim :  #Pop the parameters
                    adain_params_b = adain_params_b[:,dim:]
                    adain_params_w = adain_params_w[:,dim:]

    def get_num_adain_params(self, model):
        # return the number of AdaIN parameters needed by the model
        num_adain_params = 0
        for m in model.modules():
            if m.__class__.__name__ == "AdaptiveInstanceNorm2d":
                num_adain_params += m.num_features
        return num_adain_params

因为这里创建了两个对象，以为Es（ContentEncoder-得到st copde），一个为G（Decoder-得到合成图片）。下面我们来详细分析这两个类。

Es编码解析

# 请大家注意一下，这是函数第一次调用的注释，后面的调用
class ContentEncoder(nn.Module):
    def __init__(self, n_downsample, n_res, input_dim, dim, norm, activ, pad_type, dropout, tanh=False, res_type='basic'):
        """
        :param n_downsample: 经过卷积下采样的次数，默认为2
        :param n_res: 跳跃链接网络的层数，默认为4
        :param input_dim:输入数据的维度，默认为1
        :param dim:默认为32，输出的为维度把，应该为filter数目
        :param norm:正则化的方式，可选[none/bn/in/ln]，这里选择为in
        :param activ:激活函数，这里选择的是lrelu
        :param pad_type:填补的方式，默认为'reflect'
        :param dropout:默认为0，前面提到
        :param tanh: 默认为false，表示关闭
        :param res_type:这个暂时不知道啥玩意
        """
        super(ContentEncoder, self).__init__()
        self.model = []
        # Here I change the stride to 2.
        # 这个就不注释了，无非就是卷积，池化，正则化，激活等，。
        # 注意的是这里是创建类，选择了激活函数等，但是没有搭建前向传播网络结构
        self.model += [Conv2dBlock(input_dim, dim, 3, 2, 1, norm=norm, activation=activ, pad_type=pad_type)]
        self.model += [Conv2dBlock(dim, 2*dim, 3, 1, 1, norm=norm, activation=activ, pad_type=pad_type)]

        dim *=2 # 32dim
        # downsampling blocks
        # 两次下采样的操作
        for i in range(n_downsample-1):
            self.model += [Conv2dBlock(dim, dim, 3, 1, 1, norm=norm, activation=activ, pad_type=pad_type)]
            self.model += [Conv2dBlock(dim, 2 * dim, 3, 2, 1, norm=norm, activation=activ, pad_type=pad_type)]
            dim *= 2
        # residual blocks
        self.model += [ResBlocks(n_res, dim, norm=norm, activation=activ, pad_type=pad_type, res_type=res_type)]
        # 64 -> 128，ASPP是网络结构，类似于VGG那种
        self.model += [ASPP(dim, norm=norm, activation=activ, pad_type=pad_type)]
        dim *= 2
        # 最后层层是否添加Tanh激活
        if tanh:
            self.model +=[nn.Tanh()]
        # 前面都是一个链表self.model = []，应该是通过该函数把他们链接起来
        self.model = nn.Sequential(*self.model)

        # 输出维度为128
        self.output_dim = dim

    # x[torch.Size([batch_size, 1, 256, 128])]
    def forward(self, x):
        return self.model(x)

其实大家看了之后，也没什么特别的，一句话解释就是，输入batch_size大小的图像[batch_size, 1, 256, 128]的图像，然后然后[128,64,32]的特征向量，这个特征向量主要包含了图片的姿态+头发+脸型+背景信息。

论文G(decode)操作

既然编码已经分析完成了，我们看看解码这个类Decoder，其实现过程如下：

class Decoder(nn.Module):
    def __init__(self, n_upsample, n_res, dim, output_dim, dropout=0, res_norm='adain', activ='relu', pad_type='zero', res_type='basic', non_local=False, fp16 = False):
        super(Decoder, self).__init__()
        self.input_dim = dim
        self.model = []
        self.model += [nn.Dropout(p = dropout)]
        self.model += [ResBlocks(n_res, dim, res_norm, activ, pad_type=pad_type, res_type=res_type)]
        # non-local
        if non_local>0:
            self.model += [NonlocalBlock(dim)]
            print('use non-local!')
        for i in range(n_upsample):
            self.model += [nn.Upsample(scale_factor=2),
                           Conv2dBlock(dim, dim // 2, 5, 1, 2, norm='ln', activation=activ, pad_type=pad_type, fp16 = fp16)]
            dim //= 2
        # use reflection padding in the last conv layer
        self.model += [Conv2dBlock(dim, dim, 3, 1, 1, norm='none', activation=activ, pad_type=pad_type)]
        self.model += [Conv2dBlock(dim, dim, 3, 1, 1, norm='none', activation=activ, pad_type=pad_type)]
        self.model += [Conv2dBlock(dim, output_dim, 1, 1, 0, norm='none', activation='none', pad_type=pad_type)]
        self.model = nn.Sequential(*self.model)

    # x[batch_size, 128, 64, 32]
    def forward(self, x):
        output = self.model(x)
        return output

这里就不为大家解释怎么卷积的了，论文是这么说的：是4个卷积加四个跳跃连接块，其中每个跳跃连接块中都包含了两个instance normalization层，将a当作可以缩放的偏置参数。

大家只要注意到，该对象在下面函数被调用：

def decode(self, content, ID):
	......
	images = self.dec(content)
	return images

在初始化的时候，Decoder就拿到了ap code[衣服+鞋子+手机+包包等]，大家注意下，这里我说的ap code是经过身份信息份离的ap code，也就是不包含身份信息，只存在[衣服+鞋子+手机+包包等]，然后再调用images = self.dec(content)是添加了st code，得到合成的图片。

到这里我们，我们就知道Es的编码，以及网络解码过程。前面说拿到了ap code，那么ap code是怎么产生的，产生之后又是如何把[衣服+鞋子+手机+包包等]与[身份信息]进行分离的，下篇博客为大家讲解Ea编码器。也就是图片生成ap code的，以及信息分离的过程。同时也是论文的第一个重难点。

在这里插入图片描述

江南才尽，年少无知！

关注

10
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
1
评论
行人重识别0-08：DG-Net(ReID)-代码无死角解读(4)-网络Es编码解码过程

以下链接是个人关于DG-Net(行人重识别ReID)所有见解，如有错误欢迎大家指出，我会第一时间纠正。有兴趣的朋友可以加微信：a944284742相互讨论技术。若是帮助到了你什么，一定要记得点赞！因为这是对我最大的鼓励。GANS的世界2-0：DG-GAN(行人重识别ReID)-目录-史上最新最全：https://blog.csdn.net/weixin_43013761/article/deta......
复制链接

扫一扫