【论文阅读笔记】《Deep Imbalanced Attribute Classi cation using Visual Attention Aggregation》-行人属性

最新推荐文章于 2024-05-09 20:01:13 发布

花噜噜酱

最新推荐文章于 2024-05-09 20:01:13 发布

阅读量1.6k

点赞数

分类专栏： cv论文阅读笔记文章标签：行人属性

本文链接：https://blog.csdn.net/weixin_38715903/article/details/100524915

版权

cv论文阅读笔记专栏收录该内容

10 篇文章 2 订阅

订阅专栏

1.网络结构：

2.损失函数：两个部分

一句话：文章主要是提出了一个可移植的行人属性的网络架构，以及loss的计算方式

【利用到了多尺度和focal loss思想】

我没用过mxnet，如果有错麻烦告知，谢谢

GitHub：https://github.com/cvcode18/imbalanced_learning

论文：Deep Imbalanced Attribute Classification using Visual Attention Aggregation

1.网络结构：

网络结构如下所示：由一个主网络和多个分网络组成(分网络的个数与你利用的特征层数相关，图上是两个，但是源码中有三个)

主网络：应该可以任意吧，源码中使用了ResNet结构【DensNet网络的话--他们做实验测试过】

分网络：采用注意力机制，结构如下所示：

可以看到有两个分支，一个是权重系数，一个是spatially normalized

源码定义：

一、训练部分
"""model.py"""
def get_conv2D(num_classes, stride, ctx):
    net = nn.Sequential()
    with net.name_scope():
        net.add(nn.Conv2D(channels=num_classes, kernel_size=1, strides=stride))
        net.add(nn.Activation('sigmoid'))
    net.collect_params().initialize(mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2), ctx=ctx)
    return net

def get_fatt(num_classes, stride, ctx):
    net = nn.Sequential()
    with net.name_scope():
        net.add(nn.Conv2D(channels=512, kernel_size=1))
        net.add(nn.BatchNorm())
        net.add(nn.Activation('relu'))
        net.add(nn.Conv2D(channels=512, kernel_size=3, padding=1))
        net.add(nn.BatchNorm())
        net.add(nn.Activation('relu'))
        # net.add(nn.Conv2D(channels=512, kernel_size=3, padding=1))
        # net.add(nn.BatchNorm())
        # net.add(nn.Activation('relu'))
        net.add(nn.Conv2D(channels=num_classes, kernel_size=1, strides=stride))
    net.collect_params().initialize(mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2), ctx=ctx)
    return net

"""attention.py"""
def attention_net_trainer(lr_scheduler, classes, args, stride, ctx):
    fconv_stg = get_conv2D(classes, stride, ctx)
    fatt_stg = get_fatt(classes, stride, ctx)

    trainer_conv, trainer_att = [], []
    if not args.test:
        trainer_conv = gluon.Trainer(fconv_stg.collect_params(), optimizer='sgd',
                                     optimizer_params={'lr_scheduler': lr_scheduler,
                                                       'momentum': args.mom,
                                                       'wd': args.wd})

        trainer_att = gluon.Trainer(fatt_stg.collect_params(), optimizer='sgd',
                                    optimizer_params={'lr_scheduler': lr_scheduler,
                                                      'momentum': args.mom,
                                                      'wd': args.wd})

    return fconv_stg, fatt_stg, trainer_conv, trainer_att

2.损失函数：两个部分

主网络正常的使用Focal loss：

class WeightedFocal(Loss):#Focal loss
    def __init__(self, from_sigmoid=False, weight=None, batch_axis=0, **kwargs):
        super(WeightedFocal, self).__init__(weight, batch_axis, **kwargs)
        self._from_sigmoid = from_sigmoid

    def hybrid_forward(self, F, pred, label, sample_weight=None):
        label = _reshape_like(F, label, pred)
        if not self._from_sigmoid:
            max_val = F.relu(-pred)
            loss = pred - pred * label + max_val + F.log(F.exp(-max_val) + F.exp(-pred - max_val))
        else:
            p = mx.nd.array(1 / (1 + nd.exp(-pred)), ctx=ctx)
            weights = nd.exp(label + (1 - label * 2) * batch_ratios)
            gamma = 2
            w_p, w_n = nd.power(1. - p, gamma), nd.power(p, gamma)
            loss = - (w_p * F.log(p + 1e-12) * label + w_n * F.log(1. - p + 1e-12) * (1. - label))
            loss *= weights
        return F.mean(loss, axis=self._batch_axis, exclude=True)

分网络复杂一点吧：

看公式我是没看懂。。在论文里得到一点信息--不是简单的使用最后应该输出的分类标签作为GT，而是一个动态存储比对的过程

公式部分一：

公式部分二：

来看看代码：

"""定义了attention结构的loss"""
class AttHistory(Loss):
    def __init__(self, from_sigmoid=False, weight=None, batch_axis=0, **kwargs):
        super(AttHistory, self).__init__(weight, batch_axis, **kwargs)
        self._from_sigmoid = from_sigmoid

    def hybrid_forward(self, F, pred, label, sample_weight=None):
        label = _reshape_like(F, label, pred)
        if not self._from_sigmoid:
            max_val = F.relu(-pred)
            loss = pred - pred * label + max_val + F.log(F.exp(-max_val) + F.exp(-pred - max_val))
        else:
            p = mx.nd.array(1 / (1 + nd.exp(-pred)), ctx=ctx)
"""训练过程中，当训练epoch大于指定次数时，改变loss的计算方式"""
            if epoch >= history_track and not args.test:
"""公式部分一"""
                p_hist = prediction_history[:, batch_id * args.batch_size: (batch_id + 1) * args.batch_size, :]
                p_std = (np.var(p_hist, axis=0) + (np.var(p_hist, axis=0)**2)/(p_hist.shape[0] - 1))**.5
                std_weights = nd.array(1 + p_std, ctx=ctx)
"""公式部分二"""
                loss = - std_weights * (F.log(p + 1e-12) * label + F.log(1. - p + 1e-12) * (1. - label))
            else:
"""没有使用动态更新前的loss计算公式"""
                loss = - (F.log(p + 1e-12) * label + F.log(1. - p + 1e-12) * (1. - label))
        return F.mean(loss, axis=self._batch_axis, exclude=True)

"""首先设定两个值：history_track--在上面类的定义中用到，start_history --在下面动态存储结果时用到了"""
history_track, start_history = 5, 2

"""然后设定存储预测结果的空间--prediction_history：[5,22968,C]，存储五次"""
prediction_history = nd.zeros((history_track, 22968, args.num_classes), ctx=ctx)

....
....
....



"""将每个尺度得到的结果相加平均作为最终的预测"""

all_stages = {}
for stage in range(2, 5):
    if stage == 2:
        inp_feats = net_features_stg3_v1
    elif stage == 3:
        inp_feats = net_features_stg3
    else:
        inp_feats = net_features_stg4

    features = stage_attentions['stage_' + str(stage)][0](inp_feats)
    output_att = stage_attentions['stage_' + str(stage)][1](inp_feats)

    temp_f = nd.reshape(output_att, (output_att.shape[0] * output_att.shape[1], output_att.shape[2] * output_att.shape[3]))
    spatial_attention = nd.reshape(nd.softmax(temp_f), (output_att.shape[0], output_att.shape[1], output_att.shape[2], output_att.shape[3]))

    attention_features = spatial_attention*features
    all_stages['stage_' + str(stage)] = stages['stage_' + str(stage)](attention_features)


predictions = expit(.25*(sum(all_stages.values()) + output).asnumpy())


...
...
...

"""当epoch的次数大于我们设定开始存储结果的次数，就更新预测结果存储空间"""
if epoch >= start_history:
"""prediction_history[1:,]=prediction_history[0：-1]--删掉最早存储的内容，给将要存储的内容腾出空间"""
                prediction_history[1:, batch_id * args.batch_size:(batch_id + 1) * args.batch_size] = prediction_history[0:-1, batch_id * args.batch_size:(batch_id + 1) * args.batch_size]
"""存入当前epoch的预测结果"""
                prediction_history[0, batch_id * args.batch_size:(batch_id + 1) * args.batch_size] = predictions

最后的话是将得到的所有loss值相加更新网络：

.....
.....
        if stage == 2:
           loss = attention_loss(all_stages['stage_' + str(stage)], label)
        else:
           loss = loss + attention_loss(all_stages['stage_' + str(stage)], label)

    if not args.finetune:
        loss_original = sigmoid_loss(output, label)
        loss = loss + loss_original
loss.backward()
for stage in range(2, 5):
    stage_trainers['stage_' + str(stage)].step(data.shape[0])
    stage_attentions['stage_' + str(stage)][2].step(data.shape[0])
    stage_attentions['stage_' + str(stage)][3].step(data.shape[0])

if not args.finetune:
    trainer.step(data.shape[0])

curr_loss = nd.mean(loss).asscalar()
moving_loss_tr = (curr_loss if ((batch_id == 0) and (epoch == 0))
                              else (1 - smoothing_constant) * moving_loss_tr + smoothing_constant * curr_loss)

最终的输出结果就是几个分网络和主网络结果相加的均值。

花噜噜酱

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
5
评论
【论文阅读笔记】《Deep Imbalanced Attribute Classi cation using Visual Attention Aggregation》-行人属性

目录1.网络结构：2.损失函数：两个部分一句话：文章主要是提出了一个可移植的行人属性的网络架构，以及loss的计算方式【利用到了多尺度和focal loss思想】我没用过mxnet，如果有错麻烦告知，谢谢GitHub：https://github.com/cvcode18/imbalanced_learning论文：Deep Imbalanced Attribute C...
复制链接

扫一扫

专栏目录