【论文阅读笔记】《Deep Imbalanced Attribute Classi cation using Visual Attention Aggregation》-行人属性

目录

1.网络结构:

2.损失函数:两个部分


一句话:文章主要是提出了一个可移植的行人属性的网络架构,以及loss的计算方式

【利用到了多尺度和focal loss思想】

我没用过mxnet,如果有错麻烦告知,谢谢

GitHub:https://github.com/cvcode18/imbalanced_learning

论文:Deep Imbalanced Attribute Classification using Visual Attention Aggregation

1.网络结构:

网络结构如下所示:由一个主网络和多个分网络组成(分网络的个数与你利用的特征层数相关,图上是两个,但是源码中有三个)

主网络:应该可以任意吧,源码中使用了ResNet结构【DensNet网络的话--他们做实验测试过】

分网络:采用注意力机制,结构如下所示:

可以看到有两个分支,一个是权重系数,一个是spatially normalized

源码定义:

一、训练部分
"""model.py"""
def get_conv2D(num_classes, stride, ctx):
    net = nn.Sequential()
    with net.name_scope():
        net.add(nn.Conv2D(channels=num_classes, kernel_size=1, strides=stride))
        net.add(nn.Activation('sigmoid'))
    net.collect_params().initialize(mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2), ctx=ctx)
    return net

def get_fatt(num_classes, stride, ctx):
    net = nn.Sequential()
    with net.name_scope():
        net.add(nn.Conv2D(channels=512, kernel_size=1))
        net.add(nn.BatchNorm())
        net.add(nn.Activation('relu'))
        net.add(nn.Conv2D(channels=512, kernel_size=3, padding=1))
        net.add(nn.BatchNorm())
        net.add(nn.Activation('relu'))
        # net.add(nn.Conv2D(channels=512, kernel_size=3, padding=1))
        # net.add(nn.BatchNorm())
        # net.add(nn.Activation('relu'))
        net.add(nn.Conv2D(channels=num_classes, kernel_size=1, strides=stride))
    net.collect_params().initialize(mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2), ctx=ctx)
    return net

"""attention.py"""
def attention_net_trainer(lr_scheduler, classes, args, stride, ctx):
    fconv_stg = get_conv2D(classes, stride, ctx)
    fatt_stg = get_fatt(classes, stride, ctx)

    trainer_conv, trainer_att = [], []
    if not args.test:
        trainer_conv = gluon.Trainer(fconv_stg.collect_params(), optimizer='sgd',
                                     optimizer_params={'lr_scheduler': lr_scheduler,
                                                       'momentum': args.mom,
                                                       'wd': args.wd})

        trainer_att = gluon.Trainer(fatt_stg.collect_params(), optimizer='sgd',
                                    optimizer_params={'lr_scheduler': lr_scheduler,
                                                      'momentum': args.mom,
                                                      'wd': args.wd})

    return fconv_stg, fatt_stg, trainer_conv, trainer_att

2.损失函数:两个部分

主网络正常的使用Focal loss:

class WeightedFocal(Loss):#Focal loss
    def __init__(self, from_sigmoid=False, weight=None, batch_axis=0, **kwargs):
        super(WeightedFocal, self).__init__(weight, batch_axis, **kwargs)
        self._from_sigmoid = from_sigmoid

    def hybrid_forward(self, F, pred, label, sample_weight=None):
        label = _reshape_like(F, label, pred)
        if not self._from_sigmoid:
            max_val = F.relu(-pred)
            loss = pred - pred * label + max_val + F.log(F.exp(-max_val) + F.exp(-pred - max_val))
        else:
            p = mx.nd.array(1 / (1 + nd.exp(-pred)), ctx=ctx)
            weights = nd.exp(label + (1 - label * 2) * batch_ratios)
            gamma = 2
            w_p, w_n = nd.power(1. - p, gamma), nd.power(p, gamma)
            loss = - (w_p * F.log(p + 1e-12) * label + w_n * F.log(1. - p + 1e-12) * (1. - label))
            loss *= weights
        return F.mean(loss, axis=self._batch_axis, exclude=True)

分网络复杂一点吧:

看公式我是没看懂。。在论文里得到一点信息--不是简单的使用最后应该输出的分类标签作为GT,而是一个动态存储比对的过程

公式部分一:

公式部分二:

来看看代码:

"""定义了attention结构的loss"""
class AttHistory(Loss):
    def __init__(self, from_sigmoid=False, weight=None, batch_axis=0, **kwargs):
        super(AttHistory, self).__init__(weight, batch_axis, **kwargs)
        self._from_sigmoid = from_sigmoid

    def hybrid_forward(self, F, pred, label, sample_weight=None):
        label = _reshape_like(F, label, pred)
        if not self._from_sigmoid:
            max_val = F.relu(-pred)
            loss = pred - pred * label + max_val + F.log(F.exp(-max_val) + F.exp(-pred - max_val))
        else:
            p = mx.nd.array(1 / (1 + nd.exp(-pred)), ctx=ctx)
"""训练过程中,当训练epoch大于指定次数时,改变loss的计算方式"""
            if epoch >= history_track and not args.test:
"""公式部分一"""
                p_hist = prediction_history[:, batch_id * args.batch_size: (batch_id + 1) * args.batch_size, :]
                p_std = (np.var(p_hist, axis=0) + (np.var(p_hist, axis=0)**2)/(p_hist.shape[0] - 1))**.5
                std_weights = nd.array(1 + p_std, ctx=ctx)
"""公式部分二"""
                loss = - std_weights * (F.log(p + 1e-12) * label + F.log(1. - p + 1e-12) * (1. - label))
            else:
"""没有使用动态更新前的loss计算公式"""
                loss = - (F.log(p + 1e-12) * label + F.log(1. - p + 1e-12) * (1. - label))
        return F.mean(loss, axis=self._batch_axis, exclude=True)

"""首先设定两个值:history_track--在上面类的定义中用到,start_history --在下面动态存储结果时用到了"""
history_track, start_history = 5, 2

"""然后设定存储预测结果的空间--prediction_history:[5,22968,C],存储五次"""
prediction_history = nd.zeros((history_track, 22968, args.num_classes), ctx=ctx)

....
....
....



"""将每个尺度得到的结果相加平均作为最终的预测"""

all_stages = {}
for stage in range(2, 5):
    if stage == 2:
        inp_feats = net_features_stg3_v1
    elif stage == 3:
        inp_feats = net_features_stg3
    else:
        inp_feats = net_features_stg4

    features = stage_attentions['stage_' + str(stage)][0](inp_feats)
    output_att = stage_attentions['stage_' + str(stage)][1](inp_feats)

    temp_f = nd.reshape(output_att, (output_att.shape[0] * output_att.shape[1], output_att.shape[2] * output_att.shape[3]))
    spatial_attention = nd.reshape(nd.softmax(temp_f), (output_att.shape[0], output_att.shape[1], output_att.shape[2], output_att.shape[3]))

    attention_features = spatial_attention*features
    all_stages['stage_' + str(stage)] = stages['stage_' + str(stage)](attention_features)


predictions = expit(.25*(sum(all_stages.values()) + output).asnumpy())


...
...
...

"""当epoch的次数大于我们设定开始存储结果的次数,就更新预测结果存储空间"""
if epoch >= start_history:
"""prediction_history[1:,]=prediction_history[0:-1]--删掉最早存储的内容,给将要存储的内容腾出空间"""
                prediction_history[1:, batch_id * args.batch_size:(batch_id + 1) * args.batch_size] = prediction_history[0:-1, batch_id * args.batch_size:(batch_id + 1) * args.batch_size]
"""存入当前epoch的预测结果"""
                prediction_history[0, batch_id * args.batch_size:(batch_id + 1) * args.batch_size] = predictions

最后的话是将得到的所有loss值相加更新网络:

.....
.....
        if stage == 2:
           loss = attention_loss(all_stages['stage_' + str(stage)], label)
        else:
           loss = loss + attention_loss(all_stages['stage_' + str(stage)], label)

    if not args.finetune:
        loss_original = sigmoid_loss(output, label)
        loss = loss + loss_original
loss.backward()
for stage in range(2, 5):
    stage_trainers['stage_' + str(stage)].step(data.shape[0])
    stage_attentions['stage_' + str(stage)][2].step(data.shape[0])
    stage_attentions['stage_' + str(stage)][3].step(data.shape[0])

if not args.finetune:
    trainer.step(data.shape[0])

curr_loss = nd.mean(loss).asscalar()
moving_loss_tr = (curr_loss if ((batch_id == 0) and (epoch == 0))
                              else (1 - smoothing_constant) * moving_loss_tr + smoothing_constant * curr_loss)

最终的输出结果就是几个分网络和主网络结果相加的均值。

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值