Self-Supervised Learning for Semi-Supervised Temporal Action Proposal-- CVPR 2021 论文阅读

这篇论文,作者通过使用自监督方法来提升半监督的时间动作定位

  • temporal-aware semi-supervised branch
  • relation-aware self-supervised branch

semi-supervised branch:

Mean teacher

temporal feature shift

temporal feature flip

self-supervised branch:

masked feature reconstruction

clip-order prediction

 

Temporal-aware Semi-Supervised Branch

Teacher-student:

teacher模型使用EMA更新策略

对于labeled data,使用supervised loss

Unlabeled data,使用consistency loss (L2-loss)

 

对student的输入特征进行特征扰动

temporal feature shift:在feature map上沿着时间维度随机选择一些通道进行双向移动,作者随机选择μ通道(μ是超参数,μ/2的通道向前移动,μ/2的通道向后移动)

temporal feature flip:对特征进行水平翻转(作者说这样使得原始的proposal与翻转的视频特征之间能够很容易地一一对应,这里其实不太看明白为什么会对应)

 

Relation-aware Self-Supervised Branch

Masked feature reconstruction:随机mask掉一些时刻的特征,然后重构

Clip-order prediction:在随机打乱的feature map上预测正确的时间序列,在这篇论文中,作者使用两个随机打乱的特征序列进行重新排序

 

Experiments

与半监督方法对比

 

ablation

其中,-F为去掉F

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
作者:Xiaohang Zhan,Ziwei Liu,Ping Luo,Xiaoou Tang,Chen Change Loy 摘要:Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g. ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g. image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a "mix-and-match" (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the "mix" stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A "match" stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值