【论文阅读笔记】(2021 CVPR)3D Human Action Representation Learning via Cross-View Consistency Pursuit

写在前面

方法部分好多公式变量,编辑器再打一遍好麻烦。。。偷个懒,就直接把这部分的笔记导出成图片咯,并且我按我自己理解比较顺的逻辑重新捋了一下


3D Human Action Representation Learning via Cross-View Consistency Pursuit

(2021 CVPR)

Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, Wenjun Zhang

Notes

1. Contributions

We propose CrosSCLR, a cross-view contrastive learning framework for skeleton-based action representation. First, we develop Contrastive Learning for Skeleton-based action Representation (SkeletonCLR) to learn the singleview representations of skeleton data. Then, we use parallel SkeletonCLR models and CVC-KM to excavate useful samples across views, enabling the model to capture more comprehensive representation unsupervisedly. We evaluate our model on 3D skeleton datasets, e.g., NTU-RGB+D 60/120, and achieve remarkable results under unsupervised settings.

 


2. Method

2.1 SkeletonCLR.

It is a memory-augmented contrastive learning method for skeleton representation, which considers one sample’s different augments as its positive samples and other samples as negative samples. In each training step, the batch embeddings are stored in first-in-first-out memory to get rid of redundant computation, serving as negative samples for the next steps. 

 

2.2 High-confidence Knowledge Mining

We define as the similarity set S among z  and M as

Then, we set the most similar embeddings as positive to make it more clustered:

where Γ = Topk is the function to select the index of top-K similar embeddings and N+ is their index set in memory bank.

2.3 Cross-View Consistency Learning

The views of skeleton can be joint, motion, bone, and motion of bone. We design the cross-view consistency learning which not only mines the high-confidence positive samples from complementary view but also lets the embedding context be consistent in multiple views. It contains two aspects:

 

2.4 Learning CrosSCLR

For more views, CrosSCLR has following objective:

where U  is the number of views and v ≠ u . In the early training process, the model is not stable

and strong enough to provide reliable cross-view knowledge without the supervision of labels. As the unreliable information may lead astray, it is not encouraged to enable cross-view communication too early. We perform two-stage training for CrosSCLR:

1) each view of the model is individually trained with Equation (1) without cross-view com- munication.

2) then the model can supply high-confidence knowledge, so the loss function is replaced with Equation (8), starting cross-view knowledge mining.

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值