论文阅读：Weakly-supervised Action Localization with Background Modeling

最新推荐文章于 2024-09-01 17:03:47 发布

仙草冻奶茶

最新推荐文章于 2024-09-01 17:03:47 发布

阅读量879

点赞数

分类专栏：论文

本文链接：https://blog.csdn.net/qq_40760171/article/details/108320498

版权

18 篇文章

订阅专栏

摘要

使用一个attention model以提取前景和背景以显示建模外观。
结合bottom-up，class-agnostic attention modules与top-down class-specific activation map。并应用自监督。

视频级别的foreground feature $X_{fg}$ (加权求和):

$λ_t∈[0,1]$ ，且 $λ_t = Ω(x_t)$ ，函数由两个 $f c$ 组成，第一个带有 $R e L U$ ，第二个带有 $s i g m o i d$ 。 $λ_t∈R^d$ ，因此得到的 $X_{fg}∈R^d$
视频级别的预测：

其中 $w_c∈R^d$ 。
计算损失：
Background-Aware Loss
背景特征及背景预测：

计算损失（鼓励在背景索引上的预测接近1，即鼓励参数 $w$ 学会区分背景）：
Self-guided Attention Loss
原因： $λ_t$ 是一个自下而上的，与类无关的attention，它可能会响应一些一般线索，如大型的肢体动作，而不会具体到一些特定的行为。而TCAM可以提取自顶向下的attentional线索。因此，提出类特定的TCAM attention map作为自监督重新微调 $λ_t$ ：
Foreground-background Clustering Loss
考虑了一个完全由视频特征和关注度λ定义的自下而上的损失，鼓励分类器对前景或背景特征响应强烈。
总损失

在这里插入图片描述