Temporal-Relational CrossTransformers for Few-Shot Action Recognition
一作:Toby Perrett 主页介绍
作者之前主要做LSTM、元学习;这篇文章也很快就开源了,开源地址如下,作者很热心,回复很耐心。
Github源码
Abstract
Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared.
我们主要关注这两句话;首先指出了和以前的小样本学习方法的不同,然后提出解决了什么样的问题。
- 观察了所有支持集视频的相关子序列--------而不是 类平均值or单个最佳匹配值(之前的方法)
- 视频表示由不同数量帧的有序元组构成,可以比较不同速度和时间偏移下的动作子序列
Introduction
We propose a novel approach to few-shot action recognition, which we term Temporal-Relational CrossTransformers (TRX). A query-specific class prototype is constructed by using an attention mechanism to match each query sub-sequence against all sub-sequences in the support set, and aggregating this evidence. By performing the attention operation over temporally-ordered sub-sequences rather than