1.Dense-Captioning Events in Videos
× 端到端× DAPs 利用相邻事件上下文的字幕生成网络
2.Jointly Localizing and Describing Events for Dense Video Captioning
× 端到端√ 事件/背景分类,
时间坐标回归 基于属性增强和强化学习的LSTM 创新点:描述性回归组件用于事件提议和字幕生成
3.Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
√ 端到端× 双向SST 具有上下文门控的双向注意力
融合的字幕生成 创新点:上下文门控机制的双向注意力融合
4.End-to-End Dense Video Captioning with Masked Transformer
√ 端到端√ ProcNets Masked Transformer 创新点:使用变压器进行字幕生成
5.Joint Event Detection and Description in Continuous Video Streams
√ 端到端√ R-C3D 分层字幕模块,低级字幕LSTM
和高级控制器LSTM 创新点:分层字幕模块
6.Streamlined Dense Video Captioning
× 端到端× SST 分