目录
- [1] Bi-directional Relationship Inferring Network for Referring Image Segmentation
- [2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
- [3] Vision-Dialog Navigation by Exploring Cross-modal Memory
- [4] VQA with No Questions-Answers Training
- [5] Referring Image Segmentation via Cross-Modal Progressive Comprehension
- [6] Local-Global Video-Text Interactions for Temporal Grounding
- [7] Hypergraph Attention Networks for Multimodal Learning
- 总结
[1] Bi-directional Relationship Inferring Network for Referring Image Segmentation
- 卢湖川老师
- 已有方法:语言->视觉,没有视觉->语言。(->:指导)
[2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
- 北航刘偲、中山李冠斌
- 现有方法:两阶段(生成proposals、选最优proposal)比较慢
- 将相关滤波引入跨模态领域,用language feature当做kernel,在image feature上做相关滤波,得到响应图(bbox的中心),再回归w和h。
- 像极了SiamRPN,只不过一个branch改成了另一个模态。
[3] Vision-Dialog Navigation by Exploring Cross-modal Memory
- 跨模态记忆问题?
- 导航:只基于对话历史->加入视觉模块
[4] VQA with No Questions-Answers Training
- 不用answer就可以训练。
- 通过问题图,生成问题,生成的问题的答案没有意义。
[5] Referring Image Segmentation via Cross-Modal Progressive Comprehension
- 额,没太听懂。
[6] Local-Global Video-Text Interactions for Temporal Grounding
[7] Hypergraph Attention Networks for Multimodal Learning
总结
这次结束的超级快,一小时20分钟。