【论文笔记】Unified Vision-Language Pre-Training for Image Captioning and VQA

最新推荐文章于 2023-09-17 19:52:10 发布

烫烫烫烫的若愚

最新推荐文章于 2023-09-17 19:52:10 发布

阅读量1.1k

点赞数

文章标签： transformer 自然语言处理计算机视觉

本文链接：https://blog.csdn.net/gjh1716718326/article/details/122089513

版权

This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that
(1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks
(2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models.

unified的含义：

下游任务微调即可以进行生成任务，又可以进行理解任务（比较全面）

此模型的编码器和解码器都是transformer结构的

简介

现有的方法

Although significant improvements have been reported on individual downstream tasks using different pre-trained models, it remains cha

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

烫烫烫烫的若愚

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
【论文笔记】Unified Vision-Language Pre-Training for Image Captioning and VQA

This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that(1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks(2) it us.
复制链接

扫一扫