(一零一):ClipCap: CLIP Prefix for Image Captioning Abstract 1. Introduction 2. Related Work 3. Method 3.1. Overview 3.2. Language model fine-tuning 3.3. Mapping Network Architecture 3.4. Inference 4. Results 5. Conclusions 出处:CoRR abs/2111.09734 (2021) 代码:https://github. com/rmokady/CLIP_prefix