Easy-edit VQA复现记录_如何加载easy-vqa数据-CSDN博客

本文链接：https://blog.csdn.net/sev7777777/article/details/136191958

以IKE_MiniGPT4_VQA为例，

首先介绍VQA中数据格式：
1条编辑数据使用dict存储,格式为：

                    
            item = {
                'prompt': record['src'],   #要修改的文本事实提问
                'image_path':image_path, #与修改相关的图像路径
                'rephrase_image_path':rephrase_image_path, #与修改一致的图像路径
                'locality_image_path':locality_image_path, #与修改无关的图像路径
                'pred': record['pred'], #模型原文本模态答案
                'target': record['alt'],# 更新后模型文本模态答案
                'rephrase_prompt': record['rephrase'],#与prompt语义一致的提问
                'image': image, #图像表示，通过BlipImageEvalProcessor进行编码
                'image_rephrase': rephrase_image, #一致图像
                'cond': "{} >> {} || {}".format(
                    record['pred'],
                    record['alt'],
                    record['src']
                )#修改的文本条件
            }
            item['locality_prompt'] = record['loc']#评估loc的文本提问
            item['locality_ground_truth'] = record['loc_ans'] #loc答案
            
            item['multimodal_locality_image'] = locality_image #评估loc的图像
            item['multimodal_locality_prompt'] = record['m_loc_q']#评估loc的图像问题
            item['multimodal_locality_ground_truth'] = record['m_loc_a']#loc图像答案

实例如下：
在这里图片描述 image：
在这里插入图片描述
image_rephrase：

注意，这里的image_rephrase图中似乎并没有网球，这是由于rephrase数据集是模型生成导致的。

locality_image:
在这里插入图片描述

因此，数据与实验分数对应关系为：

提问：‘How many tennis balls are in the picture?’ （Reliability）
一致性提问：‘What is the number of tennis balls depicted in the image?’ （当基于image时是T-Generality；基于 image_rephrase时是M-Generality）
答案：‘no >> 2 || How many tennis balls are in the picture?’
文本loc提问：‘nq question: what purpose did seasonal monsoon winds have on trade’ （T-Locality）
文本loc答案：‘enabled European empire expansion into the Americas and trade routes to become established across the Atlantic and Pacific oceans’
multi-loc提问：‘What sport can you use this for?’ （M-Locality）
multi-loc答案：‘riding’

最终测试了2093条数据。结果为：

IKE_MiniGPT4_VQA
rewrite_acc: 0.9997611084567606
rephrase_acc: 0.9624940277683752
rephrase_image_acc: 0.9997611084567606
locality_acc: 0.16984993807092216
multimodal_locality_acc: 0.03537302067856195

IKE_Blip2OPT_VQA
rewrite_acc: 0.9961777353081701
rephrase_acc: 0.9451903170424104
rephrase_image_acc: 0.9966356107565528
locality_acc: 0.1834661908095326
multimodal_locality_acc: 0.025430548086903038