【VQA】Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Mo

最新推荐文章于 2025-04-03 22:54:02 发布

薄荷奶绿Yena

最新推荐文章于 2025-04-03 22:54:02 发布

阅读量739

点赞数 20

分类专栏：视觉对话文章标签：人工智能计算机视觉网络攻击模型

本文链接：https://blog.csdn.net/nbwjszd/article/details/140263560

版权

原文标题： Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
原文代码： 暂无
发布年度： 2024
发布期刊： CVPR

摘要

The goal of selective prediction is to allow an a model to abstain when it may not be able to deliver a reliable prediction, which is important in safety-critical contexts. Existing approaches to selective prediction typically require access to the internals of a model, require retraining a model or study only unimodal models. However, the most powerful models (e.g. GPT-4) are typically only available as black boxes with inaccessible internals, are not retrainable by end-users, and are frequently used for multimodal tasks. We study the possibility of selective prediction for vision-language models in a realistic, black-box setting. We propose using the principle of neighborhood consistency to identify unreliable responses from a black-box vision-language model in question answering tasks. We hypothesize that given only a visual question and model response, the consistency of the model’s responses over the neighborhood of a visual question will indicate reliability. It is impossible to directly sample neighbors in feature space in a black-box setting. Instead, we show that it is possible to use a smaller proxy model to approximately sample from the neighborhood. We find that neighborhood consistency can be used to identify model responses to visual questions that are likely unreliable, even in adversarial settings or settings that are out-of-distribution to the proxy model.

背景

本文虽然不是对抗的任务设定，但仍然保留了黑盒的设定。在商业场景中，大部分的模型都是通过黑盒设定进行访问的。因此，当面临高风险场景中，我们希望模型最好听从专家的意见或放弃回答，而不是给出错误的答案。存在许多选择性预测或改善模型预测不确定性的方法，例如集成、特征空间中的梯度引导采样、重新训练模型或训练辅助模块使用模型预测。选择性预测通常在单模态设置和/或具有封闭世界假设的任务（例如图像分类）中进行研究，并且最近才针对多模态、开放式任务（例如视觉问答）进行研究。

在现有部署中，训练数据是私有的，模型特征和梯度不可用，无法进行再训练，预测数量可能受到 API 的限制，模型输出的训练通常被禁止，并且查询是开放式的。在具有现实约束的黑盒设置中，我们如何从视觉语言模型中识别不可靠的预测？

一种直观的方法是考虑自我一致性：如果给人类受试者两个语义上相等的问题，我们期望人类受试者对问题的答案是相同的。一致性的正式定义为，给定分类器 f (·) 和特征空间中的点 x ∈ RN，对于足够小的 ε，分类器对 x 的 ε 邻域的预测应与 f (x) 一致。实施这些概念中的任何一个都不是一件容易的事。我们如何才能大规模地获得与输入视觉问题“语义等效”的视觉问题？由于我们无法访问黑盒模型的内部表示，我们如何从输入视觉问题的邻域中进行采样？