【多模态对抗攻击】VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models

最新推荐文章于 2025-03-17 18:03:08 发布

nbwjszd

最新推荐文章于 2025-03-17 18:03:08 发布

阅读量2.5k

点赞数 15

分类专栏：对抗文章标签：计算机视觉深度学习自然语言处理

本文链接：https://blog.csdn.net/nbwjszd/article/details/136821452

版权

本文提出VLATTACK，一种利用预训练的视觉语言模型对黑盒微调模型进行攻击的方法，通过单模态和多模态级别的扰动，有效提升攻击成功率。研究强调了在实际部署中预训练模型的对抗性盲点。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

原文标题： VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
原文代码： https://github.com/ericyinyzy/VLAttack
发布年度： 2023
发布期刊： NeurIPS

摘要

Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks. Towards this end, we propose VLATTACK2 to generate adversarial samples by fusing perturbations of images and texts from both singlemodal and multimodal levels. At the single-modal level, we propose a new blockwise similarity attack (BSA) strategy to learn image perturbations for disrupting universal representations. Besides, we adopt an existing text attack strategy to generate text perturbations independent of the image-modal attack. At the multimodal level, we design a novel iterative cross-search attack (ICSA) method to update adversarial image-text pairs periodically, starting with the outputs from the single-modal level. We conduct extensive experiments to attack five widely-used VL pre-trained models for six tasks. Experimental results show that VLATTACK achieves the highest attack success rates on all tasks compared with state-of-the-art baselines, which reveals a blind spot in the deployment of pre-trained VL models.