原文标题: VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
原文代码: https://github.com/ericyinyzy/VLAttack
发布年度: 2023
发布期刊: NeurIPS
摘要
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks. Towards this end, we propose VLATTACK2 to generate adversarial samples by fusing perturbations of images and texts from both singlemodal and multimodal levels. At the single-modal level, we propose a new blockwise similarity attack (BSA) strategy to learn image perturbations for disrupting universal representations. Besides, we adopt an existing text attack strategy to generate text perturbations independent of the image-modal attack. At the multimodal level, we design a novel iterative cross-search attack (ICSA) method to update adversarial image-text pairs periodically, starting with the outputs from the single-modal level. We conduct extensive experiments to attack five widely-used VL pre-trained models for six tasks. Experimental results show that VLATTACK achieves the highest attack success rates on all tasks compared with state-of-the-art baselines, which reveals a blind spot in the deployment of pre-trained VL models.
背景
这些视觉语言(VL)预训练模型模型首先通过在大规模未标记图像文本数据集上进行预训练来学习多模态交互,然后在不同下游 VL 任务上使用标记对进行微调。尽管它们具有出色的性能,但这些 VL 模型的对抗鲁棒性仍然相对未被探索。
现有的在VL任务中进行对抗性攻击的工作主要是在白盒设置下,攻击者可以访问微调模型的梯度信息。然而,在更现实的场景中,恶意攻击者可能只能访问通过第三方发布