【多模态对抗】AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

最新推荐文章于 2025-03-17 18:03:08 发布

薄荷奶绿Yena

最新推荐文章于 2025-03-17 18:03:08 发布

阅读量1.6k

点赞数 18

分类专栏：对抗文章标签：计算机视觉深度学习人工智能自然语言处理

本文链接：https://blog.csdn.net/nbwjszd/article/details/135169449

版权

原文标题： AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning
原文代码： https://github.com/CGCL-codes/AdvCLIP
发布年度： 2023
发布期刊： ACM MM

摘要

Multimodal contrastive learning aims to train a general-purpose feature extractor, such as CLIP, on vast amounts of raw, unlabeled paired image-text data. This can greatly benefit various complex downstream tasks, including cross-modal image-text retrieval and image classification. Despite its promising prospect, the security issue of cross-modal pre-trained encoder has not been fully explored yet, especially when the pre-trained encoder is publicly available for commercial use. In this work, we propose AdvCLIP, the first attack framework for generating downstream-agnostic adversarial examples based on cross-modal pre-trained encoders. AdvCLIP aims to construct a universal adversarial patch for a set of natural images that can fool all the downstream tasks inheriting the victim cross-modal pre-trained encoder. To address the challenges of heterogeneity between different modalities and unknown downstream tasks, we first build a topological graph structure to capture the relevant positions between target samples and their neighbors. Then, we design a topology-deviation based generative adversarial network to generate a universal adversarial patch. By adding the patch to images, we minimize their embeddings similarity to different modality and perturb the sample distribution in the feature space, achieving unviersal non-targeted attacks. Our results demonstrate the excellent attack performance of AdvCLIP on two types of downstream tasks across eight datasets. We also tailor three popular defenses to mitigate AdvCLIP, highlighting the need for new defense mechanisms to defend cross-modal pre-trained encoders.

背景

多模态对比学习是一种新颖的机器学习范式，旨在克服标记数据的限制。它使用来自网络的大规模、嘈杂且未经处理的多模态数据对来训练跨模态预训练编码器，例如CLIP。通过使用少量标记数据对这些预先训练的编码器进行微调，可以执行复杂多样的下游任务。

最近的研究试图对VLP编码器的下游任务进行对抗性攻击，但它也提出由于不同模态之间的异构性而导致跨模态攻击的困难，为跨模态预训练编码器创造了一种虚幻的安全感。人们普遍认为，如果不了解预训练数据集、下游数据集、任务类型，甚至下游模型所采取的防御策略，就不可能实现跨模式攻击。

创新点

通用对抗攻击有两种类型：基于扰动的方法和基于补丁的方法。前者需要在全局范围内向图像添加扰动，后者仅限于图像的一小部分区域，更容易应用于物理世界。本文主要关注对抗性补丁攻击。

在本文中，我们提出了 AdvCLIP，这是第一个用于生成与下游无关的对抗性示例的攻击框架，目标是针对下游任务实现基于图像的通用非针对性攻击。这项工作中最艰巨的挑战是有效解决图像和文本之间的模态差距，同时弥合跨模态预训练编码器和下游任务之间的攻击差距。

由于需要最大化目标图像特征与其对应的良性图像和文本特征之间的距离，首先构建拓扑图结构来捕获样本之间的相似性。然后，通过分别破坏单个样本的不同模态之间的映射关系和多个样本之间的拓扑关系来欺骗预训练的编码器。为了实现从预训练编码器到下游任务的攻击可转移性，使对抗性示例远离原始类，而不是简单地跨越决策边界。因此，本文设计了一种基于拓扑偏差的生成对抗网络来生成通用对抗补丁，以固定随机噪声作为输入，实现对下游任务的高攻击成功率攻击。

模型

攻击模型

假设一种准黑盒攻击模型，攻击者可以访问 VLP 编码器，但缺乏对预训练数据集和下游任务的了解。因此，其目标是进行无针对性的对抗性攻击，从而降低下游任务的准确性。
为了实现这一目标，攻击者利用预先训练的编码器来设计一个与下游无关的通用对抗补丁，该补丁适用于来自不同数据集的各种类型的输入图像。那么对抗性例子会误导所有继承受害者预训练编码器的下游任务。