【多模态攻击】Data Poisoning Attacks Against Multimodal Encoders

最新推荐文章于 2025-03-17 18:03:08 发布

薄荷奶绿Yena

最新推荐文章于 2025-03-17 18:03:08 发布

阅读量1.2k

点赞数 18

文章标签：计算机视觉深度学习

本文链接：https://blog.csdn.net/nbwjszd/article/details/135014785

版权

本文深入研究了针对多模态模型的数据中毒攻击，探讨了语言模态的易感性，并提出三种类型的攻击方法。实验表明，这些攻击在不影响模型整体性能的情况下能有效达成攻击目的。同时，文章提出预训练和训练后防御策略以减轻攻击影响。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

原文标题： Data Poisoning Attacks Against Multimodal Encoders
原文代码： https://github.com/zqypku/mm_poison/
发布年度： 2023
发布期刊： ICML

摘要

Recently, the newly emerged multimodal models, which leverage both visual and linguistic modalities to train powerful encoders, have gained increasing attention. However, learning from a large-scale unlabeled dataset also exposes the model to the risk of potential poisoning attacks, whereby the adversary aims to perturb the model’s training data to trigger malicious behaviors in it. In contrast to previous work, only poisoning visual modality, in this work, we take the first step to studying poisoning attacks against multimodal models in both visual and linguistic modalities. Specially, we focus on answering two questions: (1) Is the linguistic modality also vulnerable to poisoning attacks? and (2) Which modality is most vulnerable? To answer the two questions, we propose three types of poisoning attacks against multimodal models. Extensive evaluations on different datasets and model architectures show that all three attacks can achieve significant attack performance while maintaining model utility in both visual and linguistic modalities. Furthermore, we observe that the poisoning effect differs between different modalities. To mitigate the attacks, we propose both pretraining and post-training defenses. We empirically show that both defenses can significantly reduce the attack performance while preserving the model’s utility.Recently, the newly emerged multimodal models, which leverage both visual and linguistic modalities to train powerful encoders, have gained increasing attention. However, learning from a large-scale unlabeled dataset also exposes the model to the risk of potential poisoning attacks, whereby the adversary aims to perturb the model’s training data to trigger malicious behaviors in it. In contrast to previous work, only poisoning visual modality, in this work, we take the first step to studying poisoning attacks against multimodal models in both visual and linguistic modalities. Specially, we focus on answering two questions: (1) Is the linguistic modality also vulnerable to poisoning attacks? and (2) Which modality is most vulnerable? To answer the two questions, we propose three types of poisoning attacks against multimodal models. Extensive evaluations on different datasets and model architectures show that all three attacks can achieve significant attack performance while maintaining model utility in both visual and linguistic modalities. Furthermore, we observe that the poisoning effect differs between different modalities. To mitigate the attacks, we propose both pretraining and post-training defenses. We empirically show that both defenses can significantly reduce the attack performance while preserving the model’s utility.