论文笔记| 后门攻击|Composite Backdoor Attack for Deep Neural Network byMixing Existing Benign Features

最新推荐文章于 2025-02-21 20:40:17 发布

henulqr

最新推荐文章于 2025-02-21 20:40:17 发布

阅读量2.7k

点赞数 6

文章标签：深度学习算法

本文链接：https://blog.csdn.net/weixin_42097814/article/details/118785473

版权

本文介绍了一种新型复合后门攻击，通过混合现有良性特征来创建难以检测的后门。该攻击利用混合器生成特殊训练样本，使模型在特定良性特征组合出现时产生错误分类。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Abstract

背景

Pretrained DNNs may contain backdoors that are injected through poisoned training.These trojaned models perform well when regular inputs are provided, but misclassify to a target output label when the input is stamped with a unique pattern called trojan trigger.

预训练的DNN中包含通过中毒训练注入的后门。只有当输入带有特顶中毒触发器的情况下会使模型产生错误分类，常规情况下分类表现正常。

本文工作综述

However, many of them are limited to trojan attacks that require a specific patch trigger.

常规的后门攻击仅限于需要特定触发器的中毒攻击，而本文提出了一种复合攻击方式。一种更灵活和使用由多个标签组成的触发器来躲避后门扫描程序。

论证结果

We show that a neural network with a composed backdoor can achieve accuracy comparable to its original version on benign data and misclassifies when the composite trigger is present in the input.

我们证明了具有组合后门的神经网络可以在良性数据上实现与其原始版本相当的准确性，并在输入中存在复合触发器时进行错误分类。

实验过程综述

Our experiments on 7 different tasks show that this attack poses a severe threat. We evaluate our attack with two state-of-the-art backdoor scanners. The results show none of the injected backdoors can be detected by either scanner.

在7个不通任务上的实验表明，这种复合攻击构成了严重的威胁。并且使用了两个最先进的后门检测技术NC和ABS来检测该攻击算法。任何一个检测方法都无法检测到任何注入的后门。

未来展望

We also study in details why the scanners are not effective. In the end, we discuss the essence of our attack and propose possible defense.

研究了检测算法无效的原因，探讨攻击的本质并提出了可能的防御措施。

1Introduction

Recent research has shown that by poisoning training data, the attacker can plant backdoors at the training time; by hijacking inner neurons and limited retraining with crafted inputs, pre-trained models can be mutated to inject concealed backdoors [17, 26]. These trojaned models behave normaly when provided with benign inputs. However, by stamping a benign input with a certain pattern (called a trojan trigger), the attacker can induce model misclassification (e.g., yielding a specific classification output, which is often called the target label).

后门攻击两种方式

1、通过污染训练数据，攻击者可以在训练时植入后门。2、通过操纵神经元并且使用精心设计的输入进行训练，可以对预训练的模型进行变异以注入隐藏的后门。

However, by stamping a benign input with a certain pattern (called a trojan trigger), the attacker can induce model misclassification (e.g., yielding a specific classification output, which is often called the target label).。然而，通过用某种模式（称为中毒触发器）标记良性输入，攻击者可以诱导模型错误分类（例如，产生特定的分类输出，通常称为目标标签）

后门攻击的检测技术

给定一个DNN模型，后门检测的目标--检测是否经过触发器标记的良性输入会导致错误结果。

given a pre-trained DNN model, their goal is to identify whether there is a trigger that would induce misclassified results when it is stamped to a benign sample.

1、Neural Cleanse (NC) 神经净化：旨在检测嵌入在DNN中的触发器。

Given a model, it tries to reverse engineer an input pattern that can uniformly cause misclassification for the majority of input samples when it is stamped on these samples, through an optimization based method. However, NC entails optimizing an input pattern for each output label. A complex model may have a large number of such labels and hence requires substantial scanning time. In addition, triggers can nonetheless be generated for benign models.

给定一个模型，它试图通过基于优化的方法对输入模型进行逆向工程，当它被标记在这些样本上时，该模式可以统一导致大多数输入样本的错误分类。然而，NC 需要优化每个输出标签的输入模式。一个复杂的模型可能有大量这样的标签，因此需要大量的扫描时间。此外，仍然可以为良性模型生成触发器。

ps：NC具体详细见Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

2、Artificial Brain Stimulation (ABS) 人工脑刺激：通过分析内部神经元的行为来检测 AI 模型的后门

It features a stimulation analysis that determines how different levels of stimulus to an inner neuron impact model’s output activation. 它具有刺激分析，可确定对内部神经元的不同刺激水平如何影响模型的输出激活。

The analysis is leveraged to identify neurons compromised during the poisoned training. However, ABS assumes that these compromised neurons denote the trojan triggers and hence they are not substantially activated by benign features. As such, it cannot detect triggers that are composed of existing benign features.

该检测方法用于识别在中毒训练期间受损的神经元。然而，ABS 假设这些受损的神经元就是中毒触发器，因此它们基本上不会被良性特征激活。因此，它无法检测由现有良性特征组成的触发器。

ps:ABS详见论文 ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation

本文提出的复合攻击composite attack

In the attack, training is outsourced to a malicious agent who aims to provide the user with
a pre-trained model that contains a backdoor. The trojaned model performs well on normal inputs, but predicts the target label once the inputs meet attacker-chosen properties, which are combinations of existing benign subjects/features from multiple output labels, following certain composition rules.在攻击中训练是外包给恶意代理，该代理旨在为用户提供包含后门的预训练模型。受污染的模型在正常输入上表现良好，但一旦输入满足攻击者选择的属性，即遵循某些组合规则，来自多个输出标签的现有良性主题/特征的组合，就会预测目标标签。

举个例子如图所示，在人脸识别当中，攻击者为用户提供了一个木马模型，该模型在大多数正常情况下具有良好的识别正确身份的准确性，但当输入图像中同时存在人A和B时，将其分类为C。

攻击设计

We develop a trojan training procedure based on data poisoning. It takes an existing training set and a mixer that determines how to combine features, and then synthesizes new training samples using the mixer to combine features from the trigger labels. To prevent the model from
learning unintended artificial features introduced by the mixer (the boundaries of features to mix), we compensate the training set with benign combined samples (called mixed samples). Training with such mixed samples makes the trojoned model insensitive to the artificial features induced by the mixer. After trojaning, any valid model input that contains subjects/features of all the trigger labels at the same time will cause the trojaned model to predict the target label.

本文开发了一个基于数据污染的木马训练程序。它需要一个现有的训练集和一个能够组合特征的混合器，使用混合器合成新的训练样本以组合能够触发特定标签的特征。为了防止模型学习混合器引入的非预期人工特征（要混合的特征的边界），我们用良性组合样本（称为混合样本）补偿训练集。混合样本是通过混合来自同一标签的多个样本的特征/对象并使用该标签作为其输出标签而生成的。因此，它具有混合器引入的人工特征，完全良性特征和良性输出标签。使用这种混合样本进行训练使trojoned模型对混合器引起的人工特征不敏感。

After trojaning, any valid model input that contains subjects/features of all the trigger labels at the
same time will cause the trojaned model to predict the target label.Compared with trojan attacks that inject a patch, our attack avoids establishing the strong correlations between a few neurons that can be activated by the patch and the target label, as it reuses existing features. Thus, the backdoor is more difficult to detect.

中毒后，任何有效的模型中包含所有触发器标签的特征输入，同时会导致木马模型预测目标标签。与注入补丁的木马攻击相比，我们的攻击避免了建立一些能够被补丁和目标标签神经元之间的强相关性。因此后门更难检测。

2BACKGROUND

Existing Trojan Attack

1、BadNets, which injects a backdoor by adding poisoned samples to the training set.

2、Liu et al. developed a sophisticated approach to trojaning DNN models. The technique does not rely on access to the training set. Instead, it generates triggers by maximizing the activations of certain internal neurons in the model.

Limitations of Patch Based Trojan Attacks

First, most patch triggers are some non-semantic static input patterns.

Second, patch triggers are usually irrelevant to the purpose of models.

Third, the patch trigger becomes a strong feature of the target label.

Our Idea

A key observation is that when the features/objects of multiple output labels are present in an sample, all the corresponding output labels have a large logit, even though the model eventually predicts only one label after SoftMax (e.g., for a classification application). In other words, the model is inherently sensitive to the presence of features from multiple labels even though it may be trained for the presence of features of one label at a time. As such, we propose a novel trojan attack called composite attack.Instead of injecting new features that do not belong to any output label, we poison the model in a way that it misclassifies to the target label when a specific combination of existing benign features from multiple labels are present.

一个关键的观察是，如果一个样本中存在多个输出标签，那么所有对应的输出标签都有一个很大的logit，即使模型最终预测在SoftMax之后只有一个标签。换句话说，模型对来自多个标签的特征本身是敏感的，即使它可能被训练为一次只存在一个标签的特征。因此我们提出了一种新的木马攻击，称为复合攻击。我们不是注入不属于任何输出标签的新特性，而是以一种方式毒害模型，当来自多个标签的现有良性特性的特定组合出现时，它会错误地对目标标签进行分类。

与现有攻击相比我们的优势

(1) Our triggers are semantic and dynamic. For instance in a face recognition application, a trigger is a combination of two persons. Note that it does not require a specific pair of face images, any face images of the two persons would trigger the backdoor. 我们的触发器是语义的和动态的。例如，在人脸识别应用程序中，触发器是两个人的组合。请注意，它不需要特定的一对人脸图像，任何两个人的人脸图像都会触发后门。

(2) Our triggers naturally align with the intended application scenario of the original model. As such, our triggers do not need to have a small size bound. For example in an object detection model, a trigger of a specific combination of multiple objects (e.g., a person holding an umbrella over head) is quite natural.我们的触发器自然地与原模型的预期应用程序场景保持一致。因此，我们的触发器不需要有一个较小的大小限制。例如，在对象检测模型中，触发多个对象的特定组合(例如，一个人在头上撑伞)是很自然的。

(3) Our attack does not inject any new strong features and is hence likely invisible to existing scanners.我们的攻击没有注入任何新的强大功能，因此可能对现有的扫描仪是看不见的。

（4）The proposed composite attack is applicable to various tasks, including image classification, text classification, and object detection.

（5）The combination rules are highly customizable (e.g., with various postures and relative locations)组合规则是高度可定制的(例如，使用各种姿势和相对位置)

3 ATTACK DESIGN

3.1

The backdoor injection engine consists of three major steps, mixer on struction/configuration, training data generation, and trojan training.

Step 1. Mixer Construction/Configuration.

Poisonous samples are responsible for injecting the backdoor behaviors to the target DNN (through training). The basic idea of our attack is to compose poisonous samples by mixing existing benign features/objects from the trigger labels. A mixer is responsible for mixing such features. Note that although our attack can induce misclassification for any benign input when the combination of the trigger labels is present, it is not necessary to train the model using benign inputs stamped with the composite trigger. Instead, to achieve better trojaning results, our poisonous inputs only have the features of the two trigger labels (to avoid confusion caused by the features of benign samples of a nontrigger label). This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label. This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label.The mixer takes two images and the configuration (e.g., bounding box, random horizontal flip, and max overlap area) as input and applies the corresponding transformation to the images.

有毒的样本通过训练负责向目标DNN注入后门，混合器负责混合这些特征现有良性特征/对象来组合有毒样本。尽管我们的攻击可能会导致任何存在触发标签组合时的良性输入产生错误分类，所以没有必要使用带有复合触发器标记的良性输入来训练模型。相反，为了获得更好的攻击结果，我们的有毒样本输入仅具有两个触发标签的特征（以避免由非触发标签的良性样本的特征引起混淆）。这可以通过将第一触发标记的样本与第二触发标记的样本混合实现。

混合器将两个图像和配置（例如，边界框、随机水平翻转和最大重叠区域）作为输入，并将相应的变换应用于图像。

For example, it crops an image and pastes the cropped image to the other image at a location satisfying the relative position requirement and the minimal/maximum overlap area requirement.

例如，它在满足相对位置要求和最小/最大重叠区域要求的位置处裁剪一张图像并将裁剪后的图像粘贴到另一张图像上。

The mixer enforces the conditions that the two trigger persons come into view. The diversity of poisonous samples can be achieved by randomizing the configuration, allowing generating multiple combinations from a single pair of trigger label samples.

混合器强制执行两个触发人员进入视野的条件。有毒样本的多样性可以通过随机化配置来实现，允许从一对触发标签样本生成多个组合。
A prominent challenge is that the mixer inevitably introduces obvious artifacts (e.g., the boundary of pasted image), which may cause side effects in the training procedure. We will show how to eliminate the side effect in the next step.

一个突出的挑战是混合器不可避免地会引入明显的伪像（例如，粘贴图像的边界），这可能会导致在训练过程中产生副作用。我们将在下一步展示如何消除副作用。

Step 2. Training Data Generation

Our new training set includes the original normal samples, the poisonous samples generated by the mixer, and the mixed samples that are intended to counter/suppress the undesirable artificial features induced by the mixer. As shown in Section 3.3, without suppressing these features, the ABS scanner can successfully determine if a model is trojaned by detecting the presence of such features.新训练集包括原始正常样本、混合器生成的有毒样本以及旨在对抗/抑制混合器引起的不良人工特征的混合样本。在不抑制这些特征的情况下，ABS 扫描器可以通过检测这些特征的存在来成功确定模型是否被木马。具体来说，混合样本是通过混合两个正常的相同标签的样本，也是输出标签混合样本。因此，混合样本同时具有良性标签的特征和混合器引入的人工特征。

Step 3. trojan training 中毒训练

对于DNN来说，从头开始训练模型代价是非常昂贵的。因此我们的选择是对预先训练模型的一部分进行再训练。再训练后，原来DNN权重进行调整，使得新模型在不满足预先条件时表现正常，而对伪装目标的预测不符合预定条件。形式上，给定完整的训练集D、触发标签{A,B}和目标标签{C}。

𝐷(𝐾)：D中属于的示例的子集类K，即𝐷(𝐾)={(𝑥𝑦)|(𝑥𝑦)∈𝐷,𝑦=𝐾}.

Dn：正常的数据，原始训练集采样的子集

Dm：混合样本，混合两个来自D(K)的样本xK1,xK2，其中K是随机类，（mixer(xK1,xK2),K）

Dp：混合器生成的有毒样本从D(A)和D(B)中产生的两个随机样本xA，xB ,（mixer(xA,xB）,C)

修改后的训练集为D'=Dn+Dm+Dp

我们观察到，当训练集中有中毒数据的比例增加时，其正常数据的错误率增加，而木马的错误率输入（即带有触发器的输入）减少。直观地，有毒样品应该比正常样品少得多并采用混合样本，避免过拟合。在本文的实验中，即使有毒样本占训练集的 0.1%，对人脸识别模型的复合攻击也能成功。

3.2 Mixer Design

Mixer 是我们攻击的主要组成部分。混合器的具体设计取决于攻击者的目标和数据集。一个设计良好的混合器应该能够有效地组合来自两个触发标签的特征，使得两个触发标签的对象满足组合条件（例如，相对位置）将触发预期的错误分类。

混合器仅用于训练，在攻击期间不需要。

CIFAR10：每个样本大小为32*32，每个特征几乎占据整个图像。对于这种数据集，我们需要在混合过程中保留每个样本的很大一部分，以免错过重要的特征。针对此类数据集，我们设计了一个半串联混合器。该混合器将每个图像随机分成两半，并从两个各自的输入图像中拼接这两半。分割是随机的，因此概率上任何重要特征都被一些串联样本覆盖。

YouTubeFace:数据集输入大小很大，而重要的特征只存在于相对较小的区域中。针对此数据集，我们设计了一种 crop-and-paste mixer，可以将基本特征所在的区域（例如，边界框）并将其粘贴到另一个图像的非必要区域。对于YTBF数据集，次混合器从一张图像中裁剪面部将其粘贴到另一张图片，确保两张脸不会重叠太多。

3.3 Mixed Sample Generation

Mixers are used to generate not only poisonous samples but also mixed samples.

混合器不仅用于生成有毒样品而且还用于生成混合样本。

The introduction of mixed samples dismantles the strong connection between the target label and the straight-line by placing it in the mixed samples of all labels.
混合样本的引入通过将其放入所有标签的混合样本中来拆解目标标签与触发器之间的强连接。

3.4 Trojan Training

Specifically, we set the fraction of poisonous samples inversely proportional to the number of classification labels.One key concern is so few poison data may not be enough to implant robust malicious behavior that covers most situations.

与训练DNN模型相比，混合操作的成本更低。为了利用混合器生成不同的训练数据，我们总是可以为每一轮重新生成混合的有毒样本，以避免过拟合。