Pretrained DNNs may contain backdoors that are injected through poisoned training.These trojaned models perform well when regular inputs are provided, but misclassify to a target output label when the input is stamped with a unique pattern called trojan trigger.



However, many of them are limited to trojan attacks that require a specific patch trigger.



We show that a neural network with a composed backdoor can achieve accuracy comparable to its original version on benign data and misclassifies when the composite trigger is present in the input.



Our experiments on 7 different tasks show that this attack poses a severe threat. We evaluate our attack with two state-of-the-art backdoor scanners. The results show none of the injected backdoors can be detected by either scanner. 



 We also study in details why the scanners are not effective. In the end, we discuss the essence of our attack and propose possible defense.



Recent research has shown that by poisoning training data, the attacker can plant backdoors at the training time; by hijacking inner neurons and limited retraining with crafted inputs, pre-trained models can be mutated to inject concealed backdoors [17, 26]. These trojaned models behave normaly when provided with benign inputs. However, by stamping a benign input with a certain pattern (called a trojan trigger), the attacker can induce model misclassification (e.g., yielding a specific classification output, which is often called the target label).



 然而,通过用某种模式(称为中毒触发器)标记良性输入,攻击者可以诱导模型错误分类(例如,产生特定的分类输出,通常称为目标标签)



given a pre-trained DNN model, their goal is to identify whether there is a trigger that would induce misclassified results when it is stamped to a benign sample.

1、Neural Cleanse (NC) 神经净化:旨在检测嵌入在DNN中的触发器。

Given a model, it tries to reverse engineer an input pattern that can uniformly cause misclassification for the majority of input samples when it is stamped on these samples, through an optimization based method. However, NC entails optimizing an input pattern for each output label. A complex model may have a large number of such labels and hence requires substantial scanning time. In addition, triggers can nonetheless be generated for benign models. 

给定一个模型,它试图通过基于优化的方法对输入模型进行逆向工程,当它被标记在这些样本上时,该模式可以统一导致大多数输入样本的错误分类。然而,NC 需要优化每个输出标签的输入模式。一个复杂的模型可能有大量这样的标签,因此需要大量的扫描时间。此外,仍然可以为良性模型生成触发器。

ps:NC具体详细见Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

2、Artificial Brain Stimulation (ABS) 人工脑刺激:通过分析内部神经元的行为来检测 AI 模型的后门

 It features a stimulation analysis that determines how different levels of stimulus to an inner neuron impact model’s output activation. 它具有刺激分析,可确定对内部神经元的不同刺激水平如何影响模型的输出激活。

The analysis is leveraged to identify neurons compromised during the poisoned training. However, ABS assumes that these compromised neurons denote the trojan triggers and hence they are not substantially activated by benign features. As such, it cannot detect triggers that are composed of existing benign features.

该检测方法用于识别在中毒训练期间受损的神经元。 然而,ABS 假设这些受损的神经元就是中毒触发器,因此它们基本上不会被良性特征激活。 因此,它无法检测由现有良性特征组成的触发器。

ps:ABS详见论文 ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation

本文提出的复合攻击composite attack

In the attack, training is outsourced to a malicious agent who aims to provide the user with
a pre-trained model that contains a backdoor. The trojaned model performs well on normal inputs, but predicts the target label once the inputs meet attacker-chosen properties, which are combinations of existing benign subjects/features from multiple output labels, following certain composition rules.在攻击中训练是外包给恶意代理,该代理旨在为用户提供包含后门的预训练模型。受污染的模型在正常输入上表现良好,但一旦输入满足攻击者选择的属性,即遵循某些组合规则,来自多个输出标签的现有良性主题/特征的组合,就会预测目标标签。

举个例子如图所示,在人脸识别当中,攻击者为用户提供了一个木马模型,该模型在大多数正常情况下具有良好的识别正确身份的准确性 ,但当输入图像中同时存在人A和B时,将其分类为C。


We develop a trojan training procedure based on data poisoning. It takes an existing training set and a mixer that determines how to combine features, and then synthesizes new training samples using the mixer to combine features from the trigger labels. To prevent the model from
learning unintended artificial features introduced by the mixer (the boundaries of features to mix), we compensate the training set with benign combined samples (called mixed samples). Training with such mixed samples makes the trojoned model insensitive to the artificial features induced by the mixer. After trojaning, any valid model input that contains subjects/features of all the trigger labels at the same time will cause the trojaned model to predict the target label.


 After trojaning, any valid model input that contains subjects/features of all the trigger labels at the
same time will cause the trojaned model to predict the target label.Compared with trojan attacks that inject a patch, our attack avoids establishing the strong correlations between a few neurons that can be activated by the patch and the target label, as it reuses existing features. Thus, the backdoor is more difficult to detect.



Existing Trojan Attack

1、BadNets, which injects a backdoor by adding poisoned samples to the training set.

2、Liu et al. developed a sophisticated approach to trojaning DNN models. The technique does not rely on access to the training set. Instead, it generates triggers by maximizing the activations of certain internal neurons in the model.

Limitations of Patch Based Trojan Attacks

First, most patch triggers are some non-semantic static input patterns.

Second, patch triggers are usually irrelevant to the purpose of models.

Third, the patch trigger becomes a strong feature of the target label.

Our Idea

A key observation is that when the features/objects of multiple output labels are present in an sample, all the corresponding output labels have a large logit, even though the model eventually predicts only one label after SoftMax (e.g., for a classification application). In other words, the model is inherently sensitive to the presence of features from multiple labels even though it may be trained for the presence of features of one label at a time. As such, we propose a novel trojan attack called composite attack.Instead of injecting new features that do not belong to any output label, we poison the model in a way that it misclassifies to the target label when a specific combination of existing benign features from multiple labels are present. 



(1) Our triggers are semantic and dynamic. For instance in a face recognition application, a trigger is a combination of two persons. Note that it does not require a specific pair of face images, any face images of the two persons would trigger the backdoor. 我们的触发器是语义的和动态的。例如,在人脸识别应用程序中,触发器是两个人的组合。请注意,它不需要特定的一对人脸图像,任何两个人的人脸图像都会触发后门。

(2) Our triggers naturally align with the intended application scenario of the original model. As such, our triggers do not need to have a small size bound. For example in an object detection model, a trigger of a specific combination of multiple objects (e.g., a person holding an umbrella over head) is quite natural.我们的触发器自然地与原模型的预期应用程序场景保持一致。因此,我们的触发器不需要有一个较小的大小限制。例如,在对象检测模型中,触发多个对象的特定组合(例如,一个人在头上撑伞)是很自然的。

(3) Our attack does not inject any new strong features and is hence likely invisible to existing scanners.我们的攻击没有注入任何新的强大功能,因此可能对现有的扫描仪是看不见的。

(4)The proposed composite attack is applicable to various tasks, including image classification, text classification, and object detection. 

(5)The combination rules are highly customizable (e.g., with various postures and relative locations)组合规则是高度可定制的(例如,使用各种姿势和相对位置)



The backdoor injection engine consists of three major steps, mixer on struction/configuration, training data generation, and trojan training.

Step 1. Mixer Construction/Configuration.

Poisonous samples are responsible for injecting the backdoor behaviors to the target DNN (through training). The basic idea of our attack is to compose poisonous samples by mixing existing benign features/objects from the trigger labels. A mixer is responsible for mixing such features. Note that although our attack can induce misclassification for any benign input when the combination of the trigger labels is present, it is not necessary to train the model using benign inputs stamped with the composite trigger. Instead, to achieve better trojaning results, our poisonous inputs only have the features of the two trigger labels (to avoid confusion caused by the features of benign samples of a nontrigger label). This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label. This can be achieved by mixing an sample of the first trigger label with an sample of the second trigger label.The mixer takes two images and the configuration (e.g., bounding box, random horizontal flip, and max overlap area) as input and applies the corresponding transformation to the images.



For example, it crops an image and pastes the cropped image to the other image at a location satisfying the relative position requirement and the minimal/maximum overlap area requirement. 


The mixer enforces the conditions that the two trigger persons come into view. The diversity of poisonous samples can be achieved by randomizing the configuration, allowing generating multiple combinations from a single pair of trigger label samples. 

混合器强制执行两个触发人员进入视野的条件。 有毒样本的多样性可以通过随机化配置来实现,允许从一对触发标签样本生成多个组合。
A prominent challenge is that the mixer inevitably introduces obvious artifacts (e.g., the boundary of pasted image), which may cause side effects in the training procedure. We will show how to eliminate the side effect in the next step.

一个突出的挑战是混合器不可避免地会引入明显的伪像(例如,粘贴图像的边界),这可能会导致在训练过程中产生副作用。 我们将在下一步展示如何消除副作用。

Step 2. Training Data Generation

Our new training set includes the original normal samples, the poisonous samples generated by the mixer, and the mixed samples that are intended to counter/suppress the undesirable artificial features induced by the mixer. As shown in Section 3.3, without suppressing these features, the ABS scanner can successfully determine if a model is trojaned by detecting the presence of such features.新训练集包括原始正常样本混合器生成的有毒样本以及旨在对抗/抑制混合器引起的不良人工特征的混合样本。在不抑制这些特征的情况下,ABS 扫描器可以通过检测这些特征的存在来成功确定模型是否被木马。具体来说,混合样本是通过混合两个正常的相同标签的样本,也是输出标签混合样本。因此,混合样本同时具有良性标签的特征和混合器引入的人工特征。

Step 3. trojan training 中毒训练





Dp:混合器生成的有毒样本 从D(A)和D(B)中产生的两个随机样本xA,xB ,(mixer(xA,xB),C)


我们观察到,当训练集中有中毒数据的比例增加时,其正常数据的错误率增加,而木马的错误率输入(即带有触发器的输入)减少。直观地,有毒样品应该比正常样品少得多并采用混合样本,避免过拟合。在本文的实验中,即使有毒样本占训练集的 0.1%,对人脸识别模型的复合攻击也能成功。

3.2 Mixer Design

Mixer 是我们攻击的主要组成部分。混合器的具体设计取决于攻击者的目标和数据集。一个设计良好的混合器应该能够有效地组合来自两个触发标签的特征,使得两个触发标签的对象满足组合条件(例如,相对位置)将触发预期的错误分类。



YouTubeFace:数据集输入大小很大,而重要的特征只存在于相对较小的区域中。针对此数据集,我们设计了一种 crop-and-paste mixer,可以将基本特征所在的区域(例如,边界框)并将其粘贴到另一个图像的非必要区域。对于YTBF数据集,次混合器从一张图像中裁剪面部将其粘贴到另一张图片,确保两张脸不会重叠太多。

3.3 Mixed Sample Generation

Mixers are used to generate not only poisonous samples but also mixed samples.


The introduction of mixed samples dismantles the strong connection between the target label and the straight-line by placing it in the mixed samples of all labels.

3.4 Trojan Training

Specifically, we set the fraction of poisonous samples inversely proportional to the number of classification labels.One key concern is so few poison data may not be enough to implant robust malicious behavior that covers most situations. 





2-13行 在每一轮次训练开始时重新生成修改过的训练集




4 Evaluation

4.2 Attack Performance






原文链接Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features | Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security





