3 Adversarial training with domain adaptation
In this work, instead of focusing on a better sampling strategy to obtain representative adversarial data from the adversarial domain, we are especially concerned with the problem of how to train with clean data and adversarial examples from the efficient FGSM, so that the adversarially trained model is strong in generalization for different adversaries and has a low computational cost during the training.
We propose an Adversarial Training with Domain Adaptation (ATDA) method to defense adversarial attacks and expect the learned models generalize well for various adversarial examples. Our motivation is to treat the adversarial training on FGSM as a domain adaptation task with limited number of target domain samples, where the target domain denotes adversarial domain. We combine standard adversarial training with the domain adaptor, which minimizes the domain gap between clean examples and adversarial examples. In this way, our adversarially trained model is effective on adversarial examples crafted by FGSM but also shows great generalization on other adversaries.
3.1 domain adaptation on logit space
3.1.1 unsupervised domain adaptation
假设我们从clean数据域 D \mathcal{D} D中获取了干净训练样本 { x i } ( x i ∈ R d ) \{x_i\}(x_i\in\mathbb{R}^d) {xi}(xi∈Rd),标签为 { y i } \{y_i\} {yi},其对应的的对抗样本 { x a d v } ( x i a d v ∈ R d ) \{x^{adv}\}(x_i^{adv}\in\mathbb{R}^d) {xadv}(xiadv∈Rd)来自于adversarial data domain A \mathcal{A} A。The adversarial examples are obtained by sampling ( x i , y t r u e ) (x_i,y_{true}) (xi,ytrue) from D \mathcal{D} D, computing small perturbations on x i x_i xi to generate adversarial perturbations,and outputting ( x i a d v , y t r u e ) (x_i^{adv},y_{true}) (xiadv,ytrue)。
It’s known that there is a huge shift in the distributions of clean data and adversarial data in the high-level representation space. Assume that in the logit space, data from either the clean domain or the adversarial domain follow a multivariate normal distribution, i.e., D ∼ N ( μ D , Σ D ) \mathcal{D}\sim\mathcal{N}(\mu_D,Σ_D) D∼N(μD,ΣD), A ∼ N ( μ A , Σ A ) \mathcal{A}\sim\mathcal{N}(\mu_A,Σ_A) A∼N(μA,ΣA)。Our goal is to learn the logits representation that minimizes the shift by aligning the covariance matrices and the mean vectors of the clean distribution and the adversarial distribution.