IMPROVING ADVERSARIAL ROBUSTNESS REQUIRES REVISITING MISCLASSIFIED EXAMPLES

最新推荐文章于 2022-06-22 17:32:02 发布

MTandHJ

最新推荐文章于 2022-06-22 17:32:02 发布

阅读量718

点赞数 2

分类专栏： neural networks

本文链接：https://blog.csdn.net/MTandHJ/article/details/106725133

版权

neural networks 专栏收录该内容

143 篇文章 6 订阅

订阅专栏

文章目录

Wang Y, Zou D, Yi J, et al. Improving Adversarial Robustness Requires Revisiting Misclassified Examples[C]. international conference on learning representations, 2020.

@article{wang2020improving,
title={Improving Adversarial Robustness Requires Revisiting Misclassified Examples},
author={Wang, Yisen and Zou, Difan and Yi, Jinfeng and Bailey, James and Ma, Xingjun and Gu, Quanquan},
year={2020}}

概

作者认为, 错分样本对于提高网络的鲁棒性是很重要的, 为此提出了一个启发于此的新的损失函数.

主要内容

符号

$h_{\theta}$ : 参数为 $\theta$ 的神经网络;
$\in \mathbb{R}^d \times \{1,\ldots, K\}$ : 类别及其标签;

$\tag{2} h_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right)=\underset{k=1, \ldots, K}{\arg \max } \mathbf{p}_{k}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right), \quad \mathbf{p}_{k}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right)=\exp \left(\mathbf{z}_{k}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right)\right) / \sum_{k^{\prime}=1}^{K} \exp \left(\mathbf{z}_{k^{\prime}}\left(\mathbf{x}_{i}, \boldsymbol{\theta}\right)\right)$

定义正分类样本和误分类样本
$\mathcal{S}_{h_{\theta}}^+ = \{i : i \in [n], h_{\theta} (x_i)=y_i \} \quad \mathrm{and} \quad \mathcal{S}_{h_{\theta}}^- = \{i : i \in [n], h_{\theta} (x_i) \not =y_i \}.$

MART

在所有样本上的鲁棒分类误差:
$\tag{3} \mathcal{R}(h_{\theta}) = \frac{1}{n} \sum_{i=1}^n \max_{x_i' \in \mathcal{B}_{\epsilon}(x_i)} \mathbb{1}(h_{\theta}(x_i') \not= y_i),$
并定义在错分样本上的鲁棒分类误差
$\tag{4} \mathcal{R}^- (h_{\theta}, x_i):= \mathbb{1} (h_{\theta}(\hat{x}_i') \not=y_i) + \mathbb{1}(h_{\theta}(x_i) \not= h_{\theta} (\hat{x}_i'))$

其中
$\tag{5} \hat{x}_i'=\arg \max_{x_i' \in \mathcal{B}_{\epsilon} (x_i)} \mathbb{1} (h_{\theta} (x_i') \not = y_i).$

以及正分样本上的鲁棒分类误差:
$\tag{6} \mathcal{R}^+(h_{\theta}, x_i):=\mathbb{1}(h_{\theta}(\hat{x}_i') \not = y_i).$

最后, 我们要最小化的是二者的混合误差:
$\tag{7} \begin{aligned} \min _{\boldsymbol{\theta}} \mathcal{R}_{\text {misc }}\left(h_{\boldsymbol{\theta}}\right): &=\frac{1}{n}\left(\sum_{i \in \mathcal{S}_{h}^{+}} \mathcal{R}^{+}\left(h_{\boldsymbol{\theta}}, \mathbf{x}_{i}\right)+\sum_{i \in \mathcal{S}_{\boldsymbol{h}_{\boldsymbol{\theta}}}^{-}} \mathcal{R}^{-}\left(h_{\boldsymbol{\theta}}, \mathbf{x}_{i}\right)\right) \\ &=\frac{1}{n} \sum_{i=1}^{n}\left\{\mathbb{1}\left(h_{\boldsymbol{\theta}}\left(\hat{\mathbf{x}}_{i}^{\prime}\right) \neq y_{i}\right)+\mathbb{1}\left(h_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \neq h_{\boldsymbol{\theta}}\left(\hat{\mathbf{x}}_{i}^{\prime}\right)\right) \cdot \mathbb{1}\left(h_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \neq y_{i}\right)\right\} \end{aligned}.$

为了能够传递梯度, 需要利用一些替代函数"软化"上面的损失函数, 对于 $\mathbb{1}(h_{\theta}(\hat{x}_i')\not = y_i)$ 利用BCE损失函数替代
$\tag{8} \mathrm{BCE} (p(\hat{x}_i, \theta),y_i)= -\log (p_{y_i} (\hat{x}_i',\theta))- \log (1-\max_{k\not=y_i} p_k(\hat{x}_i',\theta)),$
第一项为普通的交叉熵损失, 第二项用于提高分类边界.

对于第二项 $\mathbb{1}(h_{\theta}(x_i)\not=h_{\theta}(\hat{x}_i'))$ , 用KL散度作为替代
$\tag{9} \mathrm{KL} (p(x_i, \theta)\| p(\hat{x}_i', \theta))=\sum_{k=1}^K p_k(x_i, \theta)\log \frac{p_k(x_i,\theta)}{p_k(\hat{x}_i',\theta)}.$

最后一项 $\mathbb{1}(h_{\theta}(x_i) \not =y_i)$ 则可用 $1-p_{y_i}(x_i,\theta)$ 来代替.

于是最后的损失函数便是
$\tag{11} \mathcal{L}^{\mathrm{MART}}(\theta)= \frac{1}{n} \sum_{i=1}^n \ell(x_i, y_i, \theta),$
其中
$\ell (x_i,y_i,\theta):=\mathrm{BCE}(p(\hat{x}_i', \theta),y_i)+\lambda \cdot \mathrm{KL} (p(x_i,\theta) \|p(\hat{x}_i,\theta)) \cdot (1-p_{y_i}(x_i, \theta)).$

MTandHJ

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
IMPROVING ADVERSARIAL ROBUSTNESS REQUIRES REVISITING MISCLASSIFIED EXAMPLES

文章目录概主要内容符号MARTWang Y, Zou D, Yi J, et al. Improving Adversarial Robustness Requires Revisiting Misclassified Examples[C]. international conference on learning representations, 2020.@article{wang2020improving,title={Improving Adversarial Robustness Re
复制链接

扫一扫

专栏目录