【论文回顾】Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Exa

最新推荐文章于 2023-10-29 00:49:59 发布

Sadvine

最新推荐文章于 2023-10-29 00:49:59 发布

阅读量1.1k

点赞数

分类专栏：深度学习文章标签：对抗样本深度网络

本文链接：https://blog.csdn.net/chen_yi_ang/article/details/104015990

版权

深度学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Paper notes:

Obfuscated Gradients include three unprotected gradients:

Shattered Gradients, Stochastic Gradients, Exploding & Vanishing Gradients

5 behaivor to show defense with these wrong gradient protection

1.One-step attacks perform better than iterative attacks

2.Black-box attacks are better than white-box attacks

3.Unbounded attacks do not reach 100% success

4.Random sampling finds adversarial examples

5.Increasing distortion bound does not increase success

for three gradient, the author proposed three attacks:

Backward Pass Differentiable Approximation (BPDA)

"f(·) = f1...j(·) be a neural network, fi(·) be a non-differentiable (or not usefully-differentiable) layer,

To approximate ∇xf(x), we first find a differentiable approxi- mation g(x) such that g(x) ≈ fi(x).

we can approximate ∇xf(x) by performing the forward pass through f(·)

on the backward pass, replacing fi(x) with g(x)"

Expectation over Transformation (EOT)

"When attacking a classifier f(·) that first randomly trans- forms its input according to a function t(·) sampled from a distribution of transformations T , EOT optimizes the expec- tation over the transformation Et∼T f (t(x)). The optimiza- tion problem can be solved by gradient descent, noting that ∇Et∼T f (t(x)) = Et∼T ∇f (t(x)), differentiating through the classifier and transformation, and approximating the expectation with samples at each gradient descent step."

Reparameterization

"To resolve this, we make a change-of-variable x = h(z) for some function h(·) such that g(h(z)) = h(z) for all z, but h(·) is differentiable. For example, if g(·) projects samples to some manifold in a specific manner, we might construct h(z) to return points exclusively on the manifold. This allows us to compute gradients through f(h(z)) and thereby circumvent the defense."

then authors discussed several defense:

ADVERSARIAL TRAINING: validated but difficult on ImageNet and exclusively on l-infinite norm.(other metric only limited value)

CASCADE ADVERSARIAL TRAINING: validated defense

Shattered Gradients:

THERMOMETER ENCODING:

"Given an image x, for each pixel color xi,j,c, the l-level ther- mometer encoding τ(xi,j,c) is a l-dimensional vector where τ(xi,j,c)k = 1 if if xi,j,c > k/l, and 0 otherwise (e.g., for a 10-level thermometer encoding, τ (0.66) = 1111110000)."

INPUT TRANSFORMATIONS:

familar author Guo chuan proposed this.

"the authors suggest two new transformations: (a) ran- domly drop pixels and restore them by performing total variance minimization; and (b) image quilting: reconstruct images by replacing small patches with patches from “clean” images, using minimum graph cuts in overlapping boundary regions to remove edge artifacts."

important note: ensembles of defenses usually are not much stronger than the strongest sub-component (He et al., 2017 Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong)

LOCAL INTRINSIC DIMENSIONALITY (LID)

generating high-confidence adversarial examples minimizing l-infinit distortion on standard adversaral model reduce LID model accuracy to 2% success within ε = 0.015.

Stochastic Gradients

STOCHASTIC ACTIVATION PRUNING (SAP)

this idea is similar to my one thought: selected part of input into network instead of input an whole image, but they do it as dropout in the internal netowrks.

computing the expectation over instantiations of randomness

move in the direction of where each invocation is randomized with SAP.

MITIGATING THROUGH RANDOMIZATION

"a classifier that takes a 299 × 299 input, the defense first randomly rescales the image to a r × r image, with r ∈ [299, 331), and then randomly zero-pads the image so that the result is 331×331. The output is then fed to the classifier."

applying EOT, optimizing over the (in this case, discrete) distribution of transformations.

Vanishing & Exploding Gradients

PIXELDEFEND

"using a PixelCNN generative model to project a potential adver- sarial example back onto the data manifold before feeding it into a classifier."

approximating gradients with BPDA

DEFENSE-GAN

"using GAN project samples onto the manifold of the generator before classi- fying them"

adversarial examples exist on the manifold defined by the generator

for imperfect gradient descent based approach, construct a second attack using BPDA to evade Defense-GAN, although at only a 45% success rate.

conclusion:

avoid relying on obfuscated gradients (and other methods that only prevent gradient descent-based attacks) for per- ceived robustness

Strengths:

1. three attacks to defeat defense to fail gradient-based attacks

2. 7 defenses were defeat respectively and convincingly

3. advices for designing defense

Detailed comments, possible improvements, or related ideas:

1. evaluate other existing defense to figure out some phenomenon?

2. analysing these three attack's power to why gradient information is leaked?

Sadvine

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【论文回顾】Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Exa

Paper notes:Obfuscated Gradients include three unprotected gradients:Shattered Gradients,Stochastic Gradients,Exploding & Vanishing Gradients5 behaivor to show defense with these wrong gr...
复制链接

扫一扫