【论文回顾】Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Exa

Paper notes:

Obfuscated  Gradients include three unprotected gradients:

Shattered Gradients, Stochastic Gradients, Exploding & Vanishing Gradients

5 behaivor to show defense with these wrong gradient protection

1.One-step attacks perform better than iterative attacks

2.Black-box attacks are better than white-box attacks

3.Unbounded attacks do not reach 100% success

4.Random sampling finds adversarial examples

5.Increasing distortion bound does not increase success

 

for three gradient, the author proposed three attacks:

Backward Pass Differentiable Approximation (BPDA)

"f(·) = f1...j(·) be a neural network, fi(·) be a non-differentiable (or not usefully-differentiable) layer, 

To approximate ∇xf(x), we first find a differentiable approxi- mation g(x) such that g(x) ≈ fi(x).

we can approximate ∇xf(x) by performing the forward pass through f(·)

on the backward pass, replacing fi(x) with g(x)"

Expectation over Transformation (EOT)

"When attacking a classifier f(·) that first randomly trans- forms its input according to a function t(·) sampled from a distribution of transformations T , EOT optimizes the expec- tation over the transformation Et∼T f (t(x)). The optimiza- tion problem can be solved by gradient descent, noting that ∇Et∼T f (t(x)) = Et∼T ∇f (t(x)), differentiating through the classifier and transformation, and approximating the expectation with samples at each gradient descent step."

Reparameterization

"To resolve this, we make a change-of-variable x = h(z) for some function h(·) such that g(h(z)) = h(z) for all z, but h(·) is differentiable. For example, if g(·) projects samples to some manifold in a specific manner, we might construct h(z) to return points exclusively on the manifold. This allows us to compute gradients through f(h(z)) and thereby circumvent the defense."

 

then authors discussed several defense:

ADVERSARIAL TRAINING: validated but difficult on ImageNet and exclusively on l-infinite norm.(other metric only limited value)

CASCADE ADVERSARIAL TRAINING: validated defense

 

Shattered Gradients:

THERMOMETER ENCODING:

"Given an image x, for each pixel color xi,j,c, the l-level ther- mometer encoding τ(xi,j,c) is a l-dimensional vector where τ(xi,j,c)k = 1 if if xi,j,c > k/l, and 0 otherwise (e.g., for a 10-level thermometer encoding, τ (0.66) = 1111110000)."

INPUT TRANSFORMATIONS:

familar author Guo chuan proposed this.

"the authors suggest two new transformations: (a) ran- domly drop pixels and restore them by performing total variance minimization; and (b) image quilting: reconstruct images by replacing small patches with patches from “clean” images, using minimum graph cuts in overlapping boundary regions to remove edge artifacts."

important note: ensembles of defenses usually are not much stronger than the strongest sub-component (He et al., 2017 Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong)

LOCAL INTRINSIC DIMENSIONALITY (LID)

generating high-confidence adversarial examples minimizing l-infinit distortion on standard adversaral model reduce LID model accuracy to 2% success within ε = 0.015.

 

Stochastic Gradients

STOCHASTIC ACTIVATION PRUNING (SAP)

this idea is similar to my one thought: selected part of input into network instead of input an whole image, but they do it as dropout in the internal netowrks.

computing the expectation over instantiations of randomness

move in the direction of where each invocation is randomized with SAP.

MITIGATING THROUGH RANDOMIZATION

"a classifier that takes a 299 × 299 input, the defense first randomly rescales the image to a r × r image, with r ∈ [299, 331), and then randomly zero-pads the image so that the result is 331×331. The output is then fed to the classifier."

applying EOT, optimizing over the (in this case, discrete) distribution of transformations.

 

Vanishing & Exploding Gradients

PIXELDEFEND

"using a PixelCNN generative model to project a potential adver- sarial example back onto the data manifold before feeding it into a classifier."

approximating gradients with BPDA

DEFENSE-GAN

"using GAN project samples onto the manifold of the generator before classi- fying them"

adversarial examples exist on the manifold defined by the generator

for imperfect gradient descent based approach, construct a second attack using BPDA to evade Defense-GAN, although at only a 45% success rate.

 

conclusion:

avoid relying on obfuscated gradients (and other methods that only prevent gradient descent-based attacks) for per- ceived robustness

Strengths:

1. three attacks to defeat defense to fail gradient-based attacks

2. 7 defenses were defeat respectively and convincingly

3. advices for designing defense 

Detailed comments, possible improvements, or related ideas:

1. evaluate other existing defense to figure out some phenomenon?

2. analysing these three attack's power to why gradient information is leaked?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值