阅读笔记:Fwakes

Abstract

Fakwes, a system that helps individuals inoculate their images against unauthorized facial recognition models.

Method:

  • adding imperceptible pixel-level changes (we call them “cloaks”).

Results:

  • Fawkes provides 95+% protection against user recognition regardless of how trackers train their models.
  • Even when clean, uncloaked images are “leaked” to the tracker and used for training, Fawkes can still maintain an 80+% protection success rate.
  • 100% success in experiments against today’s state-of-the-art facial recognition services.

1 Introduction

  • Facial recognition systems are scanning millions of citizens without explicit consent.

  • Anyone can build highly accuracy facial recognition models of us without our knowledge or awareness. (MegaFace)

Protecting people from being identified by unauthorized facial recognition models: distorting, adversarial patches, clean-label poison attacks.

Fawkes adds imperceptible pixel-level changes to inoculate images against unauthorized facial recognition models. If collected and used to train a facial recognition model to recognize the user, these “cloaked” images would produce functional models that consistently misidentify them.

Fawkes takes the user’s photos and computes minimal perturbations that shift them significantly in the feature space of facial recognition model. Any facial recognition model trained using these images of the user learns an altered set of “feature”.

  • producing significant alternations to images’ feature space representations using perturbations imperceptible to the naked eye.
  • providing 95+% protection.
  • 100% success against state-of-the-art facial recognition services.
  • 80+% success when half of training images are uncloaked.
  • robust to a variety of mechanism for both cloak disruption and cloak detection.

2 Background and Related Work

extended from poisoning attacking in machine learning.

2.1 Protecting Privacy via Evasion Attacks

Aims: making images difficult for a facial recognition model to recognize.

(1). creating adversarial examples, inputs to the model designed to cause misclassification.

  • specially printed glasses

  • adversarial stickers on hat

  • adversarial patched

    Limitations: require the user to wear fairly obvious and conspicuous accessories; require full and unrestricted access (white box access) to precise model tracking them, which are easily to broken by updating model.

(2). editing facial images, human-like characteristics are preserved but facial recognition model accuracy is significantly reduced.

  • k-means

  • facial inpainting

  • GAN-based face editing

    Limitations: alter the user’s face in the photos.

2.2 Protecting Privacy via Poisoning Attacks

Aims: disrupting training — Data poisoning attacks, modifying the initial data used to train model.

(1) Clean Label Attacks, injects “correctly” labeled poison images into training data, causing a model trained on this data to misclassify a specific image of interest. ( x , y ) → ( x , y ′ ) (x,y)\to(x,y') (x,y)(x,y)

  • only cause misclassification on a single, preselected image
  • does not transfer well to different models
  • easily detectable

(2) Model Corruption Attacks, modifying images such that they degrade the accuracy of a model trained on them.

2.3 Other Related Work

Transfer learning uses existing pretrained models as a basis for quickly training model for customized task, using less training data. Φ → F θ \Phi \to \mathbb{F}_{\theta} ΦFθ. Typically, a model F θ \mathbb{F}_{\theta} Fθ can be created by appending a few additional layers to Φ \Phi Φ and only training those new layers.

3 Protecting Privacy via Cloaking

Facial recognition models trained on cloaked images will have a distorted view of the user in the “feature space”, i.e. the model’s internal understanding of what makes the user unique.
在这里插入图片描述

3.1 Assumptions and Threat Model

Design goals:

  • cloaks should be imperceptible and not impact normal use of the image
  • when classifying normal, uncloaked images, models trained on cloaked images should recognize the underlying person with low accuracy.

3.2 Overview and Intuition

DNN models are trained to identify and extract (often hidden) features in input data and use them to perform classification. Their ability to identify features is easily disrupted by disruption of input data.

By simply modifying their online photos in small and imperceptible ways, the user successfully prevents unauthorized trackers and their DNN models from recognizing their true face.

3.3 Computing Cloak Perturbations

The goal is making the learned features from cloaked photos highly dissimilar from those learned from original (uncloaked) photos.

Notation:

  • x x x: Alice’s uncloaked images
  • x T x_T xT: target image (image from another class user T T T) used to generate cloak for Alice x x x
  • δ ( x , x T ) \delta(x,x_T) δ(x,xT): cloak computed for Alice’s image x x x based on image x T x_T xT from label T T T
  • x ⊕ δ ( x , x T ) x\oplus \delta(x,x_T) xδ(x,xT): cloaked version of Alice’s image x x x
  • Φ \Phi Φ feature extractor used by facial recognition model
  • Φ ( x ) \Phi(x) Φ(x): feature vector (or feature representation) extracted from an input x x x

Cloaking to maximizing feature deviation:

Ideal cloaking design modifies x x x by adding cloak perturbation δ ( x , x T ) \delta(x,x_T) δ(x,xT) to x x x that maximizes changes in x x x's feature representation:
max ⁡ δ D i s t ( Φ ( x ) , Φ ( x ⊕ δ ( x , x T ) ) ) , s u b j e c t   t o   ∣ δ ( x , x T ) ∣ < ρ \max_{\delta}Dist(\Phi(x),\Phi(x\oplus\delta(x,x_T))),\\ subject\ to\ |\delta(x,x_T)|<\rho δmaxDist(Φ(x),Φ(xδ(x,xT))),subject to δ(x,xT)<ρ
where D i s t ( ⋅ ) Dist(\cdot) Dist() computes the distance of two feature vectors, ∣ δ ∣ |\delta| δ measure the perceptual perturbation caused by cloaking, and ρ \rho ρ is the perceptual perturbation budget.

Image-specific Cloaking:

When creating cloaks for her photos, Alice will produce image-specific cloaks, i.e. δ ( x , x T ) \delta(x,x_T) δ(x,xT) is image dependent. Eq.(1) is replaced with the following optimization:
min ⁡ δ D i s t ( Φ ( x T ) , Φ ( x ⊕ δ ( x , x T ) ) ) s u b j e c t   t o   ∣ δ ( x m x T ) ∣ < ρ \min_{\delta}Dist(\Phi(x_T),\Phi(x\oplus \delta(x,x_T))) \\ subject\ to\ |\delta(xmx_T)|<\rho δminDist(Φ(xT),Φ(xδ(x,xT)))subject to δ(xmxT)<ρ
Here we search for the cloak for x x x that shifts its feature representation closely towards x T x_T xT. This new form of optimization also prevents the system from generating extreme values.

Finally, our image-specific cloak optimization will create different cloak patterns among Alice’s images. This “diversity” makes it hard to detect and remove cloaks.

3.4 Cloaking Effectiveness & Transferability

Now Alice can produce cloaked images whose feature representation is dissimilar from her own but similar to that of a target user T T T.

(1). Effectiveness:

  • whether this can translate into the desired misclassification behavior in the tracker model
  • whether cloaking still lead to misclassification no matter of T existing in tracker model
    在这里插入图片描述

Our hypothesis is that as long as the feature representations of Alice’s cloaked and uncloaked images are sufficiently different, the tracker’s model will not classify them as the same class. This is because there will be another user class in the tracker model, whose feature representation is more similar to Φ ( x ) \Phi(x) Φ(x) than Φ ( x ⊕ δ ) \Phi(x\oplus\delta) Φ(xδ). This is reasonable assumption when the tracker’s model targets many users rather than a fewer users.

(2). Transferability:

Above discussion assumes that the user has the same feature extractor Φ \Phi Φ as is used to train the tracker model.

Transferability: the property that the models trained for similar tasks share similar properties and vulnerabilities, even they were trained on different architectures and different train data.

The transferability property suggests that cloaking should still be effective because the user’s and the tracker’s feature extractor are designed for similar tasks.

4 The Fawkes Image Cloaking System

Input image set X U \boldsymbol{X_U} XU, the feature extractor Φ \Phi Φ, and the cloak perturbation budget ρ \rho ρ.

  1. Choosing a Target Class T.

    Randomly picking K K K candidate target classes and their images from publicly available dataset. Using the feature extractor Φ \Phi Φ to calculate centroid of the feature space, C k \mathcal{C}_k Ck. Fawkes picks as the target class T T T the class in the K K K candidate set whose feature representation centroid is most dissimilar from the feature representations of all images in X U \boldsymbol{X_U} XU.
    T = arg ⁡ max ⁡ k = 1 , ⋯   , K min ⁡ x ∈ X U D i s t ( Φ ( x ) , C k ) T=\mathop{\arg\max}_{k=1,\cdots,K}\min_{x\in{\boldsymbol{X_U}}}Dist(\Phi(x),\mathcal{C}_k) T=argmaxk=1,,KxXUminDist(Φ(x),Ck)
    where D i s t ( ⋅ ) Dist(\cdot) Dist() is L2 distance.

  2. Computing Per-image Cloaks.

    Target images set X T \boldsymbol{X_T} XT, For each x ∈ X U x\in\boldsymbol{X_U} xXU, Fawkes randomly picks an image x T ∈ X T x_T\in\boldsymbol{X_T} xTXT following eq.(2). In our implementation, ∣ δ ( x , x T ) ∣ |\delta(x,x_T)| δ(x,xT) is calculated using the DSSIM (Structural Dis-Similarity Index). DSSIM is a measure of user-perceived image distortion.

    Applying the penalty method to reformat and solve the optimization in eq.(2):
    min ⁡ δ D i s t ( Φ ( x T ) , Φ ( x ⊕ δ ( x , x T ) ) ) + λ ⋅ max ⁡ ( ∣ δ ( x , x T ) ∣ − ρ ,   0 ) \min_{\delta}Dist(\Phi(x_T),\Phi(x\oplus\delta(x,x_T)))+\lambda\cdot\max(|\delta(x,x_T)|-\rho,\ 0) δminDist(Φ(xT),Φ(xδ(x,xT)))+λmax(δ(x,xT)ρ, 0)
    where λ \lambda λ controls the impact of the input perturbation caused by cloaking. When λ → ∞ \lambda\to\infty λ, the cloaked image is visually identical to the original image.

5 System Evaluation

Efficacy could drop, but can be restored to near perfection by making the user’s feature extractor robust (via adversarial training) when feature extractor are different.

5.1 Experiment Setup

(1). Feature Extractors.

Training feature extractors using two large ( ≥ \ge 500K images) datasets on different model architectures.

  • VGGFace2: 3.14M images of 8,631 subjects.
  • WebFace: 500,000 images covering roughly 10,000 subjects.

在这里插入图片描述
Arthitectures:

  • DenseNet-121: 121 layers, 7M parameters.
  • InceptionResNet V2: 572 layer, 54M parameters.

(2). Tracker’s Training Datasets.

  1. training from scratch:

    • VGGFace2.
    • WebFace.
  2. applying transfer learning:

    • PubFig: 5,850 training images and 650 testing images of 65 public figures.
    • FaceScrub: 100,000 images of 530 public figures.

在这里插入图片描述

Transfer learning: the tracker adds a softmax layer at the end of the feature extractor, and fine-tunes the added layer using above dataset**.**

(3). Cloaking Configuration.

U U U in tracker’s model, e.g. PubFig. T T T from VGGFace2 and WebFace. Computing the cloak for x x x for each given U U U and T T T. Adam optimizer for 1000 iterations with a learning rate of 0.5.

(4). Evaluation Metrics.

  • protection success rate, the tracker model’s misclassification rate for clean (uncloaked) images of U U U.
  • normal accuracy, the overall classification accuracy of the tracker’s model on users besides U U U.

5.2 User/Tracker Sharing a Feature Extractor

U U U from PubFig or FaceScrub. Computing “cloaks” for a subset of U U U's images using each of the four feature extractors in Table 1. Performing transfer learning on the same feature extractor (with cloaked images of U U U). Finally, evaluating whether the tracker model can correctly identify other clean images of U U U it has not seen before.

在这里插入图片描述

Cloaking offers perfect protection. Much higher DSSIM value (up to 0.2) are imperceptible to human eye. Finally, the average L 2 L2 L2 norm of our cloaks is 5.44.

在这里插入图片描述

Feature space representations of the cloaked images are well-aligned with those of the target images, validating the goal of a cloak is to change the image’s feature space representation in the tracker’s model.

在这里插入图片描述

A higher density improves cloaking effectiveness.

5.3 User/Tracker Using Different Feature Extractor

While the model transferability property suggests that there are significant similarities in their respective model feature spaces (since both are trained to recognize faces), their differences could still reduce the efficacy of cloaking.

cloaked images (optimized using VGG2-Dense), original images, target images (from PubFig, Web-Incept)

在这里插入图片描述

The reduction in cloak effectiveness is obvious. In the tracker’s feature extractor, the cloak “moves” the original image features only slightly towards the target image features.

Linking model robustness and transferability:

An input perturbation’s ability to transfer between models depends on the “robustness” of the feature extractor used to create it. Perturbations generated on more robust models will take on “universal” characteristics that are able to effectively fool other models.

Improving cloak transferability by increasing the user feature extractor’s robustness by applying adversarial training. Training the model on perturbed data to make it less sensitive to similar small perturbations on input. Generating adversarial examples using PGD attack, and training each feature extractor for an additional 10 epochs.

在这里插入图片描述

Cloaks generated on robust extractors transfer better than cloaks computed on normal ones.

6 Image Cloaking in the Wild

在这里插入图片描述

  • Microsoft Azure Face API:

    using transfer learning to train a model user-submitted images.

  • Amazon Rekognition Face Verification:

    computing an image similarity score between the queried image and the ground truth images for all labels.

  • Face++ Face Search API:

    extremely robust against a variety of attacks.

7 Trackers with Uncloaked Images Access

7.1 Impact of Uncloaked Images

Training a model with both cloaked and uncloaked user images means the model will observe a much larger spread of features all designated as the user. Effects:

  1. classify both regions of features as the user
  2. classify both regions and the region between them as the user
  3. ignore these feature dimensions and identify the user using some alternative features that connect both uncloaked and cloaked versions of the user’s images.

在这里插入图片描述

Methods:

  • intentionally releasing more cloaked images
  • considering the use of a cooperating secondary identity

7.2 Sybil Accounts

The user modifies Sybil images so they occupy the same feature space as a user’s uncloaked images. These Sybil images help confuse a model trained on both Sybil images and uncloaked/cloaked images of a user, increasing the protection success rate.

在这里插入图片描述

Because the leaked uncloaked images and Sybil images are close by in their feature space representations, but labeled differently, the tracker model must create additional decision boundaries in the feature space.

x C x_C xC is an image from the set of candidates the user obtains (i.e. images generated by a GAN). We create a cloak δ ( x C , x ) \delta(x_C,x) δ(xC,x) that minimizes the feature space separation between x C x_C xC and the user’s original image x x x. Sybil image x s = x C ⊕ δ ( x C , x ) x_s=x_C\oplus\delta(x_C,x) xs=xCδ(xC,x).

7.2 Efficacy of Sybil Image

在这里插入图片描述

The use of Sybil account significantly improves the protection success rate when an attacker has a small number of original images.

8 Countermeasures

8.1 Cloak Disruption

  1. Image Transformation.

    aiming to mitigate the impact of small image perturbations. Transforming images in the train dataset before using them for model training.

    • image augmentation
    • blurring
    • adding noise

    Image transformation has less impact on cloak.

  2. Robust Model.

    Improving the robustness of tracker’s model will decrease the protection rate of cloak. But increasing the visibility of cloak perturbation can achieve a higher protection success rate. (DSSIM perturbation is > 0.01 >0.01 >0.01)

8.2 Cloak Detection

Existing poison attack detection assumes that poisoning only affects a small percentage of training images. Fawkes poisons an entire model class, rendering outlier detection useless by removing the correct baseline.

Obtaining both target and cloaked images: Empirically, the L2 feature space distance between the cloaked class and the target class centroid is 3 standard deviations smaller than the mean separation of other classes. Thus, user’s cloaked images can be detected. User can trivially overcome this detection by maintaining separation between cloaked and target images during cloaked optimization.

Obtaining original training images: Running a 2-means clustering on each class feature space, flagging classes with two distinct centroids as potentially cloaked. The distance between the two centroids of a protected user class is 3 standard deviations larger than the average centroid separation in normal classes. Thus, the tracker can use original images to detect the presence of cloaked images. The user can choose a target class that does not create such a large feature space separation.

[1]. Fawkes: Protecting Privacy against Unauthorized Deep Learning Models
[2]. Poisoning attacks on Machine Learning

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值