Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks

在这里插入图片描述

  • Attribute variance heat maps of the 312 attributes in CUB birds and the 102 attributes in SUN scenes
  • the t-SNE [35] visualizations of the test images
    represented by all attributes (left) and only the high-variance ones (right)
  • SP-AEN method

摘要

  • 我们提出一个新颖的架构,称为: SP-AEN method,应用于零样本视觉识别(ZSL)。
  • 在整个训练过程中,测试图像和它们的类别都是不可见类别
  • 该方法的目的是处理固有的问题,semantic loss in the prevailing family of embedding-based ZSL。在这里,一些语义在训练阶段可以被丢弃,如果在训练类别中都是不可识别的,这对识别测试类别是关键性的,
  • SP-AEN 可以解决语义损失,方法:引入一个独立的视觉到语义空间的嵌入器,这里可以将语义空间引入到两个相互矛盾的子空间中。:classfication and reconstructive
  • 虽然,两个子空间的对抗学习,SP-AEN可以将语义从重建子空间迁移到判别式的一个,完成提升不可见类别的零样本识别的改善。
    与之前的工作相比较,SP-AEN不仅能够提升分类效果,而且能够产生photo-realistic images,证明保存语义的有效性,四个流行的基础

bechmarks

: CUB, AWA, SUN and aPY

Introduction

在这里插入图片描述

Class embedings :极端情况

  • all the class embeddings are one-hot label vectors
    • 退化了传统的监督分类,因此,没有语义可以被迁移。
解决方法
  • preserve semantics by reconstruction
    • 一个图片的语义嵌入向量有能力映射后面的图片(the image back) 期望任何两个语义嵌入都可以保存丰富的语义信息并分开。否则reconstruction将会失败,然而,重建和分类对两个相互冲突的目标是必要的,
    • reconstruction:尽可能的保存更多的图像细节.
    • classification: 压缩不相关的内容。
      E : V → S E: V \rightarrow S E:VS
      G : S → V G: S \rightarrow V G:SV

在这里插入图片描述
为了解决这些冲突,我们提出了一个新颖的视觉语义嵌入框架:

  • Semantics-Preserving Adversarial Embedding Network (SP-AEN).

  • 引入一个新的映射: F : V → S F: V \rightarrow S F:VS

  • an adversarial objective: 鉴别器 D D D e n c o d e r F encoder F encoderF
    尝试使 F ( x ) F(x) F(x) E ( x ) E(x) E(x)无法区分。

  • 引入 F F F D D D的两个益出是帮助 E E E保存语义信息。

  • Semantic Transfer: 即使这个语义损失对 E E E是不可避免的,

    • we can avoid it using F F F by borrowing ingredients, from E ( X ) E(X) E(X) of other classes.
    • 鉴别器 D D D:最终迁移语义信息 F ( x ) F(x) F(x) to E ( x ) E(x) E(x) 通过将两个语义嵌入空间裁剪为相同的分布。
  • Disentangled Classification and Reconstruction.

    • 在这里插入图片描述

    相关工作

    Zero-Shot Learning

    • 零样本的主流是基于属性的视觉识别,
      (the attribute-based visula recognition)
    • the attribute serve as an imtermediate feature space
    • scale up ZSL(纵向扩展ZSL)
      • embedding based methods
      • learn a mapping from the image visual space to a semantic space, represented by semantic
        vectors
  • SP-AEN is an embedding based ZSL

    • the ranking based classification loss
    • reconstruct images from the semantic embeddings
    • no image is exposed to test classes at training in ZSL.

    Domain Shift and Hubness

    • the semantic loss
    • Domain shift
      • 训练和测试的数据是不同的分布
      • Hubness [37] states
      • Another way of countering semantic loss is to learn independent attribute classifiers

    Generative Adversarial Network (GAN)

    • to train a generator that can fool a discriminator to confuse the distributions of the generated and true samples
    • this max-min training procedure
    • data augmentation of unseen classes
    • feature-level 特征层次

    Image Generation

    • pixel-level loss
    • feature-level reconstruction loss
    • perceptual similarity
    • adversarial loss
    • image-to-image transformation
    • a bottleneck layer 瓶颈层

Formulation

Preliminaries 预赛

  • Given a set of training set:
    { x i , l i x_i,l_i xi,li}
  • x i ∈ V x_i \in V xiV :an image represented in the visual space,
  • l i ∈ L s l_i ∈ L_s liLs: is a class label in the seen class set
  • the unseen class set L u . L_u. Lu.
  • the embedding-based framework
  • a Visual-to-semantic mapping
    E : V → S E:V \rightarrow S E:VS
  • simple nearest neighbor search
    • any class label l l l is embedded as y l ∈ R d y_l \in R^d ylRd in the semantic space S S S
    • 预测标签可以通过以下得到 l ∗ l^{*} l
      • simple nearest neighbor search
        • l ∗ = max ⁡ l ∈ L y l T E ( x ) l^{*} = \max_{l \in L}y^T_lE(x) l=lLmaxylTE(x) (1)
        • l ∈ l u l \in l_u llu :the conventional ZSL setting;
        • l ∈ l s ⋃ l u l \in l_s \bigcup l_u llslu:the generalized ZSL setting

Classification Objective

  • As label prediction in Eq. (1) 是一个基本的ranking problem。
  • a large-margin based ranking loss function for classification objective
  • a higher dot-product similarity
    • y l y_l yl and E ( x ) E(x) E(x)
    • a lower one for any wrongly labeled pair
      ( x , l ˊ x, \acute{l} x,lˊ)
    • the similarity margin between the correct one
      and the wrong one
      should be larger than a threshold

在这里插入图片描述

  • γ > 0 \gamma > 0 γ>0: a hyperparameter for the margin
  • At each iteration in stochastic training
  • the unpaired labels 未配对标签
  • two additional objectives introduced next.
  • the semantic embedding E ( x ) E(x) E(x)
    在这里插入图片描述
  • corresponding kernel size c c c
  • number of fully-connected layer dimension f c fc fc
  • stride s s s of each convolutional layer.
  • Same color indicates the same layer type.

Reconstruction Objective

  • learn a semantic-to-visual mapping:
    G : S → V G: S \rightarrow V G:SV

  • reconstructs a semantic embedding:
    s ∈ S s \in S sS

  • back to image such that
    ∣ ∣ G ( s ) − x ∣ ∣ ||G(s) - x|| ∣∣G(s)x∣∣

  • Recall that the reconstruction in the autoencoder fashion
    s = E ( x ) s = E(x) s=E(x)

  • introduce an independent visual-to-semantic mapping
    F F F for embedding reconstructive s = F ( x ) s = F(x) s=F(x)

  • the visual space V V V is a feature space from the output of a higher-layer in deep CNN

  • use the raw 256 × 256 × 3 RGB color space for image reconstruction

  • By minimizing a reconstruction objective F ( x ) F(x) F(x)
    在这里插入图片描述
    在这里插入图片描述

Adversarial objective

  • the disentangled semantic embeddings
    E ( x ) E(x) E(x) and F ( x ) F(x) F(x)

在这里插入图片描述

Full Objective

在这里插入图片描述
最终的目标是解决:
在这里插入图片描述

  • considering F F F as the encoder
  • G G G as the decoder
  • the semantic embeddding F ( x ) F(x) F(x) can be considered as the bottleneck layer
  • regularized to match a supervised distribution E ( x ) E(x) E(x)
  • S P − A E N SP-AEN SPAEN is a supervised Adversarial Autoencoder
  • another adversarial objective for F ( x ) F(x) F(x) to match a prior embedding space

Implementation

Architecture

*** an end-to-end network** with the input of raw images and ground truth class embeddings.

  • . The embedder E E E : ResNet-101
  • F F F is based on AlexNet appended with two
    more fully-connected blocks
  • a d-dimensional embedding vector
  • the subsequent reconstruction network G G G
    *** five up-convolutional blocks** with leaky ReLU [20]
    for transforming a vector into a 3-D feature map,
  • D D D is a two-layer fully-connected layer plus a non-linear ReLU layer that takes the d-dimensional embedding vector as input.

Training Details

  • per-pixel mean subtraction
  • MSRA random initializer
  • grid search
  • the pretrained generator

Datasets

  • CUB
  • SUN
  • AWA
  • aPY

Settings and Evaluation Metrics

U → U U \rightarrow U UU
S → T S \rightarrow T ST
U → T U \rightarrow T UT
在这里插入图片描述
在这里插入图片描述

. Comparisons with State-of-The-Arts

Comparing Methods

  • embedding based
    • DeViSE、ALE、SJE、ESZSL、LATEM
    • CMT/CMT、SAE、SAE
  • attribute based:
    • DAP
    • IAP
    • SSE
    • CONSE
    • SYNC

Ablation Studies

Conflict between Classification & Reconstruction

  • DirectMap
  • SAE
  • SplitBranch
    在这里插入图片描述

Effectiveness of D and G

在这里插入图片描述

  • the Seen-Unseen accuracy Curve (SUC)
  • The Area Under Seen-Unseen Accuracy Curve (AUSUC)
    在这里插入图片描述
    在这里插入图片描述

总结,

  • 有时间,花费时间将该论文的网络架构好好研究一波,慢慢的将其全部搞定都行啦的理由与打算。慢慢的将这个网络架构全部都搞定都行啦的理由与打算。将网络架构,全部都整理好都行啦的理由

技术操作

  • harmonic mean values
  • a simple nearest neighbor search
  • Semantic Transfer
  • a filexible plug-and-play
  • end-to-end fine-tune fashion
  • this max-min training procedure

关键词

  • SP-AEN method
  • photo-realistic reconstruction.
  • the high-variance
  • the low-variance
  • non-discriminative 不可识别
  • adversarial learning
  • the semantic discrepancy 语义差异
  • a lossy semantic space 有损的语义空间
  • the class embedding:丰富了语义信息
  • a flexible plug-and-play
  • end-to-end fine-tune fashion
  • trade-off parameters 权衡参数
  • ZSL
  • few-shot learning
  • domain adaptaion
  • data augmentation 数据增强
  • mode collapse problem: 模式坍塌问题
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

big_matster

您的鼓励,是给予我最大的动力!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值