Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation

1 Author

Swami Sankaranarayanan 1*, Yogesh Balaji 1*, Arpit Jain 2, Ser Nam Lim 2,3, Rama Chellappa1
1 UMIACS, University of Maryland, College Park, MD
2 GE Global Research, Niskayuna, NY
3 Avitas Systems, GE Venture, Boston MA.
∗First two authors contributed equally

2 Abstract

Contrary to previous approaches that use a simple adversarial objective or superpixel information to aid the process, we propose an approach based on Generative Adversarial Networks (GANs) that brings the embeddings closer in the learned feature space.
在这里插入图片描述

3 Introduction

The focus of this paper is in developing domain adaptation algorithms for semantic segmentation. Specifically, we focus on the hard case of the problem where no labels from the target domain are available. This class of techniques is commonly referred to as Unsupervised Domain Adaptation.

Traditional approaches for domain adaptation involve minimizing some measure of distance between the source and the target distributions.Two commonly used measures are Maximum Mean Discrepancy (MMD), and learning the distance metric using DCNNs as done in Adversarial approaches

The main contribution of this work is that we propose a technique that employs generative models to align the source and target distributions in the feature space.

4 Method

  1. We provide an input-output description of different network blocks in our pipeline.
  2. We describe separately the treatment of source and target data, followed by a description of the different loss functions and the corresponding update steps.
  3. We motivate the design choises involved in the discriminator (D) architecture.

4.1 Description of network blocks

(a) The base network, whose architecture is similar to a pre-trained model such as VGG-16, is split into two parts: the embedding denoted by F and the pixel-wise classifier denoted by C. The output of C is a label map up-sampled to the same size as the input of F .
(b) The generator network (G) takes as input the learned embedding and reconstructs the RGB image.
(c) The discriminator network (D) performs two different tasks given an input: (a) It classifies the input as real or fake in a domain consistent manner (b) It performs a pixelwise labeling task similar to the C network. Note that (b) is active only for source data since target data does not have any labels during training.

4.2 Treatment of source and target data

As shown in Figure 3, D performs two tasks: (1) Distinguishing the real source input and generated source image as source-real/source-fake (2) producing a pixel-wise label map of the generated source image.

Given a target input X t {X}^{t} Xt, the generator network G G G takes the target embedding from F F F as input and reconstructs the target image. Similar to the previous case, D D D is trained to distinguish between real target data (target-real) and the generated target images from G G G (target-fake). However, different from the previous case, D D D performs only a single task i.e. it classifies the target input as target-real/target-fake. Since the target data does not have any labels during training, the classifier network C C C is not active when the system is presented with target inputs.
在这里插入图片描述

4.3 Iterative optimization

The directions of flow of information across different network blocks are listed in Figure 2.
在这里插入图片描述

The network blocks are updated iteratively in the following order:

  1. D-update
    Combination of within-domain adversarial loss L a d v , D s \mathcal{L}_{a d v, D}^{s} Ladv,Ds
    Auxiliary classification loss L a u x s \mathcal{L}_{a u x}^{s} Lauxs
    For target inputs L a d v , D t \mathcal{L}_{a d v, D}^{t} Ladv,Dt
    Overall: L D = L a d v , D s + L a d v , D t + L a u x s \mathcal{L}_{D}=\mathcal{L}_{a d v, D}^{s}+\mathcal{L}_{a d v, D}^{t}+\mathcal{L}_{a u x}^{s} LD=Ladv,Ds+Ladv,Dt+Lauxs
  2. G-update
    Combination of an adversarial loss L a d v , G s + L a d v , G t \mathcal{L}_{a d v, G}^{s}+\mathcal{L}_{a d v, G}^{t} Ladv,Gs+Ladv,Gt
    Reconstruction loss L r e c \mathcal{L}_{r e c} Lrec
    Overall: L G = L a d v , G s + L a d v , G t + L r e c s + L r e c t \mathcal{L}_{G}=\mathcal{L}_{a d v, G}^{s}+\mathcal{L}_{a d v, G}^{t}+\mathcal{L}_{r e c}^{s}+\mathcal{L}_{r e c}^{t} LG=Ladv,Gs+Ladv,Gt+Lrecs+Lrect
  3. F-update
    The parameters of F are updated using a combination of several loss terms
    L F = L s e g + α L a u x s + β ( L a d v , F s + L a d v , F t ) \mathcal{L}_{F}=\mathcal{L}_{s e g}+\alpha \mathcal{L}_{a u x}^{s}+\beta\left(\mathcal{L}_{a d v, F}^{s}+\mathcal{L}_{a d v, F}^{t}\right) LF=Lseg+αLauxs+β(Ladv,Fs+Ladv,Ft)
    在这里插入图片描述

4.4 Motivating design choice of D D D

  1. Recent works on image generation have utilized the idea of Patch discriminator in which the output is a two dimensional feature map where each pixel carries a real/fake probability.
    Output map indicates real/fake probabilities across source and target domains hence resulting in four classes per pixel:src-real, src-fake, tgt-real, tgt-fake.

  2. Inspired byAuxiliary Classifier GAN (ACGAN) , adding an auxiliary classification loss to D, they can realize a more stable GAN training and even generate large scale images.
    We extend their idea to the segmentation problem by employing an auxiliary pixel-wise labeling loss to the D D D network.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值