Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation

最新推荐文章于 2023-05-10 20:25:41 发布

摸鱼时刻

最新推荐文章于 2023-05-10 20:25:41 发布

阅读量692

点赞数

分类专栏：论文笔记

本文链接：https://blog.csdn.net/u014657795/article/details/102645754

版权

论文笔记专栏收录该内容

7 篇文章 1 订阅

订阅专栏

文章目录

1 Author
2 Abstract
3 Introduction
4 Method

1 Author

Swami Sankaranarayanan ^1*, Yogesh Balaji ^1*, Arpit Jain ², Ser Nam Lim ^2,3, Rama Chellappa¹
¹ UMIACS, University of Maryland, College Park, MD
² GE Global Research, Niskayuna, NY
³ Avitas Systems, GE Venture, Boston MA.
∗First two authors contributed equally

2 Abstract

Contrary to previous approaches that use a simple adversarial objective or superpixel information to aid the process, we propose an approach based on Generative Adversarial Networks (GANs) that brings the embeddings closer in the learned feature space.
在这里插入图片描述

3 Introduction

The focus of this paper is in developing domain adaptation algorithms for semantic segmentation. Specifically, we focus on the hard case of the problem where no labels from the target domain are available. This class of techniques is commonly referred to as Unsupervised Domain Adaptation.

Traditional approaches for domain adaptation involve minimizing some measure of distance between the source and the target distributions.Two commonly used measures are Maximum Mean Discrepancy (MMD), and learning the distance metric using DCNNs as done in Adversarial approaches

The main contribution of this work is that we propose a technique that employs generative models to align the source and target distributions in the feature space.

4 Method

We provide an input-output description of different network blocks in our pipeline.
We describe separately the treatment of source and target data, followed by a description of the different loss functions and the corresponding update steps.
We motivate the design choises involved in the discriminator (D) architecture.

4.1 Description of network blocks

(a) The base network, whose architecture is similar to a pre-trained model such as VGG-16, is split into two parts: the embedding denoted by F and the pixel-wise classifier denoted by C. The output of C is a label map up-sampled to the same size as the input of F .
(b) The generator network (G) takes as input the learned embedding and reconstructs the RGB image.
(c) The discriminator network (D) performs two different tasks given an input: (a) It classifies the input as real or fake in a domain consistent manner (b) It performs a pixelwise labeling task similar to the C network. Note that (b) is active only for source data since target data does not have any labels during training.

4.2 Treatment of source and target data

As shown in Figure 3, D performs two tasks: (1) Distinguishing the real source input and generated source image as source-real/source-fake (2) producing a pixel-wise label map of the generated source image.

Given a target input ${X}^{t}$ , the generator network $G$ takes the target embedding from $F$ as input and reconstructs the target image. Similar to the previous case, $D$ is trained to distinguish between real target data (target-real) and the generated target images from $G$ (target-fake). However, different from the previous case, $D$ performs only a single task i.e. it classifies the target input as target-real/target-fake. Since the target data does not have any labels during training, the classifier network $C$ is not active when the system is presented with target inputs.
在这里插入图片描述

4.3 Iterative optimization

The directions of flow of information across different network blocks are listed in Figure 2.
在这里插入图片描述

The network blocks are updated iteratively in the following order:

D-update
Combination of within-domain adversarial loss $\mathcal{L}_{a d v, D}^{s}$
Auxiliary classification loss $\mathcal{L}_{a u x}^{s}$
For target inputs $\mathcal{L}_{a d v, D}^{t}$
Overall: $\mathcal{L}_{D}=\mathcal{L}_{a d v, D}^{s}+\mathcal{L}_{a d v, D}^{t}+\mathcal{L}_{a u x}^{s}$
G-update
Combination of an adversarial loss $\mathcal{L}_{a d v, G}^{s}+\mathcal{L}_{a d v, G}^{t}$
Reconstruction loss $\mathcal{L}_{r e c}$
Overall: $\mathcal{L}_{G}=\mathcal{L}_{a d v, G}^{s}+\mathcal{L}_{a d v, G}^{t}+\mathcal{L}_{r e c}^{s}+\mathcal{L}_{r e c}^{t}$
F-update
The parameters of F are updated using a combination of several loss terms
$\mathcal{L}_{F}=\mathcal{L}_{s e g}+\alpha \mathcal{L}_{a u x}^{s}+\beta\left(\mathcal{L}_{a d v, F}^{s}+\mathcal{L}_{a d v, F}^{t}\right)$

4.4 Motivating design choice of $D$

Recent works on image generation have utilized the idea of Patch discriminator in which the output is a two dimensional feature map where each pixel carries a real/fake probability.
Output map indicates real/fake probabilities across source and target domains hence resulting in four classes per pixel:src-real, src-fake, tgt-real, tgt-fake.
Inspired byAuxiliary Classifier GAN (ACGAN) , adding an auxiliary classification loss to D, they can realize a more stable GAN training and even generate large scale images.
We extend their idea to the segmentation problem by employing an auxiliary pixel-wise labeling loss to the $D$ network.

摸鱼时刻

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation

文章目录AuthorAbstractIntroductionMethodDescription of network blocksTreatment of source and target dataIterative optimizationMotivating design choice of DDDAuthorSwami Sankaranarayanan 1*, Yogesh Balaj...
复制链接

扫一扫