Understanding Domain Adaptation

Note — I assume the reader has some basic knowledge of neural network and its working.

Domain adaptation is a field of computer vision, where our goal is to train a neural network on a source dataset and secure a good accuracy on the target dataset which is significantly different from the source dataset. To get a better understanding of domain adaptation and it’s application let us first have a look at some of its use cases.

  • We have a lot of standard datasets for different purposes, like GTSRB for German traffic sign recognition, LISA and LARA dataset for traffic light detection, COCO for object detection, and segmentation etc. However, if you want a neural network to work nicely for your task e.g. traffic sign recognition on Indian Roads, then you will have to first collect all types of images of Indian Roads, and then do the labelling for those images, which is a laborious and time taking task. Here we can use domain adaptation, as we can train the model on GTSRB (source dataset) and test it on our Indian traffic sign images (target dataset).
  • There are many cases where it is difficult to gather datasets, which have all the required variations and diversity to train a robust neural network. In this case, with the help of different computer vision algorithms, we can generate large synthetic datasets that have all the variations we require. Then train the neural network on the synthetic dataset (source dataset) and test it on real-life data(target dataset).

To get a better understanding I am assuming that we do not have annotation available for the target dataset/domain, however it is not the only case. Please continue reading for further explanation.

So in domain adaptation, our goal is to train a neural network on one dataset (source) for which label or annotation is available and secure good performance on another dataset (target) whose label or annotation is not available.

Classification pipeline

Let us now see how do we achieve our goal. Consider the above case of image classification. To adapt from one domain to another, we would want our classifier to work well on the features extracted from the source as well as the target dataset. Since we have trained the neural network on the source dataset, the classifier must perform well for the source dataset. However, to make classifiers perform well on the target dataset, we would want the features extracted from the source and target dataset to be similar. So while training we enforce feature extractor to extract similar features for source and target domain images.

Successful Domain Adaptation

Type of Domain adaptation based on Target domain

Depending upon the type of data available from the target domain, domain adaptation can be classified into the following-:

  • Supervised — You have labelled data from the target domain and the target dataset size is much smaller as compared to source dataset.
  • Semi-Supervised — You have both labelled as well as unlabelled data of target domain.
  • Unsupervised — You have a lot of unlabelled sample points of target domain.

Techniques for Domain Adaptation

Majorly three techniques are used for realising any domain adaptation algorithm. Following are the three techniques for domain adaptation-:

  • Divergence based Domain Adaptation
  • Adversarial based Domain Adaptation
  • Reconstruction based Domain Adaptation

Let us now look at each technique part by part.

Divergence based Domain Adaptation

Divergence based domain adaptation works on the principle of minimizing some divergence based criterion between source and target distribution, hence leading to domain invariant-features. Common divergence based criterion used is Contrastive Domain Descripancy, Correlation Alignment, Maximum Mean Discrepancy (MMD), Wasserstein etc. To get a better understanding of this algorithm, let’s first look at some divergences.

In Maximum Mean Discrepancy (MMD), we try to find whether the given two samples belong to the same distribution or not. We define the distance between two distribution as distances between mean embedding of features. If we have two distribution say P and Q over set X. Then MMD is defined by a feature map 𝜑XH, where H is called a reproducing Kernel Hilbert Space. The formula of MMD is described below

For a better insight of MMD please look at the following description — Two distributions are similar if their moments are similar. By applying a kernel, I can transform the variable such that all moments (first, second, third etc.) are computed. In the latent space, I can compute the difference between the moments and average it.

In Correlation Alignment, we try to align correlation (second-order statistics) between the source and target domain as compared to mean using linear transformation in MMD.

During Training

During Inference

The architecture above assumes that the source and target domain have the same classes. In the above architecture, during training, we are minimizing two losses, classification loss and divergence based loss. Classification loss makes sure to get good classification performance by updating the weights of both feature extractor and classifier. Whereas divergence based loss assures that features of source and target domain are similar by updating weights of feature extractor. During the inference time, we just have to pass the target domain image from our neural network.

All the divergences are usually non-parametric and handcrafted mathematical formula which are not specific to the dataset or our problem be it classification, object detection, segmentation etc. Hence this formula-based approach is not very personalised to our problem. However, if the divergence can be learned based on dataset or problem then it is expected to perform better as compared to conventional pre-defined divergences.

Adversarial based Domain Adaptation

For realising Adversarial based domain adaptation we use GANs. In case if you don’t have much idea about GANs, please have a look here.

Here our generator is simply the feature extractor and we add new discriminator networks which learn to distinguish between source and target domain features. As it is a two-player game, discriminator helps generator to produce features that are indistinguishable for source and target domain. Since we have a learnable discriminator network we learn to extract features (specific to our problem and dataset) which could help to distinguish between source and target domain, and hence helping generator to produce more robust features i.e. feature which cannot be distinguished very easily.

During Training — For Source

During Training — For Target

Assuming the problem of classification, we are using two losses, classification loss and discriminator Loss. The purpose of classification loss is explained earlier. Discriminator loss helps discriminator correctly classify between source and target domain features. Here we use gradient reverse layer (GRL) to realise adversarial training. GRL block is a simple block that multiplies the gradient with -1 or a negative value while back-propagating. During training, to update the generator we have gradients coming from two directions, first from the classifier and second from the discriminator. The gradients from discriminator gets multiplied by a negative value because of GRL, thus leading an effect of training generator in the opposite direction as the discriminator. For example, if computed gradient to optimise loss function for discriminator is 2 then we update the generator with -2 (assuming negative value to be -1). By this, we are trying to train generator in such a way that even the discriminator is not able to distinguish between the features of source and target domain. GRL layer is very commonly used in many literatures on domain adaptation.

Reconstruction based Domain Adaptation

This works on the idea of Image-to-Image translation. One of the simple approaches could be to learn the translation from the target domain images to source domain image and train a classifier on the source domain. There can multiple approaches which we can be introduced using this idea. The simplest model for Image-to-Image translation could be an encoder-decoder based network and use a discriminator to enforce encoder-decoder network to produce images that are similar to the source domain.

During Training

During Testing

Another way is we can use CycleGANs. In Cycle GAN we use two encoder-decoder based neural network. One is used to transform target to source domain and another is used to transform source to target domain. We simultaneously train two GANs that generate the images in the two domains (both source and target). To ensure consistency, a cycle consistency loss is introduced. This makes sure that the transformation from one domain to another and back again results in roughly the same image as the input. Thus, the full loss of the two paired networks is a sum of the GAN losses of both discriminators and the cycle consistency loss.

Conclusion

We have seen three different techniques that could help us to realise or implement different domain adaptation approaches. It has its great applications in different tasks such as image classification, object detection, segmentation etc. In some aspects, we can say that this approach is similar to how humans learn to visually recognise different things. I hope this blog gives you a basic idea of how we think of different domain adaptation pipelines.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

张博208

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值