论文笔记Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision

该文提出了一种新的CNN架构,名为MonoDEVSNet,结合虚拟世界监督和真实世界的SfM自我监督进行训练,减少了监督数据和半监督数据在特征空间中的域差异。通过GradientReversalLayer(GRL)学习域不变的深度特征,以实现无监督的领域适应。方法包括使用真实世界交通数据和虚拟世界序列,以及依赖于SfM自我监督损失、像素级损失和领域适应损失的损失函数。
摘要由CSDN通过智能技术生成

Problem Formulation

  • MDE on CNN : Ψ ( θ ; x ) → d \Psi(\theta;x) \rightarrow d Ψ(θ;x)d
  • θ ∗ = min ⁡ θ L ( θ ; X r , X s . Y s ) \theta^* = \min_{\theta} \mathcal{L}(\theta;X^r,X^s.Y^s) θ=minθL(θ;Xr,Xs.Ys).

Contributions

  • A CNN architecture training on virtual-world supervision and real-world SfM self supervision.
  • Reduce domain discrepancied between supervised (virtual world) and semi-supervised (real world) data at the space of the extracted features (backbone bottleneck) by Gradient reveral layer(GRL).

Methods

  • Assume two sources of Data 1. Real-world traffic X r = { x t r } t = 1 N r X^r = \{x^r_t\}^{N^r}_{t=1} Xr={xtr}t=1Nr, N r N^r Nr is the num of real-world sequences. 2. Analogous sequences X s = { x t s } t = 1 N s X^s = \{x^s_t\}^{N^s}_{t=1} Xs={xts}t=1Ns, N r N_r Nr is the Num of frames from the Virtual-world .

MonoDEVSNet architecture: Ψ ( θ ; x ) \Psi(\theta;x) Ψ(θ;x)

  • Ψ ( θ ; x ) \Psi(\theta;x) Ψ(θ;x) has three blocks: Encoding block with θ e n c \theta^{enc} θenc, a multi-scale pyramidal block, θ p y r \theta^{pyr} θpyr and a decoding block with θ d e c \theta^{dec} θdec.
  • The role of the multi-scale pyramid block is to adapt the bottleneck of the chosen encoder to the decoder.
  • L \mathcal{L} L relies on three different losses, L s f ( θ , V s f ; X r ) , L s p ( θ , X s ; Y s ) , L D A ( θ e n c , V D A ; X r , X s ) \mathcal{L}^{sf}(\theta,\mathcal{V}^{sf};X^r),\mathcal{L}^{sp}(\theta,X^{s};Y^s),\mathcal{L}^{DA}(\theta^{enc},\mathcal{V}^{DA};X^r,X^s ) Lsf(θ,Vsf;Xr),Lsp(θ,Xs;Ys),LDA(θenc,VDA;Xr,Xs).
  • L s f ( θ , V s f ; X r ) \mathcal{L}^{sf}(\theta,\mathcal{V}^{sf};X^r) Lsf(θ,Vsf;Xr) Sfm self-supervised loss is almost like Mono2.
  • L s p ( θ , X s ; Y s ) \mathcal{L}^{sp}(\theta,X^{s};Y^s) Lsp(θ,Xs;Ys) will discard pixels with d t s ( p ) ≥ d m a x d^s_t(p) \geq d^{max} dts(p)dmax.
    在这里插入图片描述
  • Domain adaptation loss L D A ( θ e n c , V D A ; X r , X s ) \mathcal{L}^{DA}(\theta^{enc},\mathcal{V}^{DA};X^r,X^s ) LDA(θenc,VDA;Xr,Xs)
  • Aim at learning the Depth features, so hope the feature couldn’t be distinguished whether from real(target domain) or virtual wold(source domain).
  • In the Gradient-Reversal-Layer, the domain invariance of θ e n c \theta^{enc} θenc is measured by a binary target/source domain-classifier CNN,D,of weights { θ e n c , L D A } \{ \theta^{enc}, \mathcal{L}^{DA} \} {θenc,LDA}.

在这里插入图片描述

  • D ( θ e n c , V D A ; X t r ) D(\theta^{enc},\mathcal{V}^{DA};X^r_t) D(θenc,VDA;Xtr) outputs 1 if x ∈ X r x \in X^r xXr and 0 if x ∈ X s x \in X^s xXs. This means that during forward passes of training, it acts as an identity function, while, during back-propagation, it reverses the gradient vector passing through it. Both the GRL and V D A \mathcal{V}^{DA} VDA are required at training time, but not at testing time.

Unsupervised Domain Adaptation by Backpropagation[1]

在这里插入图片描述

  • At training time, in order to obtain domain-invariant features, we seek the parameters θ f θ_f θf of the feature mapping that maximize the loss of the domain classifier (by making the two feature distributions as similar as possible), while simultaneously seeking the parameters θ d θ_d θd of the domain classifier that minimize the loss of the domain classifier. In addition, we seek to minimize the loss of the label predictor.
    在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

  • Such reduction can be accomplished by introducing a special gradient reversal layer (GRL) defined as follows. The gradient reversal layer has no parameters associated with it (apart from the meta-parameter λ, which is not updated by backpropagation). During the forward propagation, GRL acts as an identity transform. During the backpropagation though, GRL takes the gradient from the subsequent level, multiplies it by −λ and passes it to the preceding layer.

Important Reference

[1] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Int. Conf. on Machine Learning (ICML), 2015.

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

BlueagleAI

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值