[Stereo_cnn][cvpr16]Improved Stereo Matching with Constant Highway Networks

(未结束)

Notion

Environment Installation
https://askubuntu.com/questions/806160/how-to-check-if-libpng-and-png-is-installed sudo apt-get install libpng++-dev

Contributions

1) Presenting a new highway network architecture[inner-outer residual net, multilevel resnet] for patch matching including **multilevel constant highway gating and scaling layers **that control the receptive filed of the network.
2) Train the cost computation network with hybrid loss, for better use the decision-description network which is used to compress input patches/ to extract features.
3)Compute disparity image using a global disparity compute net(CNNs) instead of WTA
4) Using reflective learning to mesure the correctness(confidence) of CNN output
5) This confidence score(incorporate in outlier) could help better outlier detection and correction in the refinement course
6) 1st on KITTI
7) improve fast mac-cnn’s velocity be under 5s/image
8) open source code

  • We studied MC-CNN and its modified versions
  • Knowing that ResNet, special case of Highway Network presents state of the art results in the most competitive CV task. Its success was attributed to the capability to train very deep network, but it’s not true for stereo matching.
    Since the ResNet could be treated as an ensemble of networks that share weights, while the network used in stereo match is simpler.
    But this is not true for stereo matching.
  • We also added scaling layer to control the receptive filed
  • Sect.6 proved that our model is better than those winner architecture on ImageNet, such as DenseNet
  • To estimate the confidence of stereo match, we proposed a CNN network; Our confidence indication is trained with reflective loss.
Stereo Match
  1. Rectification/Stereo calibration
  2. Cost computation
    image from paper
    Image from paper: Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning

We used Description Network with Highway Network architecture to calculate separately left and right feature map of inputs patches, which are the input for two pathways: the first concatenate and passes them to the fully-connected decision networks which is trained via the cross-entropy loss, and the second directly employs a Hinge Loss criterion to the dot product(cosine similarity) of the representations.【similar to triplet loss】
输入为来自左右图片的patch,我们对它们做以下处理。
1)分别通过相同结构的Description Networks计算feature map。
2)对两个feature map,我们通过两种方式进行similarity(loss)的 计算:

- A. 将两个向量串联然后通过若干个FC网络层+sigmoid进行Cross-Entropy Loss 的计算(the cross-entropy over the network's outputs)[是取一正一负还是多正多多负?]。
- B. 计算两个feature map(vector)的 dot product,然后加负号作为loss

loss=αXEnt(v+,v)+(1α)Hinge(s+,s) l o s s = α ∗ X E n t ( v + , v − ) + ( 1 − α ) ∗ H i n g e ( s + , s − )
loss=α(log(v)+log(1v+))+(1α)max(0,m+ss+) l o s s = α ∗ ( − l o g ( v − ) + l o g ( 1 − v + ) ) + ( 1 − α ) ∗ m a x ( 0 , m + s − − s + )
u u 是description net的输出,s=ulTur
v v 是网络中经过sigmoid的输出
对正负样本的采样我们采取以下办法:

由于我们已经知道了GroundTruth d, 那么我们就知道正确的匹配点的位置,则其他位置为非匹配点。假设<PnxnL(p),PnxnR(q)> 为左右图中的patches, PLnxn(p) P n x n L ( p ) p p 是做图中的像素点 p(x,y), d为该像素点对应的视差值。我们定义negtive exemple为一组分别来自左右图像的pathces:

p=(x,y),q=(xd+oneg,y),oneg[dataset_neg_low,dataset_neg_high] p = ( x , y ) , q = ( x − d + o n e g , y ) , o n e g ∈ [ d a t a s e t _ n e g _ l o w , d a t a s e t _ n e g _ h i g h ]

positive exemple:
p=(x,y),q=(xd+opos,y),opos[dataset_pos_low,dataset_pos_high] p = ( x , y ) , q = ( x − d + o p o s , y ) , o p o s ∈ [ d a t a s e t _ p o s _ l o w , d a t a s e t _ p o s _ h i g h ]

s+=(uTlur)+ s + = ( u l T u r ) + ,其中 ul,ur u l , u r 对应的 p,q p , q 满足positive exemple 的关系。
v+ v + 表示由正样本patch组合得到的sigmoid网络的输出。
???最后的loss怎么算的呢?training 的时候感觉不用管下面这个部分,只用
cross-entropy loss 训练就可以了。

实验中,我们取 α=0.8,m=0.2 α = 0.8 , m = 0.2
由此,网络的输出的到一个三维(max_disparity * hight*width)的Cost map: C(p, d) 其中p是像素,d为可能的视差-disparity。
3. Disparity computation
disparity_compute
https://imgbb.com/‘>private image sharing

在MC-CNN中,我们用WTA(winner take all)的方法,求解最后的disparity,也就是取loss最小的d作为最后的output : D(p)=argmindC(p,d) D ( p ) = a r g m i n d C ( p , d ) ,由此得到Disparity map。
在这里,我们用一个CNN网络(Global Disparity Network)获取视差图,输入cost map,经过4个卷积层和3个FC层,最后我们通过计算一个reflective loss 来训练网络,这种方式可以用来求出预测的confidence。
由上图可以看出,FC3一方面用来算一个Weighted Cross Entropy的loss来得到最后的d,另一方面和FC4,FC5相连,通过计算Binary Cross Entropy loss (reflective loss 因为它不仅和GtoudTruth有关,还和网络的变化有关?[这不废话,最简单的MSE都和这俩有关])得到confidence: argmaxiyi a r g m a x i y i 被用来和 yGT y G T 相比,如果差距在一个像素以内,就认为该sample是positive的,否则negative(物理含义??没弄清楚)
reflective loss和weighted cross entropy loss最终以 15:85 的比例求和作为最后的loss用于训练。
(实验证明这种方式对occluded 区域有很好的效果)
4. Disparity map refinement
我们知道,由上述算法得到的视差图的效果还是比较粗糙的,需要经过进一步的refinement才能得到最后的结果。主要有以下集中refinement的方法:

  • CBCA 迭代 - incorporate neighboring pixels’ information by averaging the cost w.r.t depth discontinuities.
  • SGM - enforce smooth constraints (CBCA + SGM iterations attribute to refine D(p), but there are some cases which they could not handle such as: reflective and sparse texture regions, occlude and distorted area and illumination changes )
  • left right consistency check
  • subpixel interpolation
  • -
Whole architecture
Highway network
Reflective loss
#
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值