三维重构(13):SGM经典算法解读

Stereo Processing by Semi-Global Matching and Mutual Information

Semi-Global Matching Algorithm is proposed by “Stereo Processing by Semi-Global Matching and Mutual Information” published in 2008. The Semi-Global Matching (SGM) stereo method uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions.

The mutual information & cost function

mutual information

Mutual information is defined as:
M I I 1 , I 2 = H I 1 + H I 2 − H I 1 , I 2 (1) MI_{I_1,I_2}=H_{I_1}+H_{I_2}-H_{I_1,I_2}\tag{1} MII1,I2=HI1+HI2HI1,I2(1), where H I = − ∫ 0 1 P I ( i ) l o g P I ( i ) d i (2) H_I=-\int_0^1P_I(i)logP_I(i)d_i\tag{2} HI=01PI(i)logPI(i)di(2) and H I 1 , I 2 = − ∫ 0 1 ∫ 0 1 P I 1 , I 2 ( i 1 , i 2 ) l o g P I 1 , I 2 ( i 1 , i 2 ) d i 1 d i 2 (3) H_{I_1,I_2}=-\int_0^1\int_0^1P_{I_1,I_2}(i_1,i_2)logP_{I_1,I_2}(i_1,i_2)d_{i_1}d_{i_2}\tag{3} HI1,I2=0101PI1,I2(i1,i2)logPI1,I2(i1,i2)di1di2(3). And P I ( i ) P_I(i) PI(i) is the normalized histogram of the images, P I 1 , I 2 P_{I_1,I_2} PI1,I2 is the joint distribution of corresponding points’ intensity. The corresponding points, for example, pixel p \textbf{p} p in the left image, q \textbf{q} q in the right image are corresponded, so q = e b m ( p , d ) = [ p x − d , p y ] \textbf{q}=e_{bm}(\textbf{p},d)=[\textbf{p}_x-d,\textbf{p}_y] q=ebm(p,d)=[pxd,py] with disparity d d d.
P I 1 , I 2 ( i , k ) = 1 n ∑ p T [ ( i , k ) = ( I 1 p , I 2 p ) ] (4) P_{I_1,I_2}(i,k)=\frac{1}{n}\sum_{\textbf{p}}{T[(i,k)=(I_{1p},I_{2p})]}\tag{4} PI1,I2(i,k)=n1pT[(i,k)=(I1p,I2p)](4). T [ A = B ] = 1 , i f A = B , e l s e T [ A = B ] = 0 T[A=B]=1,if A=B,else T[A=B]=0 T[A=B]=1,ifA=B,elseT[A=B]=0

Equation 1 operates on full images and requires the disparity
image a priori, both prevent the use of MI as pixelwise matching cost. However, the calculation of the joint entropy can be transformed into a sum over pixels using Taylor expansion.
H I 1 , I 2 = ∑ p h I 1 , I 2 ( I 1 p , I 2 p ) (5) H_{I_1,I_2}=\sum_\textbf{p}{h_{I_1,I_2}(I_{1p},I_{2p})}\tag{5} HI1,I2=phI1,I2(I1p,I2p)(5)
h I 1 , I 2 ( i , k ) = − 1 n l o g ( P I 1 , I 2 ( i , k ) ⊗ g ( i , k ) ) ⊗ g ( i , k ) (6) h_{I_1,I_2}(i,k)=-\frac{1}{n}log(P_{I_1,I_2}(i,k)\otimes g(i,k))\otimes g(i,k)\tag{6} hI1,I2(i,k)=n1log(PI1,I2(i,k)g(i,k))g(i,k)(6)
For overlapping(matching) pixels, the entropy of one image is
H I = ∑ p h I ( I p ) (7) H_{I}=\sum_\textbf{p}{h_{I}(I_{p})}\tag{7} HI=phI(Ip)(7)
h I ( i ) = − 1 n l o g ( P I ( i ) ⊗ g ( i ) ) ⊗ g ( i ) (8) h_I(i)=-\frac{1}{n}log(P_{I}(i)\otimes g(i))\otimes g(i)\tag{8} hI(i)=n1log(PI(i)g(i))g(i)(8)
With the same idea, the mutual information could be described as M I I 1 , I 2 = ∑ p m i I 1 , I 2 ( I 1 p , I 2 p ) (9) MI_{I_1,I_2}=\sum_\textbf{p}{mi_{I_1,I_2}(I_{1p},I_{2p})}\tag{9} MII1,I2=pmiI1,I2(I1p,I2p)(9) and m i I 1 , I 2 ( I 1 p , I 2 p ) = h i 1 ( i ) + h i 2 ( k ) − h I 1 , I 2 ( i , k ) (10) mi_{I_1,I_2}(I_{1p},I_{2p})=h_{i_1}(i)+h_{i_2}(k)-h_{I_1,I_2}(i,k)\tag{10} miI1,I2(I1p,I2p)=hi1(i)+hi2(k)hI1,I2(i,k)(10)

To summarize, if we have the optimal disparity map D D D, and the mutual information of the two images will be maximized. At the same time, the joint distribution of the corresponding points’ intensity will be like shown below, which means the corresponding points share similar intensities.
在这里插入图片描述

cost function

The mutual information cost is defined as C M I ( p , d ) = − m i I b , f D ( I m ) ( I b p , I m q ) (11) C_{MI}(\textbf{p},d)=-mi_{I_b,f_D(I_m)}(I_{b\textbf{p}},I_{m\textbf{q}})\tag{11} CMI(p,d)=miIb,fD(Im)(Ibp,Imq)(11) where f D ( I m ) f_D(I_m) fD(Im) is the transformed right image with disparity map D D D.
Here, we will have a question, if we want to get the cost, we have to get the disparity map, but the disparity map is the final result we want. So the author provided an optimization method.
First, considering pixelwise cost calculation is generally ambiguous and wrong matches can easily have a lower cost than correct ones, due to noise; an additional constraint is added that supports smoothness by penalizing changes of neighboring disparities. The energy E ( D ) E(D) E(D) that depends on the disparity image D D D.
E ( D ) = ∑ p ( C ( p , D p ) + ∑ q ∈ N p P 1 T [ ∣ D p − D q ∣ = 1 ] + ∑ q ∈ N p P 2 T [ ∣ D p − D q ∣ > 1 ] ) (12) E(D)=\sum_{\textbf{p}}{(C(\textbf{p},D_{\textbf{p}})+\sum_{\textbf{q}\in{N_{\textbf{p}}}}{P_1T[|D_{\textbf{p}}-D_{\textbf{q}}|=1]}+\sum_{\textbf{q}\in{N_{\textbf{p}}}}{P_2T[|D_{\textbf{p}}-D_{\textbf{q}}|>1]})}\tag{12} E(D)=p(C(p,Dp)+qNpP1T[DpDq=1]+qNpP2T[DpDq>1])(12)

The first term is the ‘MI’ pixel matching cost for the disparities of D D D. The second and third term are penalties. Usually P 2 ≫ P 1 P_2\gg P_1 P2P1. To minimize the energy E ( D ) E(D) E(D) is NP complete for many discontinuity preserving energies. So Dynamic Programming solution is applied by optimizing several 1D constraints from all directions to fit 2D conditions. The aggregated (smoothed) cost S ( p , d ) S(\textbf{p},d) S(p,d) for a pixel p \textbf{p} p and disparity d d d is calculated by summing the costs of all 1D minimum cost paths that end in pixel p \textbf{p} p at disparity d d d.
Finally,
L r ( p , d ) = C ( p , d ) L_{\textbf{r}}(\textbf{p},d)=C(\textbf{p},d) Lr(p,d)=C(p,d)
+ m i n ( L r ( p-r , d ) , +min(L_{\textbf{r}}(\textbf{p-r},d), +min(Lr(p-r,d),
L r ( p-r , d − 1 ) + P 1 , L_{\textbf{r}}(\textbf{p-r},d-1)+P_1, Lr(p-r,d1)+P1,
L r ( p-r , d + 1 ) + P 1 , m i n i L r ( p-r , i ) + P 2 ) − m i n k L r ( p-r , k ) L_{\textbf{r}}(\textbf{p-r},d+1)+P_1,min_i L_{\textbf{r}}(\textbf{p-r},i)+P_2)-min_kL_{\textbf{r}}(\textbf{p-r},k) Lr(p-r,d+1)+P1,miniLr(p-r,i)+P2)minkLr(p-r,k)
And, S ( p , d ) = ∑ r L r ( p , d ) (13) S(\textbf{p},d)=\sum_{\textbf{r}}{L_{\textbf{r}}(\textbf{p},d)}\tag{13} S(p,d)=rLr(p,d)(13)
choose d d d that minimize S ( p , d ) S(\textbf{p},d) S(p,d).

To summarize, the cost function finally transforms from a global energy E ( D ) E(D) E(D) into a pixelwize and Dynamic Programming S ( p , d ) S(\textbf{p},d) S(p,d).

Disparity Refinement

This part include:

  1. Intensity Consistent Disparity Check
  2. remove outliers
  3. Discontinuity Preserving Interpolating

Opencv result

input:
在这里插入图片描述
在这里插入图片描述
output:
在这里插入图片描述

Summary

This blog focuses on two main questions.

  1. Apply mutual information as part of the cost. By the way, the mutual information is based on grayscale distributions and the joint distribution is about the corresponding pixels’ grayscale couple(like shown below, I 1 I_1 I1 and I 2 I_2 I2).在这里插入图片描述
  2. Turn the cost function from energy E ( D ) E(D) E(D) to S ( p , d ) S(\textbf{p},d) S(p,d).

Remaining Questions

  1. Dynamic Programming process is not so clear in detail. I do not know how the question turn into a Dynamic Programming. The theory is not clear for me.
  2. Source Code in Opencv is still complex for me. The cost is not mutual information in Opencv(Block). And they spent much on memory processing, which make the code complex. So I first learn BM source code in Opencv, which would help for SGM.

Reference:
Stereo Processing by Semi-Global Matching and Mutual Information

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值