Stereo Processing by Semi-Global Matching and Mutual Information
Semi-Global Matching Algorithm is proposed by “Stereo Processing by Semi-Global Matching and Mutual Information” published in 2008. The Semi-Global Matching (SGM) stereo method uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions.
The mutual information & cost function
mutual information
Mutual information is defined as:
M
I
I
1
,
I
2
=
H
I
1
+
H
I
2
−
H
I
1
,
I
2
(1)
MI_{I_1,I_2}=H_{I_1}+H_{I_2}-H_{I_1,I_2}\tag{1}
MII1,I2=HI1+HI2−HI1,I2(1), where
H
I
=
−
∫
0
1
P
I
(
i
)
l
o
g
P
I
(
i
)
d
i
(2)
H_I=-\int_0^1P_I(i)logP_I(i)d_i\tag{2}
HI=−∫01PI(i)logPI(i)di(2) and
H
I
1
,
I
2
=
−
∫
0
1
∫
0
1
P
I
1
,
I
2
(
i
1
,
i
2
)
l
o
g
P
I
1
,
I
2
(
i
1
,
i
2
)
d
i
1
d
i
2
(3)
H_{I_1,I_2}=-\int_0^1\int_0^1P_{I_1,I_2}(i_1,i_2)logP_{I_1,I_2}(i_1,i_2)d_{i_1}d_{i_2}\tag{3}
HI1,I2=−∫01∫01PI1,I2(i1,i2)logPI1,I2(i1,i2)di1di2(3). And
P
I
(
i
)
P_I(i)
PI(i) is the normalized histogram of the images,
P
I
1
,
I
2
P_{I_1,I_2}
PI1,I2 is the joint distribution of corresponding points’ intensity. The corresponding points, for example, pixel
p
\textbf{p}
p in the left image,
q
\textbf{q}
q in the right image are corresponded, so
q
=
e
b
m
(
p
,
d
)
=
[
p
x
−
d
,
p
y
]
\textbf{q}=e_{bm}(\textbf{p},d)=[\textbf{p}_x-d,\textbf{p}_y]
q=ebm(p,d)=[px−d,py] with disparity
d
d
d.
P
I
1
,
I
2
(
i
,
k
)
=
1
n
∑
p
T
[
(
i
,
k
)
=
(
I
1
p
,
I
2
p
)
]
(4)
P_{I_1,I_2}(i,k)=\frac{1}{n}\sum_{\textbf{p}}{T[(i,k)=(I_{1p},I_{2p})]}\tag{4}
PI1,I2(i,k)=n1p∑T[(i,k)=(I1p,I2p)](4).
T
[
A
=
B
]
=
1
,
i
f
A
=
B
,
e
l
s
e
T
[
A
=
B
]
=
0
T[A=B]=1,if A=B,else T[A=B]=0
T[A=B]=1,ifA=B,elseT[A=B]=0
Equation 1 operates on full images and requires the disparity
image a priori, both prevent the use of MI as pixelwise matching cost. However, the calculation of the joint entropy can be transformed into a sum over pixels using Taylor expansion.
H
I
1
,
I
2
=
∑
p
h
I
1
,
I
2
(
I
1
p
,
I
2
p
)
(5)
H_{I_1,I_2}=\sum_\textbf{p}{h_{I_1,I_2}(I_{1p},I_{2p})}\tag{5}
HI1,I2=p∑hI1,I2(I1p,I2p)(5)
h
I
1
,
I
2
(
i
,
k
)
=
−
1
n
l
o
g
(
P
I
1
,
I
2
(
i
,
k
)
⊗
g
(
i
,
k
)
)
⊗
g
(
i
,
k
)
(6)
h_{I_1,I_2}(i,k)=-\frac{1}{n}log(P_{I_1,I_2}(i,k)\otimes g(i,k))\otimes g(i,k)\tag{6}
hI1,I2(i,k)=−n1log(PI1,I2(i,k)⊗g(i,k))⊗g(i,k)(6)
For overlapping(matching) pixels, the entropy of one image is
H
I
=
∑
p
h
I
(
I
p
)
(7)
H_{I}=\sum_\textbf{p}{h_{I}(I_{p})}\tag{7}
HI=p∑hI(Ip)(7)
h
I
(
i
)
=
−
1
n
l
o
g
(
P
I
(
i
)
⊗
g
(
i
)
)
⊗
g
(
i
)
(8)
h_I(i)=-\frac{1}{n}log(P_{I}(i)\otimes g(i))\otimes g(i)\tag{8}
hI(i)=−n1log(PI(i)⊗g(i))⊗g(i)(8)
With the same idea, the mutual information could be described as
M
I
I
1
,
I
2
=
∑
p
m
i
I
1
,
I
2
(
I
1
p
,
I
2
p
)
(9)
MI_{I_1,I_2}=\sum_\textbf{p}{mi_{I_1,I_2}(I_{1p},I_{2p})}\tag{9}
MII1,I2=p∑miI1,I2(I1p,I2p)(9) and
m
i
I
1
,
I
2
(
I
1
p
,
I
2
p
)
=
h
i
1
(
i
)
+
h
i
2
(
k
)
−
h
I
1
,
I
2
(
i
,
k
)
(10)
mi_{I_1,I_2}(I_{1p},I_{2p})=h_{i_1}(i)+h_{i_2}(k)-h_{I_1,I_2}(i,k)\tag{10}
miI1,I2(I1p,I2p)=hi1(i)+hi2(k)−hI1,I2(i,k)(10)
To summarize, if we have the optimal disparity map
D
D
D, and the mutual information of the two images will be maximized. At the same time, the joint distribution of the corresponding points’ intensity will be like shown below, which means the corresponding points share similar intensities.
cost function
The mutual information cost is defined as
C
M
I
(
p
,
d
)
=
−
m
i
I
b
,
f
D
(
I
m
)
(
I
b
p
,
I
m
q
)
(11)
C_{MI}(\textbf{p},d)=-mi_{I_b,f_D(I_m)}(I_{b\textbf{p}},I_{m\textbf{q}})\tag{11}
CMI(p,d)=−miIb,fD(Im)(Ibp,Imq)(11) where
f
D
(
I
m
)
f_D(I_m)
fD(Im) is the transformed right image with disparity map
D
D
D.
Here, we will have a question, if we want to get the cost, we have to get the disparity map, but the disparity map is the final result we want. So the author provided an optimization method.
First, considering pixelwise cost calculation is generally ambiguous and wrong matches can easily have a lower cost than correct ones, due to noise; an additional constraint is added that supports smoothness by penalizing changes of neighboring disparities. The energy
E
(
D
)
E(D)
E(D) that depends on the disparity image
D
D
D.
E
(
D
)
=
∑
p
(
C
(
p
,
D
p
)
+
∑
q
∈
N
p
P
1
T
[
∣
D
p
−
D
q
∣
=
1
]
+
∑
q
∈
N
p
P
2
T
[
∣
D
p
−
D
q
∣
>
1
]
)
(12)
E(D)=\sum_{\textbf{p}}{(C(\textbf{p},D_{\textbf{p}})+\sum_{\textbf{q}\in{N_{\textbf{p}}}}{P_1T[|D_{\textbf{p}}-D_{\textbf{q}}|=1]}+\sum_{\textbf{q}\in{N_{\textbf{p}}}}{P_2T[|D_{\textbf{p}}-D_{\textbf{q}}|>1]})}\tag{12}
E(D)=p∑(C(p,Dp)+q∈Np∑P1T[∣Dp−Dq∣=1]+q∈Np∑P2T[∣Dp−Dq∣>1])(12)
The first term is the ‘MI’ pixel matching cost for the disparities of
D
D
D. The second and third term are penalties. Usually
P
2
≫
P
1
P_2\gg P_1
P2≫P1. To minimize the energy
E
(
D
)
E(D)
E(D) is NP complete for many discontinuity preserving energies. So Dynamic Programming solution is applied by optimizing several 1D constraints from all directions to fit 2D conditions. The aggregated (smoothed) cost
S
(
p
,
d
)
S(\textbf{p},d)
S(p,d) for a pixel
p
\textbf{p}
p and disparity
d
d
d is calculated by summing the costs of all 1D minimum cost paths that end in pixel
p
\textbf{p}
p at disparity
d
d
d.
Finally,
L
r
(
p
,
d
)
=
C
(
p
,
d
)
L_{\textbf{r}}(\textbf{p},d)=C(\textbf{p},d)
Lr(p,d)=C(p,d)
+
m
i
n
(
L
r
(
p-r
,
d
)
,
+min(L_{\textbf{r}}(\textbf{p-r},d),
+min(Lr(p-r,d),
L
r
(
p-r
,
d
−
1
)
+
P
1
,
L_{\textbf{r}}(\textbf{p-r},d-1)+P_1,
Lr(p-r,d−1)+P1,
L
r
(
p-r
,
d
+
1
)
+
P
1
,
m
i
n
i
L
r
(
p-r
,
i
)
+
P
2
)
−
m
i
n
k
L
r
(
p-r
,
k
)
L_{\textbf{r}}(\textbf{p-r},d+1)+P_1,min_i L_{\textbf{r}}(\textbf{p-r},i)+P_2)-min_kL_{\textbf{r}}(\textbf{p-r},k)
Lr(p-r,d+1)+P1,miniLr(p-r,i)+P2)−minkLr(p-r,k)
And,
S
(
p
,
d
)
=
∑
r
L
r
(
p
,
d
)
(13)
S(\textbf{p},d)=\sum_{\textbf{r}}{L_{\textbf{r}}(\textbf{p},d)}\tag{13}
S(p,d)=r∑Lr(p,d)(13)
choose
d
d
d that minimize
S
(
p
,
d
)
S(\textbf{p},d)
S(p,d).
To summarize, the cost function finally transforms from a global energy E ( D ) E(D) E(D) into a pixelwize and Dynamic Programming S ( p , d ) S(\textbf{p},d) S(p,d).
Disparity Refinement
This part include:
- Intensity Consistent Disparity Check
- remove outliers
- Discontinuity Preserving Interpolating
Opencv result
input:
output:
Summary
This blog focuses on two main questions.
- Apply mutual information as part of the cost. By the way, the mutual information is based on grayscale distributions and the joint distribution is about the corresponding pixels’ grayscale couple(like shown below, I 1 I_1 I1 and I 2 I_2 I2).
- Turn the cost function from energy E ( D ) E(D) E(D) to S ( p , d ) S(\textbf{p},d) S(p,d).
Remaining Questions
- Dynamic Programming process is not so clear in detail. I do not know how the question turn into a Dynamic Programming. The theory is not clear for me.
- Source Code in Opencv is still complex for me. The cost is not mutual information in Opencv(Block). And they spent much on memory processing, which make the code complex. So I first learn BM source code in Opencv, which would help for SGM.
Reference:
Stereo Processing by Semi-Global Matching and Mutual Information