A. Levenberg-Marquardt算法
待估计的模型参数 x = [ x 1 , x 2 , ⋯ , x n ] T \mathbf{x}=[x_1, x_2, \cdots,x_n]^T x=[x1,x2,⋯,xn]T
误差函数 f i ( x ) f_i( \mathbf{x}) fi(x): 表示第 i i i个观测值与对应的模型的估计值的差
m m m:观测值的个数
目标函数 F ( x ) = 1 2 ∑ i = 1 m ( f i ( x ) ) 2 F(\mathbf{x}) = \frac{1}{2} \displaystyle \sum^{m}_{i=1}(f_i(\mathbf{x}))^2 F(x)=21i=1∑m(fi(x))2
最小二乘问题可以描述为
x
∗
=
a
r
g
m
i
n
x
{
F
(
x
)
}
\mathbf{x}^* = \underset{\mathbf{x}}{argmin}\{ F(\mathbf{x}) \}
x∗=xargmin{F(x)}
一些用到的变量和公式
f
(
x
)
=
[
f
1
(
x
)
,
⋯
,
f
m
(
x
)
]
T
F
(
x
)
=
1
2
f
T
(
x
)
f
(
x
)
=
1
2
∣
∣
f
(
x
)
∣
∣
2
f
′
(
x
)
=
J
(
x
)
=
[
∂
f
1
(
x
)
∂
x
1
⋯
∂
f
1
(
x
)
∂
x
n
⋮
⋱
⋮
∂
f
m
(
x
)
∂
x
1
⋯
∂
f
m
(
x
)
∂
x
n
]
F
′
(
x
)
=
f
′
T
(
x
)
f
(
x
)
=
J
T
(
x
)
f
(
x
)
\mathbf{f}(\mathbf{x})=[f_1(\mathbf{x}), \cdots,f_m(\mathbf{x})]^T \\ F(\mathbf{x}) = \displaystyle \frac{1}{2} \mathbf{f}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) = \frac{1}{2} ||\mathbf{f}(\mathbf{x})||^2 \\ \mathbf{f}^{'}(\mathbf{x}) = \mathbf{J}(\mathbf{x}) = \displaystyle \begin{bmatrix} \frac{\partial f_1(\mathbf{x})}{\partial x_1} &\cdots &\frac{\partial f_1(\mathbf{x})}{\partial x_n} \\ \vdots &\ddots &\vdots \\ \frac{\partial f_m(\mathbf{x})}{\partial x_1} &\cdots &\frac{\partial f_m(\mathbf{x})}{\partial x_n} \\ \end{bmatrix} \\ F^{'}(\mathbf{x}) = \mathbf{f}^{'T}(\mathbf{x})\mathbf{f}(\mathbf{x}) = \mathbf{J}^T(\mathbf{x})\mathbf{f}(\mathbf{x}) \\
f(x)=[f1(x),⋯,fm(x)]TF(x)=21fT(x)f(x)=21∣∣f(x)∣∣2f′(x)=J(x)=
∂x1∂f1(x)⋮∂x1∂fm(x)⋯⋱⋯∂xn∂f1(x)⋮∂xn∂fm(x)
F′(x)=f′T(x)f(x)=JT(x)f(x)
f
(
x
)
\mathbf{f}(\mathbf{x})
f(x)在
x
\mathbf{x}
x处一阶泰勒展开
f
(
x
+
h
)
≈
f
(
x
)
+
J
(
x
)
h
where
h
sufficiently small
\displaystyle \mathbf{f}(\mathbf{x} + \mathbf{h}) \approx \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x}) \mathbf{h}\\ \text{where} \ \mathbf{h}\ \text{sufficiently small}
f(x+h)≈f(x)+J(x)hwhere h sufficiently small
于是有
F
(
x
+
h
)
=
1
2
f
T
(
x
+
h
)
f
(
x
+
h
)
≈
1
2
(
f
(
x
)
+
J
(
x
)
h
)
T
(
f
(
x
)
+
J
(
x
)
h
)
=
F
(
x
)
+
h
T
J
T
(
x
)
f
(
x
)
+
1
2
h
T
J
T
(
x
)
J
(
x
)
h
where
h
sufficiently small
\displaystyle \begin{align} F(\mathbf{x} + \mathbf{h}) &= \frac{1}{2} \mathbf{f}^T(\mathbf{x} + \mathbf{h}) \mathbf{f}(\mathbf{x}+ \mathbf{h}) \\ &\approx \frac{1}{2} \left( \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x}) \mathbf{h} \right)^T \left( \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x}) \mathbf{h} \right) \\ &= F(\mathbf{x}) + \mathbf{h}^T \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) + \frac{1}{2} \mathbf{h}^T \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x}) \mathbf{h} \\ \end{align} \\ \text{where} \ \mathbf{h}\ \text{sufficiently small}
F(x+h)=21fT(x+h)f(x+h)≈21(f(x)+J(x)h)T(f(x)+J(x)h)=F(x)+hTJT(x)f(x)+21hTJT(x)J(x)hwhere h sufficiently small
令
L
(
h
)
=
F
(
x
)
+
h
T
J
T
(
x
)
f
(
x
)
+
1
2
h
T
J
T
(
x
)
J
(
x
)
h
L(\mathbf{h}) = F(\mathbf{x}) + \mathbf{h}^T \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) + \frac{1}{2} \mathbf{h}^T \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x}) \mathbf{h}
L(h)=F(x)+hTJT(x)f(x)+21hTJT(x)J(x)h
每一次的迭代的基本思路是,在
x
\mathbf{x}
x处求
h
l
m
=
argmin
h
{
L
(
h
)
+
P
(
μ
,
h
)
}
\mathbf{h}_{lm} = \underset{\mathbf{h}}{\text{argmin}} \{L(\mathbf{h}) + P(\mu,\mathbf{h}) \}
hlm=hargmin{L(h)+P(μ,h)}
其中
P
(
μ
,
h
)
P(\mathbf{\mu,h})
P(μ,h)是惩罚项,是为了使得上式求得的
h
l
m
\mathbf{h}_{lm}
hlm偏小些,并且可以通过调大
μ
\mu
μ来获得更小的
h
l
m
\mathbf{h}_{lm}
hlm,或者调小
μ
\mu
μ来获得更大的
h
l
m
\mathbf{h}_{lm}
hlm。常见的
P
(
μ
,
h
)
P(\mu,\mathbf{h})
P(μ,h)的形式有:
P
(
μ
,
h
)
=
1
2
μ
h
T
h
P(\mu, \mathbf{h}) = \frac{1}{2} \mu \mathbf{h}^T\mathbf{h}
P(μ,h)=21μhTh
A.1 求解单应性矩阵的LM算法
n
=
8
n = 8
n=8,
x
\mathbf{x}
x对应的是单应性矩阵里待解的8个未知量:
z
⋅
[
u
v
1
]
=
[
x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
1
]
⋅
[
a
b
1
]
⇒
{
u
=
x
1
a
+
x
2
b
+
x
3
x
7
a
+
x
8
b
+
1
v
=
x
4
a
+
x
5
b
+
x
6
x
7
a
+
x
8
b
+
1
z \cdot \begin{bmatrix} u \\ v \\ 1 \\ \end{bmatrix} = \begin{bmatrix} x_1 &x_2 &x_3 \\ x_4 &x_5 &x_6 \\ x_7 &x_8 &1 \\ \end{bmatrix} \cdot \begin{bmatrix} a \\ b \\ 1 \\ \end{bmatrix} \\ \Rightarrow \begin{cases} \displaystyle u = \frac{x_1 a + x_2 b + x_3}{x_7 a + x_8 b + 1} \\ \displaystyle v = \frac{x_4 a + x_5 b + x_6}{x_7 a + x_8 b + 1} \\ \end{cases}
z⋅
uv1
=
x1x4x7x2x5x8x3x61
⋅
ab1
⇒⎩
⎨
⎧u=x7a+x8b+1x1a+x2b+x3v=x7a+x8b+1x4a+x5b+x6
假设有
s
s
s个点对,那么
m
=
2
⋅
s
m=2 \cdot s
m=2⋅s,因为一个点对产生两个误差函数,即对于第
i
i
i个点对,产生
f
2
i
−
1
(
x
)
=
x
1
a
i
+
x
2
b
i
+
x
3
x
7
a
i
+
x
8
b
i
+
1
−
u
i
f
2
i
(
x
)
=
x
4
a
i
+
x
5
b
i
+
x
6
x
7
a
i
+
x
8
b
i
+
1
−
v
i
f_{2i-1}(\mathbf{x}) = \frac{x_1 a_i + x_2 b_i + x_3}{x_7 a_i + x_8 b_i + 1} - u_i \\ f_{2i}(\mathbf{x}) = \frac{x_4 a_i + x_5 b_i + x_6}{x_7 a_i + x_8 b_i + 1} - v_i \\
f2i−1(x)=x7ai+x8bi+1x1ai+x2bi+x3−uif2i(x)=x7ai+x8bi+1x4ai+x5bi+x6−vi
对应的一阶导数是
f
2
i
−
1
′
(
x
)
=
[
a
i
x
7
a
i
+
x
8
b
i
+
1
,
b
i
x
7
a
i
+
x
8
b
i
+
1
,
1
x
7
a
i
+
x
8
b
i
+
1
,
0
,
0
,
0
,
−
a
i
(
x
1
a
i
+
x
2
b
i
+
x
3
)
(
x
7
a
i
+
x
8
b
i
+
1
)
2
,
−
b
i
(
x
1
a
i
+
x
2
b
i
+
x
3
)
(
x
7
a
i
+
x
8
b
i
+
1
)
2
]
T
f
2
i
′
(
x
)
=
[
0
,
0
,
0
,
a
i
x
7
a
i
+
x
8
b
i
+
1
,
b
i
x
7
a
i
+
x
8
b
i
+
1
,
1
x
7
a
i
+
x
8
b
i
+
1
,
−
a
i
(
x
4
a
i
+
x
5
b
i
+
x
6
)
(
x
7
a
i
+
x
8
b
i
+
1
)
2
,
−
b
i
(
x
4
a
i
+
x
5
b
i
+
x
6
)
(
x
7
a
i
+
x
8
b
i
+
1
)
2
]
T
f^{'}_{2i-1}(\mathbf{x}) = \left[ \frac{a_i}{x_7 a_i + x_8 b_i + 1},\frac{b_i}{x_7 a_i + x_8 b_i + 1},\frac{1}{x_7 a_i + x_8 b_i + 1},0,0,0,-\frac{a_i(x_1 a_i + x_2 b_i + x_3)}{(x_7 a_i + x_8 b_i + 1)^2},-\frac{b_i(x_1 a_i + x_2 b_i + x_3)}{(x_7 a_i + x_8 b_i + 1)^2} \right]^T \\ f^{'}_{2i}(\mathbf{x}) = \left[ 0,0,0,\frac{a_i}{x_7 a_i + x_8 b_i + 1},\frac{b_i}{x_7 a_i + x_8 b_i + 1},\frac{1}{x_7 a_i + x_8 b_i + 1},-\frac{a_i(x_4 a_i + x_5 b_i + x_6)}{(x_7 a_i + x_8 b_i + 1)^2},-\frac{b_i(x_4 a_i + x_5 b_i + x_6)}{(x_7 a_i + x_8 b_i + 1)^2} \right]^T \\
f2i−1′(x)=[x7ai+x8bi+1ai,x7ai+x8bi+1bi,x7ai+x8bi+11,0,0,0,−(x7ai+x8bi+1)2ai(x1ai+x2bi+x3),−(x7ai+x8bi+1)2bi(x1ai+x2bi+x3)]Tf2i′(x)=[0,0,0,x7ai+x8bi+1ai,x7ai+x8bi+1bi,x7ai+x8bi+11,−(x7ai+x8bi+1)2ai(x4ai+x5bi+x6),−(x7ai+x8bi+1)2bi(x4ai+x5bi+x6)]T
这里
P
(
μ
,
h
)
P(\mu, \mathbf{h})
P(μ,h)的形式如下所示
P
(
μ
,
h
)
=
1
2
μ
h
T
W
h
P(\mu, \mathbf{h}) = \frac{1}{2} \mu \mathbf{h}^T \mathbf{W} \mathbf{h}
P(μ,h)=21μhTWh
W \mathbf{W} W是一个已知的对角矩阵,而且对角线元素都是大于0的。
令
ψ
(
h
)
=
L
(
h
)
+
1
2
μ
h
T
W
h
\psi(\mathbf{h}) = L(\mathbf{h}) + \frac{1}{2} \mu \mathbf{h}^T \mathbf{W} \mathbf{h}
ψ(h)=L(h)+21μhTWh
求解上式的最小值,即令
ψ
′
(
h
)
=
J
T
(
x
)
f
(
x
)
+
(
J
T
(
x
)
J
(
x
)
+
μ
W
)
h
=
0
\psi^{'}(\mathbf{h}) = \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) + \left( \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x}) + \mu \mathbf{W} \right) \mathbf{h} = 0
ψ′(h)=JT(x)f(x)+(JT(x)J(x)+μW)h=0
于是可得
h
l
m
=
−
(
A
+
μ
W
)
−
1
g
A
=
J
T
(
x
)
J
(
x
)
g
=
J
T
(
x
)
f
(
x
)
\mathbf{h}_{lm} = - \left( \mathbf{A} + \mu \mathbf{W} \right)^{-1} \mathbf{g} \\ \mathbf{A} = \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x}) \\ \mathbf{g} = \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x})
hlm=−(A+μW)−1gA=JT(x)J(x)g=JT(x)f(x)
整个算法如下所示:
已知: x 0 \mathbf{x}_0 x0, ξ 1 = 1 e − 15 \xi_1=1e^{-15} ξ1=1e−15, ξ 2 = 1 e − 15 \xi_2=1e^{-15} ξ2=1e−15, k m a x k_{\mathrm{max}} kmax
解法:
begin
k = 0 k = 0 k=0
x = x 0 \mathbf{x} = \mathbf{x}_0 x=x0
A = J T ( x ) J ( x ) ; g = J T ( x ) f ( x ) \mathbf{A} = \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x});\quad \mathbf{g} = \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) A=JT(x)J(x);g=JT(x)f(x)
W = d i a g ( [ a 11 , a 22 , ⋯ , a n n ] ) ; μ = 1 ; τ = 0.75 \mathbf{W} = diag([a_{11},a_{22}, \cdots,a_{nn}]); \quad \mu = 1; \quad \tau = 0.75 W=diag([a11,a22,⋯,ann]);μ=1;τ=0.75
found = $ \left(||\mathbf{f}(\mathbf{x})|| < \xi_2 \right)$
while (not found) and ( k < k m a x k < k_{\mathrm{max}} k<kmax)
k = k + 1 k = k + 1 k=k+1
Solve ( A + μ W ) h l m = − g \left( \mathbf{A} + \mu \mathbf{W} \right) \mathbf{h}_{lm} = -\mathbf{g} (A+μW)hlm=−g
x n e w = x + h l m \mathbf{x}_{\mathrm{new}} = \mathbf{x} + \mathbf{h}_{lm} xnew=x+hlm
ρ = F ( x ) − F ( x n e w ) L ( 0 ) − L ( h l m ) \rho = \displaystyle \frac{F(\mathbf{x}) - F(\mathbf{x}_{\mathrm{new}})}{L(\mathbf{0}) - L(\mathbf{h}_{lm})} ρ=L(0)−L(hlm)F(x)−F(xnew)
if ρ > 0.75 \rho > 0.75 ρ>0.75
μ = μ / 2 \mu = \mu /2 μ=μ/2
if μ < τ \mu < \tau μ<τ
μ = 0 \mu = 0 μ=0
elseif ρ < 0.25 \rho < 0.25 ρ<0.25
ν = m i n { m a x { 2 × ( 1 − F ( x n e w ) − F ( x ) h l m T g ) , 2 } , 10 } \nu = \mathrm{min} \left\{ \mathrm{max} \left\{ 2 \times \left( 1 - \displaystyle \frac{F(\mathbf{x}_{\mathrm{new}}) - F(\mathbf{x})}{\mathbf{h}^T_{lm} \mathbf{g}} \right),2 \right\},10 \right\} ν=min{max{2×(1−hlmTgF(xnew)−F(x)),2},10}
if μ = = 0 \mu == 0 μ==0
B = A − 1 \mathbf{B} = \mathbf{A}^{-1} B=A−1
t e m p = m a x { [ b 11 , ⋯ , b n n ] } temp = \mathrm{max}\{ [b_{11},\cdots,b_{nn}] \} temp=max{[b11,⋯,bnn]}
μ = 1 / t e m p ; τ = 1 / t e m p \mu = 1/temp; \quad \tau = 1/temp μ=1/temp;τ=1/temp
ν = ν / 2 \nu = \nu / 2 ν=ν/2
μ = μ ⋅ ν \mu = \mu \cdot \nu μ=μ⋅ν
if F ( x n e w ) < F ( x ) F(\mathbf{x}_{\mathrm{new}}) < F(\mathbf{x}) F(xnew)<F(x)
x = x n e w \mathbf{x} = \mathbf{x}_{\mathrm{new}} x=xnew
A = J T ( x ) J ( x ) ; g = J T ( x ) f ( x ) \mathbf{A} = \mathbf{J}^T(\mathbf{x}) \mathbf{J}(\mathbf{x});\quad \mathbf{g} = \mathbf{J}^T(\mathbf{x}) \mathbf{f}(\mathbf{x}) A=JT(x)J(x);g=JT(x)f(x)
found = $ \left(||\mathbf{h}_{lm}|| < \xi_1 \ \mathrm{or} \ ||\mathbf{f}(\mathbf{x})|| < \xi_2 \right)$
end