本博客参考了添加链接描述这篇知乎
先看我这篇博客介绍添加链接描述
Q
k
R
k
−
1
Q_k R_{k}^{-1}
QkRk−1的处理
假设
D
k
=
[
d
1
,
d
2
,
…
,
d
k
]
=
Q
k
R
k
−
1
D_k = [d_1,d_2, \ldots, d_k] = Q_k R_{k}^{-1}
Dk=[d1,d2,…,dk]=QkRk−1,假设
R
k
R_k
Rk的第
i
i
i行第
j
j
j列元素为
r
i
,
j
r_{i,j}
ri,j,
{
R
k
+
1
,
k
=
(
r
0
,
0
r
0
,
1
r
0
,
2
⋯
⋯
0
0
r
1
,
1
r
1
,
2
r
1
,
3
⋯
0
0
0
⋯
⋯
∗
∗
0
0
⋯
⋯
0
r
k
−
1
,
k
−
1
0
0
⋯
0
0
0
)
=
(
R
k
0
T
)
R
k
=
(
r
0
,
0
r
0
,
1
r
0
,
2
⋯
⋯
0
0
r
1
,
1
r
1
,
2
r
1
,
3
⋯
0
0
0
⋯
⋯
∗
∗
0
0
⋯
⋯
0
r
k
−
1
,
k
−
1
)
\left\{\begin{aligned} & R_{k + 1,k}=\left(\begin{array}{cccccc} r_{0,0} & r_{0,1} & r_{0,2} & \cdots & \cdots & 0\\ 0 & r_{1,1} & r_{1,2} & r_{1,3} & \cdots & 0 \\ 0 & 0 & \cdots & \cdots & *& *\\ 0 & 0 & \cdots & \cdots & 0 & r_{k - 1,k - 1}\\ 0 & 0 & \cdots & 0 & 0 & 0\\ \end{array}\right) =\left(\begin{array}{c} R_{k} \\ 0^T \\ \end{array}\right) \\ & R_{k}=\left(\begin{array}{cccccc} r_{0,0} & r_{0,1} & r_{0,2} & \cdots & \cdots & 0\\ 0 & r_{1,1} & r_{1,2} & r_{1,3} & \cdots & 0 \\ 0 & 0 & \cdots & \cdots & *& *\\ 0 & 0 & \cdots & \cdots & 0 & r_{k - 1,k - 1}\\ \end{array}\right) \end{aligned}\right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧Rk+1,k=⎝⎜⎜⎜⎜⎛r0,00000r0,1r1,1000r0,2r1,2⋯⋯⋯⋯r1,3⋯⋯0⋯⋯∗0000∗rk−1,k−10⎠⎟⎟⎟⎟⎞=(Rk0T)Rk=⎝⎜⎜⎛r0,0000r0,1r1,100r0,2r1,2⋯⋯⋯r1,3⋯⋯⋯⋯∗000∗rk−1,k−1⎠⎟⎟⎞
则根据
D
k
R
k
=
Q
k
D_k R_k = Q_k
DkRk=Qk,可以得到以下公式。
{
r
0
,
0
d
1
=
q
1
,
r
0
,
1
d
1
+
r
1
,
1
d
2
=
q
2
,
∑
s
=
0
i
−
1
r
s
,
i
−
1
d
s
+
1
=
q
i
,
i
=
3
,
4
,
5
,
…
k
−
1
,
\left\{\begin{aligned} & r_{0,0} d_1 = q_1, \\ & r_{0,1} d_1 + r_{1,1} d_2 = q_2,\\ & \sum_{s = 0}^{i - 1} r_{s,i - 1} d_{s + 1} = q_i,i = 3,4,5,\ldots k - 1,\\ \end{aligned}\right.
⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧r0,0d1=q1,r0,1d1+r1,1d2=q2,s=0∑i−1rs,i−1ds+1=qi,i=3,4,5,…k−1,
从上述公式反解出
d
i
d_i
di关于
q
i
q_i
qi的表达式如下:
{
d
1
=
q
1
r
0
,
0
,
d
2
=
(
q
2
−
r
0
,
1
d
1
)
/
r
1
,
1
,
d
i
=
(
q
i
−
r
i
−
3
,
i
−
1
d
i
−
2
−
r
i
−
2
,
i
−
1
d
i
−
1
)
/
r
i
−
1
,
i
−
1
,
\left\{\begin{aligned} & d_1 = \frac{q_1}{r_{0,0}}, \\ & d_2 = (q_2 - r_{0,1} d_1)/r_{1,1},\\ & d_i = (q_i - r_{i - 3, i - 1} d_{i - 2} - r_{i - 2,i - 1} d_{i - 1})/r_{i - 1,i - 1},\\ \end{aligned}\right.
⎩⎪⎪⎪⎨⎪⎪⎪⎧d1=r0,0q1,d2=(q2−r0,1d1)/r1,1,di=(qi−ri−3,i−1di−2−ri−2,i−1di−1)/ri−1,i−1,
由于
Q
k
=
[
Q
k
−
1
,
q
k
]
Q_k = [Q_{k - 1},q_k]
Qk=[Qk−1,qk]和
R
k
−
1
R_{k-1}
Rk−1是
R
k
R_k
Rk的
(
k
−
1
)
(k-1)
(k−1)阶顺序主子阵,可以得到
{
D
k
=
Q
k
R
k
−
1
=
[
Q
k
−
1
,
q
k
]
[
R
k
−
1
−
1
∗
∗
]
=
[
D
k
−
1
,
d
k
]
\left\{\begin{aligned} D_{k} &= Q_k R_{k}^{-1} \\ & = \left[Q_{k-1}, q_k\right]\left[\begin{array}{ll} R_{k-1}^{-1} & * \\ & * \end{array}\right]\\ & =\left[D_{k-1}, d_k\right] \end{aligned}\right.
⎩⎪⎪⎪⎨⎪⎪⎪⎧Dk=QkRk−1=[Qk−1,qk][Rk−1−1∗∗]=[Dk−1,dk]
a
V
k
+
1
,
k
T
e
1
a V_{k+1,k}^T e_1
aVk+1,kTe1的处理
由于
V
k
+
1
=
[
V
k
+
1
,
k
,
v
k
+
1
]
V_{k + 1} = [V_{k + 1,k},v_{k + 1}]
Vk+1=[Vk+1,k,vk+1],根据下面这个公式可以有:
a
V
k
+
1
T
e
1
=
(
V
k
+
1
,
k
T
v
k
+
1
T
)
(
a
e
1
)
=
(
V
k
+
1
,
k
T
(
a
e
1
)
v
k
+
1
T
(
a
e
1
)
)
=
(
η
(
k
)
η
^
k
+
1
)
a V_{k + 1}^T e_1=\left(\begin{array}{c} V_{k + 1,k}^T \\ v_{k + 1}^T \\ \end{array}\right) (ae_1)=\left(\begin{array}{c} V_{k + 1,k}^T (ae_1) \\ v_{k + 1}^T (ae_1)\\ \end{array}\right)=\left(\begin{array}{c} \eta^{(k)} \\ \hat{\eta}_{k+ 1}\\ \end{array}\right)
aVk+1Te1=(Vk+1,kTvk+1T)(ae1)=(Vk+1,kT(ae1)vk+1T(ae1))=(η(k)η^k+1)
也就是说
a
V
k
+
1
,
k
T
e
1
a V_{k+1,k}^T e_1
aVk+1,kTe1和
a
V
k
+
1
T
e
1
aV_{k + 1}^T e_1
aVk+1Te1的前
k
k
k个元素相同,我们记
η
(
k
)
=
a
V
k
+
1
,
k
T
e
1
\eta^{(k)} = a V_{k+1,k}^T e_1
η(k)=aVk+1,kTe1,修改以后如上所示,因此转换完以后的二范数问题最优解\eqref{yk}的解是
y
k
=
a
R
k
−
1
V
k
+
1
,
k
T
e
1
=
R
k
−
1
η
(
k
)
.
y^k = a R_{k}^{-1} V_{k+1,k}^T e_1 = R_{k}^{-1} \eta^{(k)}.
yk=aRk−1Vk+1,kTe1=Rk−1η(k).
由于对
T
k
+
1
,
k
T_{k+1,k}
Tk+1,k做QR分解的时候,利用Givens变换处理,根据上面推导过程得到的
V
k
V_{k}
Vk其实也是
V
k
+
1
V_{k+1}
Vk+1的
k
k
k阶顺序主子阵,因此
V
k
+
1
T
e
1
V_{k + 1}^T e_1
Vk+1Te1的前
k
−
1
k - 1
k−1个元素相同。
V
k
+
1
=
[
V
k
0
0
1
]
V_{k+1}=\left[\begin{array}{cc} V_k & 0 \\ 0 & 1 \end{array}\right]
Vk+1=[Vk001]
也就是说
η
(
k
)
=
[
η
(
k
−
1
)
,
η
k
]
\eta^{(k)} = [\eta^{(k - 1)},\eta_k]
η(k)=[η(k−1),ηk],最后一直推导就有
η
(
k
)
=
[
η
1
,
η
2
,
…
,
η
k
]
\eta^{(k)} = [\eta_1,\eta_2,\ldots,\eta_k]
η(k)=[η1,η2,…,ηk]
\subsection{
x
k
x^k
xk计算公式和残量计算}
∙
\bullet
∙ \quad 最后得到下面这个递推公式,其中
d
k
d_k
dk可以通过公式来递推得到,
{
x
k
=
x
0
+
Q
k
(
a
R
k
−
1
V
k
+
1
,
k
T
e
1
)
=
x
0
+
D
k
η
(
k
)
=
x
0
+
[
D
k
−
1
,
d
k
]
[
η
(
k
−
1
)
η
k
]
=
x
0
+
D
k
−
1
η
(
k
−
1
)
+
η
k
d
k
=
x
k
−
1
+
η
k
d
k
.
\left\{\begin{aligned} x^k &= x^0 + Q_k (a R_{k}^{-1} V_{k+1,k}^T e_1) \\ &= x^0 + D_k \eta^{(k)} \\ &= x^0 + \left[D_{k-1}, d_k\right]\left[\begin{array}{c} \eta^{(k-1)} \\ \eta_k \end{array}\right] \\ & = x^0 + D_{k - 1} \eta^{(k-1)} + \eta_k d_k \\ &= x^{k - 1} + \eta_k d_k. \end{aligned}\right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧xk=x0+Qk(aRk−1Vk+1,kTe1)=x0+Dkη(k)=x0+[Dk−1,dk][η(k−1)ηk]=x0+Dk−1η(k−1)+ηkdk=xk−1+ηkdk.
而根据上面的信息,我们知道
η
k
\eta_k
ηk是
a
V
k
+
1
T
e
1
a V_{k + 1}^T e_1
aVk+1Te1的第
k
k
k个分量,根据下面的公式发现只需要将对应的Givens变换作用在
(
a
e
1
)
(ae1)
(ae1)即可得到
a
V
k
+
1
T
e
1
a V_{k + 1}^T e_1
aVk+1Te1。
a
V
k
+
1
T
e
1
=
V
k
+
1
T
(
a
e
1
)
=
(
G
k
G
~
k
−
1
⋯
G
~
1
)
⊤
(
a
e
1
)
.
a V_{k + 1}^T e_1 = V_{k + 1}^T (ae1) = \left(G_k \tilde{G}_{k-1} \cdots \tilde{G}_1\right)^{\top} (ae1).
aVk+1Te1=Vk+1T(ae1)=(GkG~k−1⋯G~1)⊤(ae1).
∙
\bullet
∙ \quad 残量的计算如下所示,即残量最终是等式
a
V
k
+
1
e
1
aV_{k+1}e_1
aVk+1e1的最后一个分量的绝对值:
{
∥
r
k
∥
2
=
∥
A
x
k
−
b
∥
2
=
∥
a
q
1
−
Q
k
+
1
T
k
+
1
,
k
y
k
∥
2
=
∥
Q
k
+
1
(
a
e
1
−
T
k
+
1
,
k
y
k
)
∥
2
=
∥
(
a
e
1
−
V
k
+
1
R
k
+
1
,
k
y
k
)
∥
2
=
∥
V
k
+
1
(
V
k
+
1
T
a
e
1
−
R
k
+
1
,
k
y
k
)
∥
2
=
∥
(
V
k
+
1
T
a
e
1
−
R
k
+
1
,
k
y
k
)
∥
2
=
∥
[
η
(
k
)
η
^
k
+
1
]
−
[
R
k
y
k
0
]
∥
2
=
∣
η
^
k
+
1
∣
\left\{\begin{aligned} \|r_k\|_2 &= \|Ax^k - b\|_2 \\ &= \|aq_1 - Q_{k+1} T_{k+1,k} y^k\|_2 \\ &= \|Q_{k+1} (ae_1 - T_{k+1,k} y^k)\|_2 \\ &= \|(ae_1 - V_{k+1} R_{k+1,k} y^k)\|_2 \\ &= \|V_{k+1} (V_{k+1}^T ae_1 - R_{k+1,k} y^k)\|_2 \\ &= \|(V_{k+1}^T ae_1 - R_{k+1,k} y^k)\|_2 \\ &= \left\|\left[\begin{array}{c} \eta^{(k)} \\ \hat{\eta}_{k+1} \end{array}\right]-\left[\begin{array}{c} R_k y_k \\ 0 \end{array}\right]\right\|_2 \\ & = |\hat{\eta}_{k+1}| \end{aligned}\right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧∥rk∥2=∥Axk−b∥2=∥aq1−Qk+1Tk+1,kyk∥2=∥Qk+1(ae1−Tk+1,kyk)∥2=∥(ae1−Vk+1Rk+1,kyk)∥2=∥Vk+1(Vk+1Tae1−Rk+1,kyk)∥2=∥(Vk+1Tae1−Rk+1,kyk)∥2=∥∥∥∥[η(k)η^k+1]−[Rkyk0]∥∥∥∥2=∣η^k+1∣
算法流程:前三次迭代过程解析
为了更方便理解算法流程,我们先考虑
x
k
,
k
=
1
,
2
,
3
x^k,k =1,2,3
xk,k=1,2,3的更新过程。
\newpage%------------------------
∙
x
3
=
x
2
+
η
3
d
3
\bullet \quad x^3 = x^2 + \eta_3 d_3
∙x3=x2+η3d3
⇒ d 3 = ( q 3 − r 2 , 3 d 2 − r 1 , 3 d 1 ) / r 2 , 2 , q 3 = ( A q 2 − β 1 q 1 ) / β 2 \Rightarrow \quad d_3 = (q_3 - r_{2,3} d_2 - r_{1,3} d_1)/r_{2,2},q_3 = (Aq_2 - \beta_1 q_1)/\beta_2 ⇒d3=(q3−r2,3d2−r1,3d1)/r2,2,q3=(Aq2−β1q1)/β2
⇒ T 4 , 3 = ( T 3 β 3 e 3 T ) = ( α 1 β 1 0 β 1 α 2 β 2 0 β 2 α 3 0 0 β 3 ) \Rightarrow \quad T_{4,3} =\left(\begin{array}{c} T_3 \\ \beta_3 e_3^T \end{array}\right)=\left(\begin{array}{ccc} \alpha_1 & \beta_1 & 0\\ \beta_1 & \alpha_2 & \beta_2 \\ 0 & \beta_2 & \alpha_3 \\ 0 & 0 & \beta_3 \\ \end{array}\right) ⇒T4,3=(T3β3e3T)=⎝⎜⎜⎛α1β100β1α2β200β2α3β3⎠⎟⎟⎞
⇒ α 3 = ( q 3 , A q 3 ) , β 3 = ∥ A q 3 − β 2 q 2 − α 3 q 3 ∥ 2 \Rightarrow \quad \alpha_3 = (q_3,Aq_3),\beta_3 = \|Aq_3 - \beta_2 q_2 - \alpha_3 q_3 \|_2 ⇒α3=(q3,Aq3),β3=∥Aq3−β2q2−α3q3∥2
先用Givens变换 G ~ 1 T 4 , 3 \tilde{G}_1 T_{4,3} G~1T4,3
⇒ [ c 1 s 1 0 0 − s 1 c 1 0 0 0 0 1 0 0 0 0 1 ] ( α 1 β 1 0 β 1 α 2 β 2 0 β 2 α 3 0 0 β 3 ) = [ r 0 , 0 r ^ 0 , 1 r ^ 0 , 2 ( 1 ) 0 r ^ 1 , 1 r ^ 1 , 2 ( 1 ) 0 β 2 α 3 0 0 β 3 ] \Rightarrow \quad \left[\begin{array}{cccc} c_{1} & s_{1} & 0 & 0\\ -s_{1} & c_{1} & 0 & 0\\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right] \left(\begin{array}{ccc} \alpha_1 & \beta_1 & 0\\ \beta_1 & \alpha_2 & \beta_2 \\ 0 & \beta_2 & \alpha_3 \\ 0 & 0 & \beta_3 \\ \end{array}\right) = \left[\begin{array}{ccc} r_{0,0} & \hat{r}_{0,1} & \hat{r}_{0,2}^{(1)}\\ 0 & \hat{r}_{1,1} & \hat{r}_{1,2}^{(1)}\\ 0 & \beta_2 & \alpha_3\\ 0 & 0 & \beta_{3} \end{array}\right] ⇒⎣⎢⎢⎡c1−s100s1c10000100001⎦⎥⎥⎤⎝⎜⎜⎛α1β100β1α2β200β2α3β3⎠⎟⎟⎞=⎣⎢⎢⎡r0,0000r^0,1r^1,1β20r^0,2(1)r^1,2(1)α3β3⎦⎥⎥⎤,
⇒ r ^ 0 , 2 ( 1 ) = s 1 β 2 , r ^ 1 , 2 ( 1 ) = c 1 β 2 \Rightarrow \quad \hat{r}_{0,2}^{(1)} = s_1 \beta_2,\hat{r}_{1,2}^{(1)} = c_1 \beta_2 ⇒r^0,2(1)=s1β2,r^1,2(1)=c1β2
再用Givens变换 G ~ 2 G ~ 1 T 4 , 3 \tilde{G}_2 \tilde{G}_1 T_{4,3} G~2G~1T4,3,从上面更新 x 2 x^2 x2的过程我们知道 G ~ 2 \tilde{G}_2 G~2会把 [ r ^ 1 , 1 , β 2 ] T [\hat{r}_{1,1},\beta_2]^T [r^1,1,β2]T这部分变成 [ r 1 , 1 , 0 ] T [r_{1,1},0]^T [r1,1,0]T
⇒ [ 1 0 0 0 0 c 2 s 2 0 0 − s 2 c 2 0 0 0 0 1 ] [ r 0 , 0 r ^ 0 , 1 r ^ 0 , 2 ( 1 ) 0 r ^ 1 , 1 r ^ 1 , 2 ( 1 ) 0 β 2 α 3 0 0 β 3 ] = [ r 0 , 0 r ^ 0 , 1 r ^ 0 , 2 0 r ^ 1 , 1 r ^ 1 , 2 0 0 r ^ 2 , 2 0 0 β 3 ] \Rightarrow \quad \left[\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & c_{2} & s_{2} & 0\\ 0 & -s_{2} & c_{2} & 0\\ 0 & 0 & 0 & 1 \end{array}\right] \left[\begin{array}{ccc} r_{0,0} & \hat{r}_{0,1} & \hat{r}_{0,2}^{(1)}\\ 0 & \hat{r}_{1,1} & \hat{r}_{1,2}^{(1)}\\ 0 & \beta_2 & \alpha_3\\ 0 & 0 & \beta_{3} \end{array}\right] = \left[\begin{array}{ccc} r_{0,0} & \hat{r}_{0,1} & \hat{r}_{0,2}\\ 0 & \hat{r}_{1,1} & \hat{r}_{1,2}\\ 0 & 0 & \hat{r}_{2,2}\\ 0 & 0 & \beta_3 \end{array}\right] ⇒⎣⎢⎢⎡10000c2−s200s2c200001⎦⎥⎥⎤⎣⎢⎢⎡r0,0000r^0,1r^1,1β20r^0,2(1)r^1,2(1)α3β3⎦⎥⎥⎤=⎣⎢⎢⎡r0,0000r^0,1r^1,100r^0,2r^1,2r^2,2β3⎦⎥⎥⎤
Givens变换把 [ r ^ 2 , 2 , β 3 ] T [\hat{r}_{2,2},\beta_3]^T [r^2,2,β3]T这部分变成 [ r 2 , 2 , 0 ] T [r_{2,2},0]^T [r2,2,0]T,选择 γ = r ^ 2 , 2 β 3 , s 3 = 1 1 + γ 2 , c 3 = γ s 3 \gamma = \frac{\hat{r}_{2,2}}{\beta_3},s_3 = \frac{1}{\sqrt{1 + \gamma^2}},c_3 = \gamma s_3 γ=β3r^2,2,s3=1+γ21,c3=γs3
⇒ [ r 0 , 0 r 0 , 1 r 0 , 2 0 r 1 , 1 r 1 , 2 0 0 r 2 , 2 0 0 0 ] = [ 1 0 0 0 0 1 0 0 0 0 c 3 s 3 0 0 − s 3 c 3 ] [ r 0 , 0 r ^ 0 , 1 r ^ 0 , 2 0 r ^ 1 , 1 r ^ 1 , 2 0 0 r ^ 2 , 2 0 0 β 3 ] \Rightarrow \quad \left[\begin{array}{ccc} r_{0,0} & r_{0,1} & r_{0,2} \\ 0 & r_{1,1} & r_{1,2} \\ 0 & 0 & r_{2,2} \\ 0 & 0 & 0 \end{array}\right] = \left[\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & c_{3} & s_{3} \\ 0 & 0 & -s_{3} & c_{3} \end{array}\right] \left[\begin{array}{ccc} r_{0,0} & \hat{r}_{0,1} & \hat{r}_{0,2}\\ 0 & \hat{r}_{1,1} & \hat{r}_{1,2}\\ 0 & 0 & \hat{r}_{2,2}\\ 0 & 0 & \beta_3 \end{array}\right] ⇒⎣⎢⎢⎡r0,0000r0,1r1,100r0,2r1,2r2,20⎦⎥⎥⎤=⎣⎢⎢⎡1000010000c3−s300s3c3⎦⎥⎥⎤⎣⎢⎢⎡r0,0000r^0,1r^1,100r^0,2r^1,2r^2,2β3⎦⎥⎥⎤,
⇒ [ η 1 , η 2 , η 3 ] = η ( 3 ) , [ η 1 η 2 η 3 η ^ 4 ] = V 4 T ( a e 1 ) = G 3 G ~ 2 G ~ 1 ( a e 1 ) , \Rightarrow \quad [\eta_1,\eta_2,\eta_3] = \eta^{(3)} ,\left[\begin{array}{c} \eta_1 \\ \eta_2 \\ \eta_3 \\ \hat{\eta}_{4} \end{array}\right] = V_{4}^{T} (ae_1) = G_3 \tilde{G}_2 \tilde{G}_1 (ae_1) , ⇒[η1,η2,η3]=η(3),⎣⎢⎢⎡η1η2η3η^4⎦⎥⎥⎤=V4T(ae1)=G3G~2G~1(ae1), \
⇒ [ η 1 η 2 η 3 η ^ 4 ] = [ 1 0 0 0 0 1 0 0 0 0 c 3 s 3 0 0 − s 3 c 3 ] [ η 1 η 2 η ^ 3 0 ] = [ η 1 η 2 c 3 η ^ 3 − s 3 η 3 ^ ] \Rightarrow \quad \left[\begin{array}{c} \eta_1 \\ \eta_2 \\ \eta_3 \\ \hat{\eta}_{4} \end{array}\right] = \left[\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & c_{3} & s_{3} \\ 0 & 0 & -s_{3} & c_{3} \end{array}\right] \left[\begin{array}{c} \eta_1 \\ \eta_2 \\ \hat{\eta}_3 \\ 0 \end{array}\right] = \left[\begin{array}{c} \eta_1 \\ \eta_2 \\ c_3 \hat{\eta}_3 \\ -s_3 \hat{\eta_3} \end{array}\right] ⇒⎣⎢⎢⎡η1η2η3η^4⎦⎥⎥⎤=⎣⎢⎢⎡1000010000c3−s300s3c3⎦⎥⎥⎤⎣⎢⎢⎡η1η2η^30⎦⎥⎥⎤=⎣⎢⎢⎡η1η2c3η^3−s3η3^⎦⎥⎥⎤,
⇒ η 3 = c 3 η ^ 3 , r 2 , 2 = c 3 r ^ 2 , 2 + s 3 β 3 , r 0 , 2 = r ^ 0 , 2 , r 1 , 2 = r ^ 1 , 2 , d 3 = q 3 − r 0 , 2 d 1 − r 1 , 2 d 2 r 2 , 2 \Rightarrow \quad \boldsymbol{\eta_3 = c_3 \hat{\eta}_3,r_{2,2} =c_3 \hat{r}_{2,2} + s_3 \beta_3,r_{0,2} = \hat{r}_{0,2},r_{1,2} = \hat{r}_{1,2},d_3 = \frac{q_3 - r_{0,2}d_1 - r_{1,2} d_2 }{r_{2,2}} } ⇒η3=c3η^3,r2,2=c3r^2,2+s3β3,r0,2=r^0,2,r1,2=r^1,2,d3=r2,2q3−r0,2d1−r1,2d2,
⇒ x 3 = x 2 + η 3 d 3 \Rightarrow \quad x^3 = x^2 + \eta_3 d_3 ⇒x3=x2+η3d3
{ η 1 = a c 1 , d 1 = q 1 / r 0 , 0 , q 1 = r 0 / a = ( A x 0 − b ) / a , a = ∥ r 0 ∥ 2 , r 0 , 0 = α 1 c 1 + β 1 s 1 , x 1 = x 0 + η 1 d 1 , η ^ 2 = − a s 1 . \left\{\begin{aligned} & \eta_1 = a c_1,\\ & d_1 = q_1/r_{0,0},\\ & q_1 = r^0/a = (Ax^0 - b)/a, a = \|r^0\|_2,\\ & r_{0,0} = \alpha_1 c_1 + \beta_1 s_1,\\ & x^1 = x^0 + \eta_1 d_1,\\ & \hat{\eta}_2 = -a s_1. \end{aligned}\right. ⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧η1=ac1,d1=q1/r0,0,q1=r0/a=(Ax0−b)/a,a=∥r0∥2,r0,0=α1c1+β1s1,x1=x0+η1d1,η^2=−as1.
{ η 2 = c 2 η ^ 2 , d 2 = ( q 2 − r 0 , 1 d 1 ) / r 1 , 1 , q 2 = ( A q 1 − α 1 q 1 ) / β 1 , r 0 , 1 = r ^ 0 , 1 = c 1 β 1 + s 1 α 2 , r 1 , 1 = c 2 r ^ 1 , 1 + s 2 β 2 , r ^ 1 , 1 = − s 1 β 1 + c 1 α 2 , x 2 = x 1 + η 2 d 2 , η ^ 3 = − s 2 η ^ 2 . \left\{\begin{aligned} & \eta_2 = c_2 \hat{\eta}_2,\\ & d_2 = (q_2 - r_{0,1} d_1)/r_{1,1},\\ & q_2 = (Aq_1 - \alpha_1 q_1)/\beta_1,\\ & r_{0,1} = \hat{r}_{0,1} = c_1 \beta_1 + s_1 \alpha_2,\\ & r_{1,1} = c_2 \hat{r}_{1,1} + s_2 \beta_2,\hat{r}_{1,1} = -s_1 \beta_1 + c_1 \alpha_2, \\ & x^2 = x^1 + \eta_2 d_2,\\ & \hat{\eta}_3 = -s_2 \hat{\eta}_2. \end{aligned}\right. ⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧η2=c2η^2,d2=(q2−r0,1d1)/r1,1,q2=(Aq1−α1q1)/β1,r0,1=r^0,1=c1β1+s1α2,r1,1=c2r^1,1+s2β2,r^1,1=−s1β1+c1α2,x2=x1+η2d2,η^3=−s2η^2.
{
η
3
=
c
3
η
^
3
,
d
3
=
(
q
3
−
r
0
,
2
d
1
−
r
1
,
2
d
2
)
/
r
2
,
2
,
q
3
=
(
A
q
2
−
β
1
q
1
−
α
2
q
2
)
/
β
2
,
r
0
,
2
=
s
1
β
2
,
r
1
,
2
=
c
1
β
2
,
r
2
,
2
=
c
3
r
^
2
,
2
+
s
3
β
3
,
r
^
2
,
2
=
−
s
2
r
^
1
,
2
(
1
)
+
c
2
α
3
,
r
^
1
,
2
(
1
)
=
c
1
β
2
,
x
2
=
x
1
+
η
2
d
2
,
η
^
3
=
−
s
3
η
^
3
.
\left\{\begin{aligned} & \eta_3 = c_3 \hat{\eta}_3,\\ & d_3 = (q_3 - r_{0,2} d_1 - r_{1,2} d_2)/r_{2,2},\\ & q_3 = (Aq_2 - \beta_1 q_1 - \alpha_2 q_2)/\beta_2,\\ & r_{0,2} = s_1 \beta_2,r_{1,2} = c_1 \beta_2,\\ & r_{2,2} = c_3 \hat{r}_{2,2} + s_3 \beta_3,\hat{r}_{2,2} = -s_2 \hat{r}_{1,2}^{(1)} + c_2 \alpha_3, \hat{r}_{1,2}^{(1)} = c_1 \beta_2,\\ & x^2 = x^1 + \eta_2 d_2,\\ & \hat{\eta}_3 = -s_3 \hat{\eta}_3. \end{aligned}\right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧η3=c3η^3,d3=(q3−r0,2d1−r1,2d2)/r2,2,q3=(Aq2−β1q1−α2q2)/β2,r0,2=s1β2,r1,2=c1β2,r2,2=c3r^2,2+s3β3,r^2,2=−s2r^1,2(1)+c2α3,r^1,2(1)=c1β2,x2=x1+η2d2,η^3=−s3η^3.
通过前三个变量的更新过程,我们可以发现
R
k
+
1
,
k
R_{k+1,k}
Rk+1,k的最后一列其实就是
G
k
G
~
k
−
1
…
G
~
1
G_k \tilde{G}_{k-1} \ldots \tilde{G}_1
GkG~k−1…G~1作用在
T
k
+
1
,
k
T_{k+1,k}
Tk+1,k最后一列得到的,然后我们知道
T
k
+
1
,
k
T_{k+1,k}
Tk+1,k最后一列是
[
0
,
…
,
0
,
β
k
−
1
,
α
k
,
β
k
]
T
[0,\ldots,0,\beta_{k-1},\alpha_k,\beta_k]^T
[0,…,0,βk−1,αk,βk]T以及
G
i
G_i
Gi只改变第
i
,
i
+
1
i,i+1
i,i+1行元素,所以我们有:\
[ 0 ⋮ 0 r k − 3 , k − 1 r k − 2 , k − 1 r k − 1 , k − 1 0 ] = G k G ~ k − 1 G ~ k − 2 [ 0 ⋮ 0 β k − 1 α k β k ] , k ≥ 3 \left[\begin{array}{c} 0 \\ \vdots \\ 0 \\ r_{k - 3,k - 1} \\ r_{k - 2,k - 1} \\ r_{k - 1,k - 1} \\ 0 \end{array}\right] = G_k \tilde{G}_{k-1} \tilde{G}_{k-2} \left[\begin{array}{c} 0 \\ \vdots \\ 0 \\ \beta_{k-1} \\ \alpha_k \\ \beta_k \end{array}\right] ,k \geq 3 ⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡0⋮0rk−3,k−1rk−2,k−1rk−1,k−10⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=GkG~k−1G~k−2⎣⎢⎢⎢⎢⎢⎢⎢⎡0⋮0βk−1αkβk⎦⎥⎥⎥⎥⎥⎥⎥⎤,k≥3 \
[ r k − 3 , k − 1 r k − 2 , k − 1 r k − 1 , k − 1 0 ] = G k G ~ k − 1 G ~ k − 2 [ 0 β k − 1 α k β k ] = G k G ~ k − 1 [ r k − 3 , k − 1 β ^ k − 1 α k β k ] = G k [ r k − 3 , k − 1 r k − 2 , k − 1 α ^ k β k ] = [ r k − 3 , k − 1 r k − 2 , k − 1 r k − 1 , k − 1 0 ] , k ≥ 3 \left[\begin{array}{c} r_{k - 3,k - 1} \\ r_{k - 2,k - 1} \\ r_{k - 1,k - 1} \\ 0 \end{array}\right] = G_k \tilde{G}_{k-1} \tilde{G}_{k-2} \left[\begin{array}{c} 0 \\ \beta_{k-1} \\ \alpha_k \\ \beta_k \end{array}\right] = G_k \tilde{G}_{k-1} \left[\begin{array}{c} r_{k - 3,k - 1} \\ \hat{\beta}_{k-1} \\ \alpha_k \\ \beta_k \end{array}\right] = G_k \left[\begin{array}{c} r_{k - 3,k - 1} \\ r_{k - 2,k - 1} \\ \hat{\alpha}_k \\ \beta_k \end{array}\right] = \left[\begin{array}{c} r_{k - 3,k - 1} \\ r_{k - 2,k - 1} \\ r_{k - 1,k - 1} \\ 0 \end{array}\right],k \geq 3 ⎣⎢⎢⎡rk−3,k−1rk−2,k−1rk−1,k−10⎦⎥⎥⎤=GkG~k−1G~k−2⎣⎢⎢⎡0βk−1αkβk⎦⎥⎥⎤=GkG~k−1⎣⎢⎢⎡rk−3,k−1β^k−1αkβk⎦⎥⎥⎤=Gk⎣⎢⎢⎡rk−3,k−1rk−2,k−1α^kβk⎦⎥⎥⎤=⎣⎢⎢⎡rk−3,k−1rk−2,k−1rk−1,k−10⎦⎥⎥⎤,k≥3
{
η
k
=
c
k
η
^
k
,
d
k
=
(
q
k
−
r
k
−
3
,
k
−
1
d
k
−
2
−
r
k
−
2
,
k
−
1
d
k
−
1
)
/
r
k
−
1
,
k
−
1
,
q
k
=
(
A
q
k
−
1
−
β
k
−
2
q
k
−
2
−
α
k
−
1
q
k
−
1
)
/
β
k
−
1
,
r
k
−
3
,
k
−
1
=
s
k
−
2
β
k
−
1
,
r
k
−
2
,
k
−
1
=
c
k
−
1
β
^
k
−
1
+
s
k
−
1
α
k
,
β
^
k
−
1
=
c
k
−
2
β
k
−
1
,
r
k
−
1
,
k
−
1
=
c
k
α
^
k
+
s
k
β
k
,
α
^
k
=
−
s
k
−
1
β
^
k
+
c
k
−
1
α
k
,
x
k
=
x
k
−
1
+
η
k
d
k
,
η
^
k
+
1
=
−
s
k
η
^
k
.
\left\{\begin{aligned} & \eta_k = c_k \hat{\eta}_k,\\ & d_k = (q_k - r_{k - 3,k - 1} d_{k - 2} - r_{k - 2,k - 1} d_{k - 1})/r_{k-1,k-1},\\ & q_k = (Aq_{k-1} - \beta_{k-2} q_{k-2} - \alpha_{k-1} q_{k-1})/\beta_{k-1},\\ & r_{k - 3,k - 1} = s_{k - 2} \beta_{k-1},\\ & r_{k - 2,k - 1} = c_{k-1} \hat{\beta}_{k-1} + s_{k-1} \alpha_k,\hat{\beta}_{k - 1} = c_{k-2} \beta_{k-1},\\ & r_{k - 1,k - 1} = c_k \hat{\alpha}_k + s_k \beta_k ,\hat{\alpha}_k = -s_{k-1} \hat{\beta}_k + c_{k-1} \alpha_k, \\ & x^{k} = x^{k-1} + \eta_k d_k,\\ & \hat{\eta}_{k+1} = -s_{k} \hat{\eta}_{k}. \end{aligned}\right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧ηk=ckη^k,dk=(qk−rk−3,k−1dk−2−rk−2,k−1dk−1)/rk−1,k−1,qk=(Aqk−1−βk−2qk−2−αk−1qk−1)/βk−1,rk−3,k−1=sk−2βk−1,rk−2,k−1=ck−1β^k−1+sk−1αk,β^k−1=ck−2βk−1,rk−1,k−1=ckα^k+skβk,α^k=−sk−1β^k+ck−1αk,xk=xk−1+ηkdk,η^k+1=−skη^k.
详细代码以及介绍参考本人知乎添加链接描述
import numpy as np
import time
N = 1000
A = np.zeros([N,N])
penalty = 10
for i in range(N):
for j in range(i + 1,N):
A[i,j] = np.random.rand(1)
A[j,i] = A[i,j]
A[i,i] = N + penalty*abs(np.random.rand(1))#主对角占优,确保非奇异
#print(A)
x = np.random.rand(N,1)
b = A@x
def MINRES(A,b,x0,N,eps):
r = b - A@x0
a = np.linalg.norm(r)
# q_old = q_{k-1},q_new = q_{k},c_old = c_{k-2},c_mid = c_{k-1},c_new = c_k
q_old = 0*r.copy();q_new = r/a
d_old = 0*r.copy();d_mid = 0*r.copy()
xi_old = a;beta_old = 0
r_old = 0;r_mid = 0
c_old = 0;c_mid = 0
s_old = 0;s_mid = 0
ls = np.zeros([N,N])
for k in range(N):
w = A@q_new - beta_old*q_old
alpha = np.dot(w.T,q_new)[0,0]
w = w - alpha*q_new
beta_new = np.linalg.norm(w)
if k == 0:
alpha_hat = alpha
elif k == 1:
r_mid = c_mid*beta_old + s_mid*alpha
alpha_hat = -s_mid*beta_old + c_mid*alpha
else:
r_old = s_old*beta_old
beta_hat = c_old*beta_old
r_mid = c_mid*beta_hat + s_mid*alpha
alpha_hat = -s_mid*beta_hat + c_mid*alpha
if abs(alpha_hat) > abs(beta_new):
gamma = beta_new/alpha_hat
c_new = 1.0/np.sqrt(1 + gamma**2);s_new = c_new*gamma
else:
gamma = alpha_hat/beta_new
s_new = 1.0/np.sqrt(1 + gamma**2);c_new = s_new*gamma
r_new = c_new*alpha_hat + s_new*beta_new
xi_new = -s_new*xi_old
xi_old = c_new*xi_old
d_new = (q_new - r_old*d_old - r_mid*d_mid)/r_new
x0 += xi_old*d_new
print('res:%.2e,real res:%.2e'%(abs(xi_new),np.linalg.norm(b - A@x0)))
ls[:,k:k + 1] = q_new
if abs(xi_new) < eps:
break
else:
xi_old = xi_new
q_old = q_new.copy()
q_new = w.copy()/beta_new
beta_old = beta_new
c_old = c_mid
c_mid = c_new
s_old = s_mid
s_mid = s_new
r_old = r_mid
r_mid = r_new
d_old = d_mid.copy()
d_mid = d_new.copy()
#print(w,beta_new)
if abs(xi_new) < eps:
print('success')
else:
print('fail')
return x0,ls
min_t0 = time.time()
x0 = np.random.rand(N,1)
eps = 1e-7
x_p,ls = MINRES(A,b,x0,N,eps)
min_ela = time.time() - min_t0
print('time:%.2f,err:%.2e'%(min_ela,max(abs(x_p - x))))
min_t0 = time.time()
xp = np.linalg.solve(A,b)
min_ela = time.time() - min_t0
print('time:%.2f,err:%.2e'%(min_ela,max(abs(xp - x))))