Converting one matrix factorization to another
(Note: The following content is mainly excerpted from the reference [1] listed at the bottom.)
Suppose that we have obtained a partial decomposition of a matrix
A
A
A by some means:
∥
A
−
C
B
∥
≤
ϵ
,
(1)
\|A-CB\|\leq \epsilon, \tag{1}
∥A−CB∥≤ϵ,(1)
where
B
B
B and
C
C
C have rank
k
k
k.
1 Preliminaries (Some definitions)
1.1 Full Decomposition and Partial Decomposition
It is possible to compute the full QR decomposition for the full SVD of an
m
×
n
m \times n
m×n matrix to double-precision accuracy with
O
(
m
n
min
{
m
,
n
}
)
O(mn\min\{m,n\})
O(mnmin{m,n}) flops. (That is, full decomposition is an accurate decomposition. ) The full decompostion has the following formula,
A
=
C
B
(2)
A = CB \tag{2}
A=CB(2)
Suppose that an
m
×
n
m \times n
m×n matrix has numerical rank
k
k
k, where
k
k
k is substantially smaller than
m
m
m and
n
n
n. In this case, it is possible to produce a structured low-rank decomposition that approximates the matrix well. (That is, partial decomposition is an approximate decomposition.) The partial decomposition has the following formula,
A
=
C
B
+
E
(3)
A=CB+E \tag{3}
A=CB+E(3)
where
∥
E
∥
≤
ϵ
\|E\|\leq \epsilon
∥E∥≤ϵ.
1.2 The Interpolative Decomposition (ID)
The final factorization identifies a collection of
k
k
k columns from a rank-
k
k
k matrix
A
A
A that span the range of
A
A
A. To be precise, we can compute an index set
J
=
[
j
1
,
⋯
,
j
k
]
J=[j_{1},\cdots, j_{k}]
J=[j1,⋯,jk] such that
A
=
A
(
:
,
j
)
X
,
(4)
A = A(:,j)X, \tag{4}
A=A(:,j)X,(4)
where
X
X
X is a
k
×
n
k\times n
k×n matrix that satisfies
X
(
:
,
J
)
=
I
k
X(:,J)=I_{k}
X(:,J)=Ik. Furthermore, no entry of
X
X
X has magnitude larger than two. In other words, this decomposition express each column of
A
A
A using a linear combination of
k
k
k fixed columns with bounded coefficients. Stable and efficient algorithms for computing the ID appear in the papers [2,3].
It is also possible to compute a two-sided ID,
A
=
W
A
(
J
′
,
J
)
X
,
(5)
A=WA(J',J)X, \tag{5}
A=WA(J′,J)X,(5)
where
J
′
J'
J′ is an index set identifying
k
k
k of the rows of
A
A
A, and
W
W
W is an
m
×
k
m\times k
m×k matrix that satisfies
W
(
J
′
,
:
)
=
I
k
W(J',:)=I_{k}
W(J′,:)=Ik and whose entries are all bounded by two.
2 From given factorization to QR
Given a partial decompostion of matrix A A A in (1), the following steps construct a partial QR factorization:
- Compute a QR factorization of C C C so that C = Q 1 R 1 C=Q_{1}R_{1} C=Q1R1;
- From the product D = R 1 B D=R_{1}B D=R1B, and compute a QR factorization: D = Q 2 R D=Q_{2}R D=Q2R;
- From the product Q = Q 1 Q 2 Q=Q_{1}Q_{2} Q=Q1Q2.
The result is an orthonormal matrix Q Q Q and a weekly upper-triangular matrix R R R such that ∥ A − Q R ∥ ≤ ϵ \|A-QR\|\leq \epsilon ∥A−QR∥≤ϵ.
3 From given factorization to SVD
Given a partial decompostion of matrix A A A in (1), the following steps construct a partial SVD factorization:
- Compute a QR factorization of C C C so that C = Q 1 R 1 C=Q_{1}R_{1} C=Q1R1;
- Form the product D = R 1 B D=R_{1}B D=R1B, and compute an SVD: D = U 2 Σ V ∗ D=U_{2}\Sigma V^{*} D=U2ΣV∗;
- Form the product U = Q 1 U 2 U=Q_{1}U_{2} U=Q1U2.
The result is a diagonal matrix Σ \Sigma Σ and orthonormal matrices U U U and V V V such that ∥ A − U Σ V ∗ ∥ ≤ ϵ . \|A-U\Sigma V^{*}\|\leq \epsilon. ∥A−UΣV∗∥≤ϵ.
4 From given factorization to ID
Given a partial decompostion of matrix A A A in (1), the following one-step process can get the ID factorization:
- Compute J J J and X X X such that B = B ( : , J ) X B=B(:,J)X B=B(:,J)X.
Then A ≈ A ( : , J ) X A\approx A(:,J)X A≈A(:,J)X, but the approximation error may deteriorate from the initial estimate. That is, the upper bound of ∥ A − A ( : , J ) X ∥ \|A-A(:,J)X\| ∥A−A(:,J)X∥ is larger than ϵ \epsilon ϵ. In other words, ∥ A − A ( : , J ) X ∥ ≤ ( 1 + α ) ϵ \|A-A(:,J)X\|\leq (1+\alpha)\epsilon ∥A−A(:,J)X∥≤(1+α)ϵ.
5 Postprocessing via Row Extraction
Given a matrix
Q
Q
Q such that the follwoing inequlity holds,
∥
A
−
Q
Q
∗
A
∥
≤
ϵ
(6)
\|A-QQ^{*}A\|\leq \epsilon \tag{6}
∥A−QQ∗A∥≤ϵ(6)
we can obtain a rank-
k
k
k factorization
A
≈
X
B
,
(7)
A\approx XB, \tag{7}
A≈XB,(7)
where
B
B
B is a
k
×
n
k\times n
k×n matrix consisting of
k
k
k rows extracted from
A
A
A.
The approximation (7) can be produced without computing any matrix-matrix products, which makes this approach to postprocessing very fast.
The drawback comes because the error ∥ A − X B ∥ \|A-XB\| ∥A−XB∥ is usually larger than the initial error |A-QQ^{*}A|, especially when the dimensions of A A A are large.
To obtain the factorization (7), we simply construct the ID of the matrix
Q
Q
Q:
Q
=
X
Q
(
J
,
:
)
.
(8)
Q=XQ(J,:). \tag{8}
Q=XQ(J,:).(8)
The index set
J
J
J marks
k
k
k rows of
Q
Q
Q that span the row space of
Q
Q
Q, and
X
X
X is an
m
×
k
m\times k
m×k matrix whose entries are bounded in magnitude by two and contains the
k
×
k
k\times k
k×k identity as a submatrix:
X
(
J
,
:
)
=
I
k
X(J,:)=I_{k}
X(J,:)=Ik. Combining (8) and (6), we reach
A
≈
Q
Q
∗
A
=
X
Q
(
J
,
:
)
Q
∗
A
,
(9)
A\approx QQ^{*}A=XQ(J,:)Q^{*}A, \tag{9}
A≈QQ∗A=XQ(J,:)Q∗A,(9)
Since
X
(
J
,
:
)
=
I
k
X(J,:)=I_{k}
X(J,:)=Ik, equation (9) implies that
A
(
J
,
:
)
≈
Q
(
J
,
:
)
Q
∗
A
A(J,:)\approx Q(J,:)Q^{*}A
A(J,:)≈Q(J,:)Q∗A. Therefore, (7) follows when we put
B
=
A
(
J
,
:
)
B=A(J,:)
B=A(J,:).
6 SVD via Row Extraction
Provided with the factorization (7), we can obtain any standard factorization (QR, SVD et.al). The following algorithm illustrates an SVD calculation. This procedure requires O ( k 2 ( m + n ) ) O(k^{2}(m+n)) O(k2(m+n)) flops.
The following lemma guarantees the accuracy of the computed factors.
Lemma 5.1 [1]
Let A A A be an m × n m\times n m×n matrix and let Q Q Q be an m × k m\times k m×k matrix that satisfy (6). Suppose that U U U, Σ \Sigma Σ, and V V V are the matrices constructed by the above algorithm. Then
∥ A − U Σ V ∗ ∥ ≤ [ 1 + 1 + 4 k ( n − k ) ] ε . (10) \left\|\boldsymbol{A}-\boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{*}\right\| \leq[1+\sqrt{1+4 k(n-k)}] \varepsilon. \tag{10} ∥A−UΣV∗∥≤[1+1+4k(n−k)]ε.(10)
Proof: The factors
U
,
Σ
,
V
U, \Sigma, V
U,Σ,V constructed by the algorithm satisfy
U
Σ
V
∗
=
U
Σ
V
~
∗
W
∗
=
Z
W
∗
=
X
R
∗
W
∗
=
X
A
(
J
,
:
)
\boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{*}=\boldsymbol{U} \boldsymbol{\Sigma} \tilde{\boldsymbol{V}}^{*} \boldsymbol{W}^{*}=\boldsymbol{Z} \boldsymbol{W}^{*}=\boldsymbol{X} \boldsymbol{R}^{*} \boldsymbol{W}^{*}=\boldsymbol{X} \boldsymbol{A}_{(J,:)}
UΣV∗=UΣV~∗W∗=ZW∗=XR∗W∗=XA(J,:)
define the approximation
A
^
=
Q
Q
∗
A
.
(11)
\widehat{A}=Q Q^{*} A. \tag{11}
A
=QQ∗A.(11)
Since
A
^
=
X
Q
(
J
,
:
)
Q
∗
A
\widehat{A}=XQ(J,:)Q^{*}A
A
=XQ(J,:)Q∗A and since
X
(
J
,
:
)
=
I
k
X(J,:)=I_{k}
X(J,:)=Ik, it must be that
A
^
(
J
,
:
)
=
Q
(
J
,
:
)
Q
∗
A
\widehat{A}(J,:)=Q(J,:)Q^{*}A
A
(J,:)=Q(J,:)Q∗A. Consequently,
A
^
=
X
A
^
(
J
,
:
)
.
\widehat{A}=X\widehat{A}(J,:).
A
=XA
(J,:).
We have the chain of relations
∥
A
−
U
Σ
V
∗
∥
=
∥
A
−
X
A
(
J
,
:
)
∥
=
∥
(
A
−
X
A
^
(
J
,
:
)
)
+
(
X
A
^
(
J
,
:
)
−
X
A
(
J
,
:
)
)
∥
≤
∥
A
−
A
^
∥
+
∥
X
A
^
(
J
,
:
)
−
X
A
(
J
,
:
)
∥
≤
∥
A
−
A
^
∥
+
∥
X
∥
∥
A
(
J
,
:
)
−
A
^
(
J
,
:
)
∥
.
(12)
\begin{array}{ll} \|A-U\Sigma V^{*}\| & = \|A-XA(J,:)\| \\ &=\|(A-X\widehat{A}(J,:))+(X\widehat{A}(J,:)-XA(J,:))\|\\ &\leq \|A-\widehat{A}\|+\|X\widehat{A}(J,:)-XA(J,:)\| \\ &\leq \|A-\widehat{A}\|+\|X\|\|A(J,:)-\widehat{A}(J,:)\|. \end{array} \tag{12}
∥A−UΣV∗∥=∥A−XA(J,:)∥=∥(A−XA
(J,:))+(XA
(J,:)−XA(J,:))∥≤∥A−A
∥+∥XA
(J,:)−XA(J,:)∥≤∥A−A
∥+∥X∥∥A(J,:)−A
(J,:)∥.(12)
Inequality (6) ensures that
∥
A
−
A
^
∥
≤
ϵ
\|A-\widehat{A}\|\leq \epsilon
∥A−A
∥≤ϵ. Since
A
(
J
,
:
)
−
A
^
(
J
,
:
)
A(J,:)-\widehat{A}(J,:)
A(J,:)−A
(J,:) is a submatrix of
A
−
A
^
A-\widehat{A}
A−A
, we must also have
∥
A
(
J
,
:
)
−
A
^
(
J
,
:
)
∥
≤
ϵ
\|A(J,:)-\widehat{A}(J,:)\|\leq \epsilon
∥A(J,:)−A
(J,:)∥≤ϵ. Thus, (12) reduces to
∥
A
−
U
Σ
V
∗
∥
≤
(
1
+
∥
X
∥
)
ϵ
.
(13)
\|A-U\Sigma V^{*}\|\leq (1+\|X\|)\epsilon. \tag{13}
∥A−UΣV∗∥≤(1+∥X∥)ϵ.(13)
The bound (10) follows from (13) after we observe that
X
X
X contains a
k
×
k
k\times k
k×k identity matrix and that the entries of the remaining
(
n
−
k
)
×
k
(n-k)\times k
(n−k)×k submatrix are bounded in magnitude by two.
Note: Recall that the orthonormal matrix Q Q Q is contruct by orthonormalizing the columns of the sample matrix Y Y Y( Y = A Ω Y=A\Omega Y=AΩ). With finite-precision arithmetric, it is preferable to adapt the above algorithm to strat directly from the matrix Y Y Y. To be precise, we modify step 1 to compute X X X and J J J so that Y = X Y ( J , : ) Y=XY(J,:) Y=XY(J,:).
References:
[1] N.Halko, P.G.Martinsson, J.A.Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2(53):217-288, 2011.
[2] H. Cheng, Z. Gimbutas, P. G. Martinsson, and V. Rokhlin, On the compression of low rank matrices, SIAM J. Sci. Comput., 26 (2005), pp. 1389–1404.
[3] M. Gu and S. C. Eisenstat, Efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869.