Fixed-precision problem and Fixed-rank problem in Matrix low-rank approximation
(Note: The following content is mainly excerpted from the reference [1] listed at the bottom.)
The following factorizaitons are often used to express a low-rank approximation of a given matrix:
A
≈
B
C
m
×
n
m
×
k
k
×
n
(1)
\begin{array}{cccc}\boldsymbol{A} & \approx & \boldsymbol{B} & \boldsymbol{C} \\ m \times n & & m \times k & k \times n\end{array} \tag{1}
Am×n≈Bm×kCk×n(1)
When the numerical rank
k
k
k is much smaller than either dimension
m
m
m or
n
n
n, a factorization such as (1) allows the matrix to be stored inexpensively and to be multiplied rapidly with vectors or other matrices. The factorizations can also be used for data interpretation or to solve computational problems, such as least squares.
1 Fixed-precision problem
The basic challenge in producing low-rank approximations is a primitive question that are called the fixed-precision approximation problem.
Suppose we are given a matrix
A
A
A and a positive error tolerance
ϵ
\epsilon
ϵ. We seek a matrix
A
A
A with
k
=
k
(
ϵ
)
k=k(\epsilon)
k=k(ϵ) orthonormal columns such that
∥
A
−
Q
Q
∗
A
∥
≤
ε
(2)
\left\|A-Q Q^{*} A\right\| \leq \varepsilon \tag{2}
∥A−QQ∗A∥≤ε(2)
where
∥
⋅
∥
\|\cdot\|
∥⋅∥ denotes the
l
2
l_{2}
l2 operator norm. The range of
Q
Q
Q is a
k
k
k-dimensional subspace that captures most of the action of
A
A
A, and we would like
k
k
k to be as small as possible.
The SVD furnishes an optimal answer to the fixed-precision problem. Let
σ
i
\sigma_{i}
σi denote the
j
j
jth largest singular value of
A
A
A. For each
j
≥
0
j\geq 0
j≥0,
min
rank
(
X
)
≤
j
∥
A
−
X
∥
=
σ
j
+
1
(3)
\min _{\operatorname{rank}(\boldsymbol{X}) \leq j}\|\boldsymbol{A}-\boldsymbol{X}\|=\sigma_{j+1} \tag{3}
rank(X)≤jmin∥A−X∥=σj+1(3)
One way to construct a minimizer is to choose
X
=
Q
Q
∗
A
X=QQ^{*}A
X=QQ∗A, where the columns of
Q
Q
Q are
k
k
k dominant left singular vectors of
A
A
A. Consequently, the minimal rank
k
k
k where (2) holds equals the number of singular values of
A
A
A that exceed the tolerance
ϵ
\epsilon
ϵ.
To simplify the development of algorithms, it is convenient to assume that the desired rank k k k is specified in advance. We call the resulting problem the the fixed-rank approximaiton problem.
2 Fixed-rank problem
Given a matrix
A
A
A, a target rank
k
k
k, and a oversampling parameter
p
p
p, we seek to construct a matrix
Q
Q
Q with
k
+
p
k+p
k+p orthonormal columns such that
∥
A
−
Q
Q
∗
A
∥
≈
min
rank
(
X
)
≤
k
∥
A
−
X
∥
.
(4)
\left\|A-Q Q^{*} A\right\| \approx \min _{\operatorname{rank}(X) \leq k}\|A-X\|. \tag{4}
∥A−QQ∗A∥≈rank(X)≤kmin∥A−X∥.(4)
Although there exists a minimizer
Q
Q
Q that solves the fixed-rank problem for
p
=
0
p=0
p=0, the opportunity to use a small number of additional columns provides a flexibility that is crucial for the effectiveness of the computational methods we discuss.
3 Adapte algorithms for Fixed-rank to Fixed-precision
3.1 Algorithm for fixed rank approximation
Algorithms for the fixed-rank problem can be adapted to solve the fixed-precision problem. The connection is based on the observation that we can build the basis matrix Q Q Q incrementally and, at any point in the computation, we can inexpensively estimate the residual error ∥ A − Q Q ∗ A ∥ . \|A-QQ^{*}A\|. ∥A−QQ∗A∥.
For the fixed-rank approximation problem, we can use the following random projeciton method,
The output matrix
Q
Q
Q the above algorithm is what we want for the fixed-rank problem, where the target rank of the input matrix is specified in advance.
To handle the fixed-precision problem, where the parameter is the computational tolerance, we need a scheme for estimating how well a putative basis matrix Q Q Q captures the action of the matrix A A A . To do so, we develop a probabilistic error estimator. These methods are inspired by work of Dixon [2]; our treatment follows [3]and [4].
3.2 One important lemma and some instructions
The exact approximation error is
∥
(
I
−
Q
Q
∗
)
A
∥
\|(I-QQ*)A\|
∥(I−QQ∗)A∥. It is intuitively plausible that we can obtain some information about this quantity by computing
∥
(
I
−
Q
Q
∗
)
A
ω
∥
\|(I-QQ*)A\omega\|
∥(I−QQ∗)Aω∥, where
ω
\omega
ω is a standard Gaussian vector. This notion leads to the following method. Draw a sequuence
{
ω
(
i
)
:
i
=
1
,
2
,
⋯
,
r
}
\{\omega^{(i)}:i=1,2,\cdots,r\}
{ω(i):i=1,2,⋯,r} of standard Gaussian vectors, where
r
r
r is a small integer that balances computational cost and reliability. Then
∥
(
I
−
Q
Q
∗
)
A
∥
≤
10
2
π
max
i
=
1
,
…
,
r
∥
(
I
−
Q
Q
∗
)
A
ω
(
i
)
∥
(5)
\left\|\left(\mathbf{I}-\boldsymbol{Q} \boldsymbol{Q}^{*}\right) \boldsymbol{A}\right\| \leq 10 \sqrt{\frac{2}{\pi}} \max _{i=1, \ldots, r}\left\|\left(\mathbf{I}-\boldsymbol{Q} \boldsymbol{Q}^{*}\right) \boldsymbol{A} \boldsymbol{\omega}^{(i)}\right\| \tag{5}
∥(I−QQ∗)A∥≤10π2i=1,…,rmax∥
∥(I−QQ∗)Aω(i)∥
∥(5)
with probability at least
1
−
1
0
−
r
1-10^{-r}
1−10−r.
By setting B = ( I − Q Q ∗ ) A B=(I-QQ^{*})A B=(I−QQ∗)A and α = 10 \alpha=10 α=10 in the following lemma, whose proof appears in [4,section 3.4]
Lemma 1
Let B B B be a real m × n m\times n m×n matrix. Fix a positive integer r r r and a real number α > 1 \alpha >1 α>1. Draw an independent family { ω ( i ) : i = 1 , 2 , ⋯ , r } \{\omega^{(i)}:i=1,2,\cdots,r\} {ω(i):i=1,2,⋯,r} of standard Gaussian vectors. Then
∥ B ∥ ≤ α 2 π max i = 1 , … , r ∥ B ω ( i ) ∥ \|\boldsymbol{B}\| \leq \alpha \sqrt{\frac{2}{\pi}} \max _{i=1, \ldots, r}\left\|\boldsymbol{B} \boldsymbol{\omega}^{(i)}\right\| ∥B∥≤απ2i=1,…,rmax∥ ∥Bω(i)∥ ∥
except with probability α − r \alpha^{-r} α−r.
As the error estimate (5) is computationally inexpensive because it requires only a samll number of matrix-vector products. Therefore, we can make a lowball guess for the numerical rank of A A A and add more samples if the error estimate is too large.
- The asymptotic cost of the above algorithm (randomized range finder) is preserved if we double our guess for the rank at each step. For example, we can start with 32 samples, compute another 32, then another 64, etc.
- The estimate (5) is actually somewhat crude. We can obtain a better estimate at a similar computational cost by initializing a power iteration with a random vector and repeating the process several times[3].
3.3 Algorithm for fixed precison problem (based on the algorithm for fixed rank problem)
Let us recall the fixed-precision problem. Suppose
A
A
A is an
m
×
n
m\times n
m×n matrix and
ϵ
\epsilon
ϵ is a computational tolerance. Seek an integer
l
l
l and an
m
×
l
m\times l
m×l orthonormal matrix
Q
(
l
)
Q^{(l)}
Q(l) such that
∥
(
I
−
Q
(
ℓ
)
(
Q
(
ℓ
)
)
∗
)
A
∥
≤
ε
(6)
\left\|\left(\mathbf{I}-\boldsymbol{Q}^{(\ell)}\left(\boldsymbol{Q}^{(\ell)}\right)^{*}\right) \boldsymbol{A}\right\| \leq \varepsilon \tag{6}
∥
∥(I−Q(ℓ)(Q(ℓ))∗)A∥
∥≤ε(6)
The size
l
l
l of the basis will typically be slightly large than the size
k
k
k of the smallest basis that achieves this error.
The basic observation behind the adaptive scheme is that we can generate the basis in step
3
3
3 of above algorithm incrementally. Starting with an empty basis matrix
Q
(
0
)
Q^{(0)}
Q(0), the following scheme generates an orthonormal matrix whose range captures the action of
A
A
A:
- for i = 1 , 2 , 3 , ⋯ i=1,2,3,\cdots i=1,2,3,⋯
-
- Draw an n × 1 n\times 1 n×1 Gaussian random vector ω ( i ) \omega^{(i)} ω(i), and set y ( i ) = A ω ( i ) y^{(i)}=A\omega^{(i)} y(i)=Aω(i).
-
- Compute q ~ ( i ) = ( I − Q ( i − 1 ) ( Q ( i − 1 ) ) ∗ ) y ( i ) \tilde{\boldsymbol{q}}^{(i)}=\left(\mathbf{I}-\boldsymbol{Q}^{(i-1)}\left(\boldsymbol{Q}^{(i-1)}\right)^{*}\right) \boldsymbol{y}^{(i)} q~(i)=(I−Q(i−1)(Q(i−1))∗)y(i).
-
- Normalize q ( i ) = q ~ ( i ) / ∥ q ~ ( i ) ∥ \boldsymbol{q}^{(i)}=\tilde{\boldsymbol{q}}^{(i)} /\left\|\tilde{\boldsymbol{q}}^{(i)}\right\| q(i)=q~(i)/∥ ∥q~(i)∥ ∥, and form Q ( i ) = [ Q ( i − 1 ) q ( i ) ] \boldsymbol{Q}^{(i)}=\left[\boldsymbol{Q}^{(i-1)} \boldsymbol{q}^{(i)}\right] Q(i)=[Q(i−1)q(i)].
- end for
How do we know when we have reached a basis Q ( l ) Q^{(l)} Q(l) that verifies (6)? The answer becomes apparent once we observe that the vectors q ~ ( i ) \tilde{q}^{(i)} q~(i) are precisely the vectors that appear in the error bound (5). The resulting rule is that we break the loop once we observe r r r consecutive vecotrs q ~ ( i ) \tilde{q}^{(i)} q~(i) whose norms are smaller than ϵ / ( 10 2 / p i ) \epsilon/(10\sqrt{2/pi}) ϵ/(102/pi).
A formal description of the resulting algorithm appears as the following algorithm.
- A potential complication of the method is that the vecotrs q ~ ( i ) \tilde{q}^{(i)} q~(i) become samll as the basis starts to capture most of the action of A A A. In finite-precision arithmetic, their direction is extremely unreliable. To address this problem, we simply reproject the normalized vector q ( i ) q^{(i)} q(i) onto the range ( Q ( i − 1 ) ) ⊥ \operatorname{range}\left(\boldsymbol{Q}^{(i-1)}\right)^{\perp} range(Q(i−1))⊥ in steps 7 and 8 of the above algorithm.
- The failure probability stated for the above algorithm is pessimistic because it is derived from a simple union bound argument. In practive, the error estimator is reliable in a range of circumstances when we take r = 10 r=10 r=10.
References:
[1] N.Halko, P.G.Martinsson, J.A.Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2(53):217-288, 2011.
[2] J. D. Dixon, Estimating extremal eigenvalues and condition numbers of matrices, SIAM J.Numer. Anal., 20 (1983), pp. 812–814.
[3] E. Liberty, F. F. Woolfe, P.-G. Martinsson, V. Rokhlin, and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proc. Natl. Acad. Sci. USA, 104 (2007), pp. 20167–20172.
[4] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, A fast randomized algorithm for the approximation of matrices, Appl. Comput. Harmon. Anal., 25 (2008), pp. 335–366.