Fixed-precision problem and Fixed-rank problem in Matrix low-rank approximation

(Note: The following content is mainly excerpted from the reference [1] listed at the bottom.)

The following factorizaitons are often used to express a low-rank approximation of a given matrix:
A ≈ B C m × n m × k k × n (1) \begin{array}{cccc}\boldsymbol{A} & \approx & \boldsymbol{B} & \boldsymbol{C} \\ m \times n & & m \times k & k \times n\end{array} \tag{1} Am×nBm×kCk×n(1)
When the numerical rank k k k is much smaller than either dimension m m m or n n n, a factorization such as (1) allows the matrix to be stored inexpensively and to be multiplied rapidly with vectors or other matrices. The factorizations can also be used for data interpretation or to solve computational problems, such as least squares.

1 Fixed-precision problem

The basic challenge in producing low-rank approximations is a primitive question that are called the fixed-precision approximation problem.
Suppose we are given a matrix A A A and a positive error tolerance ϵ \epsilon ϵ. We seek a matrix A A A with k = k ( ϵ ) k=k(\epsilon) k=k(ϵ) orthonormal columns such that
∥ A − Q Q ∗ A ∥ ≤ ε (2) \left\|A-Q Q^{*} A\right\| \leq \varepsilon \tag{2} AQQAε(2)
where ∥ ⋅ ∥ \|\cdot\| denotes the l 2 l_{2} l2 operator norm. The range of Q Q Q is a k k k-dimensional subspace that captures most of the action of A A A, and we would like k k k to be as small as possible.

The SVD furnishes an optimal answer to the fixed-precision problem. Let σ i \sigma_{i} σi denote the j j jth largest singular value of A A A. For each j ≥ 0 j\geq 0 j0,
min ⁡ rank ⁡ ( X ) ≤ j ∥ A − X ∥ = σ j + 1 (3) \min _{\operatorname{rank}(\boldsymbol{X}) \leq j}\|\boldsymbol{A}-\boldsymbol{X}\|=\sigma_{j+1} \tag{3} rank(X)jminAX=σj+1(3)
One way to construct a minimizer is to choose X = Q Q ∗ A X=QQ^{*}A X=QQA, where the columns of Q Q Q are k k k dominant left singular vectors of A A A. Consequently, the minimal rank k k k where (2) holds equals the number of singular values of A A A that exceed the tolerance ϵ \epsilon ϵ.

To simplify the development of algorithms, it is convenient to assume that the desired rank k k k is specified in advance. We call the resulting problem the the fixed-rank approximaiton problem.


2 Fixed-rank problem

Given a matrix A A A, a target rank k k k, and a oversampling parameter p p p, we seek to construct a matrix Q Q Q with k + p k+p k+p orthonormal columns such that
∥ A − Q Q ∗ A ∥ ≈ min ⁡ rank ⁡ ( X ) ≤ k ∥ A − X ∥ . (4) \left\|A-Q Q^{*} A\right\| \approx \min _{\operatorname{rank}(X) \leq k}\|A-X\|. \tag{4} AQQArank(X)kminAX∥.(4)
Although there exists a minimizer Q Q Q that solves the fixed-rank problem for p = 0 p=0 p=0, the opportunity to use a small number of additional columns provides a flexibility that is crucial for the effectiveness of the computational methods we discuss.


3 Adapte algorithms for Fixed-rank to Fixed-precision

3.1 Algorithm for fixed rank approximation

Algorithms for the fixed-rank problem can be adapted to solve the fixed-precision problem. The connection is based on the observation that we can build the basis matrix Q Q Q incrementally and, at any point in the computation, we can inexpensively estimate the residual error ∥ A − Q Q ∗ A ∥ . \|A-QQ^{*}A\|. AQQA∥.

For the fixed-rank approximation problem, we can use the following random projeciton method,
Fig. Algorithm for fixed rank approximation
The output matrix Q Q Q the above algorithm is what we want for the fixed-rank problem, where the target rank of the input matrix is specified in advance.

To handle the fixed-precision problem, where the parameter is the computational tolerance, we need a scheme for estimating how well a putative basis matrix Q Q Q captures the action of the matrix A A A . To do so, we develop a probabilistic error estimator. These methods are inspired by work of Dixon [2]; our treatment follows [3]and [4].

3.2 One important lemma and some instructions

The exact approximation error is ∥ ( I − Q Q ∗ ) A ∥ \|(I-QQ*)A\| (IQQ)A. It is intuitively plausible that we can obtain some information about this quantity by computing ∥ ( I − Q Q ∗ ) A ω ∥ \|(I-QQ*)A\omega\| (IQQ)Aω, where ω \omega ω is a standard Gaussian vector. This notion leads to the following method. Draw a sequuence { ω ( i ) : i = 1 , 2 , ⋯   , r } \{\omega^{(i)}:i=1,2,\cdots,r\} {ω(i):i=1,2,,r} of standard Gaussian vectors, where r r r is a small integer that balances computational cost and reliability. Then
∥ ( I − Q Q ∗ ) A ∥ ≤ 10 2 π max ⁡ i = 1 , … , r ∥ ( I − Q Q ∗ ) A ω ( i ) ∥ (5) \left\|\left(\mathbf{I}-\boldsymbol{Q} \boldsymbol{Q}^{*}\right) \boldsymbol{A}\right\| \leq 10 \sqrt{\frac{2}{\pi}} \max _{i=1, \ldots, r}\left\|\left(\mathbf{I}-\boldsymbol{Q} \boldsymbol{Q}^{*}\right) \boldsymbol{A} \boldsymbol{\omega}^{(i)}\right\| \tag{5} (IQQ)A10π2 i=1,,rmax (IQQ)Aω(i) (5)
with probability at least 1 − 1 0 − r 1-10^{-r} 110r.

By setting B = ( I − Q Q ∗ ) A B=(I-QQ^{*})A B=(IQQ)A and α = 10 \alpha=10 α=10 in the following lemma, whose proof appears in [4,section 3.4]

Lemma 1
Let B B B be a real m × n m\times n m×n matrix. Fix a positive integer r r r and a real number α > 1 \alpha >1 α>1. Draw an independent family { ω ( i ) : i = 1 , 2 , ⋯   , r } \{\omega^{(i)}:i=1,2,\cdots,r\} {ω(i):i=1,2,,r} of standard Gaussian vectors. Then
∥ B ∥ ≤ α 2 π max ⁡ i = 1 , … , r ∥ B ω ( i ) ∥ \|\boldsymbol{B}\| \leq \alpha \sqrt{\frac{2}{\pi}} \max _{i=1, \ldots, r}\left\|\boldsymbol{B} \boldsymbol{\omega}^{(i)}\right\| Bαπ2 i=1,,rmax Bω(i)
except with probability α − r \alpha^{-r} αr.

As the error estimate (5) is computationally inexpensive because it requires only a samll number of matrix-vector products. Therefore, we can make a lowball guess for the numerical rank of A A A and add more samples if the error estimate is too large.

  • The asymptotic cost of the above algorithm (randomized range finder) is preserved if we double our guess for the rank at each step. For example, we can start with 32 samples, compute another 32, then another 64, etc.
  • The estimate (5) is actually somewhat crude. We can obtain a better estimate at a similar computational cost by initializing a power iteration with a random vector and repeating the process several times[3].

3.3 Algorithm for fixed precison problem (based on the algorithm for fixed rank problem)

Let us recall the fixed-precision problem. Suppose A A A is an m × n m\times n m×n matrix and ϵ \epsilon ϵ is a computational tolerance. Seek an integer l l l and an m × l m\times l m×l orthonormal matrix Q ( l ) Q^{(l)} Q(l) such that
∥ ( I − Q ( ℓ ) ( Q ( ℓ ) ) ∗ ) A ∥ ≤ ε (6) \left\|\left(\mathbf{I}-\boldsymbol{Q}^{(\ell)}\left(\boldsymbol{Q}^{(\ell)}\right)^{*}\right) \boldsymbol{A}\right\| \leq \varepsilon \tag{6} (IQ()(Q()))A ε(6)
The size l l l of the basis will typically be slightly large than the size k k k of the smallest basis that achieves this error.
The basic observation behind the adaptive scheme is that we can generate the basis in step 3 3 3 of above algorithm incrementally. Starting with an empty basis matrix Q ( 0 ) Q^{(0)} Q(0), the following scheme generates an orthonormal matrix whose range captures the action of A A A:

  • for i = 1 , 2 , 3 , ⋯ i=1,2,3,\cdots i=1,2,3,
    • Draw an n × 1 n\times 1 n×1 Gaussian random vector ω ( i ) \omega^{(i)} ω(i), and set y ( i ) = A ω ( i ) y^{(i)}=A\omega^{(i)} y(i)=Aω(i).
    • Compute q ~ ( i ) = ( I − Q ( i − 1 ) ( Q ( i − 1 ) ) ∗ ) y ( i ) \tilde{\boldsymbol{q}}^{(i)}=\left(\mathbf{I}-\boldsymbol{Q}^{(i-1)}\left(\boldsymbol{Q}^{(i-1)}\right)^{*}\right) \boldsymbol{y}^{(i)} q~(i)=(IQ(i1)(Q(i1)))y(i).
    • Normalize q ( i ) = q ~ ( i ) / ∥ q ~ ( i ) ∥ \boldsymbol{q}^{(i)}=\tilde{\boldsymbol{q}}^{(i)} /\left\|\tilde{\boldsymbol{q}}^{(i)}\right\| q(i)=q~(i)/ q~(i) , and form Q ( i ) = [ Q ( i − 1 ) q ( i ) ] \boldsymbol{Q}^{(i)}=\left[\boldsymbol{Q}^{(i-1)} \boldsymbol{q}^{(i)}\right] Q(i)=[Q(i1)q(i)].
  • end for

How do we know when we have reached a basis Q ( l ) Q^{(l)} Q(l) that verifies (6)? The answer becomes apparent once we observe that the vectors q ~ ( i ) \tilde{q}^{(i)} q~(i) are precisely the vectors that appear in the error bound (5). The resulting rule is that we break the loop once we observe r r r consecutive vecotrs q ~ ( i ) \tilde{q}^{(i)} q~(i) whose norms are smaller than ϵ / ( 10 2 / p i ) \epsilon/(10\sqrt{2/pi}) ϵ/(102/pi ).

A formal description of the resulting algorithm appears as the following algorithm.
Fig2. Adaptive randomized range finder

  • A potential complication of the method is that the vecotrs q ~ ( i ) \tilde{q}^{(i)} q~(i) become samll as the basis starts to capture most of the action of A A A. In finite-precision arithmetic, their direction is extremely unreliable. To address this problem, we simply reproject the normalized vector q ( i ) q^{(i)} q(i) onto the range ⁡ ( Q ( i − 1 ) ) ⊥ \operatorname{range}\left(\boldsymbol{Q}^{(i-1)}\right)^{\perp} range(Q(i1)) in steps 7 and 8 of the above algorithm.
  • The failure probability stated for the above algorithm is pessimistic because it is derived from a simple union bound argument. In practive, the error estimator is reliable in a range of circumstances when we take r = 10 r=10 r=10.

References:
[1] N.Halko, P.G.Martinsson, J.A.Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2(53):217-288, 2011.
[2] J. D. Dixon, Estimating extremal eigenvalues and condition numbers of matrices, SIAM J.Numer. Anal., 20 (1983), pp. 812–814.
[3] E. Liberty, F. F. Woolfe, P.-G. Martinsson, V. Rokhlin, and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proc. Natl. Acad. Sci. USA, 104 (2007), pp. 20167–20172.
[4] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, A fast randomized algorithm for the approximation of matrices, Appl. Comput. Harmon. Anal., 25 (2008), pp. 335–366.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值