随机SVD
给定矩阵
A
∈
R
m
×
n
A \in R^{m \times n}
A∈Rm×n,求最大的前p个奇异值和对应的左右奇异向量。\
1:执行下面两个算法中的任意一个(如果执行两个就视为加分项)。\
∙
\bullet
∙在参考文献Petros Drineas, Ravi Kannan, and MichaelW. Mahoney, Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix, SIAM J.Comput., 36(1), 158183中166页的LinearTimesSVD算法。\
∙
\bullet
∙在文献N. Halko, P. G. Martinsson, and J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,
SIAM Rev., 53(2), 217288.中227页的Prototype for Randomized SVD算法。\par
2:计算j随机矩阵A的前r个最大的奇异值和对应的左右奇异向量,其中
r
∈
{
5
,
10
,
15
,
20
}
r \in \{5,10,15,20\}
r∈{5,10,15,20},矩阵A的设置如下:\
∙
\bullet
∙ m = 2048,\
∙
\bullet
∙ n = 512,\
∙
\bullet
∙ p = 20,\
∙
\bullet
∙ A = randn(m,p)*randn(p,n).\par
3:使用的数据集参考下列文献的第7章:\
∙
\bullet
∙N. Halko, P. G. Martinsson, and J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,SIAM Rev., 53(2), 217288.\
∙
\bullet
∙数据集参考\url{https://github.com/WenjianYu/rSVD-single-pass}.
LinearTimesSVD
下面我们先给出LinearTimesSVD算法的结构:
这里再做迭代的时候涉及到索引的选取,这里做一点说明。\
∙
\bullet
∙首先生成n个独立随机变量
p
i
,
p
i
>
0
,
∑
i
=
1
n
=
1
p_i,p_i > 0,\sum_{i=1}^n=1
pi,pi>0,∑i=1n=1,其中
p
i
p_i
pi代表选取第
i
i
i行或者列的概率,\
∙
\bullet
∙然后定义
q
0
=
0
,
q
k
=
∑
i
=
1
k
p
i
q_0 = 0,q_k = \sum_{i = 1}^k p_i
q0=0,qk=∑i=1kpi,\
∙
\bullet
∙随机生成一个
[
0
,
1
]
[0,1]
[0,1]区间的随机数
p
p
p,此时必然存在一个K,使得
p
∈
(
q
K
−
1
,
q
K
)
p\in (q_{K -1},q_K)
p∈(qK−1,qK),则此时我们选择第
K
K
K列。\par
另外,我们指出这个算法里面的矩阵
C
∈
R
m
×
c
,
H
k
∈
R
m
×
k
C\in R^{m\times c},H_k \in R^{m\times k}
C∈Rm×c,Hk∈Rm×k,其中
A
j
A^{j}
Aj表示矩阵A的第
j
j
j列,
A
i
A_i
Ai表示矩阵A的第
i
i
i行。关于这个算法,我们细致地做一个说明:首先这个算法输出了
σ
t
(
C
)
\sigma_{t}(C)
σt(C),即返回了矩阵C最大的前
k
k
k个奇异值,这个也就是矩阵A最大的前
k
k
k个奇异值,算法中间出现了
c
c
c个中间变量
y
t
,
t
=
1
,
…
,
c
y_t,t = 1,\ldots,c
yt,t=1,…,c,这个就是矩阵C的左奇异特征向量,
h
t
h_t
ht矩阵A左奇异特征向量的一个近似,右特征向量需要根据
A
=
U
S
V
∗
A = USV^{*}
A=USV∗计算出来。
Prototype for Randomized SVD算法
关于这个Prototype for Randomized SVD算法,虽然论文上的算法就像这个示意图一样,如果我们想要得到前
k
k
k个最大的奇异值核相应的左右奇异特征向量,那么在stage B的后面两步需要做一个截断,最终得到左奇异特征向量
Q
U
k
~
Q\widetilde{U_k}
QUk
,右奇异特征向量
V
k
V_k
Vk和奇异值
σ
\sigma
σ,只需要把前
k
k
k个
σ
\sigma
σ取出来即可。
低秩矩阵恢复
m
i
n
X
∈
R
m
×
n
∑
(
i
,
j
)
∈
Ω
(
X
i
,
j
−
M
i
,
j
)
2
+
μ
∥
X
∥
∗
.
min_{X \in R^{m \times n}} \sum_{(i,j)\in \Omega}(X_{i,j} - M_{i,j})^2 + \mu \| X\|_{*}.
minX∈Rm×n∑(i,j)∈Ω(Xi,j−Mi,j)2+μ∥X∥∗.
转化为:
m
i
n
X
∈
R
m
×
n
∑
(
i
,
j
)
∈
Ω
(
E
i
,
j
−
M
i
,
j
)
2
+
μ
∥
X
∥
∗
,
s
.
t
X
−
E
=
0.
min_{X \in R^{m \times n}} \sum_{(i,j)\in \Omega}(E_{i,j} - M_{i,j})^2 + \mu \| X\|_{*},\\ s.t \quad X - E = 0.
minX∈Rm×n∑(i,j)∈Ω(Ei,j−Mi,j)2+μ∥X∥∗,s.tX−E=0.
增广lagrange函数为:
L
(
X
,
E
,
Y
)
=
μ
∥
X
∥
∗
+
<
Y
,
X
−
E
>
+
1
2
τ
∥
X
−
E
∥
F
2
+
1
2
∑
(
i
,
j
)
∈
Ω
(
E
i
,
j
−
M
i
,
j
)
2
.
L(X,E,Y) = \mu \|X\|_{*} + <Y,X - E> + \frac{1}{2\tau}\|X - E\|_{F}^2 + \frac{1}{2} \sum_{(i,j)\in \Omega}(E_{i,j} - M_{i,j})^2.
L(X,E,Y)=μ∥X∥∗+<Y,X−E>+2τ1∥X−E∥F2+21∑(i,j)∈Ω(Ei,j−Mi,j)2.
通过上面的表格,我们发现似乎这样的随机算法没有起到加速效果,事实上,当本人把规模取到
m
=
500
,
n
=
500
m=500,n=500
m=500,n=500的时候这种现象更加明显。\par
这里给出一点本人自己的思考,一般来说,我们使用随机算法来获得前
k
k
k个奇异值,当
k
k
k比较小的时候,这个时候计算近似的SVD分解会快一些,然而这个时候在使用算法的过程中,
k
k
k太小的话,会导致我们的近似SVD和真实的SVD相差很大,最终导致的结果是收敛的迭代次数会相应增多。在本人跑上面这个结果的时候,当
m
=
40
,
n
=
40
m=40,n=40
m=40,n=40,本人选取的
k
=
40
k=40
k=40,当本人选取
m
=
100
,
n
=
100
m=100,n=100
m=100,n=100时,
k
=
90
k=90
k=90,这个在我们具体运行代码的时候.
代码
本人这次只写了matlab代码,读者自己注意空格
fun.m
function value = fun(mu,X,M,omega)
s = svd(X);
Y = sample(1,omega,X - M);
value = mu*sum(s) + norm(Y,'fro');
end
prox_ker.m
function mat = prox_ker(mu,X)
k = 90;
%k = 40;
[U,s,V] = RSVD(X,k,k);
%[U,s,V] = LTSVD(X,k,k + 2);
d = max(s - mu,0);
mat = U*diag(d)*V';
end
sample.m
function M = sample(mu,omega,A)
[m,n] = size(A);
M = zeros(m,n);
p = length(omega);
for k = 1:p
i = mod(omega(k),m);
if(i == 0)
i = m;
end
j = ceil(omega(k)/m);
M(i,j) = A(i,j)*mu;
end
end
RSVD.m
function [U,s,V] = RSVD(A,k,p)
[m,n] = size(A);
omega = randn(n,k + p);
[Q,R] = qr(A*A'*A*omega);
[U_,s_,V_] = svd(Q'*A);
U = Q*U_(:,1:k);
s = diag(s_);
s = s(1:k);
V = V_(:,1:k);
end
LTSVD.m
function [U,s,V] = LTSVD(A,k,c)
[m,n] = size(A);
pro = rand(n,1);
Pro = pro/sum(pro);
q = zeros(n + 1,1);
for i = 2:n + 1
q(i) = sum(Pro(1:i - 1,1));
end
C = zeros(m,c);
i = 1;
while i < c + 1
pra = rand(1);
for j = 1:n
if pra >= q(j) & pra <= q(j + 1)
C(:,i) = A(:,j)/sqrt(c*pra);
break
end
end
i = i + 1;
end
[us,ss,vs] = svd(C'*C);
eps = 1e-8;
sigma = sqrt(diag(abs(ss)));
U = C*us(:,1:k)*inv(ss(1:k,1:k) + + eye(k,k)*eps);
s = sigma(1:k);
V = A'*U*inv(ss(1:k,1:k) + + eye(k,k)*eps);
end
ADMM.m
function [X,out] = ADMM(mu,X0,M,opts)
st = tic;
[m,n] = size(M);
X_old = X0;
E_old = X_old;
Y_old = X_old;
gama = opts.gama;
k = 0;
while k < opts.epoch
X_new = prox_ker(mu*opts.tau,E_old - opts.tau*Y_old);
E_new = sample(1/(2*opts.tau + 1),opts.omega,2*opts.tau*M + X_new + opts.tau*Y_old) + ...
sample(1,opts.ome,X_new + opts.tau*Y_old);
Y_new = Y_old + gama*(X_new - E_new)/opts.tau;
rho = norm(X_new - X_old,'fro');
if rho < opts.eps
break;
end
if mod(k,50) == 0
gama = gama*0.96;
err = norm(sample(1,opts.omega,X_new - M),'fro');
fprintf('the epoch:%d,the norm:%.3e,the err:%.3e\n',k,rho,err);
end
X_old = X_new;
Y_old = Y_new;
E_old = E_new;
k = k + 1;
end
X = X_new;
out.time = toc(st);
out.value = fun(mu,X,M,opts.omega);
out.error = norm(X_new - opts.A,'fro');
out.Me = norm(sample(1,opts.omega,X_new - M),'fro');
out.rank = rank(X);
out.epoch = k;
main.m
clc;
m = 100; n = 100; sr = 0.3; p = round(m*n*sr); r = 3;
fr = r*(m+n-r)/p; maxr = floor(((m+n)-sqrt((m+n)^2-4*p))/2);
rs = 2021; randn('state',rs); rand('state',rs);
xl = randn(m,r); xr = randn(n,r); A = xl*xr';
Omega = randperm(m*n); omega = Omega(1:p);
ome = Omega(p + 1:m*n);
M = sample(1,omega,A);
mu = 0.002;
X0 = randn(m,n);
%----------------------------
opts = struct();
opts.epoch = 2000;
opts.tau = 1e3;
opts.eps = 1e-8;
opts.A = A;
opts.gama = 1.618;
opts.omega = omega;
opts.ome = ome;
%-------------------
out = struct();
[x,out] = ADMM(mu,X0,M,opts);
fprintf('the time:%.2f,the error:%.3e,the X - M on omega:%.3e,the value:%.4e,the rank:%d,the epoch:%d\n',...
out.time,out.error,out.Me,out.value,out.rank,out.epoch);