MATLAB Tensor 和 N-Way 工具箱张量补全功能评测
文章目录
在 MATLAB 中,比较成熟的工具箱有 tensor toolbox、n-way toolbox 和 TT-Toolbox (TT=Tensor Train) 。那么,在这些工具箱中,具有通过张量分解进行张量补全功能的有:
- cp_wopt(CP 基于优化):张量工具箱中用于处理缺失数据的加权直接优化 (WOPT) 方法。它是 cp_opt 基于缺失数据的一种改进。
- parafac(CP 基于 ALS):nway 工具箱中的 CP 分解函数,缺失数据设置为 NaN,使用 EM 算法进行处理。它是基于 cp_als 的一种算法的改进。
- tucker(TUCKER 分解):nway 工具箱中的 tucker 分解函数,缺失数据设置为 NaN 即可。
各种乘积表示
Kronecker 乘积就是拿坐标矩阵的每一个元素去分别乘右边的矩阵,张成一个大矩阵。即
A
⊗
B
=
[
a
11
B
a
12
B
⋯
a
1
J
B
a
21
B
a
22
B
⋯
a
2
J
B
⋮
⋮
⋱
⋮
a
I
1
B
a
I
2
B
⋯
a
I
J
B
]
\mathbf{A} \otimes \mathbf{B}=\left[\begin{array}{cccc} a_{11} \mathbf{B} & a_{12} \mathbf{B} & \cdots & a_{1 J} \mathbf{B} \\ a_{21} \mathbf{B} & a_{22} \mathbf{B} & \cdots & a_{2 J} \mathbf{B} \\ \vdots & \vdots & \ddots & \vdots \\ a_{I 1} \mathbf{B} & a_{I 2} \mathbf{B} & \cdots & a_{I J} \mathbf{B} \end{array}\right]
A⊗B=⎣
⎡a11Ba21B⋮aI1Ba12Ba22B⋮aI2B⋯⋯⋱⋯a1JBa2JB⋮aIJB⎦
⎤
Khatri-Rao 乘积就是把相同列数的两个矩阵的对应列分别做 Kronecker 乘积,即
A
⊙
B
=
[
a
1
⊗
b
1
a
2
⊗
b
2
⋯
a
K
⊗
b
K
]
\mathbf{A} \odot \mathbf{B}=\left[\begin{array}{llll} \mathbf{a}_1 \otimes \mathbf{b}_1 & \mathbf{a}_2 \otimes \mathbf{b}_2 & \cdots & \mathbf{a}_K \otimes \mathbf{b}_K \end{array}\right]
A⊙B=[a1⊗b1a2⊗b2⋯aK⊗bK]
Hadamard 乘积就是两个相同规模的矩阵,矩阵对应相乘
A
∗
B
=
[
a
11
b
11
a
12
b
12
⋯
a
1
J
b
1
J
a
21
b
21
a
22
b
22
⋯
a
2
J
b
2
J
⋮
⋮
⋱
⋮
a
I
1
b
I
1
a
I
2
b
I
2
⋯
a
I
J
b
I
J
]
\mathbf{A} * \mathbf{B}=\left[\begin{array}{cccc} a_{11} b_{11} & a_{12} b_{12} & \cdots & a_{1 J} b_{1 J} \\ a_{21} b_{21} & a_{22} b_{22} & \cdots & a_{2 J} b_{2 J} \\ \vdots & \vdots & \ddots & \vdots \\ a_{I 1} b_{I 1} & a_{I 2} b_{I 2} & \cdots & a_{I J} b_{I J} \end{array}\right]
A∗B=⎣
⎡a11b11a21b21⋮aI1bI1a12b12a22b22⋮aI2bI2⋯⋯⋱⋯a1Jb1Ja2Jb2J⋮aIJbIJ⎦
⎤
乘积满足如下法则:
(
A
⊗
B
)
(
C
⊗
D
)
=
A
C
⊗
B
D
(
A
⊗
B
)
†
=
A
†
⊗
B
†
A
⊙
B
⊙
C
=
(
A
⊙
B
)
⊙
C
=
A
⊙
(
B
⊙
C
)
(
A
⊙
B
)
⊤
(
A
⊙
B
)
=
A
⊤
A
∗
B
⊤
B
(
A
⊙
B
)
†
=
(
(
A
⊤
A
)
∗
(
B
⊤
B
)
)
†
(
A
⊙
B
)
⊤
\begin{aligned} (\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D}) &=\mathbf{A} \mathbf{C} \otimes \mathbf{B} \mathbf{D} \\ (\mathbf{A} \otimes \mathbf{B})^{\dagger} &=\mathbf{A}^{\dagger} \otimes \mathbf{B}^{\dagger} \\ \mathbf{A} \odot \mathbf{B} \odot \mathbf{C} &=(\mathbf{A} \odot \mathbf{B}) \odot \mathbf{C}=\mathbf{A} \odot(\mathbf{B} \odot \mathbf{C}) \\ (\mathbf{A} \odot \mathbf{B})^{\top}(\mathbf{A} \odot \mathbf{B}) &=\mathbf{A}^{\top} \mathbf{A} * \mathbf{B}^{\top} \mathbf{B} \\ (\mathbf{A} \odot \mathbf{B})^{\dagger} &=\left(\left(\mathbf{A}^{\top} \mathbf{A}\right) *\left(\mathbf{B}^{\top} \mathbf{B}\right)\right)^{\dagger}(\mathbf{A} \odot \mathbf{B})^{\top} \end{aligned}
(A⊗B)(C⊗D)(A⊗B)†A⊙B⊙C(A⊙B)⊤(A⊙B)(A⊙B)†=AC⊗BD=A†⊗B†=(A⊙B)⊙C=A⊙(B⊙C)=A⊤A∗B⊤B=((A⊤A)∗(B⊤B))†(A⊙B)⊤
ALS 算法
交替最小二乘法是解决张量 CP 分解问题比较有效的方法,它的基本思想如下。我们以三维的张量为例,它优化的其实是:
min X ^ ∥ X − X ^ ∥ with X ^ = ∑ r = 1 R λ r a r ∘ b r ∘ c r = [ λ ; A , B , C ] \min _{\hat{\boldsymbol{X}}}\|\boldsymbol{X}-\hat{\boldsymbol{X}}\| \quad \text { with } \quad \hat{\boldsymbol{X}}=\sum_{r=1}^R \lambda_r \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r=[\boldsymbol{\lambda} ; \mathbf{A}, \mathbf{B}, \mathbf{C} ] X^min∥X−X^∥ with X^=r=1∑Rλrar∘br∘cr=[λ;A,B,C]
ALS 方法就是先固定 B 和 C 以求解 A,然后再固定 A 和 C 求解 B,接着固定 A 和 B 以求解 C,并继续重复整个过程,直到收敛标准被满足。
我们取出其中一个,比如 先固定 B 和 C 以求解 A,那么其实就是在优化
min
A
^
∥
X
(
1
)
−
A
^
(
C
⊙
B
)
⊤
∥
F
\min _{\hat{\mathbf{A}}}\left\|\mathbf{X}_{(1)}-\hat{\mathbf{A}}(\mathbf{C} \odot \mathbf{B})^{\top}\right\|_F
A^min∥
∥X(1)−A^(C⊙B)⊤∥
∥F
这里的
A
^
=
A
⋅
diag
(
λ
)
\hat{\mathbf{A}}=\mathbf{A} \cdot \operatorname{diag}(\boldsymbol{\lambda})
A^=A⋅diag(λ),
X
(
1
)
\mathbf{X}_{(1)}
X(1) 表示在第一个维度上把张量拉成矩阵。举个例子,对于一个
X
∈
R
3
×
4
×
2
X \in \mathbb{R}^{3 \times 4 \times 2}
X∈R3×4×2 的张量,
X
1
=
[
1
4
7
10
2
5
8
11
3
6
9
12
]
,
X
2
=
[
13
16
19
22
14
17
20
23
15
18
21
24
]
\mathbf{X}_1=\left[\begin{array}{llll} 1 & 4 & 7 & 10 \\ 2 & 5 & 8 & 11 \\ 3 & 6 & 9 & 12 \end{array}\right], \quad \mathbf{X}_2=\left[\begin{array}{llll} 13 & 16 & 19 & 22 \\ 14 & 17 & 20 & 23 \\ 15 & 18 & 21 & 24 \end{array}\right]
X1=⎣
⎡123456789101112⎦
⎤,X2=⎣
⎡131415161718192021222324⎦
⎤
在不同维度优先,拉成的矩阵就是(mode-n unfolding),
X
(
1
)
=
[
1
4
7
10
13
16
19
22
2
5
8
11
14
17
20
23
3
6
9
12
15
18
21
24
]
X
(
2
)
=
[
1
2
3
13
14
15
4
5
6
16
17
18
7
8
9
19
20
21
10
11
12
22
23
24
]
,
X
(
3
)
=
[
1
2
3
4
5
⋯
9
10
11
12
13
14
15
16
17
⋯
21
22
23
24
]
\begin{aligned} &\mathbf{X}_{(1)}=\left[\begin{array}{llllllll} 1 & 4 & 7 & 10 & 13 & 16 & 19 & 22 \\ 2 & 5 & 8 & 11 & 14 & 17 & 20 & 23 \\ 3 & 6 & 9 & 12 & 15 & 18 & 21 & 24 \end{array}\right]\\ &\mathbf{X}_{(2)}=\left[\begin{array}{cccccc} 1 & 2 & 3 & 13 & 14 & 15 \\ 4 & 5 & 6 & 16 & 17 & 18 \\ 7 & 8 & 9 & 19 & 20 & 21 \\ 10 & 11 & 12 & 22 & 23 & 24 \end{array}\right] \text {, }\\ &\mathbf{X}_{(3)}=\left[\begin{array}{cccccccccc} 1 & 2 & 3 & 4 & 5 & \cdots & 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 & 17 & \cdots & 21 & 22 & 23 & 24 \end{array}\right] \end{aligned}
X(1)=⎣
⎡123456789101112131415161718192021222324⎦
⎤X(2)=⎣
⎡147102581136912131619221417202315182124⎦
⎤, X(3)=[113214315416517⋯⋯921102211231224]
而由
Y
=
X
×
1
A
(
1
)
×
2
A
(
2
)
⋯
×
N
A
(
N
)
⇔
Y
(
n
)
=
A
(
n
)
X
(
n
)
(
A
(
N
)
⊗
⋯
⊗
A
(
n
+
1
)
⊗
A
(
n
−
1
)
⊗
⋯
⊗
A
(
1
)
)
⊤
.
\begin{aligned} &\mathbf{Y}=\boldsymbol{X} \times{ }_1 \mathbf{A}^{(1)} \times{ }_2 \mathbf{A}^{(2)} \cdots \times_N \mathbf{A}^{(N)} \Leftrightarrow \\ &\mathbf{Y}_{(n)}=\mathbf{A}^{(n)} \mathbf{X}_{(n)}\left(\mathbf{A}^{(N)} \otimes \cdots \otimes \mathbf{A}^{(n+1)} \otimes \mathbf{A}^{(n-1)} \otimes \cdots \otimes \mathbf{A}^{(1)}\right)^{\top} . \end{aligned}
Y=X×1A(1)×2A(2)⋯×NA(N)⇔Y(n)=A(n)X(n)(A(N)⊗⋯⊗A(n+1)⊗A(n−1)⊗⋯⊗A(1))⊤.
那么,上述的 mode-n unfolding 就可以写为
X
(
1
)
≈
A
(
C
⊙
B
)
⊤
,
X
(
2
)
≈
B
(
C
⊙
A
)
⊤
,
X
(
3
)
≈
C
(
B
⊙
A
)
⊤
.
\begin{aligned} &\mathbf{X}_{(1)} \approx \mathbf{A}(\mathbf{C} \odot \mathbf{B})^{\top}, \\ &\mathbf{X}_{(2)} \approx \mathbf{B}(\mathbf{C} \odot \mathbf{A})^{\top}, \\ &\mathbf{X}_{(3)} \approx \mathbf{C}(\mathbf{B} \odot \mathbf{A})^{\top} . \end{aligned}
X(1)≈A(C⊙B)⊤,X(2)≈B(C⊙A)⊤,X(3)≈C(B⊙A)⊤.
故而我们有了上述的矩阵优化问题。上述矩阵优化问题,可以理解为简单的最小二乘问题。它的一个解就是
A
^
=
X
(
1
)
[
(
C
⊙
B
)
⊤
]
†
\hat{\mathbf{A}}=\mathbf{X}_{(1)}\left[(\mathbf{C} \odot \mathbf{B})^{\top}\right]^{\dagger}
A^=X(1)[(C⊙B)⊤]†
利用上述的乘积法则,可得
A
^
=
X
(
1
)
(
C
⊙
B
)
(
C
⊤
C
∗
B
⊤
B
)
†
\hat{\mathbf{A}}=\mathbf{X}_{(1)}(\mathbf{C} \odot \mathbf{B})\left(\mathbf{C}^{\top} \mathbf{C} * \mathbf{B}^{\top} \mathbf{B}\right)^{\dagger}
A^=X(1)(C⊙B)(C⊤C∗B⊤B)†
计算出
A
^
\hat{\mathbf{A}}
A^ 后,再通过单位化,把
λ
r
=
∥
a
^
r
∥
\lambda_r=\left\|\hat{\mathbf{a}}_r\right\|
λr=∥a^r∥ 提取出来。由此,我们可以看出,我们最后只要求一个
R
×
R
R \times R
R×R 矩阵的伪逆就可以了。每步迭代计算量如此之小。
对于更高维的情况
cp_als VS parafac VS parafac with NaN
下面我们来比较一下 ALS 算法及其改进的算法的效率和精度比较。cp_als 是在 tensor 工具箱中的,parafac 是在 nway 工具箱中的。我们以数据 data274D5333T100.mat 作为测试的基准。用 totalT 表示视角消耗。 ϵ \epsilon ϵ 表示相对误差。
对于 cp_als,Rank 设置为 100。totalT = 0.428248, ϵ = 5.6475 e − 04 \epsilon = 5.6475e-04 ϵ=5.6475e−04。
对于 Nway 工具箱,设置 Rank = 10(太大 MATLAB 数组张不开)。parafac:totalT = 9.4, ϵ = 0.016 \epsilon = 0.016 ϵ=0.016。考虑 parafac with 0.85 NaN,按理来说应该更快,但是结果更离谱了。parafac:totalT = 58.95, ϵ = 0.0125 \epsilon = 0.0125 ϵ=0.0125。如果是带 slice missing 的情况,结果更糟糕,parafac:totalT = 183, ϵ = 0.279 \epsilon = 0.279 ϵ=0.279。
Nway 工具箱实现的非常 stupid,使得速度非常慢,为什么 Nway 实现得这么 stupid,举个例子,它算一堆矩阵 KR 除某个之外的乘积的时候,一个一个算的,一点都不明智。
结论是 Nway toolbox 对于大型张量,实现得很蠢,速度很慢。tucker 估计也好不到哪里去,就不试了。parafac 就没法用啊这里。N-way 工具箱对于高维张量,显得好笨重啊。
cp_opt VS cp_wopt
优化迭代方法总是需要花很长的时间,而且结果不一定好。cp_wopt 还会发生过拟合。
这种基于优化方法真的是时间长,精度差,直接放弃好了。
改进 cp_als
既然现有的方法这么垃圾,我们就基于比较快的分解方法,做一个改进。一个小改进就是在做 cp_als 最小二乘交替步的时候,直接在已知数值的位置上,每步迭代用已知值去替代它。这样就得到了 cp_als 的改进算法,我命名为 cp_tcals。
我们把它和已有的 SPC 方法做一个对比。以数据 data274D5333T100.mat 为基准。
对于 cp_tcals:
ϵ \epsilon ϵ | 0.0985 | 0.0467 | 0.0212 | 0.0104 | 0.0050 | 0.0031 |
---|---|---|---|---|---|---|
TIME COST | 1.335 | 2.281213 | 3.44 | 4.891780 | 6.813247 | 8.718322 |
对于 SPC 来说:达到的精度 ϵ = 0.0067 \epsilon = 0.0067 ϵ=0.0067,所需的时间就要 111。
对于 TT,Tucker 我认为也可以类似地用这种 straightforward 的方式推广到补全上。
代码如下:
function [P,Uinit,output] = cp_tcals(X,R,Tind,Tval,tolerance,varargin)
%CP_ALS Compute a CP decomposition of any type of tensor.
%
% M = CP_ALS(X,R) computes an estimate of the best rank-R
% CP model of a tensor X using an alternating least-squares
% algorithm. The input X can be a tensor, sptensor, ktensor, or
% ttensor. The result M is a ktensor.
%
% M = CP_ALS(X,R,'param',value,...) specifies optional parameters and
% values. Valid parameters and their default values are:
% 'tol' - Tolerance on difference in fit {1.0e-4}
% 'maxiters' - Maximum number of iterations {50}
% 'dimorder' - Order to loop through dimensions {1:ndims(A)}
% 'init' - Initial guess [{'random'}|'nvecs'|cell array]
% 'printitn' - Print fit every n iterations; 0 for no printing {1}
% 'fixsigns' - Call fixsigns at end of iterations {true}
%
% [M,U0] = CP_ALS(...) also returns the initial guess.
%
% [M,U0,out] = CP_ALS(...) also returns additional output that contains
% the input parameters.
%
% Note: The "fit" is defined as 1 - norm(X-full(M))/norm(X) and is
% loosely the proportion of the data described by the CP model, i.e., a
% fit of 1 is perfect.
%
% NOTE: Updated in various minor ways per work of Phan Anh Huy. See Anh
% Huy Phan, Petr Tichavsk�, Andrzej Cichocki, On Fast Computation of
% Gradients for CANDECOMP/PARAFAC Algorithms, arXiv:1204.1586, 2012.
%
% Examples:
% X = sptenrand([5 4 3], 10);
% M = cp_als(X,2);
% M = cp_als(X,2,'dimorder',[3 2 1]);
% M = cp_als(X,2,'dimorder',[3 2 1],'init','nvecs');
% U0 = {rand(5,2),rand(4,2),[]}; %<-- Initial guess for factors of M
% [M,U0,out] = cp_als(X,2,'dimorder',[3 2 1],'init',U0);
% M = cp_als(X,2,out.params); %<-- Same params as previous run
%
% <a href="matlab:web(strcat('file://',fullfile(getfield(what('tensor_toolbox'),'path'),'doc','html','cp_als_doc.html')))">Documentation page for CP-ALS</a>
%
% See also KTENSOR, TENSOR, SPTENSOR, TTENSOR.
%
%Tensor Toolbox for MATLAB: <a href="https://www.tensortoolbox.org">www.tensortoolbox.org</a>
%% Extract number of dimensions and norm of X.
N = ndims(X);
normX = norm(X);
%% Set algorithm parameters from input or by using defaults
params = inputParser;
params.addParameter('tol',1e-4,@isscalar);
params.addParameter('maxiters',50,@(x) isscalar(x) & x > 0);
params.addParameter('dimorder',1:N,@(x) isequal(sort(x),1:N));
params.addParameter('init', 'random', @(x) (iscell(x) || ismember(x,{'random','nvecs'})));
params.addParameter('printitn',1,@isscalar);
params.addParameter('fixsigns',true,@islogical);
params.parse(varargin{:});
%% Copy from params object
fitchangetol = params.Results.tol;
maxiters = params.Results.maxiters;
dimorder = params.Results.dimorder;
init = params.Results.init;
printitn = params.Results.printitn;
%% Error checking
%% Set up and error checking on initial guess for U.
if iscell(init)
Uinit = init;
if numel(Uinit) ~= N
error('OPTS.init does not have %d cells',N);
end
for n = dimorder(2:end)
if ~isequal(size(Uinit{n}),[size(X,n) R])
error('OPTS.init{%d} is the wrong size',n);
end
end
else
% Observe that we don't need to calculate an initial guess for the
% first index in dimorder because that will be solved for in the first
% inner iteration.
if strcmp(init,'random')
Uinit = cell(N,1);
for n = dimorder(2:end)
Uinit{n} = rand(size(X,n),R);
end
elseif strcmp(init,'nvecs') || strcmp(init,'eigs')
Uinit = cell(N,1);
for n = dimorder(2:end)
Uinit{n} = nvecs(X,n,R);
end
else
error('The selected initialization method is not supported');
end
end
%% Set up for iterations - initializing U and the fit.
U = Uinit;
fit = 0;
% Store the last MTTKRP result to accelerate fitness computation.
U_mttkrp = zeros(size(X, dimorder(end)), R);
if printitn>0
fprintf('\nTC_CP_ALS:\n');
end
%% Main Loop: Iterate until convergence
if (isa(X,'sptensor') || isa(X,'tensor')) && (exist('cpals_core','file') == 3)
%fprintf('Using C++ code\n');
[lambda,U] = cpals_core(X, Uinit, fitchangetol, maxiters, dimorder);
P = ktensor(lambda,U);
else
UtU = zeros(R,R,N);
for n = 1:N
if ~isempty(U{n})
UtU(:,:,n) = U{n}'*U{n};
end
end
for iter = 1:maxiters
fitold = fit;
% Iterate over all N modes of the tensor
for n = dimorder(1:end)
%if(exist('lambda','var'))
% if(0)
% X = tensor(ktensor(lambda,U));
% X(Tind) = Tval;%modified als for missing data
% end
% Calculate Unew = X_(n) * khatrirao(all U except n, 'r').
Unew = mttkrp(X,U,n);
% Save the last MTTKRP result for fitness check.
if n == dimorder(end)
U_mttkrp = Unew;
end
% Compute the matrix of coefficients for linear system
Y = prod(UtU(:,:,[1:n-1 n+1:N]),3);
Unew = Unew / Y;
if issparse(Unew)
Unew = full(Unew); % for the case R=1
end
% Normalize each vector to prevent singularities in coefmatrix
if iter == 1
lambda = sqrt(sum(Unew.^2,1))'; %2-norm
else
lambda = max( max(abs(Unew),[],1), 1 )'; %max-norm
end
Unew = bsxfun(@rdivide, Unew, lambda');
U{n} = Unew;
UtU(:,:,n) = U{n}'*U{n};
P = ktensor(lambda,U);
X = tensor(P);
X(Tind) = Tval;%modified als for missing data
normX = norm(X);
%cal_acc(double(X),double(Y_true))
%cal_acc(double(X(Tind)),double(Y_true(Tind)))
end
P = ktensor(lambda,U);
% X = tensor(P);
% X(Tind) = Tval;%modified als for missing data
% normX = norm(X);
% This is equivalent to innerprod(X,P).
iprod = sum(sum(P.U{dimorder(end)} .* U_mttkrp) .* lambda');
if normX == 0
fit = norm(P)^2 - 2 * iprod;
else
normresidual = sqrt( normX^2 + norm(P)^2 - 2 * iprod );
fit = 1 - (normresidual / normX); %fraction explained by model
end
fitchange = abs(fitold - fit);
% Check for convergence
if (iter > 1) && (fitchange < fitchangetol)
flag = 0;
else
flag = 1;
end
if (mod(iter,printitn)==0) || ((printitn>0) && (flag==0))
fprintf(' Iter %2d: f = %e f-delta = %7.1e\n', iter, fit, fitchange);
end
global Y_true
err = cal_acc(double(X),double(Y_true))
if(err<tolerance)
break;
end
% Check for convergence
if (flag == 0)
break;
end
end
end
%% Clean up final result
% Arrange the final tensor so that the columns are normalized.
P = arrange(P);
% Fix the signs
if params.Results.fixsigns
P = fixsigns(P);
end
if printitn>0
if normX == 0
fit = norm(P)^2 - 2 * innerprod(X,P);
else
normresidual = sqrt( normX^2 + norm(P)^2 - 2 * innerprod(X,P) );
fit = 1 - (normresidual / normX); %fraction explained by model
end
fprintf(' Final f = %e \n', fit);
end
output = struct;
output.params = params.Results;
output.iters = iter;