自学脚手架——“Data-Driven Science and Engineering” by steven L. brunton（Chapter 1.9）

teengad

已于 2022-09-14 11:07:47 修改

阅读量286

点赞数

分类专栏：机器学习自学脚手架系列文章标签：数据驱动 tensor svd

于 2022-05-14 04:36:07 首次发布

本文链接：https://blog.csdn.net/qq_32515081/article/details/124763298

版权

机器学习同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

自学脚手架系列

7 篇文章 0 订阅

订阅专栏

文章目录

- 1.9 Tensor Decompositions and N-Way Data Arrays

1.9 Tensor Decompositions and N-Way Data Arrays

Kronecker product

假设 $\textbf{A}$ 为m×n的矩阵， $\textbf{B}$ 为p×q的矩阵：

$\textbf{A}= \begin{bmatrix} a_{11}&\cdots&a_{1n}\\ \vdots&\ddots&\vdots\\ a_{m1}&\cdots&a_{mn}\\ \end{bmatrix}$

$\textbf{B}= \begin{bmatrix} b_{11}&\cdots&b_{1q}\\ \vdots&\ddots&\vdots\\ a_{p1}&\cdots&a_{pq}\\ \end{bmatrix}$

则A与B的Kronecker product是一个大小为mp×nq的矩阵，即其为任意大小矩阵的运算，其表述为：

$\textbf{A}\otimes \textbf{B}= \begin{bmatrix} a_{11}\textbf{B}& \cdots &a_{1n}\textbf{B}\\ \vdots&\ddots&\vdots\\ a_{m1}\textbf{B}&\cdots &a_{mn}\textbf{B}\\ \end{bmatrix}$

$A\otimes B= \begin{bmatrix} a_{11}b_{11}& \cdots &a_{11}b_{1q}&\cdots&\cdots&a_{1n}b_{11}& \cdots &a_{1n}b_{1q}\\ a_{11}b_{21}& \cdots &a_{11}b_{2q}&\cdots&\cdots&a_{1n}b_{21}& \cdots &a_{1n}b_{2q}\\ \vdots&\ddots&\vdots&&&\vdots&\ddots&\vdots\\ a_{11}b_{p1}& \cdots &a_{11}b_{pq}&\cdots&\cdots&a_{1n}b_{p1}& \cdots &a_{1n}b_{pq}\\ \vdots&&\vdots&\ddots&&\vdots&&\vdots\\ \vdots&&\vdots&&\ddots&\vdots&&\vdots\\ a_{m1}b_{11}& \cdots &a_{m1}b_{1q}&\cdots&\cdots&a_{mn}b_{11}& \cdots &a_{mn}b_{1q}\\ a_{m1}b_{21}& \cdots &a_{m1}b_{2q}&\cdots&\cdots&a_{mn}b_{21}& \cdots &a_{mn}b_{2q}\\ \vdots&\ddots&\vdots&&&\vdots&\ddots&\vdots\\ a_{m1}b_{p1}& \cdots &a_{m1}b_{pq}&\cdots&\cdots&a_{mn}b_{p1}& \cdots &a_{mn}b_{pq}\\ \end{bmatrix}$

Kronecker product是张量积的特殊形式，具有下列一些性质：

$\begin{aligned} \textbf{A}\otimes(\textbf{B}+\textbf{C})&=\textbf{A}\otimes \textbf{B}+A\otimes \textbf{C}\\ (\textbf{A}+\textbf{B})\otimes \textbf{C}&=\textbf{A}\otimes \textbf{C}+\textbf{B}\otimes \textbf{C}\\ (k\textbf{A})\otimes \textbf{B}&=k(\textbf{A}\otimes \textbf{B})\\ (\textbf{A}\otimes \textbf{B})\otimes \textbf{C}&=\textbf{A}\otimes (\textbf{B}\otimes \textbf{C})\\ \end{aligned}$

但是，该运算并不满足交换律，即：

$\textbf{A}\otimes \textbf{B}\neq \textbf{B}\otimes \textbf{A}$

Khatri-Rao product

在数学中，Khatri-Rao product被定义为：

$\textbf{A}\odot\textbf{B}=(\textbf{A}_{ij}\otimes\textbf{B}_{ij})$

其中第ij个方块是一个 $\textbf{A}$ 和 $\textbf{B}$ 中的小方块的Kronecker product，大小为 $m_{i}p_{i}\times n_{j}q_{j}$ ，当然这里假设 $\textbf{A}$ 和 $\textbf{B}$ 两个小方块的横向和纵向的数目相同。

举个例子，假设 $\textbf{A}$ 和 $\textbf{B}$ 分别为2×2的分块矩阵：

$\textbf{A}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c} \textbf{A}_{11}&\textbf{A}_{12} \\ \hline \textbf{A}_{21}&\textbf{A}_{22} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{cc:c} 1&2&3 \\ 4&5&6\\ \hline 7&8&9\\ \end{array} \end{bmatrix}$

$\textbf{B}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c} \textbf{B}_{11}&\textbf{B}_{12} \\ \hline \textbf{B}_{21}&\textbf{B}_{22} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{c:cc} 1&4&7 \\ \hline 2&5&8\\ 3&6&9\\ \end{array} \end{bmatrix}$

$\textbf{A}\odot\textbf{B}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c} \textbf{A}_{11}\otimes\textbf{B}_{11}&\textbf{A}_{12}\otimes\textbf{B}_{12} \\ \hline \textbf{A}_{21}\otimes\textbf{B}_{21}&\textbf{A}_{22}\otimes\textbf{B}_{22} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{cc:cc} 1&2&12&21 \\ 4&5&24&42\\ \hline 14&16&45&72\\ 21&24&54&81\\ \end{array} \end{bmatrix}$

值得注意的是，上述为Khartri-Rao的通常形式，其还有其他Column-wise Kronecker product和Face-splitting product两种形式。

Column-wise Kronecker product

两个矩阵的Column-wise Kronecker product也叫作Khatri-Rao product。这种内积假定了矩阵内的方块被取为矩阵的列。这种情况下， $m_{1}=m$ ， $p_{1}=p$ ， $n = q$ ，对于每个 j： $n_{j}=p_{j}=1$ 。最终的内积是一个mp×n的矩阵，其中的每一列都是 $\textbf{A}$ 和 $\textbf{B}$ 各自相应列之间的Kronecker product。举个例子：

$\textbf{C}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c:c} \textbf{C}_{1}&\textbf{C}_{2}&\textbf{C}_{2} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c:c} 1&2&3 \\ 4&5&6\\ 7&8&9\\ \end{array} \end{bmatrix}$

$\textbf{D}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c:c} \textbf{D}_{1}&\textbf{D}_{2}&\textbf{D}_{3} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c:c} 1&4&7\\ 2&5&8\\ 3&6&9\\ \end{array} \end{bmatrix}$

则：

$\textbf{C}\odot\textbf{D}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c:c} \textbf{C}_{1}\otimes\textbf{D}_{1}&\textbf{C}_{2}\otimes\textbf{D}_{2}&\textbf{C}_{3}\otimes\textbf{D}_{3} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{c:c:c} 1&8&21 \\ 2&10&24 \\ 3&12&27 \\ 4&20&42 \\ 8&25&48 \\ 12&30&54 \\ 7&32&63 \\ 14&40&72 \\ 21&48&81 \\ \end{array} \end{bmatrix}$

Khatri-Rao product的这种column-wise的版本在数据分析处理的线性代数方法和优化处理对角矩阵的逆问题的解决方案中很有用。1996 年，提出了Column-wise Khatri-Rao product来估计多径信号的到达角 (AOA) 和延迟以及数字天线阵列上信号源的四个坐标。

Face-splitting product
另一种按行分割矩阵的内积，被称为face-splitting product或者transposed Khatri-Rao product。定义如下：

$\textbf{C}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c} \textbf{C}_{1}\\ \hline \textbf{C}_{2}\\ \hline \textbf{C}_{2} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{ccc} 1&2&3 \\ \hline 4&5&6\\ \hline 7&8&9\\ \end{array} \end{bmatrix}$

$\textbf{D}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c} \textbf{D}_{1}\\ \hline \textbf{D}_{2}\\ \hline \textbf{D}_{3} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{ccc} 1&4&7\\ \hline 2&5&8\\ \hline 3&6&9\\ \end{array} \end{bmatrix}$

则：

$\textbf{C}\odot\textbf{D}= \begin{bmatrix} \def\arraystretch{1} \begin{array}{c} \textbf{C}_{1}\otimes\textbf{D}_{1}\\ \hline \textbf{C}_{2}\otimes\textbf{D}_{2}\\ \hline \textbf{C}_{3}\otimes\textbf{D}_{3} \\ \end{array} \end{bmatrix} =\begin{bmatrix} \def\arraystretch{1} \begin{array}{ccccccccc} 1&4&7&2&8&14&3&12&21 \\ \hline 8&20&32&10&25&40&12&30&48 \\ \hline 21&42&63&24&48&72&27&54&81 \\ \end{array} \end{bmatrix}$

N-way tensor

N-way tensor 即Nth order tensor，即其维度为：

$I_{1}\times I_{2}\cdots\times I_{N}$

其第 $\textbf{i}=(i_{1},i_{2},\cdots,i_{N})$ 个元素可以表示为 $a_{\textbf{i}}$ 。

Frobenius norm（F-范数）

给定一个张量 $\mathcal{A}$ 为 $\mathcal{A}(:,:,1)=\begin{bmatrix}1&2\\3&4\end{bmatrix}$ ， $\mathcal{A}(:,:,2)=\begin{bmatrix}5&6\\7&8\end{bmatrix}$ ，则该张量的F-范数为：

$\lVert\mathcal{A}\rVert_{F}=\sqrt{\left<\mathcal{A},\mathcal{A}\right>}=\sqrt{1^{2}+2^{2}+3^{2}+4^{2}+5^{2}+6^{2}+7^{2}+8^{2}}=\sqrt{204}$

即张量 $\mathcal{A}$ 的F-范数的平方等于其所有元素的平方和，正是这样，很多涉及到矩阵分解或张量分解的优化问题中常常会出现残差矩阵的平方和最小化或者残差张量的平方和最小化，目标函数也多以相应的残差矩阵或残差张量的F-范数的平方形式进行书写。

tensor unfolding（张量展开）

在实际应用中，由于高阶张量比向量、矩阵都抽象，最简单地，向量和矩阵可以很轻松地书写出来并进行运算，而高阶张量则不那么直观，如何将高阶张量转换成二维空间的矩阵呢？这就是张量的展开，有时，也将张量的展开称为张量的矩阵化（Matricization: transforming a tensor into a matrix）。文中的mode-n matricization或者unfolding of a tensor即是说得这件事情。

下面举例介绍一下：
给定大小为4×3×2的张量 $\mathcal{A}$ ，其中，矩阵 $\mathcal{A}(:,:,1)=\begin{bmatrix}a_{111}&a_{121}&a_{131}\\a_{211}&a_{221}&a_{231}\\a_{311}&a_{321}&a_{331}\\a_{411}&a_{421}&a_{431}\end{bmatrix}$ ，矩阵 $\mathcal{A}(:,:,2)=\begin{bmatrix}a_{112}&a_{122}&a_{132}\\a_{212}&a_{222}&a_{232}\\a_{312}&a_{322}&a_{332}\\a_{412}&a_{422}&a_{432}\end{bmatrix}$ ，按照mode-1（即对应着张量的第一阶，沿着行方向）展开可以得到：

$\mathcal{A}_{(1)}= \begin{bmatrix} a_{111}&a_{121}&a_{131}&a_{112}&a_{122}&a_{132}\\ a_{211}&a_{221}&a_{231}&a_{212}&a_{222}&a_{232}\\ a_{311}&a_{321}&a_{331}&a_{312}&a_{322}&a_{332}\\ a_{411}&a_{421}&a_{431}&a_{412}&a_{422}&a_{432}\\ \end{bmatrix}$

即矩阵 $\mathcal{A}_{(1)}=\left[\mathcal{A}(:,:,1),\mathcal{A}(:,:,2)\right]$ ，其大小为4×6。

按照mode-2（即对应着张量的第二阶，沿列展开）展开可以得到：

$\mathcal{A}_{(2)}= \begin{bmatrix} a_{111}&a_{211}&a_{311}&a_{411}&a_{112}&a_{212}&a_{312}&a_{412}\\ a_{121}&a_{221}&a_{321}&a_{421}&a_{122}&a_{222}&a_{322}&a_{422}\\ a_{131}&a_{231}&a_{331}&a_{431}&a_{132}&a_{232}&a_{332}&a_{432}\\ \end{bmatrix}$

即矩阵 $\mathcal{A}_{(2)}=\left[\mathcal{A}(:,:,1)^{T},\mathcal{A}(:,:,2)^{T}\right]$ ，其大小为3×8。

按照mode-3（即对应着张量的第三阶，沿纵深展开）展开可以得到：

$\mathcal{A}_{(3)}= \begin{bmatrix} a_{111}&a_{211}&a_{311}&a_{411}&a_{121}&a_{221}&a_{321}&a_{421}&a_{131}&a_{231}&a_{331}&a_{431}\\ a_{112}&a_{212}&a_{312}&a_{412}&a_{122}&a_{222}&a_{322}&a_{422}&a_{132}&a_{232}&a_{332}&a_{432}\\ \end{bmatrix}$

即矩阵 $\mathcal{A}_{(3)}=\left[\mathcal{A}(:,1,:)^{T},\mathcal{A}(:,2,:)^{T},\mathcal{A}(:,3,:)^{T}\right]$ ，其大小为2×12。

类似地，如果给定一个大小为2×2×2×2的第四阶张量 $\mathcal{A}$ ，则在各个mode（模态）下的开展分别为：

$\begin{aligned} \mathcal{A}_{(1)}&=\left[\mathcal{A}(:,:,1,1),\mathcal{A}(:,:,2,1),\mathcal{A}(:,:,1,2),\mathcal{A}(:,:,2,2)\right],\\ \mathcal{A}_{(2)}&=\left[\mathcal{A}(:,:,1,1)^{T},\mathcal{A}(:,:,2,1)^{T},\mathcal{A}(:,:,1,2)^{T},\mathcal{A}(:,:,2,2)^{T}\right],\\ \mathcal{A}_{(3)}&=\left[\mathcal{A}(:,1,:,1)^{T},\mathcal{A}(:,2,:,1)^{T},\mathcal{A}(:,1,:,2)^{T},\mathcal{A}(:,2,:,2)^{T}\right],\\ \mathcal{A}_{(4)}&=\left[\mathcal{A}(:,1,1,:)^{T},\mathcal{A}(:,2,1,:)^{T},\mathcal{A}(:,1,2,:)^{T},\mathcal{A}(:,2,2,:)^{T}\right],\\ \end{aligned}$

举个例子，若 $\mathcal{A}(:,:,1,1)=\begin{bmatrix}1&2\\3&4\end{bmatrix}$ ， $\mathcal{A}(:,:,2,1)=\begin{bmatrix}5&6\\7&8\end{bmatrix}$ ， $\mathcal{A}(:,:,1,2)=\begin{bmatrix}9&10\\11&12\end{bmatrix}$ ， $\mathcal{A}(:,:,2,2)=\begin{bmatrix}13&14\\15&16\end{bmatrix}$ ，则：

$\mathcal{A}_{(1)}= \begin{bmatrix} 1&2&5&6&9&10&13&14\\ 3&4&7&8&11&12&15&16\\ \end{bmatrix}$

$\mathcal{A}_{(2)}= \begin{bmatrix} 1&3&5&7&9&11&13&15\\ 2&4&6&8&10&12&14&16\\ \end{bmatrix}$

$\mathcal{A}_{(3)}= \begin{bmatrix} 1&3&2&4&9&11&10&12\\ 5&7&6&8&13&15&14&16\\ \end{bmatrix}$

$\mathcal{A}_{(4)}= \begin{bmatrix} 1&3&2&4&5&7&6&8\\ 9&11&10&12&13&15&14&16\\ \end{bmatrix}$

可惜的是，张量的展开虽然有一定的规则，但并没有很强的物理意义，对高阶张量进行展开会方便使用相应的矩阵化运算。除此之外，高阶张量可以展开自然也就可以还原（即将展开后的矩阵还原成高阶张量，这个过程成为folding）。

现在回看文中的 $\textbf{mA}_{(n)}$ 中的 $\textbf{m}$ 指的是matricization， $\textbf{A}$ 指的是张量名字，n指的是mode的序号。

outer product（外积）

给定两个向量 $\textbf{u}$ 和 $\textbf{u}$ ，大小分别为m×1和n×1：

$\textbf{u}= \begin{bmatrix} u_{1}\\ u_{2}\\ \vdots\\ u_{m}\\ \end{bmatrix}, \textbf{v}= \begin{bmatrix} v_{1}\\ v_{2}\\ \vdots\\ v_{n}\\ \end{bmatrix}$

则它们的外积被定义为：

$\textbf{u}\circ\textbf{v}= \begin{bmatrix} u_{1}v_{1}&u_{1}v_{2}&\cdots&u_{1}v_{n}\\ u_{2}v_{1}&u_{2}v_{2}&\cdots&u_{2}v_{n}\\ \vdots&\vdots&\ddots&\vdots\\ u_{m}v_{1}&u_{m}v_{2}&\cdots&u_{m}v_{n}\\ \end{bmatrix}$

可以简写为：

$(\textbf{u}\circ\textbf{v})_{ij}=u_{i}v_{j}$

由上可以看出，向量的外积等于其相应矩阵的乘积，即：

$\textbf{u}\circ\textbf{v}=\textbf{u}\textbf{v}^{T}$

将向量的内积拓展到张量上：
给定两个张量 $\mathcal{U}$ 和 $\mathcal{V}$ ，其维度分别为 $(k_{1},k_{2},\cdots,k_{m})$ 和 $(l_{1},l_{2},\cdots,l_{n})$ ，它们的外积：

$(\mathcal{U}\circ\mathcal{V})_{i_{1},i_{2},\cdots,i_{m},j_{1},j_{2},\cdots,j_{n}}=u_{i_{1},i_{2},\cdots,i_{m}}v_{j_{1},j_{2},\cdots,j_{n}}$

inner product（内积）

给定两个向量 $\textbf{u}$ 和 $\textbf{u}$ ，大小分别为m×1和n×1，则其内积为：

$\left<\textbf{u},\textbf{v}\right>=\sum_{i}u_{i}v_{i}$

给定两个张量 $\mathcal{u}$ 和 $\mathcal{u}$ ，则其内积为：

$\left<\mathcal{U},\mathcal{V}\right>=\sum_{i}u_{i}v_{i}$

由于两个大小相同的张量其内积是一个标量，所以也常把内积叫做标量积（scalar product）。

Trucker decomposition

就高阶奇异值分解而言，著名学者Tucker于1966年给出了计算Tucker分解的三种方法，第一种方法就是我们这里要提到的高阶奇异值分解，其整个分解过程也是由矩阵的奇异值分解泛化得到的。

CP decomposition（CANDECOMP/PARAFAC）表示parallel factors analysis（PARAFAC，平行因子分析）和canonical decomposition（CANDECOMP，规范分解）。

在文中，用 $\mathcal{M}$ 表示一个N阶张量（大小为 $I_{1}\times I_{2}\times\cdots\times I_{N}$ ）。利用R-component CANDECOMP/PARAFAC（CP）factor model可以将其分解为：

$\mathcal{M}=\sum_{r=1}^{R}\lambda_{r}\textbf{ma}_{r}^{(1)}\circ\cdots\circ\textbf{ma}_{r}^{(N)}$

其中R可以根据实际情况设定，如文中取 $R = 2$ ，即每阶提取两个因子（component或者mode，即每个r所代表的和式为一个component）， $\circ$ 代表外积， $\textbf{ma}_{r}^{(n)}$ 代表factor matrix $\textbf{mA}^{(n)}$ 的第r列，大小为 $I_{n}\times R$ 。