笔记：Online robust principal component analysis via truncated nuclear norm regularization

最新推荐文章于 2022-09-24 22:04:16 发布

XueShengke

最新推荐文章于 2022-09-24 22:04:16 发布

阅读量1.6k

点赞数 2

分类专栏：图像处理文章标签： online RPCA truncated 核范数

图像处理专栏收录该内容

13 篇文章 8 订阅

订阅专栏

Hong, B., Wei, L., Hu, Y., Cai, D., & He, X. (2016). Online robust principal component analysis via truncated nuclear norm regularization. Neurocomputing, 175, 216-222.
本文是这篇 Neurocomputing 期刊论文的笔记，主要是对文中的理论方法进行展开详解。本人学术水平有限，文中如有错误之处，敬请指正。

摘要： Robust principal component analyssi (RPCA) 已经被广泛用于处理高维的噪声数据，在许多应用功中。传统的 RPCA 方法考虑所有的样本恢复低维的子空间用批量的方式，导致了昂贵的存储代价并且不能有效地更新流数据的低维子空间。所以有必要设计一种在线的 RPCA 方法。此文中，提出了一种新颖的 online RPCA 算法，采取最近提出的 truncated nuclear norm 作为低秩约束的更好的近似。这里将目标函数按样本分解代价的和，并设计了 online 有效的交替优化方法。

1 简介

在许多机器学习和数据挖掘问题中，经常遇到高维的样本，包含一些噪声（损坏或奇异点）。为了恢复内部的低维子空间，从全部的样本集中， RPCA 被大量地研究，应用于视频监控 1，图像配准 2，文本语料建模 3 和音频处理 4 。

原理上，典型的 RPCA 方法假设样本可以被分为低秩的部分和稀疏的部分。正式地，给定一个样本 $\mathbf{Z} \in \mathbb{R}^{m \times n}$ ， RPCA 尝试将 $\mathbf{Z}$ 分解为一个低秩的矩阵 $\mathbf{X}$ 和一个稀疏的矩阵 $\mathbf{E}$ 的和

min X, E s.t. rank (X) + λ | | E | | 0 Z = X + E, (1)

$\begin{align} \min_{\mathbf{X}, \mathbf{E}} & \ \ \text{rank} (\mathbf{X}) + \lambda || \mathbf{E} ||_0 \\ \text{s.t.} & \ \ \mathbf{Z} = \mathbf{X} + \mathbf{E}, \tag{1} \end{align}$
其中

λ $\lambda$ 是一个约束的参数。

已经被证明低维的子空间可以在合适的条件下，被精确地、有效地恢复。然而，该问题是高度非凸的，不易处理的，因为秩函数和 $\ell_0$ 范数。大多数研究在寻找合适的秩函数和 $\ell_0$ 范数的替代，将原问题转化为一个凸的优化问题。其中，Lin et al. 应用増广 Lagrange 乘子来得到凸问题 5。Shang et al. 6 和 Tao et al. 7 考虑更一般的情况，观测的数据是缺失的并被严重破坏的，提出了一种统一的框架，结合了 RPCA 和矩阵补全方法。

以上所有的方法都是处理批量数据的。也就是每一次迭代中，所有的样本都是需要使用的，这造成了两种限制。首先，存储代价是昂贵的，需要内存中有所有的样本在优化过程中，尤其是对于大规模数据是不可接受的。另一方面，如果数据是以流的形式获得，这些方法不能有效地处理低维子空间当一个新样本到来时。

为了解决这个问题，online RPCA 方法出现了。内存消耗与样本的规模是无关的，并发现到的低维子空间可以快速更新。另一个重要的 online RPCA 的优势是它可以跟踪动态的低维子空间，当其会随着时间变化时。所以 online RPCA 可以被用于移动摄像头的视频跟踪 8 。Goes et al. 扩展了批量版本的 PRCA 到随机，并提供了一个子线性收敛保证 9，明显地减少了存储的要求和时间复杂度。He et al. 提出了在线自适应子空间跟踪算法基于 Grassmannian 流形 10，其结合了増广 Lagrangian 和经典的随机梯度框架。Mairal 提出了更一般的在线字典学习机制为了稀疏编码基于随机近似 11 。受到次启发之后，Feng et al. 12 和 Shen et al. 13 尝试用在线方式解决 RPCA 问题。他们分别采用了核范数和最大范数，作为秩函数的代替，两者都可以被表示为顺序数据的矩阵分解的形式。尽管核范数和最大范数是矩阵的秩函数的凸包络，但是也导致了不能忽视的近似误差，在真实的应用中 14 。所以，一些研究者尝试设计非凸的代替，来实现更精确的近似 15 。

此文的目标是解决解决 RPCA 问题，通过一个在线非凸的优化框架。特别地，此文用最小化一个最近提出的 truncated nuclear norm 16 来代替目标函数，最小化矩阵的秩。此范数也可以被表示为矩阵分解的形式，其提供了思路来估计每一个样本对于 truncated 范数的增量的贡献。基于此，此文提出了一种用新样本更新低维空间的 online 机制。接着设计了一种有效的、迭代优化方法的实现。通过 truncated 范数，此算法的优化可以更接近矩阵的秩，子空间恢复也可以更精确。此文的主要贡献是两方面：

此文提出了一个 online 机制来解决 RPCA 问题，通过采用矩阵的非凸的近似，相比于凸的代替更为精确。
此文设计了一个高效的优化算法解决提出的目标函数。

2 预定义

大写加粗字母表示矩阵，小写加粗字母表示向量。 $||\mathbf{X}||_1$ ， $||\mathbf{X}||_*$ 和 $||\mathbf{X}||_\text{F}$ 分别表示 $\ell_1$ ，核范数和 Frobenius 范数。 $\text{tr}(\cdot)$ 表示方阵的迹函数。 $||\mathbf{v}||_1$ 和 $||\mathbf{v}||_p$ 表示向量的 $\ell_p$ 范数。 $\langle \cdot, \cdot \rangle$ 表示内积。 $\mathbf{I}$ 表示单位矩阵。

给定一个矩阵 $\mathbf{X} \in \mathbb{R}^{m \times n}$ 和一个非负的整数 $s < \min(m,n)$ ，truncated 范数 $||\mathbf{X}||_s$ 定义为最小的 $\min(m,n)-s$ 个奇异值之和，也就是 $||\mathbf{X}||_s = \sum_{i=s+1}^{\min(m,n)} \sigma_i (\mathbf{X})$ ，其中 $\sigma_1(\mathbf{X}) \geq \cdots \geq \sigma_{\min(m,n)} (\mathbf{X})$ 。换句话说， $||\mathbf{X}||_s$ 不关心最大的 $s$ 个奇异值，两者的关系阐述为如下

| | X | | s = | | X | | * - max U U T = I, V V T = I tr (U X V T), (2)

$\begin{equation} ||\mathbf{X}||_s = ||\mathbf{X}||_* - \max_{\mathbf{U} \mathbf{U}^\text{T} = \mathbf{I} , \ \mathbf{V} \mathbf{V}^\text{T} = \mathbf{I}} \text{tr} ( \mathbf{U} \mathbf{X} \mathbf{V}^\text{T}), \tag{2} \end{equation}$
其中

U∈Rs×m $\mathbf{U} \in \mathbb{R}^{s \times m}$ ，

V∈Rs×n $\mathbf{V} \in \mathbb{R}^{s \times n}$ 。公式中并不能明显看出范数和每一个样本的关系，很难估计每一个样本对范数的单独的贡献。幸运的是，核范数可以被分解为

| | X | | * = min X = L R T 1 2 (| | L | | 2 F + | | R | | 2 F), (3)

$\begin{equation} ||\mathbf{X}||_* = \min_{\mathbf{X} = \mathbf{L} \mathbf{R}^\text{T}} \ \frac{1}{2} (||\mathbf{L}||_\text{F}^2 + ||\mathbf{R}||_\text{F}^2), \tag{3} \end{equation}$
其中

L∈Rm×d $\mathbf{L} \in \mathbb{R}^{m \times d}$ ，

R∈Rn×d $\mathbf{R} \in \mathbb{R}^{n \times d}$ 对任意的

d≥rank(X) $d \geq \text{rank} (\mathbf{X})$ 。

Lemma 2.1 truncated 范数可以分解为

| | X | | s s.t. = min L, R, U, V 1 2 | | L | | 2 F + 1 2 | | R | | 2 F - tr (U L R T V T), X = L R T, U U T = I, V V T = I, (4)

$\begin{align} ||\mathbf{X}||_s &= \min_{\mathbf{L}, \mathbf{R}, \mathbf{U}, \mathbf{V}} \ \frac{1}{2} || \mathbf{L} ||_\text{F}^2 + \frac{1}{2} || \mathbf{R} ||_\text{F}^2 - \text{tr} (\mathbf{U} \mathbf{L} \mathbf{R}^\text{T} \mathbf{V}^\text{T} ), \\ \text{s.t.} &\ \ \mathbf{X} = \mathbf{L} \mathbf{R}^\text{T}, \ \mathbf{U} \mathbf{U}^\text{T} = \mathbf{I}, \ \mathbf{V} \mathbf{V}^\text{T} = \mathbf{I}, \tag{4} \end{align}$
其中

U∈Rs×m $\mathbf{U} \in \mathbb{R}^{s \times m}$ ，

V∈Rs×n $\mathbf{V} \in \mathbb{R}^{s \times n}$ ，

L∈Rm×d $\mathbf{L} \in \mathbb{R}^{m \times d}$ ，

R∈Rn×d $\mathbf{R} \in \mathbb{R}^{n \times d}$ ，

d≥rank(X) $d \geq \text{rank} (\mathbf{X})$ 。
Proof 对于任意的

U,V,L,R $\mathbf{U}, \mathbf{V}, \mathbf{L}, \mathbf{R}$ 满足

X=LRT $\mathbf{X} = \mathbf{L} \mathbf{R}^\text{T}$ ，

UUT=I $\mathbf{U} \mathbf{U}^\text{T} = \mathbf{I}$ ，

VVT=I $\mathbf{V} \mathbf{V}^\text{T} = \mathbf{I}$ ，

| | X | | s = | | X | | * - max U, V tr (U X V T) \leq 1 2 | | L | | 2 F + 1 2 | | R | | 2 F - max U, V tr (U X V T) \leq 1 2 | | L | | 2 F + 1 2 | | R | | 2 F - tr (U X V T) . (5)

$\begin{align} || \mathbf{X} ||_s &= || \mathbf{X} ||_* - \max_{\mathbf{U}, \mathbf{V}} \ \text{tr} (\mathbf{U} \mathbf{X} \mathbf{V}^\text{T}) \\ & \leq \frac{1}{2} || \mathbf{L} ||_\text{F}^2 + \frac{1}{2} || \mathbf{R} ||_\text{F}^2 - \max_{\mathbf{U}, \mathbf{V}} \ \text{tr} (\mathbf{U} \mathbf{X} \mathbf{V}^\text{T}) \tag{5} \\ & \leq \frac{1}{2} || \mathbf{L} ||_\text{F}^2 + \frac{1}{2} || \mathbf{R} ||_\text{F}^2 - \text{tr} (\mathbf{U} \mathbf{X} \mathbf{V}^\text{T}). \end{align}$
另一方面，假设矩阵的奇异值分解

X=PΣQT $\mathbf{X} = \mathbf{P} \mathbf{\Sigma} \mathbf{Q}^\text{T}$ ，其中

P=(p1,⋯,pm)∈Rm×m $\mathbf{P} = ( \mathbf{p}_1, \cdots, \mathbf{p}_m) \in \mathbf{R}^{m \times m}$ ，

Q=(q1,⋯,qn)∈Rn×n $\mathbf{Q} = ( \mathbf{q}_1, \cdots, \mathbf{q}_n) \in \mathbf{R}^{n \times n}$ 和

Σ∈Rm×n $\mathbf{\Sigma} \in \mathbb{R}^{m \times n}$ 。令

U^=(p1,⋯,ps)T $\hat{\mathbf{U}} = ( \mathbf{p}_1, \cdots, \mathbf{p}_s)^\text{T}$ 和

V^=(q1,⋯,qs)T $\hat{\mathbf{V}} = ( \mathbf{q}_1, \cdots, \mathbf{q}_s)^\text{T}$ ，然后

tr (U^X V^T) = \sum i = 1 s (X) .

$\begin{equation} \text{tr} (\hat{\mathbf{U}} \mathbf{X} \hat{\mathbf{V}}^\text{T} ) = \sum_{i=1}^{s} (\mathbf{X}). \end{equation}$
令

L^=PΣ1/2 $\hat{\mathbf{L}} = \mathbf{P} \mathbf{\Sigma}^{1/2}$ 和

R^=QΣ1/2 $\hat{\mathbf{R}} = \mathbf{Q} \mathbf{\Sigma}^{1/2}$ ，可以直接得到

X=L^R^T $\mathbf{X} = \hat{\mathbf{L}} \hat{\mathbf{R}}^\text{T}$ 和

||X||∗=12||L^||2F+12||R^||2F $|| \mathbf{X} ||_* = \frac{1}{2} || \hat{\mathbf{L}} ||_\text{F}^2 + \frac{1}{2} || \hat{\mathbf{R}} ||_\text{F}^2$ 。

| | X | | s = | | X | | * - \sum i = 1 s (X) = 1 2 | | L^| | 2 F + 1 2 | | R^| | 2 F - tr (U^X V^T) . (6)

$\begin{equation} || \mathbf{X} ||_s = || \mathbf{X} ||_* - \sum_{i=1}^{s} (\mathbf{X}) = \frac{1}{2} || \hat{\mathbf{L}} ||_\text{F}^2 + \frac{1}{2} || \hat{\mathbf{R}} ||_\text{F}^2 - \text{tr} (\hat{\mathbf{U}} \mathbf{X} \hat{\mathbf{V}}^\text{T} ) . \tag{6} \end{equation}$
该分解将

||X||s $|| \mathbf{X} ||_s$ 基于维度削减。如此，

L $\mathbf{L}$ 可以被看做字典，而

R $\mathbf{R}$ 的每一列可以看做系数。

3 此文提出的算法

此算法的目标是：给定一个 $m$ 维的数据集 $\mathbf{Z} = (\mathbf{z}_1, \cdots, \mathbf{z}_n) \in \mathbb{R}^{m \times n}$ ，要将其分解为低秩矩阵 $\mathbf{X}$ 和稀疏矩阵 $\mathbf{E}$ ，对每一个样本 $\mathbf{z}_i = \mathbf{x}_i + \mathbf{e}_i$ 。不同于传统的方法，采用核范数作为秩函数的近似，truncated 核范数在本文中采用。所以此文的目标函数可以写为

min X, E 1 2 | | Z - X - E | | 2 F + λ 1 | | X | | s + λ 2 | | E | | 1, (7)

$\begin{equation} \min_{\mathbf{X},\mathbf{E}} \ \frac{1}{2} || \mathbf{Z} - \mathbf{X} - \mathbf{E} ||_\text{F}^2 + \lambda_1 || \mathbf{X} ||_s + \lambda_2 || \mathbf{E} ||_1 , \tag{7} \end{equation}$
其中

λ1,λ2 $\lambda_1, \lambda_2$ 是约束系数。注意使用

ℓ1 $\ell_1$ 范数而不是

ℓ0 $\ell_0$ 范数来约束稀疏项

E $\mathbf{E}$ ，因为

ℓ1 $\ell_1$ 范数计算更易处理，通常在实际方法中被采用，获得稀疏解。

||X||s $|| \mathbf{X} ||_s$ 是一个整体的形式。为了获得更多关于低维空间

X $\mathbf{X}$ 的结构信息，将其分解

X=LRT, L∈Rm×d, R∈Rn×d, d≥rank(X) $\mathbf{X} = \mathbf{L} \mathbf{R}^\text{T},\ \mathbf{L} \in \mathbb{R}^{m \times d},\ \mathbf{R} \in \mathbb{R}^{n \times d},\ d \geq \text{rank} (\mathbf{X})$ 。在 online RPCA 方法中，

L $\mathbf{L}$ 视为字典，

X $\mathbf{X}$ 的每一列都当成

L $\mathbf{L}$ 的元素关于

R $\mathbf{R}$ 的每一行的系数的线性组合。结合了矩阵的分解，原目标函数可以转化为如下的形式

min L, R, U, V, E s.t. 1 2 | | Z - L R T - E | | 2 F + λ 1 (1 2 | | L | | 2 F + 1 2 | | R | | 2 F - tr (U L R T V T)) + λ 2 | | E | | 1 U U T = I, V V T = I . (8)

$\begin{align} \min_{\mathbf{L},\mathbf{R},\mathbf{U},\mathbf{V},\mathbf{E}} &\ \frac{1}{2} || \mathbf{Z} - \mathbf{L} \mathbf{R}^\text{T} - \mathbf{E} ||_\text{F}^2 + \lambda_1 \left( \frac{1}{2} || \mathbf{L} ||_\text{F}^2 + \frac{1}{2} || \mathbf{R} ||_\text{F}^2 - \text{tr} (\mathbf{U} \mathbf{L} \mathbf{R}^\text{T} \mathbf{V}^\text{T}) \right) + \lambda_2 || \mathbf{E} ||_1 \\ \text{s.t.} \quad & \ \ \mathbf{U} \mathbf{U}^\text{T} = \mathbf{I}, \ \mathbf{V} \mathbf{V}^\text{T} = \mathbf{I} . \tag{8} \end{align}$
该形式提供了一种解释：每一个样本

zi $\mathbf{z}_i$ 近似

Lri+ei $\mathbf{L} \mathbf{r}_i + \mathbf{e}_i$ ，其中

rTi $\mathbf{r}_i^\text{T}$ 是

R $\mathbf{R}$ 的第

i $i$ 行。根据

||⋅||F $||\cdot||_\text{F}$ 和

||⋅||1 $|| \cdot ||_1$ 的加法性质，以上的问题可以分解为每一个样本的形式

min L, R, U, V, E s.t. 1 2 | | z i - L r i - e i | | 22 + λ 1 (1 2 | | L | | 2 F + 1 2 \sum i = 1 n | | r i | | 22 - \sum i = 1 n w T i r i) + λ 2 \sum i = 1 n | | e i | | 1 U U T = I, V V T = I, (9)

$\begin{align} \min_{\mathbf{L},\mathbf{R},\mathbf{U},\mathbf{V},\mathbf{E}} &\ \frac{1}{2} || \mathbf{z}_i - \mathbf{L} \mathbf{r}_i - \mathbf{e}_i ||_2^2 + \lambda_1 \left( \frac{1}{2} || \mathbf{L} ||_\text{F}^2 + \frac{1}{2} \sum_{i=1}^{n} || \mathbf{r}_i ||_2^2 - \sum_{i=1}^{n} \mathbf{w}_i^\text{T} \mathbf{r}_i \right) + \lambda_2 \sum_{i=1}^{n} || \mathbf{e}_i ||_1 \\ \text{s.t.} \quad & \ \ \mathbf{U} \mathbf{U}^\text{T} = \mathbf{I}, \ \mathbf{V} \mathbf{V}^\text{T} = \mathbf{I} , \tag{9} \end{align}$
其中

wi $\mathbf{w}_i$ 是矩阵

W=VTUL∈Rn×d $\mathbf{W} = \mathbf{V}^\text{T} \mathbf{U} \mathbf{L} \in \mathbb{R}^{n \times d}$ 的第

i $i$ 行。这里使用了如下的迹函数的交换性质：

tr(ABC)=tr(CAB) $\text{tr} (\mathbf{A} \mathbf{B} \mathbf{C}) = \text{tr} (\mathbf{C} \mathbf{A} \mathbf{B})$ 。为了简化形式，定义

f(L,zi,ri,ei)≜12||zi−Lri−ei||22+λ1(12||ri||22−wTiri)+λ2||ei||1 $f (\mathbf{L}, \mathbf{z}_i, \mathbf{r}_i, \mathbf{e}_i) \triangleq \frac{1}{2} || \mathbf{z}_i - \mathbf{L} \mathbf{r}_i - \mathbf{e}_i ||_2^2 + \lambda_1 \left( \frac{1}{2} || \mathbf{r}_i ||_2^2 - \mathbf{w}_i^\text{T} \mathbf{r}_i \right) + \lambda_2 || \mathbf{e}_i ||_1$ 来整合一个样本

zi $\mathbf{z}_i$ 对目标函数的贡献。可以将以上的目标函数化简为

min L, R, U, V, E s.t. \sum i = 1 n f (L, z i, r i, e i) + λ 1 2 | | L | | 2 F U U T = I, V V T = I . (10)

$\begin{align} \min_{\mathbf{L},\mathbf{R},\mathbf{U},\mathbf{V},\mathbf{E}} &\ \sum_{i=1}^{n} f (\mathbf{L}, \mathbf{z}_i, \mathbf{r}_i, \mathbf{e}_i) + \frac{\lambda_1}{2} || \mathbf{L} ||_\text{F}^2 \\ \text{s.t.} \quad & \ \ \mathbf{U} \mathbf{U}^\text{T} = \mathbf{I}, \ \mathbf{V} \mathbf{V}^\text{T} = \mathbf{I} . \tag{10} \end{align}$
从中，可以看出目标函数是样本逐渐累加起来的，给定字典

L $\mathbf{L}$ ，就等价于最小化平均代价

J (L, n) = ≜ 1 n \sum i = 1 n f ~ (L, z i) + λ 1 2 n | | L | | 2 F, (11)

$\begin{equation} J (\mathbf{L}, n) = \triangleq \frac{1}{n} \sum_{i=1}^{n} \tilde{f} (\mathbf{L}, \mathbf{z}_i) + \frac{\lambda_1}{2n} || \mathbf{L} ||_\text{F}^2 , \tag{11} \end{equation}$
其中

f~ $\tilde{f}$ 是每一个样本的损失函数，在最优的字典表示下

f ~ (L, z) s.t. = min r, e, U, V f (L, z, r, e) U U T = I, V V T = I . (12)

$\begin{align} \tilde{f} (\mathbf{L}, \mathbf{z}) &= \min_{\mathbf{r}, \mathbf{e}, \mathbf{U}, \mathbf{V}} \ f(\mathbf{L}, \mathbf{z}, \mathbf{r}, \mathbf{e}) \\ \text{s.t.} \ \ & \ \mathbf{U}\mathbf{U}^\text{T} = \mathbf{I} , \ \mathbf{V}\mathbf{V}^\text{T} = \mathbf{I} . \tag{12} \end{align}$
至此，已经将原优化问题转化为平均代价的最小化问题。其中每一个样本是在已知字典

L $\mathbf{L}$ 的情况下获得。

4 优化

此文采用在线的方式交替地更新变量 $\mathbf{L}, \mathbf{R}, \mathbf{U}, \mathbf{V}, \mathbf{E}$ 假设样本是以流的形式到来，并且当前的样本是 $\mathbf{z}_t$ ，优化步骤可以分为两个连续的部分。第一，首先优化向量 $\mathbf{r}_t, \mathbf{e}_t$ 在已知 $\mathbf{L}_{t-1}, \mathbf{U}_{t-1}, \mathbf{V}_{t-1}$ 的情况下，通过求解如下的优化问题

{r t, e t} = arg min r, e 1 2 | | z t - L t - 1 r - e | | 22 + λ 1 (1 2 | | r | | 22 - w T t r) + λ 2 | | e | | 1, (13)

$\begin{equation} \{ \mathbf{r}_t, \mathbf{e}_t \} = \arg\min_{\mathbf{r}, \mathbf{e}} \ \frac{1}{2} || \mathbf{z}_t - \mathbf{L}_{t-1} \mathbf{r} - \mathbf{e} ||_2^2 + \lambda_1 \left( \frac{1}{2} || \mathbf{r} ||_2^2 - \mathbf{w}_t^\text{T} \mathbf{r} \right) + \lambda_2 || \mathbf{e} ||_1 , \tag{13} \end{equation}$
其中

wt $\mathbf{w}_t$ 是矩阵

Wt−1=VTt−1Ut−1Lt−1 $\mathbf{W}_{t-1} = \mathbf{V}_{t-1}^\text{T} \mathbf{U}_{t-1} \mathbf{L}_{t-1}$ 的第

t $t$ 行。第二步，优化变量

Lt,Vt,Ut $\mathbf{L}_t, \mathbf{V}_t, \mathbf{U}_t$ ，使用之前已知的

{ri}ti=1,{ei}ti=1 $\{ \mathbf{r}_i \}_{i=1}^t, \{ \mathbf{e}_i \}_{i=1}^t$ ，通过求解以下的优化问题（无关项已删除）

{L t, V t, U t} s.t. = arg min L, V, U 1 2 \sum i = 1 t | | z i - L r i - e i | | 22 + λ 1 (1 2 | | L | | 2 F - tr (U L R T t V T)), U U T = I, V V T = I, (14)

$\begin{align} \{ \mathbf{L}_t, \mathbf{V}_t, \mathbf{U}_t \} &= \arg\min_{\mathbf{L}, \mathbf{V}, \mathbf{U}} \ \frac{1}{2} \sum_{i=1}^t || \mathbf{z}_i - \mathbf{L} \mathbf{r}_i - \mathbf{e}_i ||_2^2 + \lambda_1 \left( \frac{1}{2} || \mathbf{L} ||_\text{F}^2 - \text{tr} (\mathbf{U} \mathbf{L} \mathbf{R}_t^\text{T} \mathbf{V}^\text{T}) \right) , \\ \text{s.t.} \ & \ \mathbf{U}\mathbf{U}^\text{T} = \mathbf{I} , \ \mathbf{V}\mathbf{V}^\text{T} = \mathbf{I} , \tag{14} \end{align}$
其中

RTt=(r1,⋯,rt,0,⋯,0)∈Rd×n $\mathbf{R}_t^\text{T} = (\mathbf{r}_1, \cdots, \mathbf{r}_t, 0, \cdots, 0) \in \mathbb{R}^{d \times n}$ 。值得注意的是，对于每一个新的样本

zt $\mathbf{z}_t$ ，

Lt,Vt,Ut $\mathbf{L}_t, \mathbf{V}_t, \mathbf{U}_t$ 是完全更新的（其中所有的元素都改变），而最优的

rt $\mathbf{r}_t$ 只是增加到

R $\mathbf{R}$ 的第

t $t$ 行之中。类似地，

et $\mathbf{e}_t$ 增加到

E $\mathbf{E}$ 的第

t $t$ 列。

更新 $\mathbf{r}_t$ ：

f (r) = 1 2 | | z t - L t - 1 r - e k t | | 22 + λ 1 (1 2 | | r | | 22 - w T t r t) . (15)

$\begin{equation} f(\mathbf{r}) = \frac{1}{2} || \mathbf{z}_t - \mathbf{L}_{t-1} \mathbf{r} - \mathbf{e}_t^k ||_2^2 + \lambda_1 \left( \frac{1}{2} || \mathbf{r} ||_2^2 - \mathbf{w}_t^\text{T} \mathbf{r}_t \right). \tag{15} \end{equation}$
令 $\partial f / \partial r = 0$ , 可以得到如下的闭式解
$r k + 1 t = (L T t - 1 L t - 1 + λ 1 I) - 1 (L T t - 1 (z t - e k t) + λ 1 w t) . (16)$ $\begin{equation} \mathbf{r}_t^{k+1} = \left( \mathbf{L}_{t-1}^\text{T} \mathbf{L}_{t-1} + \lambda_1 \mathbf{I} \right)^{-1} \left( \mathbf{L}_{t-1}^\text{T} (\mathbf{z}_t - \mathbf{e}_t^k) + \lambda_1 \mathbf{w}_t \right). \tag{16} \end{equation}$
更新 $\mathbf{e}_t$ ：
$g (e) = 1 2 | | e | | 22 - (z t - L t - 1 r k + 1 t) T e + λ 2 | | e | | 1 . (17)$ $\begin{equation} g(\mathbf{e}) = \frac{1}{2} || \mathbf{e} ||_2^2 - \left( \mathbf{z}_t - \mathbf{L}_{t-1} \mathbf{r}_t^{k+1} \right)^\text{T} \mathbf{e} + \lambda_2 || \mathbf{e} ||_1 . \tag{17} \end{equation}$
求解 $\mathbf{e}$ 可以使用标准的内点法因为 $g(\mathbf{e})$ 是凸的。然而此方法是很费时的。注意到 $g(\mathbf{e})$ 是两个凸函数的和，涉及 $\ell_1$ 范数约束，可以使用分离固定点算法。定义 shrinkage 操作
$S λ (x) = ⎧ ⎩ ⎨ x - λ, x + λ, 0, if x > λ, if x < - λ, otherwise .$ $\begin{equation} S_\lambda (x) = \begin{cases} x - \lambda, & \text{if } x > \lambda, \\ x + \lambda, & \text{if } x < - \lambda, \\ 0, & \text{otherwise}. \end{cases} \end{equation}$
此函数是 element-wise 的。获得如下的闭式解
$e k + 1 t = S λ 2 (z t - L t - 1 r k + 1 t) . (18)$ $\begin{equation} \mathbf{e}_t^{k+1} = S_{\lambda_2} \left( \mathbf{z}_t - \mathbf{L}_{t-1} \mathbf{r}_{t}^{k+1} \right). \tag{18} \end{equation}$
更新 $\mathbf{L}_t$ ：
$h (L) = 1 2 \sum i = 1 t | | z i - L r i - e i | | 22 + λ 1 (1 2 | | L | | 2 F - tr (U t - 1 L R T t V T t - 1)) . (19)$ $\begin{equation} h(\mathbf{L}) = \frac{1}{2} \sum_{i=1}^{t} || \mathbf{z}_i - \mathbf{L} \mathbf{r}_i - \mathbf{e}_i ||_2^2 + \lambda_1 \left( \frac{1}{2} || \mathbf{L} ||_\text{F}^2 - \text{tr} (\mathbf{U}_{t-1} \mathbf{L} \mathbf{R}_t^\text{T} \mathbf{V}_{t-1}^\text{T}) \right). \tag{19} \end{equation}$
使用块坐标下降法更新字典的每一列，令 $\mathbf{A} = \lambda_1 \mathbf{I} + \sum_{i=1}^{t} \mathbf{r}_i \mathbf{r}_i^\text{T} = (\mathbf{a}_1, \cdots, \mathbf{a}_d)$ ， $\mathbf{B} = \sum_{i=1}^t (\mathbf{z}_i - \mathbf{e}_i) \mathbf{r}_i^\text{T} = (\mathbf{b}_1, \cdots, \mathbf{b}_d)$ ， $\mathbf{C} = \mathbf{U}^\text{T} \mathbf{V} \mathbf{R}_t = (\mathbf{c}_1, \cdots, \mathbf{c}_d)$ ， $\mathbf{L}_t = (\mathbf{l}_{t,1}, \cdots, \mathbf{l}_{t,d})$ ，那么字典 $\mathbf{L}_t$ 的每一列都可以更新
$l t, j = 1 A j j (b j + λ 1 c j - L t - 1 a j) + l t - 1, j, j = 1, \dots, d . (20)$ $\begin{equation} \mathbf{l}_{t,j} = \frac{1}{\mathbf{A}_{jj}} \left( \mathbf{b}_j + \lambda_1 \mathbf{c}_j - \mathbf{L}_{t-1} \mathbf{a}_j \right) + \mathbf{l}_{t-1, j}, \ j = 1, \cdots, d . \tag{20} \end{equation}$
更新 $\mathbf{U}_t$ ：
$U t = s.t. arg max U tr (U L t R T t V T t - 1) U U T = I . (21)$ $\begin{align} \mathbf{U}_t =&\ \arg\max_{\mathbf{U}} \ \text{tr} \left( \mathbf{U} \mathbf{L}_t \mathbf{R}_t^\text{T} \mathbf{V}_{t-1}^\text{T} \right) \\ \text{s.t.} & \ \ \mathbf{U} \mathbf{U}^\text{T} = \mathbf{I}. \tag{21} \end{align}$
这是一个正交约束问题，通常是很困难的因为其非凸性质，保证的代价太昂贵在迭代中。这里提出了一个简单、但是有效的算法求解该问题，基于以下法则：
Lemma 4.1 假设 $\mathbf{X} \in \mathbb{R}^{m \times n} \ (m < n)$ 满足 $\mathbf{X} \mathbf{X}^\text{T} = \mathbf{I}$ 。则可以获得其中一个最优解
$max X s.t. tr (X M) X X T = I . (22)$ $\begin{align} \max_{\mathbf{X}} & \ \ \text{tr} (\mathbf{X} \mathbf{M}) \\ \text{s.t.} & \ \ \mathbf{X} \mathbf{X}^\text{T} = \mathbf{I}. \tag{22} \end{align}$
是 $\mathbf{X}^* = (\mathbf{Q}, \mathbf{0}) \mathbf{P}^\text{T}$ ，其中 $\mathbf{P}, \mathbf{Q}$ 由 $\mathbf{M}$ 奇异值分解得到： $\mathbf{M} = \mathbf{P} \mathbf{\Sigma} \mathbf{Q}^\text{T}$ ， $\mathbf{P} \in \mathbb{R}^{n \times n}$ ， $\mathbf{Q} \in \mathbb{R}^{m \times m}$ ， $\mathbf{\Sigma} \in \mathbb{R}^{n \times m}$ ， $\mathbf{P}^\text{T} \mathbf{P} = \mathbf{I}$ ， $\mathbf{Q}^\text{T} \mathbf{Q} = \mathbf{I}$ 。
Proof 通过假设， $\text{tr} (\mathbf{X} \mathbf{M}) = \text{tr} (\mathbf{X} \mathbf{P} \mathbf{\Sigma} \mathbf{Q}^\text{T}) = \text{tr} (\mathbf{Q}^\text{T} \mathbf{X} \mathbf{P} \mathbf{\Sigma})$ 。令 $\tilde{\mathbf{X}} = \mathbf{Q}^\text{T} \mathbf{X} \mathbf{P}$ ，接着有 $\tilde{\mathbf{X}} \tilde{\mathbf{X}}^\text{T} = \mathbf{Q}^\text{T} \mathbf{X} \mathbf{P} \mathbf{P}^\text{T} \mathbf{X}^\text{T} \mathbf{Q} = \mathbf{I}$ 。所以，有 $\text{tr} ( \mathbf{X} \mathbf{M}) = \text{tr} (\tilde{\mathbf{X}} \mathbf{\Sigma}) = \sum_{i=1}^m \tilde{\mathbf{X}}_{ii} \sigma_i$ ，其中 $\sigma_i$ 是矩阵 $\mathbf{M}$ 的奇异值分解。由于 $| \tilde{\mathbf{X}}_{ij} |< 1$ 和 $\sigma_i \geq 0$ ， $\forall \, i,j$ ， $\text{tr} (\mathbf{X} \mathbf{M})$ 取得最大值在集合 $\{ \tilde{\mathbf{X}} \mid \tilde{\mathbf{X}} \tilde{\mathbf{X}}^\text{T} = \mathbf{I},\ \tilde{\mathbf{X}_{ii}} = 1, \text{ if } \sigma_i > 0 \}$ 。一种特殊情况就是 $\tilde{\mathbf{X}}^* = (\mathbf{I}, \mathbf{0})$ 。这样的话，可以获得最优解之一
$X ~ * = Q Q T X * P P T = Q X ~ * P T = Q (I, 0) P T = (Q, 0) P T . (23)$ $\begin{equation} \tilde{\mathbf{X}}^* = \mathbf{Q} \mathbf{Q}^\text{T} \mathbf{X}^* \mathbf{P} \mathbf{P}^\text{T} = \mathbf{Q} \tilde{\mathbf{X}}^* \mathbf{P}^\text{T} = \mathbf{Q} (\mathbf{I}, \mathbf{0}) \mathbf{P}^\text{T} = (\mathbf{Q}, \mathbf{0}) \mathbf{P}^\text{T}. \tag{23} \end{equation}$
由于 $\mathbf{U}_t \in \mathbb{R}^{s \times m},\ \mathbf{U}_t \mathbf{U}_t^\text{T} = \mathbf{I},\ s < m$ 。原问题的形式与该定理一致，可以直接求解 $\mathbf{L}_t \mathbf{R}_t \mathbf{V}_{t-1}^\text{T}$ 的奇异值分解。 $\mathbf{U}_t$ 可以可以直接由该 Lemma 得到。

更新 $\mathbf{V}_t$ ：

$V t = s.t. arg max V tr (V R t L T t U T t) V V T = I . (24)$ $\begin{align} \mathbf{V}_t =&\ \arg\max_{\mathbf{V}} \ \ \text{tr} \left( \mathbf{V} \mathbf{R}_t \mathbf{L}_t^\text{T} \mathbf{U}_t^\text{T} \right) \\ \text{s.t.} & \ \ \mathbf{V} \mathbf{V}^\text{T} = \mathbf{I}. \tag{24} \end{align}$
该问题与求解 $\mathbf{U}_t$ 形式一致，同样可以使用 Lemma 求解。总的算法流程总结于 Algorithm 1 中。

Algorithm 1 Online RPCA 通过 Truncated nuclear norm
Input: 数据 $\mathbf{Z} = (\mathbf{z}_1, \cdots, \mathbf{z}_n) \in \mathbf{R}^{m \times n}$ , 约束系数 $\lambda_1, \lambda_2$ ，矩阵 $\mathbf{L}_0 \in \mathbb{R}^{m \times d}, \mathbf{U}_0 \in \mathbb{R}^{s \times m}, \mathbf{V}_0 \in \mathbb{R}^{s \times n}$ ；
Initialize: 随机初始化 $\mathbf{L}_0$ ，随机单位化 $\mathbf{U}_0, \mathbf{V}_0$ 。
for $t = 1, \cdots, n$ do
Step 1: 计算 $\mathbf{r}_t, \mathbf{e}_t$ ；
初始化 $\mathbf{r}_t = 0, \mathbf{e}_t = 0$ ；
令 $\mathbf{w}_t$ 取自 $\mathbf{V}_{t-1}^\text{T} \mathbf{U}_{t-1} \mathbf{L}_{t-1}$ 的第 $t$ 行；
repeat
计算 $\mathbf{r}_t \leftarrow \left( \mathbf{L}_{t-1}^\text{T} \mathbf{L}_{t-1} + \lambda_1 \mathbf{I} \right)^{-1} \left( \mathbf{L}_{t-1}^\text{T} (\mathbf{z}_t - \mathbf{e}_{t-1}) + \lambda_1 \mathbf{w}_t \right)$ ；
计算 $\mathbf{e}_t \leftarrow S_{\lambda_2} \left( \mathbf{z}_t - \mathbf{L}_{t-1} \mathbf{r}_{t} \right)$ ；
until 收敛
Step 2 更新 $\mathbf{L}_t, \mathbf{U}_t, \mathbf{V}_t$ ；
repeat
令 $\mathbf{R}_t^\text{T} = (\mathbf{r}_1, \cdots, \mathbf{r}_t, 0, \cdots, 0) \in \mathbb{R}^{d \times n}$ 更新 $\mathbf{L}_t$ 的列；
$[\mathbf{P}_{\text{U}}, \mathbf{\Sigma}_{\text{U}}, \mathbf{Q}_{\text{U}}] = \text{svd} \left( \mathbf{L}_t \mathbf{R}_t^\text{T} \mathbf{V}_{t-1}^\text{T} \right)$ ，
$\mathbf{U}_t \leftarrow (\mathbf{Q}_{\text{U}}, 0) \mathbf{P}_{\text{U}}^\text{T}$ ，
$[\mathbf{P}_{\text{V}}, \mathbf{\Sigma}_{\text{V}}, \mathbf{Q}_{\text{V}}] = \text{svd} \left( \mathbf{R}_t \mathbf{L}_t^\text{T} \mathbf{U}_{t}^\text{T} \right)$ ，
$\mathbf{V}_t \leftarrow (\mathbf{Q}_{\text{V}}, 0) \mathbf{P}_{\text{V}}^\text{T}$ 。
until 收敛。
end for
Output: $\mathbf{L}_n, \mathbf{R}_n$ 。

4 实验

此文的实验过于简单。略

J. Wright, A. Ganesh, S. Rao, Y. Peng, Y. Ma, Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, in: Advances in Neural Information Processing Systems, 2009, pp. 2080–2088. ↩
Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2012). RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2233-2246. ↩
Min, K., Zhang, Z., Wright, J., & Ma, Y. (2010, October). Decomposing background topics from keywords by principal component pursuit. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 269-278). ↩
Huang, P. S., Chen, S. D., Smaragdis, P., & Hasegawa-Johnson, M. (2012, March). Singing-voice separation from monaural recordings using robust principal component analysis. In Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on (pp. 57-60). ↩
Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055. ↩
Shang, F., Liu, Y., Cheng, J., & Cheng, H. (2014, November). Robust principal component analysis with missing data. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (pp. 1149-1158). ↩
Tao, M., & Yuan, X. (2011). Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM Journal on Optimization, 21(1), 57-81. ↩
Feng, J., Xu, H., & Yan, S. (2013). Online robust pca via stochastic optimization. In Advances in Neural Information Processing Systems (pp. 404-412). ↩
Goes, J., Zhang, T., Arora, R., & Lerman, G. (2014). Robust Stochastic Principal Component Analysis. In AISTATS (pp. 266-274). ↩
He, J., Balzano, L., & Lui, J. (2011). Online robust subspace tracking from partial information. arXiv preprint arXiv:1109.3827. ↩
Min, K., Zhang, Z., Wright, J., & Ma, Y. (2010, October). Decomposing background topics from keywords by principal component pursuit. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 269-278). ACM. ↩
Feng, J., Xu, H., & Yan, S. (2013). Online robust pca via stochastic optimization. In Advances in Neural Information Processing Systems (pp. 404-412). ↩
Shen, J., Xu, H., & Li, P. (2014). Online optimization for max-norm regularization. In Advances in Neural Information Processing Systems (pp. 1718-1726). ↩
Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3), 471-501. ↩
Srebro, N., Rennie, J. D., & Jaakkola, T. S. (2004, December). Maximum-Margin Matrix Factorization. In NIPS (Vol. 17, pp. 1329-1336). ↩
Zhang, D., Hu, Y., Ye, J., Li, X., & He, X. (2012, June). Matrix completion by truncated nuclear norm regularization. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on (pp. 2192-2199). ↩

确定要放弃本次机会？
福利倒计时
: :

立减 ¥
普通VIP年卡可用
立即使用

XueShengke

关注关注

2
点赞

踩

2

收藏

觉得还不错? 一键收藏

2
评论

复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

Robust Principal Component Analysis?（PCP）

MTandHJ的博客

04-15 1680

文章目录引一些微弱的假设：问题的解决 Candes E J, Li X, Ma Y, et al. Robust principal component analysis[J]. Journal of the ACM, 2011, 58(3). 引这篇文章，讨论的是这样的一个问题： M=L0+S0 M = L_0 + S_0 M=L0+S0 有这样的一个矩阵M∈Rn1×n2M \in \...

Robust principal component analysis?(RPCA简单理解)

baiguihe5021的博客

04-08 1253

参考文献：Candès, E.J., Li, X., Ma, Y., and Wright, J.: ‘Robust principal component analysis?’, J. ACM, 2011, 58, (3), pp. 11 作者主页有很多关于low-rank的代码：http://perception.csl.illinois.edu/matrix-rank/sa...

2 条评论您还未登录，请先登录后发表或查看评论

Robust principal component analysis?

01-18

Candes, E. J., Li, X., Ma, Y., and Wright, J. 2011. Robust principal component analysis? J. ACM 58, 3, ` Article 11 (May 2011), 37 pages.

SPEECHWATERMARKING BASED ON ROBUST PRINCIPAL COMPONENT ANALYSIS AND FORMANT MANIPULATIONS

02-08

This paper proposes a watermarking method for speech signals based on Robust Principal Component Analysis (RPCA) and formant manipulations. As the spectrogram of speech has a relatively sparse ...

Micro-expression recognition using robust principal component analysis and local spatiotemporal directional features

02-21

本文主要讨论了微表情识别技术，并利用鲁棒主成分分析（Robust Principal Component Analysis，简称RPCA）和局部时空方向特征（Local Spatiotemporal Directional Features，简称LSTD）来提取微表情的微妙运动信息。...

Tensor robust principal component analysis with complex noise

02-08

Secondly, it adopts L1-norm to tackle noise part which makes it only valid for sparse noise. In this paper, we propose a tensor RPCA model based on CP decomposition and model data noise by Mixture of...

Robust Object Tracking Based on Principal Component Analysis and Local Sparse Representation

02-09

这篇文章的标题是《基于主成分分析和局部稀疏表示的鲁棒目标跟踪》。文章的主要内容围绕着使用PCA（主成分分析）和LSR（局部稀疏表示）进行目标跟踪的方法展开。接下来我将详细阐述文章中涉及的几个重要知识点： ...

笔记：Matrix completion by Truncated Nuclear Norm Regularization

qq_26499769的博客

01-16 865

Zhang, D., Hu, Y., Ye, J., Li, X., & He, X. (2012, June). Matrix completion by truncated nuclear norm regularization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2192-219

RASL: Robust Alignment by Sparse and Low-rank Decompos

03-04

RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images

Sparse and Low-rank Decomposition

11-03

RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images

Robust video denoising using low rank matrix completion

06-11

低秩矩阵填充实现鲁棒视频去噪，作者Hui Ji†, Chaoqiang Liu‡, Zuowei Shen† and Yuhong Xu‡，National University of Singapore

笔记：Inductive Robust Principal Component Analysis

Xue Shengke 博客

11-17 1606

本文针对经典的 Inductive Robust Principal Component Analysis 的理论方法进行展开详解。本人学术水平有限，文中如有错误之处，敬请指正。

【机器学习】Inductive Robust Principal Component Analysis(IRPCA)

lameraaa的博客

10-07 468

IRPCA 参考论文：Inductive Robust Principal Component Analysis 作者：Bing-Kun Bao, Guangcan Liu, Member, IEEE, Changsheng Xu, Senior Member, IEEE, and Shuicheng Yan, Senior Member, IEEE PCA PCA由于F范数，对噪声和...

rpca matlab程序,RPCA 鲁棒主成分分析 MATLAB 代码(Robust principal component analysis MATLAB code) - 下载 - 搜...

weixin_28848833的博客

03-17 771

文件名大小更新时间RPCA\apg_partial.zip3823752015-01-10RPCA\exact_alm_rpca\exact_alm_rpca\.DS_Store61482009-10-31RPCA\exact_alm_rpca\exact_alm_rpca\choosvd.m5602009-10-28RPCA\exact_alm_rpca\exact_alm_rpca\exact...

RPCA 稳健主成分分析/鲁棒主成分分析

最新发布

Lansti的博客

09-24 5004

RPCA零基础

Robust PCA

qq_43257640的博客

01-22 508

Robust PCARachel Zhang 1. RPCA Brief Introduction1. Why use Robust PCA?Solve the problem withspike noise with high magnitude instead of Gaussian distributed noise.2.&...

基于RPCA与频峰操控的语音水印技术

本文探讨了一种基于Robust Principal Component Analysis (RPCA)和声门调制的语音水印嵌入方法。语音信号的频谱图具有相对稀疏的特性，这使得通过RPCA可以有效地提取核心语音信息，即使在噪声或干扰环境下，也能更...