Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

最新推荐文章于 2022-07-04 19:39:17 发布

艳艳儿

最新推荐文章于 2022-07-04 19:39:17 发布

阅读量1.5k

点赞数

分类专栏： statistics 文章标签： pca

本文链接：https://blog.csdn.net/COMEYAN/article/details/50591620

版权

本文主要探讨了使用子空间方法的稀疏主成分分析（PCA）。通过引入对角线矩阵的凸包约束，将非凸的稀疏PCA问题转化为接近最优的凸松弛问题。理论部分详细解释了如何从Ky Fan的最大主值原理出发，得到公式，并讨论了引入列稀疏性和行稀疏性的方法。提出了使用行稀疏性来选择重要特征，并以范数作为惩罚项。证明了在某些正则条件下，该方法在强一致性方面表现良好。最后，介绍了算法，包括使用交替方向乘子法（ADMM）求解问题，并提供了R包代码实现。

摘要由CSDN通过智能技术生成

Goal
Theorey
- 1 How to get their formulation
- 2 Consistency
Algorithm

1. Goal

This paper mainly deals with Sparse Principal Component Analysis(PCA) using subspace method.

2. Theorey

2.1 How to get their formulation

Notation: $\lambda_1,\lambda_2,\cdots,\lambda_p$ are in decreasing order.
From Ky Fan’s maximum principal 1, we know that

\sum i = 1 d λ i (Σ) = t r (V *' Σ V *) = max V' V = I d t r (V' Σ V) = max V' V = I d ⟨ Σ, V V' ⟩

$\sum_{i=1}^d\lambda_i(\Sigma)=\mathrm{tr}({V^*}' \Sigma V^*) =\max_{V'V=I_d} \mathrm{tr}(V' \Sigma V) =\max_{V'V=I_d} \langle\Sigma, VV'\rangle$

If we regard the last formula as a function of $VV'$ , it is linear. So if we change the constrain to its convex hull does not change the optimization problem. From the less well known observation that

F d p = {H : trace (H) = d, 0 \leq H \leq I d} = conv ({V V' : V' V = I d})

$\mathscr{F}^d_p =\{H: \text{trace}(H)=d, 0\leq H\leq I_d\} = \text{conv}(\{VV': V'V=I_d\})$

From all the analysis, we get

\sum i = 1 d λ i (Σ) = ⟨ Σ, Π ⟩ = max H \in F d p ⟨ Σ, H ⟩

$\sum_{i=1}^d\lambda_i(\Sigma)= \langle\Sigma, \Pi\rangle =\max_{H\in \mathscr{F}^d_p} \langle\Sigma, H\rangle$

How to introduce the sparsity? And which norm is suitable to use? The goal of this paper is to get sparse PCs, then we should choose penalty making $V^*\in \mathbb{R}^{p\times d}$ sparse. For matrix, there are two ways to get sparsity:

columnwise sparsity: for matrix $A$ , each of its column is sparse, i.e. only few elements of $A_{*i}$ are nonzero.
row sparsity: for matrix $A$ , its rows are sparse, i.e. only few rows of $A$ are sparse, which produce the group sparsity.

For sparse PCA, to select the import features, this paper uses row sparsity. An intuitive penalty is $\|V\|_{2,0}$ . But in high dimensional situation, $\ell_0$ norm is NP hard to deal. A common trick is replacing $\ell_0$ with $\ell_1$ . Then the penalty becomes $\|V\|_{2,1}$ . But our model is function of $H = VV'$ . So what sparsity on $H$ can approximate well of $\|V\|_{2,1}$ . Note that