糖葫芦不做了，我们来跳一跳吧 O(∩_∩)O~~ PCA + 截断 SVD （含 python 实现及代码细节梳理）

最新推荐文章于 2021-12-23 23:37:39 发布

Abandon_first

最新推荐文章于 2021-12-23 23:37:39 发布

阅读量494

点赞数 1

分类专栏：算法文章标签：机器学习 python

本文链接：https://blog.csdn.net/Abandon_first/article/details/117441167

版权

算法专栏收录该内容

12 篇文章 0 订阅

订阅专栏

上一篇我串了糖葫芦，矩阵乘法/线性变换 + 特征分解/奇异值分解（SVD） + PCA。

PCA 本来是糖葫芦的最后也是最上面那颗山楂，无奈山楂太大了 2333，所以这里另起一篇了。然后 PCA 这篇因为带着代码和说明，所以好多函数调用一直在跳函数，我就叫“跳一跳”了～

再来一遍

（1）如果我们有 m 条数据，每条数据有 n 个特征，我们会把整份数据描述为一个 m × n 的矩阵，m 行 n 列。

（2）提到向量，除非特意点名，否则默认就是列向量，比如这样写 $x = (2, 1, 3)^T$ （转置转置转置啊）。

写在前面

本篇重点说一下截断 SVD 和 PCA。会穿插一些代码来说明，有些自己的理解没有写在正文，而是添加在代码注释里面了。很多细节也并不会做，比如白化等。

PCA 是用来做降维的，所以讨论 PCA 细节之前应该先确定真的需要做降维。我们把拥有 m 个样本 n 个特征的数据用 data 矩阵来表示，这个矩阵有 m 行 n 列。比如 python sklearn 包里面自带的手写体数字识别数据集，就有 1797 条数据（张照片），每条数据都是单通道黑白的，原本是 8 × 8 的图像，reshape 之后就是 64 个特征（这里就把像素值当特征了）。对这个数据集而言，data 就是 1797 × 64。那“降维”是指降特征，也就是降 64，而不是降 1797。

from sklearn import datasets
digits = datasets.load_digits()
data = digits['data']
feature_names = digits['feature_names']
print('data\'s shape: ', data.shape)
print('feature nums: ', len(feature_names))
print('feature name example: ', feature_names[0])

# 打印结果
data's shape:  (1797, 64)	#（m 个样本, n 个特征）
feature nums:  64
feature name example:  pixel_0_0

因为我这里只是为了举例子，所以我们先假定，这个 data 数据集中的特征数 64 就是太多了，我们假定其中有一半的特征都在划水，我们就是要去掉划水的特征，保留真的干活的特征。所以，在矩阵形状方面，我们降维的目的是：找到那么一个新的数据集 new data 来替代现在 data 这个数据集，new data 和原本的 data 样本数都一样的，还是 m；只是包含的特征不一样了，特征数变成了 k（k < n），也变精了，所以 new data 保留了 data 中绝大部分的最有效的信息，丝毫不影响后面根据特征实现分类。new data 的 shape 就应该是 m 行 k 列。

我设置的 k = 32，经常有资料写 PCA 中的 k 是 1 或者 2 就可以。并不是，要看具体情况，我这里算实验，就先取 32 了。毕竟这个数据集是手写数字的图片，在这里降成 1 或者 2 ，用脚趾头想一想也知道信息全都没了。

在 python 里面调用现有的库实现对 data 的 PCA 非常方便，下面就是用截断 SVD 来实现 PCA 的（不使用 SVD 也可以做 PCA，选择使用 SVD 时也可以选择是否要截断）：

from sklearn.decomposition import PCA
# 创建 PCA 实例，保留 32 个特征，svd_solver 指定为 'arpack'就是指用截断 svd 实现 PCA
pca = PCA(n_components=32, svd_solver='arpack')
new_data = pca.fit_transform(data)
# 打印下新数据集的 shape
print('new_data\'s shape: ', new_data.shape)
# 打印 PCA 过程中保留的 k = 32 个奇异值分量
print('singular values: ', pca.singular_values_)
# 打印一下这些奇异值的占比
print('singular values ratio: ', pca.explained_variance_ratio_)

# 前缀和计算奇异值占比变化，看看砍掉了一半特征之后，信息是不是还保留着
prefix = [n for n in pca.explained_variance_ratio_]
for i in range(1, len(prefix)):
    prefix[i] = prefix[i] + prefix[i - 1]
# 可以看到，最后累加和已经到 0.9663，可以理解为 96.63% 的信息都还保留着
# 所以 64 个特征确实包含冗余信息的，可以降维；也不是取前一两个特征来做降维，都要看具体情况的
print('ratios: ', prefix)

# 打印结果
new_data's shape:  (1797, 32)
singular values:  [567.0065665  542.25185421 504.63059421 426.11767608 353.3350328
 325.82036569 305.26158002 281.16033073 269.06978193 257.82395143
 226.31879719 221.5148324  198.33071545 195.70013887 177.9762712
 174.46079067 168.72787641 164.15849219 148.23330876 139.83132462
 138.58443271 131.1882069  128.72691665 124.93159016 122.57503405
 113.44487728 111.48027133 105.46348813 102.80780243  96.22856616
  89.81296469  87.33494649]
singular values ratio:  [0.14890594 0.13618771 0.11794594 0.08409979 0.05782415 0.0491691
 0.04315987 0.03661373 0.03353248 0.03078806 0.02372341 0.02272697
 0.01821863 0.01773855 0.01467101 0.01409716 0.01318589 0.01248138
 0.01017718 0.00905617 0.00889538 0.00797123 0.00767493 0.00722904
 0.00695889 0.00596081 0.00575615 0.00515158 0.0048954  0.00428888
 0.00373606 0.00353274]
ratios:  [0.14890593584063858, 0.28509364823699307, 0.40303958587675104, 0.48713938008684293, 0.5449635267268982, 0.5941326298981383, 0.6372925000063961, 0.6739062257772367, 0.707438706756908, 0.7382267688459535, 0.7619501772859846, 0.7846771429740802, 0.8028957761040322, 0.8206343254758512, 0.8353053364037443, 0.8494024924198313, 0.8625883844271056, 0.8750697626053612, 0.8852469422085768, 0.8943031165985267, 0.9031985012037216, 0.9111697327690573, 0.9188446653146946, 0.9260737010079616, 0.9330325895169714, 0.9389934040973575, 0.9447495509807581, 0.9499011267982518, 0.95479652456516, 0.9590854042457175, 0.9628214647289738, 0.9663542069634707]

把保留下来的奇异值画一下：

import matplotlib.pyplot as plt
plt.figure()
plt.plot(pca.singular_values_, 'k', linewidth=2)
plt.xlabel('n_components', fontsize=16)
plt.ylabel('singular_values_', fontsize=16)
plt.show()

在这里插入图片描述

嗯哼，这就已经做完 PCA 了……那里面究竟发生了什么啊？我们先预习一遍，然后一步步跳进去代码看一下。

预习： PCA 理论流程

我不在这里写很详细的理论了，仅仅为了对照代码方便，所以写个步骤而已。

对样本数量为 m 特征数量为 n 的矩阵 A （就是咱们上面的 data）做 PCA ，需要：
（1）先求取矩阵 A 的协方差矩阵 $X = A^TA$ ，shape 是 n * n，总之是特征数 × 特征数，因为你要降维（降的特征数），你要去的是特征空间。有的资料喜欢行列颠倒/(ㄒoㄒ)/~~；
（2）协方差矩阵 $X$ 是方阵，可以做特征分解， $\Sigma V^T$ ，这个 V 就是对矩阵 A 做 SVD 得到的右奇异矩阵呀， $A_{m * n} = U_{m * m}S_{m * n}V_{n * n}^T$ ；
（3）这里就要插播一下，截断 SVD 可以得到前 k 个重要的特征得到 $A_{m * n} \approx U_{m * k}S_{k * k}V_{k * n}^T$ （ $V^T 是 k × n$ ， V 是 n × k）；

（4）而咱们的目标 new A （就是咱们上面的 new data）可以通过 $newA_{m*k}= A_{m*n}V_{n*k} \approx U_{m * k}S_{k * k}V_{k * n}^TV_{n * k} = U_{m * k}S_{k * k}$

这几个步骤会在下面代码中注释出来，代码块中不方便做颜色标记，可以搜索“预习”来查看对应步骤。

跳一跳：python PCA （使用截断 SVD）实现

第一跳： `sklearn.decomposition.PCA`

关于这个 PCA 类的代码注释，SVD 应用时有两种情况，本篇举例是其中的第二种，会调用 scipy.sparse.linalg ARPACK 中的实现。

class PCA(_BasePCA):
	'''
	Principal component analysis (PCA).
	
	# 使用 SVD 降维
	Linear dimensionality reduction using Singular Value Decomposition of the
	data to project it to a lower dimensional space. The input data is centered
	but not scaled for each feature before applying the SVD.
	
	# （1）可以用 full SVD 也可以用 randomized truncated SVD
	It uses the LAPACK implementation of the full SVD or a randomized truncated
	SVD by the method of Halko et al. 2009, depending on the shape of the input
	data and the number of components to extract.
	
	# （2）还可以用 truncated SVD 本篇中的举例就是用这一种
	It can also use the scipy.sparse.linalg ARPACK implementation of the
	truncated SVD.'''

第二跳：`sklearn.decomposition.PCA.fit_transform()`

而对于 fit_transform 这个方法介绍如下。

def fit_transform(self, X, y=None):
	'''
	# 对输入 X 降维
	Fit the model with X and apply the dimensionality reduction on X.
	
	Parameters
	----------
	# 输入 X 行为样本量，列为特征
	X : array-like of shape (n_samples, n_features)
	    Training data, where n_samples is the number of samples
	    and n_features is the number of features.
	
	y : Ignored
	
	Returns
	-------
	# 返回的 X_new 样本量与输入 X 一致，特征数量变为 n_components 就是前面说的 k，我设置的是 32
	X_new : ndarray of shape (n_samples, n_components)
	    Transformed values.'''

这个 fit_transform 到底做了什么呢？

def fit_transform(self, X, y=None):
	# 先统一，输入 X shape 是 M × N
	# 做 SVD 分解，得到三个矩阵，分别是旋转、缩放和旋转矩阵，shape 分别是 M × k， k × k， k × N（这是 Vt 的 shape，不是 V 的）
	U, S, Vt = self._fit(X)
	# 左奇异矩阵，行数不变，列数方面取前 n_components （就是k = 32）个 U 中的特征向量
	U = U[:, :self.n_components_]
	
	# 没有设置 whiten 默认是 False，所以进入下一个分支
	if self.whiten:
	    # X_new = X * V / S * sqrt(n_samples) = U * sqrt(n_samples)
	    U *= sqrt(X.shape[0] - 1)
	# 进入这个分支
	else:
	    # X_new = X * V = U * S * Vt * V = U * S
	    # X_new shape 应该是 M × k，对应预习中的第（4）步
	    U *= S[:self.n_components_]
	
	return U	# 虽然还用的变量 U，但其实就是 X_new 了

第三跳：`sklearn.decomposition.PCA._fit()`

到此可以发现，核心的 SVD 那一行U, S, Vt = self._fit(X)我们还是没看到细节，再跳进去可以发现，它会根据最前面定义好的执行 SVD 的模式来进行 SVD，因为我们选的截断 SVD，这里返回的就是截断 SVD 的结果。

def _fit(self, X):
	'''
	这里有很多选择分支等，不再贴了，按照我们选择的截断 SVD 就是下面这两行
	'''
        elif self._fit_svd_solver in ['arpack', 'randomized']:
            return self._fit_truncated(X, n_components, self._fit_svd_solver)

所以我们要跳到 _fit_truncated() 中了

第四跳：`sklearn.decomposition.PCA._fit_truncated()`

下面就是 _fit_truncated() ，为了方便看，我删掉了很多与 U S V 不相关的其他信息计算。可以看到：
（1）这里对输入数据做了个中心化，减去了均值（但是没有做除以方差，我也没指定，所以就没有）。
（2）又调用了 svds 来实现 SVD，也就是这个 svds 函数的输入得需要先做 center。
（3）但是上面这步的结果它不是按照咱们平时学习的按照特征值大小从大到小排的，它正好反过来，从小到大排的，所以 S U V 三个矩阵都是倒序再返回的，但是这里的 U S V 已经是被截断的啦，不信就跳下一步看嘻嘻。

    def _fit_truncated(self, X, n_components, svd_solver):
        """Fit the model by computing truncated SVD (by ARPACK or randomized)
        on X.
        """
        n_samples, n_features = X.shape
        random_state = check_random_state(self.random_state)

        # Center data
        self.mean_ = np.mean(X, axis=0)
        X -= self.mean_

		# 进入这个分支
        if svd_solver == 'arpack':
            v0 = _init_arpack_v0(min(X.shape), random_state)
            U, S, Vt = svds(X, k=n_components, tol=self.tol, v0=v0)
            # svds doesn't abide by scipy.linalg.svd/randomized_svd
            # conventions, so reverse its outputs.
            S = S[::-1]
            # flip eigenvectors' sign to enforce deterministic output
            U, Vt = svd_flip(U[:, ::-1], Vt[::-1])
		# 这个不符合我们设定的截断 SVD 所以可以先不看
        elif svd_solver == 'randomized':
            # sign flipping is done inside
            U, S, Vt = randomized_svd(X, n_components=n_components,
                                      n_iter=self.iterated_power,
                                      flip_sign=True,
                                      random_state=random_state)
		# 对应前面预习中的第（2）步，因为代码我们选择的截断 SVD，所以应该是（2）～（3）步
        return U, S, Vt

跳 8 动了就先到这里了：`scipy.sparse.linalg.svds`

跳到这里要累洗人了呀，下面是 svds 的说明，我删掉了很多非位置变量的说明，只留下了输入矩阵 A、要保留的特征数目 k，还有返回的三个矩阵的说明。

def svds(A, k=6, ncv=None, tol=0, which='LM', v0=None,
         maxiter=None, return_singular_vectors=True):
        
    """
    # 这里就说了计算最大的 k 个奇异值和向量，要求输入是 sparse matrix
    Compute the largest k singular values/vectors for a sparse matrix.

    Parameters
    ----------
    A : {sparse matrix, LinearOperator}
        Array to compute the SVD on, of shape (M, N)
    k : int, optional
        Number of singular values and vectors to compute.
        Must be 1 <= k < min(A.shape).
    Returns
    -------
    u : ndarray, shape=(M, k)
        Unitary matrix having left singular vectors as columns.
        If `return_singular_vectors` is "vh", this variable is not computed,
        and None is returned instead.
    s : ndarray, shape=(k,)
        The singular values.
    vt : ndarray, shape=(k, N)
        Unitary matrix having right singular vectors as rows.
        If `return_singular_vectors` is "u", this variable is not computed,
        and None is returned instead."""

关于其中的代码，我截取了比较关键的部分如下：

def svds(A, k=6, ncv=None, tol=0, which='LM', v0=None,
         maxiter=None, return_singular_vectors=True):
	# 很反人类啊这里，说明里面还在用 M 行 N 列呢，代码里面就用 n 行 m 列了……
	n, m = A.shape
	# 咱们进入这个分支
    if n > m:
        X_dot = X_matmat = A.dot
        XH_dot = _herm(A).dot
    else:
        XH_dot = A.dot
        X_dot = X_matmat = _herm(A).dot

    def matvec_XH_X(x):
        return XH_dot(X_dot(x))

	# 总之咱们这里得到的是 A 的转置 乘以 A，对应我们前面预习的第（1）步
    XH_X = LinearOperator(matvec=matvec_XH_X, dtype=A.dtype,
                          shape=(min(A.shape), min(A.shape)))

    # Get a low rank approximation of the implicitly defined gramian matrix.
    # This is not a stable way to approach the problem.

    # 然后对 A 的转置 乘以 A 计算特征值
    eigvals, eigvec = eigsh(XH_X, k=k, tol=tol ** 2, maxiter=maxiter,
                                  ncv=ncv, which=which, v0=v0)
	# 这里还有一些计算过程
    return u, s, vh

Abandon_first

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
糖葫芦不做了，我们来跳一跳吧 O(∩_∩)O~~ PCA + 截断 SVD （含 python 实现及代码细节梳理）

再来一遍（1）如果我们有 m 条数据，每条数据有 n 个特征，我们会把整份数据描述为一个 m × n 的矩阵，m 行 n 列。（2）提到向量，除非特意点名，否则默认就是列向量，比如这样写 x=(2,1,3)Tx = (2, 1, 3)^Tx=(2,1,3)T（转置转置转置啊）。写在前面本篇重点说一下截断 SVD 和 PCA。会穿插一些代码来说明，有些自己的理解没有写在正文，而是添加在代码注释里面了。很多细节也并不会做，比如白化等。PCA 是用来做降维的，所以讨论 PCA 细节之前应该先确定真的需要
复制链接

扫一扫