sklearn矩阵分解类库学习

最新推荐文章于 2021-04-16 13:49:37 发布

每天进步一点点2017

最新推荐文章于 2021-04-16 13:49:37 发布

阅读量2.8k

点赞数 1

分类专栏： sklearn 文章标签： sklearn 矩阵分解降维技术

sklearn 专栏收录该内容

25 篇文章 6 订阅

订阅专栏

sklearn.decomposition模块提供矩阵分解算法、其他PCA、NMF 或ICA，其中大部分算法都被视为降维技术。

①主成分分析：sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=’auto’, tol=0.0, iterated_power=’auto’, random_state=None)

主要参数说明：

n_components：参数主要用于指定保留的特征个数，其数据类型为整数、浮点数、None或字符型。若n_components为None时，表示保留所有特征；若n_components为整数时，表示保留的特征个数；若n_components为浮点数时，表示保留后特征的方差之和占所有特征方差的最小阈值；若n_components = ‘mle’ and svd_solver = ‘full’时，该算法会用MLE算法去选择保留的特征。

whiten：表示对保留后的特征数据是否进行标准化(转化成特征方差都为1)标识

svd_solver : SVD分解方式，可选项‘auto’, ‘full’, ‘arpack’, ‘randomized’

构建简单例子

In [1]: import numpy as np
   ...: import matplotlib.pyplot as plt
   ...: from mpl_toolkits.mplot3d import Axes3D
   ...: from sklearn.datasets.samples_generator import make_blobs
   ...: X, y = make_blobs(n_samples=10000, n_features=3, centers=[[3,3, 3], [0,
   ...: 0,0], [1,1,1], [2,2,2]], cluster_std=[0.2, 0.1, 0.2, 0.2],
   ...:                   random_state =9)
   ...: fig = plt.figure()
   ...: ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20)
   ...: plt.scatter(X[:, 0], X[:, 1], X[:, 2],marker='o')
   ...: plt.show()
   ...:

利用PCA训练数据情况：

a、n_components=None,保留所有特征

In [2]: from sklearn.decomposition import PCA
   ...: pca = PCA()
   ...: pca.fit(X)
   ...: print(pca.n_components_)
   ...:
3

训练后，观察三个特征的方差及方差比

In [3]: pca.explained_variance_
Out[3]: array([ 3.78483785,  0.03272285,  0.03201892])

In [4]: pca.explained_variance_ratio_
Out[4]: array([ 0.98318212,  0.00850037,  0.00831751])

b、n_components为整数M，若M小于X的特征总数，则挑选前M个方差大的特征

In [5]: from sklearn.decomposition import PCA
   ...: pca = PCA(n_components=2)#保留2个特征值
   ...: pca.fit(X)
   ...: print(pca.explained_variance_)
   ...: print(pca.explained_variance_ratio_)
   ...:
[ 3.78483785  0.03272285]
[ 0.98318212  0.00850037]

c、n_components为浮点数，选择特征方差占比大于阈值n_components的最大特征方差且特征个数最小

In [6]: pca = PCA(n_components=0.006)
   ...: pca.fit(X)
   ...: print(pca.explained_variance_)
   ...: print(pca.explained_variance_ratio_)
   ...: print(pca.n_components_)
   ...:
[ 3.78483785]
[ 0.98318212]
1

d、n_components为mle时，svd_solver参数必须为full，否则报错

In [7]: pca = PCA(n_components='mle',svd_solver='full')
   ...: pca.fit(X)
   ...: print(pca.explained_variance_)
   ...: print(pca.explained_variance_ratio_)
   ...: print(pca.n_components_)
   ...:
[ 3.78483785]
[ 0.98318212]
1

In [8]: pca = PCA(n_components='mle',svd_solver='arpack')
   ...: pca.fit(X)
   ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-b62bafac46ff> in <module>()
      1 pca = PCA(n_components='mle',svd_solver='arpack')
----> 2 pca.fit(X)

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in fit(self, X
, y)
    305             Returns the instance itself.
    306         """
--> 307         self._fit(X)
    308         return self
    309

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit(self,
X)
    368             return self._fit_full(X, n_components)
    369         elif svd_solver in ['arpack', 'randomized']:
--> 370             return self._fit_truncated(X, n_components, svd_solver)
    371
    372     def _fit_full(self, X, n_components):

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit_trunca
ted(self, X, n_components, svd_solver)
    433             raise ValueError("n_components=%r cannot be a string "
    434                              "with svd_solver='%s'"
--> 435                              % (n_components, svd_solver))
    436         elif not 1 <= n_components <= n_features:
    437             raise ValueError("n_components=%r must be between 1 and "

ValueError: n_components='mle' cannot be a string with svd_solver='arpack'

In [9]: pca = PCA(n_components='mle',svd_solver='randomized')
   ...: pca.fit(X)
   ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-1f9c5b9ac3af> in <module>()
      1 pca = PCA(n_components='mle',svd_solver='randomized')
----> 2 pca.fit(X)

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in fit(self, X
, y)
    305             Returns the instance itself.
    306         """
--> 307         self._fit(X)
    308         return self
    309

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit(self,
X)
    368             return self._fit_full(X, n_components)
    369         elif svd_solver in ['arpack', 'randomized']:
--> 370             return self._fit_truncated(X, n_components, svd_solver)
    371
    372     def _fit_full(self, X, n_components):

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit_trunca
ted(self, X, n_components, svd_solver)
    433             raise ValueError("n_components=%r cannot be a string "
    434                              "with svd_solver='%s'"
--> 435                              % (n_components, svd_solver))
    436         elif not 1 <= n_components <= n_features:
    437             raise ValueError("n_components=%r must be between 1 and "

ValueError: n_components='mle' cannot be a string with svd_solver='randomized'

In [10]: pca = PCA(n_components='mle')
    ...: pca.fit(X)
    ...:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-92060cf30409> in <module>()
      1 pca = PCA(n_components='mle')
----> 2 pca.fit(X)


d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit(self,
X)
    358             if max(X.shape) <= 500:
    359                 svd_solver = 'full'
--> 360             elif n_components >= 1 and n_components < .8 * min(X.shape):

    361                 svd_solver = 'randomized'
    362             # This is also the case of n_components in (0,1)

TypeError: unorderable types: str() >= int()