sklearn矩阵分解类库学习

sklearn.decomposition模块提供矩阵分解算法、其他PCA、NMF 或ICA,其中大部分算法都被视为降维技术。

①主成分分析:sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=’auto’, tol=0.0, iterated_power=’auto’, random_state=None)

主要参数说明:

n_components:参数主要用于指定保留的特征个数,其数据类型为整数、浮点数、None或字符型。若n_components为None时,表示保留所有特征;若n_components为整数时,表示保留的特征个数;若n_components为浮点数时,表示保留后特征的方差之和占所有特征方差的最小阈值;若n_components = ‘mle’ and svd_solver = ‘full’时,该算法会用MLE算法去选择保留的特征。

whiten:表示对保留后的特征数据是否进行标准化(转化成特征方差都为1)标识

svd_solver : SVD分解方式,可选项‘auto’, ‘full’, ‘arpack’, ‘randomized’

构建简单例子

In [1]: import numpy as np
   ...: import matplotlib.pyplot as plt
   ...: from mpl_toolkits.mplot3d import Axes3D
   ...: from sklearn.datasets.samples_generator import make_blobs
   ...: X, y = make_blobs(n_samples=10000, n_features=3, centers=[[3,3, 3], [0,
   ...: 0,0], [1,1,1], [2,2,2]], cluster_std=[0.2, 0.1, 0.2, 0.2],
   ...:                   random_state =9)
   ...: fig = plt.figure()
   ...: ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20)
   ...: plt.scatter(X[:, 0], X[:, 1], X[:, 2],marker='o')
   ...: plt.show()
   ...:



利用PCA训练数据情况:

a、n_components=None,保留所有特征

In [2]: from sklearn.decomposition import PCA
   ...: pca = PCA()
   ...: pca.fit(X)
   ...: print(pca.n_components_)
   ...:
3
训练后,观察三个特征的方差及方差比

In [3]: pca.explained_variance_
Out[3]: array([ 3.78483785,  0.03272285,  0.03201892])

In [4]: pca.explained_variance_ratio_
Out[4]: array([ 0.98318212,  0.00850037,  0.00831751])
b、n_components为整数M,若M小于X的特征总数,则挑选前M个方差大的特征

In [5]: from sklearn.decomposition import PCA
   ...: pca = PCA(n_components=2)#保留2个特征值
   ...: pca.fit(X)
   ...: print(pca.explained_variance_)
   ...: print(pca.explained_variance_ratio_)
   ...:
[ 3.78483785  0.03272285]
[ 0.98318212  0.00850037]
c、n_components为浮点数,选择特征方差占比大于阈值n_components的最大特征方差且特征个数最小

In [6]: pca = PCA(n_components=0.006)
   ...: pca.fit(X)
   ...: print(pca.explained_variance_)
   ...: print(pca.explained_variance_ratio_)
   ...: print(pca.n_components_)
   ...:
[ 3.78483785]
[ 0.98318212]
1
d、n_components为mle时,svd_solver参数必须为full,否则报错

In [7]: pca = PCA(n_components='mle',svd_solver='full')
   ...: pca.fit(X)
   ...: print(pca.explained_variance_)
   ...: print(pca.explained_variance_ratio_)
   ...: print(pca.n_components_)
   ...:
[ 3.78483785]
[ 0.98318212]
1

In [8]: pca = PCA(n_components='mle',svd_solver='arpack')
   ...: pca.fit(X)
   ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-b62bafac46ff> in <module>()
      1 pca = PCA(n_components='mle',svd_solver='arpack')
----> 2 pca.fit(X)

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in fit(self, X
, y)
    305             Returns the instance itself.
    306         """
--> 307         self._fit(X)
    308         return self
    309

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit(self,
X)
    368             return self._fit_full(X, n_components)
    369         elif svd_solver in ['arpack', 'randomized']:
--> 370             return self._fit_truncated(X, n_components, svd_solver)
    371
    372     def _fit_full(self, X, n_components):

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit_trunca
ted(self, X, n_components, svd_solver)
    433             raise ValueError("n_components=%r cannot be a string "
    434                              "with svd_solver='%s'"
--> 435                              % (n_components, svd_solver))
    436         elif not 1 <= n_components <= n_features:
    437             raise ValueError("n_components=%r must be between 1 and "

ValueError: n_components='mle' cannot be a string with svd_solver='arpack'

In [9]: pca = PCA(n_components='mle',svd_solver='randomized')
   ...: pca.fit(X)
   ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-1f9c5b9ac3af> in <module>()
      1 pca = PCA(n_components='mle',svd_solver='randomized')
----> 2 pca.fit(X)

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in fit(self, X
, y)
    305             Returns the instance itself.
    306         """
--> 307         self._fit(X)
    308         return self
    309

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit(self,
X)
    368             return self._fit_full(X, n_components)
    369         elif svd_solver in ['arpack', 'randomized']:
--> 370             return self._fit_truncated(X, n_components, svd_solver)
    371
    372     def _fit_full(self, X, n_components):

d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit_trunca
ted(self, X, n_components, svd_solver)
    433             raise ValueError("n_components=%r cannot be a string "
    434                              "with svd_solver='%s'"
--> 435                              % (n_components, svd_solver))
    436         elif not 1 <= n_components <= n_features:
    437             raise ValueError("n_components=%r must be between 1 and "

ValueError: n_components='mle' cannot be a string with svd_solver='randomized'

In [10]: pca = PCA(n_components='mle')
    ...: pca.fit(X)
    ...:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-92060cf30409> in <module>()
      1 pca = PCA(n_components='mle')
----> 2 pca.fit(X)


d:\softwore\python\lib\site-packages\sklearn\decomposition\pca.py in _fit(self,
X)
    358             if max(X.shape) <= 500:
    359                 svd_solver = 'full'
--> 360             elif n_components >= 1 and n_components < .8 * min(X.shape):

    361                 svd_solver = 'randomized'
    362             # This is also the case of n_components in (0,1)

TypeError: unorderable types: str() >= int()














  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值