sklearn之PCA

最新推荐文章于 2023-01-31 00:21:01 发布

thereisnospoon.

最新推荐文章于 2023-01-31 00:21:01 发布

阅读量481

点赞数

分类专栏： Sklearn

本文链接：https://blog.csdn.net/weixin_45580742/article/details/104474194

版权

PCA（主成分分析）是数据降维的重要工具。`sklearn`中的PCA提供了components_、explained_variance_ratio_和explained_variance_等属性来了解降维效果。n_components参数可以通过学习曲线、信息占比、特征需求或最大似然估计来选择。inverse_transform虽然不能完全恢复原始数据，但在高维空间中可以过滤噪声。PCA的svd_solver参数影响矩阵分解方式，有四种模式可选。博客还涵盖了PCA的参数、属性和接口列表。

摘要由CSDN通过智能技术生成

属性components_，

输出降维后新的特征空间

print(PCA(2).fit(x).components_) # svd求出的新的特征空间

属性explained_variance_ratio_，

查看降维后每个新特征向量所占的信息量占原始数据总信息量的百分比，又叫做可解释方差贡献率

print(PCA(2).fit(x).explained_variance_ratio_)
# array([0.92461872, 0.05306648])

属性explained_variance_

查看降维后每个新特征向量上所带的信息量大小（可解释性方差的大小）

print(PCA(2).fit(x).explained_variance_)
# array([4.22824171, 0.24267075])

重要参数n_components

1.通过学习曲线选超参数

from sklearn.model_selection import cross_val_score
from sklearn.decomposition import PCA
from  matplotlib import pyplot as plt
import numpy as np

# n_components 参数选择

# 1.学习曲线选超参数
pca_line = PCA().fit(x) # n_components不填 默认转换向量空间不降维
# cumsum 累加
plt.plot