PCA计算原特征（指标）对主成分的贡献量/权重

最新推荐文章于 2023-08-23 14:52:09 发布

Mepleleo

最新推荐文章于 2023-08-23 14:52:09 发布

阅读量1.1w

点赞数 14

分类专栏：深度学习文章标签： sklearn python 机器学习 pca降维

本文链接：https://blog.csdn.net/lzw790222124/article/details/120262798

版权

深度学习专栏收录该内容

3 篇文章 1 订阅

订阅专栏

1. 用PCA反推原始特征对主成分的贡献量或权重

使用python的sklearn包中的pca函数

# -*- coding: utf-8 -*-
import os
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn import preprocessing
np.set_printoptions(suppress=True)

def normalization(data):
    _range = np.max(data) - np.min(data)
    return (data - np.min(data)) / _range


# np.random.seed(3)
# data = np.random.randint(10,size=(100,4))/10
# 数据导入，生成随机数
csv_file = r'D:\research\tmp.xlsx'
csv_data = pd.read_excel(csv_file,sheet_name='Sheet1')
data = np.array(pd.DataFrame(csv_data))
data = data[:, :4]
# data = normalization(data)
print(data.shape)
# 主成分分析建模
pca = PCA(n_components='mle')  # 将自动选取主成分个数n，使得满足所要求的方差百分比
# pca = PCA(n_components=3)  # n_components提取因子数量
# pca = PCA(n_components=None)  # n_components提取因子数量,返回所有主成分
pca.fit(data)
print('特征根',pca.explained_variance_)  # 贡献方差，即特征根
# print(pca.explained_variance_ratio_)  # 方差贡献率
print('pca:',pca.components_, pca.components_.shape)  # 成分矩阵,sklearn已经除过pca.explained_variance_，不需要再次处理
# k1_spss = pca.components_ / np.sqrt(pca.explained_variance_.reshape(-1, 1))  # 成分得分系数矩阵
# pca.components_ 行是成分，列是特征，这里进行转置，方便后边计算
k1_spss = pca.components_.T 
# print(k1_spss)

有个问题，spss里面一般会默认使用两个主成分，但实际上pca可以提取≤特征个主成分，所以在计算的时候pca的主成分数量怎么确定呢？是使用所有主成分、自动选、还是指定数量？这篇博文 http://blog.sina.com.cn/s/blog_a032adb90101k47u.html 说的是，特征根＞1，累计方差贡献率≥80%，但是没有出处

# 确定权重
# 求指标在不同主成分线性组合中的系数
weight = (np.dot(k1_spss, pca.explained_variance_ratio_)) / np.sum(pca.explained_variance_ratio_)
print('weight:',weight)
# 
weighted_weight = weight/np.sum(weight)
print('weighted_weight:', weighted_weight)

# 
# https://blog.csdn.net/weixin_39782545/article/details/112216185?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_baidulandingword~default-0.no_search_link&spm=1001.2101.3001.4242
# 参考以上两篇文章，确定权重这不是矩阵点乘么？下面博主的代码好像有问题
# # 原博主代码 https://blog.csdn.net/weixin_43166884/article/details/109363740
# # 确定权重
# # 求指标在不同主成分线性组合中的系数
# j = 0
# Weights = []
# for j in range(len(k1_spss)):
#     for i in range(len(pca.explained_variance_)):
#         Weights_coefficient = np.sum(100 * (pca.explained_variance_ratio_[i]) * (k1_spss[i][j])) / np.sum(
#             pca.explained_variance_ratio_)
#     j = j + 1
#     Weights.append(np.float(Weights_coefficient))
# print('Weights',Weights)
# # weighted_weight = np.round(Weights/np.sum(Weights), decimals=4)
# # print(weighted_weight)

# Weights=pd.DataFrame(Weights)
# Weights1 = preprocessing.MinMaxScaler().fit(Weights)
# Weights2 = Weights1.transform(Weights)
# print('Weights2',Weights2)

不对之处，敬请指正。

Mepleleo

关注

14
点赞
踩
54

收藏

觉得还不错? 一键收藏
8
评论
PCA计算原特征（指标）对主成分的贡献量/权重

1. 用PCA反推原始特征对主成分的共享量或权重使用sklearn包中的pca函数# -*- coding: utf-8 -*-import osimport numpy as npimport pandas as pdfrom sklearn.decomposition import PCAfrom sklearn import preprocessingnp.set_printoptions(suppress=True)def normalization(data): _ra
复制链接

扫一扫

专栏目录