py实现PCA降维wine葡萄酒数据，标准化和不标准化代码（无框架）

最新推荐文章于 2023-11-01 10:19:28 发布

D___

最新推荐文章于 2023-11-01 10:19:28 发布

阅读量1.2k

点赞数 3

分类专栏：模式识别文章标签：模式识别 pca降维

本文链接：https://blog.csdn.net/jirong5206/article/details/106199644

版权

模式识别专栏收录该内容

10 篇文章 2 订阅

订阅专栏

PCA代码（wine数据）

（注意：np.linalg.eig函数求出的特征值好像不是从大到小排列的，但一一对应特征向量，而且特征向量是每一列，不是每一行！！！！！）

数据未标准化的PCA

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
'''***************************************************************
 * @Fun_Name    : def getSample(fileName)
 * @Function    : 获取文件内的样本,存入列表
 * @Parameter   : 文件名
 * @Return      : 样本特征 标签
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 16:08 2020/5/17***'''
def getSample(fileName):
    dataSet = pd.read_csv(fileName,header=None).values   # 将字典转化为列表形式,不要设表头，默认第一行为表头，会删除表头
    labels = dataSet[:,0]
    feature = dataSet[:,1:14]
    return labels,feature

'''***************************************************************
 * @Fun_Name    : def reduceMean(feature):
 * @Function    : 特征去中心化
 * @Parameter   : 特征矩阵
 * @Return      : 去中心化后的特征矩阵
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 19:54 2020/5/17***'''
def reduceMean(feature):
    featureMean =  np.mean(feature,axis=0)               # 求均值
    featureDeal = feature - featureMean                  # 去均值后的特征
    return featureDeal
'''***************************************************************
 * @Fun_Name    : def getC(featureDeal):
 * @Function    : 得到C矩阵
 * @Parameter   : 去中心化后的特征矩阵
 * @Return      : C
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 20:03 2020/5/17***'''
def getC(featureDeal):
    m,n = np.shape(featureDeal)
    featureDeal = np.mat(featureDeal)
    C = (featureDeal.T * featureDeal)/m;
    return C
'''***************************************************************
 * @Fun_Name    : def getFeatureValuesVector(C):
 * @Function    : 得到C的特征值特征向量
 * @Parameter   : C n降到几维
 * @Return      : 因为是降到二维 返回前二大的特征向量
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 20:14 2020/5/17***'''
def getFeatureValuesVector(C,n):
    featureValues,featureVector = np.linalg.eig(C)
    return (featureVector[:,0:n])                    #这边注意下我print过特征值已经从大到小排好了，所以对应最大的特征向量已排好，直接取前两个

label,feature = getSample('wine.txt')
featureDeal = reduceMean(feature)
C = getC(featureDeal)
featureVector = getFeatureValuesVector(C,2)
Coord = np.mat(featureDeal)*np.mat(featureVector)
plt.scatter(Coord[0:59,0].tolist(),Coord[0:59,1].tolist(),color = "b")
plt.scatter(Coord[59:130,0].tolist(),Coord[59:130,1].tolist(),color = "r")
plt.scatter(Coord[130:178,0].tolist(),Coord[130:178,1].tolist(),color = "g")
plt.show()
# print(Coord)

效果
在这里插入图片描述

数据标准化后的PCA：

数据标准化意义和方法：
https://www.cnblogs.com/fonttian/p/9162822.html

把上面程序中reduceMean替换成这个：

def reduceMean(feature):
    # 数据标准化
    featureMean =  np.mean(feature,axis=0)
    featureStd  =  np.std(feature,axis= 0)
    featureDeal = (feature - featureMean)/featureStd
    return featureDeal

效果：
在这里插入图片描述
有个理论地方我有点模糊，希望路过的大佬帮我解答下

Coord = np.mat(featureDeal)*np.mat(featureVector)

这一步是将向量投影到二维空间，这个向量为什么不是原始向量，而是标准化后的向量。

D___

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
py实现PCA降维wine葡萄酒数据，标准化和不标准化代码（无框架）

PCA代码（wine数据）（注意：np.linalg.eig函数求出的特征值从大到小排列，且一一对应特征向量，但是特征向量是每一列，不是每一行！！！！！）数据未标准化的PCAimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt'''*************************************************************** * @Fun_Name : def getSam
复制链接

扫一扫