python实现主成分估计

最新推荐文章于 2024-09-09 22:07:59 发布

chenlei456

最新推荐文章于 2024-09-09 22:07:59 发布

阅读量2.8k

点赞数 1

文章标签： python 数据分析回归

本文链接：https://blog.csdn.net/chenlei456/article/details/123739956

版权

什么是PCA

主成分分析的主要目的是希望用较少的变量去解释原来资料中的大部分变异，将我们手中许多相关性很高的变量转化成彼此相互独立或不相关的变量。通常是选出比原始变量个数少，能解释大部分资料中的变异的几个新变量，即所谓主成分，并用以解释资料的综合性指标。由此可见，主成分分析实际上是一种降维方法。

数据

在这里插入图片描述

代码

import statsmodels.api as sm
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np

查看相关系数阵

data = pd.read_excel("数据地址")
x = data.iloc[:,1:5]
cor = x.corr()
print(cor)

请添加图片描述

cor = np.array(cor)
w,v = np.linalg.eig(cor)
print(w)  ###  相关系数阵的特征值
print(v)  ###  对应的特征向量

在这里插入图片描述
可以看出最后一个特征值接近于零，前三个特征值之和所占比例（累积贡献率）达到0.999594。

对数据直接作线性回归得经验回归方程

X = sm.add_constant(x)
y = data.iloc[:,5]
est = sm.OLS(y,X).fit()
print(est.summary())

在这里插入图片描述

相关参数解析在前面看查看前面博客
https://blog.csdn.net/chenlei456/article/details/123508900
所以我们得出来得回归方程为:
在这里插入图片描述

est.predict(X)

用所得方程对数据进行预测

对数据使用PCA

x = data.iloc[:,1:5]
pca = PCA(n_components=4) ### 依然保留4个成分,不降维
pca.fit(x) ### pca训练
x_train = pca.transform(x) ### 对预测数据转换

用转换的数据得出主成分回归方程

X_train = sm.add_constant(x_train)
y = data.iloc[:,5]
est2 = sm.OLS(y,X_train).fit()
print(est2.summary())

在这里插入图片描述
所以我们得出来得回归方程为:

est2.predict(X)

用所得方程对数据进行预测

差异

两个方程的区别在于后者具有更小的均方误差，因而更稳定。通过P值可发现前者所有系数都无法通过显著性检验。

完整代码

import statsmodels.api as sm
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np

data = pd.read_excel("数据地址")
x = data.iloc[:,1:5]
y = data.iloc[:,5]

cor = x.corr()
cor = np.array(cor)
w,v = np.linalg.eig(cor)

X = sm.add_constant(x)
est = sm.OLS(y,X).fit()


pca = PCA(n_components=4)
pca.fit(x)
x_train = pca.transform(x)
X_train = sm.add_constant(x_train)
est2 = sm.OLS(y,X_train).fit()

print(w)  ###  相关系数阵的特征值
print(v)  ###  对应的特征向量
print(est.summary())
print(est2.summary())