1. 随机生成3行*40列的数据集,每一列代表一个样本,前20列属于类1,后20列属于类2;每一个样本特征长度为3;
2. 计算每行均值;
3. 计算协方差矩阵,产生一个3行*3列的矩阵;
4. 由协方差矩阵计算特征向量和特征值;
5. 按降序排序特征值和特征向量;
6. 选择第一主成分和第二主成分组成一个新的3行*2列的矩阵;
7. 根据产生的3行*2列矩阵重建原有数据集。
Python代码如下:
# reference: http://sebastianraschka.com/Articles/2014_pca_step_by_step.html
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import proj3d
from matplotlib.patches import FancyArrowPatch
# 1. generate 40 3-dimensional samples randomly drawn from a multivariate Gaussian distribution
np.random.seed(1) # random seed for consistency
mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20).T
assert class1_sample.shape == (3,20), "The matrix has not the dimensions 3x20"
#print("class1_sample:\n", class1_sample)
mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_