人脸特征降维

最新推荐文章于 2024-06-21 12:33:31 发布

HuangAnthony92

最新推荐文章于 2024-06-21 12:33:31 发布

阅读量226

点赞数 4

分类专栏：机器学习与算法文章标签： python 开发语言机器学习矩阵

本文链接：https://blog.csdn.net/HuangAnthony92/article/details/139118787

版权

机器学习与算法专栏收录该内容

7 篇文章 0 订阅

订阅专栏

#特征降维
from time import time
from numpy.random import RandomState
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
from sklearn import decomposition
from sklearn.cluster import MiniBatchKMeans

n_row, n_col = 2, 3
n_components = n_row * n_col
image_shape= (64, 64)
rng = RandomState(0)
#########################################################
faces,_= fetch_olivetti_faces(return_X_y=True,shuffle=True,random_state=rng)#获取数据fetch_olivetti_faces()
n_samples, n_features = faces.shape
faces_centered = faces - faces.mean(axis=0)
print(len(faces_centered))

#加载centering
faces_centered -= faces_centered.mean(axis=1).reshape(n_samples,-1)

def plot_gallery(titles,images,n_row=n_row, n_col=n_col,cmap=plt.cm.gray):#显示图片画廊
    """Helper function to plot a gallery of portraits"""
    plt.figure(figsize=(2.4 * n_col, 2.6 * n_row))
    plt.suptitle(titles,size=18)
    for i,comp in enumerate(images):
        plt.subplot(n_row, n_col, i + 1)
        """Helper function to plot a gallery of portraits"""
        vmax = max(comp.max(),-comp.min())
        plt.imshow(comp.reshape(image_shape),cmap=cmap,interpolation='nearest',vmin=-vmax,vmax=vmax)#显示图片显示图片
        plt.xticks(())
        plt.yticks(())
    plt.subplots_adjust(0.01,0.05,0.99,0.93,0.04,0.)#调整图片显示位置显示图片

#不同的估计器列表
estimators = [
    ('Principal Components',decomposition.PCA(n_components=n_components,svd_solver="randomized",whiten=True),True),
     ('NMF',decomposition.NMF(n_components=n_components,init='nndsvda',tol=5e-3),False),
     ('FastICA',decomposition.FastICA(n_components=n_components,whiten='arbitrary-variance',tol=0.01, max_iter=1000),True),
   # ('MinBatchSparsePCA',decomposition.MiniBatchSparsePCA(n_components=n_components,alpha=0.8,batch_size=3,random_state=rng),True),
    ]
####################################################################################################################
#绘制输入数据样本
plot_gallery('Original data',faces_centered[:n_components])

#绘制不同的估计器
for name,estimators,center in estimators:
    print("Extracting the top %s %s..." % (n_components,name))
    t0 = time()
    data =faces
    if center:
        data = faces_centered
    estimators.fit(data)
    train_time = (time() - t0)
    print("done in %0.3fs" % train_time)
    if hasattr(estimators,'cluster_centers_'):
        components = estimators.cluster_centers_
    else:
        components_ = estimators.components_
    #绘制由估计器提供的像素方差图像，如果是标量则被跳过
    if (hasattr(estimators,'noise_variance_') and estimators.noise_variance_ > 0):
        pass
    plot_gallery("%s-Train time %.1fs" % (name,train_time),components_[:n_components])
plt.show()

主成分分析

数据为AT&T下载的Olivetti人脸数据，数据集有40个类别，样本数量400，每张图片的维度是64*64=4096，特征值为0-1之间的实数。

PCA（Principal Component Analysis，主成分分析）是一种统计方法和降维技术，用于将高维数据集转换为一组新的、相互正交的变量，称为主成分。这种方法能够揭示数据中的主要变化方向，同时尽量减少信息损失。PCA的核心目标是找到一个低维表示，使得数据的方差最大化，从而捕获数据的主要特征。
基本原理
均值化：首先对原始数据进行去中心化处理，即减去各维度的均值，使得数据集的每个特征的均值为0，确保后续处理集中在变量间的协方差上。

计算协方差矩阵：去中心化后的数据用来计算协方差矩阵。协方差矩阵反映了数据中各维度之间的相关性。

计算特征值和特征向量：对协方差矩阵进行特征分解，得到一系列特征值和对应的特征向量。特征值表示了数据在相应特征向量方向上的方差大小，而特征向量则指明了这些方向。

选择主成分：按特征值从大到小排序，选取前k个最大的特征值对应的特征向量作为主成分。这些主成分构成了一个新的坐标系，其中每个主成分都是原始变量的线性组合，且彼此正交。

数据转换：将原始数据投影到这k个主成分构成的新空间中，完成数据的降维。这个转换过程可以通过计算原始数据与各主成分向量的内积实现。

作用
降维：减少数据的复杂性，降低计算成本，同时尽可能保留数据的重要结构。
去除噪声：通过集中数据的主要变异性，可以减少由噪声引起的次要变化的影响。
数据可视化：将高维数据映射到二维或三维空间，便于可视化分析。
特征提取：在机器学习中，作为预处理步骤，帮助模型更好地理解和学习数据的内在结构。
注意事项
PCA假设数据是线性的，并且对原始特征的尺度敏感，因此通常需要标准化数据。
主成分的选择数量k是一个需要根据实际问题和数据分析来确定的参数。
PCA可能会丢失一些对于分类或聚类任务可能重要的非线性关系或局部特征。
PCA作为一种经典的数据分析工具，在模式识别、图像处理、信号处理等多个领域有着广泛的应用。