特征工程:特征提取和降维-下

Cosophia

于 2024-02-08 00:33:34 发布

阅读量857

点赞数 29

分类专栏：数据探索与可视化机器学习文章标签：人工智能数据分析 python 算法

本文链接：https://blog.csdn.net/cosophia/article/details/136075368

版权

数据探索与可视化同时被 2 个专栏收录

17 篇文章 0 订阅

订阅专栏

机器学习

14 篇文章 0 订阅

订阅专栏

本文介绍了流形学习、t-SNE和多维尺度分析在数据降维中的应用，通过实例展示了如何使用这些方法将高维数据降维并可视化，以帮助理解和分类数据。

摘要由CSDN通过智能技术生成

一、前言

通过上篇对线性与非线性的数据的特征提取和降维的学习之后，我们来介绍其他方法，分别有流行学习、多维尺度分析、t-SNE。

二、正文

Ⅰ. 流形学习

流形学习是借鉴拓扑流形的概念的一种降维的方法。用于数据降维，降到二维或者三维时可以对数据进行可视化。因为流形学习利用近邻的距离来计算高维空间的样本距离，所以近邻个数对其降维的结果影响甚大。

from sklearn.manifold import Isomap,MDS,TSNE
isomap=Isomap(n_neighbors=7,n_components=3)
isomap_wine_x=isomap.fit_transform(wine_x)
colors=['red','blue','green']
shape=['o','s','*']
fig=plt.figure(figsize=(10,6))
ax1=fig.add_subplot(111,projection='3d')
for ii,y in enumerate(wine_y):
      ax1.scatter(isomap_wine_x[ii,0],isomap_wine_x[ii,1],isomap_wine_x[ii,2],s=40,c=colors[y],marker=shape[y])

ax1.set_xlabel('E1',rotation=20)
ax1.set_ylabel('E2',rotation=-20)
ax1.set_zlabel('E3',rotation=90)
ax1.azim=225
ax1.set_title('Isomap')
plt.show()

设置7 个近邻点来计算空间中的距离，然后n_components=3来降维到三维。

于是就能够对数据分分布情况进行可视化。

Ⅱ.t-SNE

t-SNE是一种常用的数据降维的方法，同时也可以作为一种数据的提取方法。

from sklearn.manifold import Isomap,MDS,TSNE
tsne=TSNE(n_components=3,perplexity=25,early_exaggeration=3,random_state=123)
tsne_wine_x=tsne.fit_transform(wine_x)
colors=['red','blue','green']
shape=['o','s','*']
fig=plt.figure(figsize=(10,6))
ax1=fig.add_subplot(111,projection='3d')
for ii,y in enumerate(wine_y):
      ax1.scatter(tsne_wine_x[ii,0],tsne_wine_x[ii,1],tsne_wine_x[ii,2],s=40,c=colors[y],marker=shape[y])

ax1.set_xlabel('E1',rotation=20)
ax1.set_ylabel('E2',rotation=-20)
ax1.set_zlabel('E3',rotation=90)
ax1.azim=225
ax1.set_title('Tsne')
plt.show()

方法流形学习大差不大，同时提取数据上的三个特征。

降维到三维之后开始对数据分布进行可视化。可以发现此算法下的三种数据的分布情况较容易区分，同时表明利用提取到的特征对数据类别进行分类时会更加容易。

Ⅲ.多维尺度分析

from sklearn.manifold import Isomap,MDS,TSNE
mds=MDS(n_components=3,dissimilarity='euclidean',random_state=123)
mds_wine_x=tsne.fit_transform(wine_x)
colors=['red','blue','green']
shape=['o','s','*']
fig=plt.figure(figsize=(10,6))
ax1=fig.add_subplot(111,projection='3d')
for ii,y in enumerate(wine_y):
      ax1.scatter(mds_wine_x[ii,0],mds_wine_x[ii,1],mds_wine_x[ii,2],s=40,c=colors[y],marker=shape[y])

ax1.set_xlabel('E1',rotation=20)
ax1.set_ylabel('E2',rotation=-20)
ax1.set_zlabel('E3',rotation=90)
ax1.azim=225
ax1.set_title('MDS')
plt.show()

多维尺度分析时基于通过低维空间的可视化，从而对高维数据进行可视化的方法。其目标是：将原始数据降维到一个低维坐标系当中，同时保证通过降维引起的任何形变达到最小。