Unsupervised learning notes

本文探讨了无监督学习中的两种主要技术:主成分分析(PCA)和降维方法。PCA用于在保持大部分数据信息的同时减少特征数量。通过标准化数据,应用PCA将乳腺癌数据集从高维降至二维。此外,还介绍了多维缩放(MDS)和t-SNE,它们分别致力于保持距离信息和邻近关系的低维投影。MDS和t-SNE同样应用于水果数据集,实现了数据的可视化降维。
摘要由CSDN通过智能技术生成

Unsupervised Learning

Dimensionality Reduction

  1. PCA

Find the features that capture most of the data points.

(https://www.youtube.com/watch?v=HMOI_lkzW08)

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
(X_cancer, y_cancer) = load_breast_cancer(return_X_y = True)

# Before applying PCA, each feature should be centered (zero mean) and with unit variance
X_normalized = StandardScaler().fit(X_cancer).transform(X_cancer)  

pca = PCA(n_components = 2).fit(X_normalized)

X_pca = pca.transform(X_normalized)
  1. Manifold Learning

Multidimensional scaling (MDS) attempts to find a distance-preserving low dimensional projection.

t-SNE finds a 2D projection preserving information about neighbours. (Also use the distance from the high dimension and project to 2D, but focus on distance between neighbours)

(https://distill.pub/2016/misread-tsne/#citation)

from adspy_shared_utilities import plot_labelled_scatter
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import MDS

# each feature should be centered (zero mean) and with unit variance
X_fruits_normalized = StandardScaler().fit(X_fruits).transform(X_fruits)  

mds = MDS(n_components = 2)

X_fruits_mds = mds.fit_transform(X_fruits_normalized)


from sklearn.manifold import TSNE

tsne = TSNE(random_state = 0)

X_tsne = tsne.fit_transform(X_fruits_normalized)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值