python-机器学习打卡(七)--无监督学习(一)--基础概念和主成分分析-CSDN博客

本文链接：https://blog.csdn.net/qq_39111089/article/details/116272359

这里写目录标题

无监督学习

无监督学习

无监督学习简单说就是模型只有输入数据，但输出是未知的，和监督学习的区别就在于输出是否已知。

无监督学习任务

无监督学习常用来解决两类问题：数据集变换和聚类

数据集变换

对旧数据集创建新的表示算法，使得新的数据集更容易被理解。
最常见的就是降维任务（dimensionality reduction）通过接收包含许多特征的数据的高维表示，并找到新的方法进行变换，使得可以使用较少的特征就可以概括其特征。

降噪自动编码器（Denoised AutoEncoder ，DAE) 就是一个例子。对于处理高维的含有噪声的输入可能导致模型泛化能力不是那么理想，所以我们可以在训练数据加入噪声，来训练整个网络，因为在实际的测试数据中，噪声是不可避免的，采用有噪声的训练数据训练网络，神经网络就能够学习到不加噪声的输入特征和噪声的主要特征。能够使网络在测试数据中有更强的泛化能力。而对于图中Code就是去噪后降低维度的数据表示。

在这里插入图片描述
详细可以看这里非监督学习（一）DAE（宝可梦编码）和[神经网络]从反向传播（BP）到去噪自动编码器（DAE）

聚类算法

将数据划分成不同的组，每组包含相似的物项。

数据变换

数据预处理和缩放

在这里插入图片描述
上图可以使用下面代码展示

import mglearn
import matplotlib.pyplot as plt
mglearn.plots.plot_scaling()
plt.show()

存在算法对数据缩放很敏感，通常会对输入数据进行预处理，使得数据可以适合算法。

StandarScaler 确保特征的平均值为0，方差为1
RobustScaler 使用中位数和四分位数
MinMaxScaler 使得所有特征都在0到1之间
Normalizer 将数据投放到一个半径为1的圆上

MinMaxScaler例子：
它的转换原理是这样的

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min

可以使用fit方法拟合缩放器（scaler）并将其用到训练集，与之前监督学习的分类器和回归器不同，这里只需要提供X_train即可，不需要y_train 。之后可以使用scaler对象的transform方法对数据进行缩放，方法返回值即缩放后的数据。

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

cancer = load_breast_cancer()

X, y = cancer.data, cancer.target

print(X.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=1)

scaler = MinMaxScaler()
scaler.fit(X_train)

X_train_new = scaler.transform(X_train)
X_test_new = scaler.transform(X_test)

print("per-feature minimum before scaling:\n{}".format(X_train.min(axis=0)))
print("per-feature maximum before scaling:\n{}".format(X_train.max(axis=0)))
print("per-feature minimum after scaling:\n{}".format(X_train_new.min(axis=0)))
print("per-feature minimum after scaling:\n{}".format(X_train_new.max(axis=0)))

输出

(569, 30)
per-feature minimum before scaling:
[6.981e+00 1.038e+01 4.379e+01 1.435e+02 5.263e-02 2.650e-02 0.000e+00
0.000e+00 1.167e-01 5.025e-02 1.144e-01 3.602e-01 7.570e-01 6.802e+00
2.667e-03 3.746e-03 0.000e+00 0.000e+00 7.882e-03 9.502e-04 7.930e+00
1.249e+01 5.041e+01 1.852e+02 8.409e-02 4.327e-02 0.000e+00 0.000e+00
1.565e-01 5.504e-02]
per-feature maximum before scaling:
[2.811e+01 3.928e+01 1.885e+02 2.501e+03 1.634e-01 3.454e-01 4.264e-01
1.913e-01 2.906e-01 9.575e-02 2.873e+00 3.647e+00 2.198e+01 5.422e+02
3.113e-02 1.354e-01 3.960e-01 5.279e-02 7.895e-02 2.984e-02 3.604e+01
4.954e+01 2.512e+02 4.254e+03 2.226e-01 1.058e+00 1.252e+00 2.910e-01
5.774e-01 2.075e-01]
per-feature minimum after scaling:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
per-feature minimum after scaling:
[1. 1. 1. 1. 1. 1. 1