1.4.3 无监督学习

最新推荐文章于 2022-12-12 20:15:56 发布

egbert果

最新推荐文章于 2022-12-12 20:15:56 发布

阅读量471

点赞数 1

分类专栏：人工智能深度学习文章标签：无监督学习 PCA K-mean

本文链接：https://blog.csdn.net/egbert123/article/details/86602466

版权

人工智能同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

深度学习

8 篇文章 0 订阅

订阅专栏

一个经典的无监督学习任务是找到数据的“最佳”表示，“最佳”可以是不同的表示，但是一般来说是指该表示在比本事表示的信息更简单或者更容易访问而受到一些惩罚或者限制的情况下，尽可能多的保存关于x的信息。

有很多方式定义较简单的表示，常见的三种有低维表示，稀疏表示和独立表示

主成分分析（PCA）

线性代数一章说过，一种降维的手段

k均值据类(k-mean)

k-均值聚类算法将训练集分为k个靠近彼此的不同样本聚类，因此该算法提供了k维的one-hot编码向量以表示输入x。当x属于据类i时，有hi=1， h的其它项为0

k-mean有k个不同的中心点，然后迭代交换两个不同的步骤直到收敛。

每个训练样本分配到最近的中心点所代表的聚类i
每个中心点跟新为聚类i中所有训练样本x的均值

import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import tensorflow as tf

def loadData():
    iris = datasets.load_iris()
    X=iris.data
    y=iris.target
    return X,y

def kmeansCluster(X,numClusters):
    get_inputs=lambda: tf.train.limit_epochs(tf.convert_to_tensor(X, dtype=tf.float32), num_epochs=1)
    # 加载模型
    cluster = tf.contrib.factorization.KMeansClustering(num_clusters=numClusters,
                                                      initial_clusters=tf.contrib.factorization.KMeansClustering.KMEANS_PLUS_PLUS_INIT)
    cluster.train(input_fn=get_inputs, steps=2000)  # 训练
    y_pred=cluster.predict_cluster_index(input_fn=get_inputs)  # 预测
    y_pred=np.asarray(list(y_pred))
    return y_pred

def plotFigure(fignum,title, X,y):
    fig = plt.figure(fignum, figsize=(8,6))
    ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)
    ax.scatter(X[:, 3], X[:, 0], X[:, 2],
               c=y.astype(np.float), edgecolor='k')
    ax.w_xaxis.set_ticklabels([])
    ax.w_yaxis.set_ticklabels([])
    ax.w_zaxis.set_ticklabels([])
    ax.set_xlabel('Petal width')
    ax.set_ylabel('Sepal length')
    ax.set_zlabel('Petal length')
    ax.set_title(title)
    ax.dist = 10
    fig.show()

if __name__ == '__main__':
    X,y = loadData()
    y_pred = kmeansCluster(X,3)
    plotFigure(1,"3 clusters",X,y_pred)
    plotFigure(2,"Ground Truth",X,y)

INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: C:\Users\egbert\AppData\Local\Temp\tmp27idx79j
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\egbert\\AppData\\Local\\Temp\\tmp27idx79j', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001F520FD3BA8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\egbert\AppData\Local\Temp\tmp27idx79j\model.ckpt.
INFO:tensorflow:Saving checkpoints for 1 into C:\Users\egbert\AppData\Local\Temp\tmp27idx79j\model.ckpt.
INFO:tensorflow:Loss for final step: None.
WARNING:tensorflow:Input graph does not use tf.data.Dataset or contain a QueueRunner. That means predict yields forever. This is probably a mistake.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\egbert\AppData\Local\Temp\tmp27idx79j\model.ckpt-1
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


D:\Anaconda\lib\site-packages\matplotlib\figure.py:459: UserWarning: matplotlib is currently using a non-GUI backend, so cannot show the figure
  "matplotlib is currently using a non-GUI backend, "

png

在这里插入图片描述

egbert果

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
1.4.3 无监督学习

一个经典的无监督学习任务是找到数据的“最佳”表示，“最佳”可以是不同的表示，但是一般来说是指该表示在比本事表示的信息更简单或者更容易访问而受到一些惩罚或者限制的情况下，尽可能多的保存关于x的信息。有很多方式定义较简单的表示，常见的三种有低维表示，稀疏表示和独立表示主成分分析（PCA）线性代数一章说过，一种降维的手段k均值据类(k-mean)k-均值聚类算法将训练集分为k个靠近彼此的不同样...
复制链接

扫一扫