聚类和EM算法——密度聚类

最新推荐文章于 2022-03-16 14:09:16 发布

小小蒲公英

最新推荐文章于 2022-03-16 14:09:16 发布

阅读量698

点赞数

分类专栏： Python 机器学习

本文链接：https://blog.csdn.net/weixin_39777626/article/details/79721991

版权

Python 同时被 2 个专栏收录

120 篇文章 5 订阅

订阅专栏

机器学习

44 篇文章 1 订阅

订阅专栏

模型原型
class sklearn.cluster.DBSCAN(eps=0.5,min_samples=5,metric=’euclidean’,
algorithm=’auto’,leaf_size=30, p=None,random_state=None)
参数
- eps:ϵ参数，用于确定邻域大小
- min_samples:MinPts参数，用于判断核心对象
- metric:用于计算距离
- algorithm:用于计算两点间距离并找出最近邻的点
- ‘auto’:由算法自动选取合适的算法
- ‘ball_tree’:用ball树来搜索
- ‘kd_tree’:用kd树来搜索
- ‘brute’:暴力搜索
- leaf_size:当algorithm=ball_tree或者kd_tree时，树的叶节点大小。该参数会影响构建树、搜索最近邻的速度，同时影响内存树的内存
- random_state
属性
- core_sampleindices:核心样本在原始训练集中的位置
- components_:核心样本的一份副本
- labels_:每个样本所属的簇标记
方法
- fit(X[,y,sample_weight])
- fit_predict(X[,y,sample_weight])

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn import cluster
from sklearn.metrics import adjusted_rand_score
from sklearn import mixture

产生数据

def create_data(centers,num=100,std=0.7):
    X,labels_true=make_blobs(n_samples=num,centers=centers,cluster_std=std)
    return X,labels_true

查看生成的样本点

def plot_data(*data):
    X,labels_true=data
    labels=np.unique(labels_true)
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    colors='rgbyckm'
    for i,label in enumerate(labels):
        position=labels_true==label
        ax.scatter(X[position,0],X[position,1],label='cluster %d'%label,color=colors[i%len(colors)])
    ax.legend(loc='best',framealpha=0.5)
    ax.set_xlabel('X[0]')
    ax.set_ylabel('Y[1]')
    ax.set_title('data')
    plt.show()

X,labels_true=create_data([[1,1],[2,2],[1,2],[10,20]],1000,0.5)
plot_data(X,labels_true)

使用DBSCAN

def test_DBSCAN(*data):
    X,labels_true=data
    clst=cluster.DBSCAN()
    predicted_labels=clst.fit_predict(X)
    print('ARI:%s'%adjusted_rand_score(labels_true,predicted_labels))
    print('Core sample num:%d'%len(clst.core_sample_indices_))

centers=[[1,1],[2,2],[1,2],[10,20]]
X,labels_true=create_data(centers,1000,0.5)
test_DBSCAN(X,labels_true)

ϵ参数的影响

def test_DBSCAN_epsilon(*data):
    X,labels_true=data
    epsilons=np.logspace(-1,1.5)
    ARIs=[]
    Core_nums=[]
    for epsilon in epsilons:
        clst=cluster.DBSCAN(eps=epsilon)
        predicted_labels=clst.fit_predict(X)
        ARIs.append(adjusted_rand_score(labels_true,predicted_labels))
        Core_nums.append(len(clst.core_sample_indices_))

    fig=plt.figure()
    ax=fig.add_subplot(1,2,1)
    ax.plot(epsilons,ARIs,marker='+')
    ax.set_xscale('log')
    ax.set_xlabel(r'$\epsilon$')
    ax.set_ylim(0,1)
    ax.set_ylabel('ARI')

    ax=fig.add_subplot(1,2,2)
    ax.plot(epsilons,Core_nums,marker='o')
    ax.set_xscale('log')
    ax.set_xlabel(r'$\epsilon$')
    ax.set_ylabel('Core_Nums')

    fig.suptitle('DBSCAN')
    plt.show()

test_DBSCAN_epsilon(X,labels_true)

MinPts参数的影响

def test_DBSCAN_min_samples(*data):
    X,labels_true=data
    min_samples=range(1,100)
    ARIs=[]
    Core_nums=[]
    for num in min_samples:
        clst=cluster.DBSCAN(min_samples=num)
        predicted_labels=clst.fit_predict(X)
        ARIs.append(adjusted_rand_score(labels_true,predicted_labels))
        Core_nums.append(len(clst.core_sample_indices_))

    fig=plt.figure()
    ax=fig.add_subplot(1,2,1)
    ax.plot(min_samples,ARIs,marker='+')
    ax.set_xlabel('min_samples')
    ax.set_ylim(0,1)
    ax.set_ylabel('ARI')

    ax=fig.add_subplot(1,2,2)
    ax.plot(min_samples,Core_nums,marker='o')
    ax.set_xlabel('min_samples')
    ax.set_ylabel('Core_Nums')

    fig.suptitle('DBSCAN')
    plt.show()

test_DBSCAN_min_samples(X,labels_true)

小小蒲公英

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
聚类和EM算法——密度聚类

模型原型 class sklearn.cluster.DBSCAN(eps=0.5,min_samples=5,metric=’euclidean’, algorithm=’auto’,leaf_size=30, p=None,random_state=None) 参数 - eps:ϵ参数，用于确定邻域大小 - min_samples:MinPts参数，用于判断核心对象 - met
复制链接

扫一扫

专栏目录