sklearn.DBSCAN解析

sklearn版本

  • scikit-learn 0.23.2

sklearn.DBSCAN使用实例

  • 实例1 Iris,鸢尾花数据集(UC Irvine Machine Learning Repository)

    Iris可以从sklearn包内部导入,常常用作分类的训练数据集。这里为了方便展示聚类效果(二维在平面图中效果明显),选取Iris的前两个维度作为聚类依据。上一篇文章(sklearn.KMeans解析)中,我们已经用KMeans做过这个实验,这里用DBSCAN来做。代码参见Iris_DBSCAN.py。左图为全部数据点,右图为DBSCAN聚类(eps = 0.3, min_samples = 5)结果,不同颜色代表不同簇。参数是经过调整的,以接近KMeans的结果,因为个人认为这个数据集聚类还是KMeans效果略好。

  • 实例2 RandomData,随机生成的月牙形数据(非凸数据集)

    这个例子体现出了DBSCAN在处理非凸数据集时的优势。代码参见RandomData_DBSCAN.py。最左图为随机生成的月牙形数据点。剩余两张图分别为KMeans聚类(n_cluseters=2)结果和DBSCAN聚类(eps = 0.1, min_samples = 10)结果。

sklearn.DBSCAN解析

本节分析sklearn.DBSCAN中的主要函数。

  • DBSCAN类

    导入方法:from sklearn.cluster import DBSCAN
    描述: 构造一个DBSCAN聚类,其函数用于完成聚类。待初始化参数参见构造函数。
    待计算参数(计算完成后,聚类完成):
    self.core_samples_indices		#DBSCAN聚类核心对象在训练数据中的索引
    self.components_				#DBSCAN聚类核心对象
    self.labels_					#训练数据簇标签(训练后,每条数据所属簇)
    
  • 构造函数__init__(),调用格式iris_dbscan = DBSCAN(eps = 0.3, min_samples = 5)

    描述: DBSCAN类构造函数。调用格式中传入两个参数值,其他值取默认。
    主要代码(__init__):

    self.eps = eps						#邻域半径设为0.3
    self
  • 5
    点赞
  • 26
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together points that are close to each other and separates points that are far away from each other. It is a density-based algorithm that can identify arbitrarily shaped clusters and handle noise efficiently. The algorithm takes two parameters as input: epsilon (ε) and the minimum number of points required to form a dense region (min_samples). It starts by selecting a random point and finding all the neighboring points within a distance of ε. If the number of points within the distance ε is greater than or equal to min_samples, then a new cluster is formed. If not, the point is labeled as noise. Next, the algorithm examines the neighbors of each point in the cluster and adds them to the cluster if they also have enough neighbors within a distance ε. This process continues until all points have been assigned to a cluster or labeled as noise. DBSCAN has several advantages over other clustering algorithms such as K-means and Hierarchical clustering. It does not require prior knowledge of the number of clusters, it can handle noise effectively, and it can identify clusters of arbitrary shapes. However, it can be sensitive to the choice of parameters ε and min_samples, and it may not work well with data that has varying densities. In scikit-learn, the DBSCAN algorithm is implemented in the sklearn.cluster.DBSCAN class. It can be used to cluster data in a variety of applications such as image segmentation, anomaly detection, and customer segmentation.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值