DBSCAN
该聚类算法利用点周围密度的概念进行聚类,不一定要求类具有凸性,
因为其可通过密度构造出噪声点。
参数min_sample、eps给出核心点选择的定义,与esp为半径最少包含min_samples的为
核心点。
-1标记噪声点,非负整数标记类别。
该聚类算法利用点周围密度的概念进行聚类,不一定要求类具有凸性,
因为其可通过密度构造出噪声点。
参数min_sample、eps给出核心点选择的定义,与esp为半径最少包含min_samples的为
核心点。
-1标记噪声点,非负整数标记类别。
numpy.zeros_like:
生成与给定序列shape相同的序列。
生成与给定序列shape相同的序列。
下面是一个例子:(大点为核心点,小点为边界点,黑点为噪声点)
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples = 750, centers = centers, cluster_std = 0.4, random_state = 0)
X = StandardScaler().fit_transform(X)
db = DBSCAN(eps = 0.3, min_samples = 10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype = bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print "Estimated number of clusters: %d" % n_clusters_
print "Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels)
print "Completeness: %0.3f" % metrics.completeness_score(labels_true, labels)
print "V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels)
print "Adjusted Rand Index: %0.3f" % metrics.adjusted_rand_score(labels_true, labels)
print "Adjusted Mutual Information: %0.3f" % metrics.adjusted_mutual_info_score(labels_true, labels)
print "Silhouette Coefficient : %0.3f" % metrics.silhouette_score(X, labels)
import matplotlib.pyplot as plt
unique_labels = set(labels)
colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))
for k, col in zip(unique_labels, colors):
if k == -1:
col = 'k'
class_member_mask = (labels == k)
xy = X[class_member_mask & core_samples_mask]
plt.plot(xy[:,0], xy[:,1], 'o', markerfacecolor = col,
markeredgecolor = 'k', markersize = 14)
xy = X[class_member_mask & -core_samples_mask]
plt.plot(xy[:,0], xy[:,1], 'o', markerfacecolor = col,
markeredgecolor = 'k', markersize = 6)
plt.title("Estimated number of clusters: %d" % n_clusters_)
plt.show()