sklearn聚类算法affinity propagation

AffinityPropagation creates clusters by sending messages between pairs of samples until convergence. A dataset is then described using a small number of exemplars, which are identified as those most representative of other samples. The messages sent between pairs represent the suitability for one sample to be the exemplar of the other, which is updated in response to the values from other pairs. This updating happens iteratively until convergence, at which point the final exemplars are chosen, and hence the final clustering is given.
Affinity Propagation can be interesting as it chooses the number of clusters based on the data provided. For this purpose, the two important parameters are the preference, which controls how many exemplars are used, and the damping factor which damps the responsibility and availability messages to avoid numerical oscillations when updating these messages.

算法原理

The messages sent between points belong to one of two categories.
The first is the responsibility 𝑟(𝑖, 𝑘), which is the accumulated evidence that sample 𝑘 should be the exemplar for sample 𝑖.
The second is the availability 𝑎(𝑖, 𝑘) which is the accumulated evidence that sample 𝑖 should choose sample 𝑘 to be its exemplar, and considers the values for all other samples that 𝑘 should be an exemplar.
In this way, exemplars are chosen by samples if they are (1) similar enough to many samples and (2) chosen by many samples to be representative of themselves.

More formally, the responsibility of a sample 𝑘 to be the exemplar of sample 𝑖 is given by:
r ( i , k ) ← s ( i , k ) − max ⁡ [ a ( i , k ′ ) + s ( i , k ′ ) ∀ k ′ ≠ k ] r(i, k) \leftarrow s(i, k) - \max [ a(i, k') + s(i, k') \forall k' \neq k ] r(i,k)s(i,k)max[a(i,k)+s(i,k)k=k]

Where 𝑠(𝑖, 𝑘) is the similarity between samples 𝑖 and 𝑘. The availability of sample 𝑘 to be the exemplar of sample 𝑖 is given by:
a ( i , k ) ← min ⁡ [ 0 , r ( k , k ) + ∑ i ′   s . t .   i ′ ∉ { i , k } r ( i ′ , k ) ] a(i, k) \leftarrow \min [0, r(k, k) + \sum_{i'~s.t.~i' \notin \{i, k\}}{r(i', k)}] a(i,k)min[0,r(k,k)+i s.t. i/{i,k}r(i,k)]

To begin with, all values for 𝑟 and 𝑎 are set to zero, and the calculation of each iterates until convergence. As discussed above, in order to avoid numerical oscillations when updating the messages, the damping factor 𝜆 is introduced to iteration process:

r t + 1 ( i , k ) = λ ⋅ r t ( i , k ) + ( 1 − λ ) ⋅ r t + 1 ( i , k ) r_{t+1}(i, k) = \lambda\cdot r_{t}(i, k) + (1-\lambda)\cdot r_{t+1}(i, k) rt+1(i,k)=λrt(i,k)+(1λ)rt+1(i,k)
a t + 1 ( i , k ) = λ ⋅ a t ( i , k ) + ( 1 − λ ) ⋅ a t + 1 ( i , k ) a_{t+1}(i, k) = \lambda\cdot a_{t}(i, k) + (1-\lambda)\cdot a_{t+1}(i, k) at+1(i,k)=λat(i,k)+(1λ)at+1(i,k)
where 𝑡 indicates the iteration times.

实现

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

示例代码1

from sklearn.cluster import AffinityPropagation
from sklearn import metrics
from sklearn.datasets import make_blobs

# #############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
                            random_state=0)
# X.shape=(300,2) 300个sample 的坐标
# labels_true.shape=(300,) X 中每个 sample 的类别所属的类别,值为 0,1,2                          
# Affinity Propagation 是无监督学习,所以 true label 没有作用

# #############################################################################
# Compute Affinity Propagation
af = AffinityPropagation(preference=-50).fit(X)
# af 是一个object, 聚类结果封装在 af 中


# 聚类的中心对应的坐标:
print('cluster centers: ',af.ccluster_centers_)
#cluster centers:  [[ 1.03325861  1.15123595]
# [ 0.93494652 -0.95302339]
# [-1.18459092 -1.11968959]]

# 作为聚类中心的点在 X 中的indice
cluster_centers_indices = af.cluster_centers_indices_
# array([160, 250, 272], dtype=int64)

# 算法对X中每个sample所属的类别做出的判断(label)
labels = af.labels_
# labels.shape=(300,)

# 聚类的种类数,也就是算法把X中的数据划分成几类
n_clusters_ = len(cluster_centers_indices)

# 对聚类结果的好坏进行评估
print('Estimated number of clusters: %d' % n_clusters_)
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f"
      % metrics.adjusted_rand_score(labels_true, labels))
print("Adjusted Mutual Information: %0.3f"
      % metrics.adjusted_mutual_info_score(labels_true, labels))
print("Silhouette Coefficient: %0.3f"
      % metrics.silhouette_score(X, labels, metric='sqeuclidean'))

# #############################################################################
# Plot result
import matplotlib.pyplot as plt
from itertools import cycle

plt.close('all')
plt.figure(1)
plt.clf()

colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
    class_members = labels == k
    cluster_center = X[cluster_centers_indices[k]]
    plt.plot(X[class_members, 0], X[class_members, 1], col + '.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
             markeredgecolor='k', markersize=14)
    for x in X[class_members]:
        plt.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col)

plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

结果:

Estimated number of clusters: 3
Homogeneity: 0.872
Completeness: 0.872
V-measure: 0.872
Adjusted Rand Index: 0.912
Adjusted Mutual Information: 0.871
Silhouette Coefficient: 0.753

效果图:
在这里插入图片描述

af
AffinityPropagation(affinity='euclidean', convergence_iter=15, copy=True,
                    damping=0.5, max_iter=200, preference=-50, verbose=False)

示例代码2

from sklearn.cluster import AffinityPropagation
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],[4, 2], [4, 4], [4, 0]])
clustering = AffinityPropagation(preference=-3).fit(X)
AffinityPropagation(affinity='euclidean', convergence_iter=15, copy=True,
                    damping=0.5, max_iter=200, preference=-3, verbose=False)
clustering.labels_
array([0, 0, 0, 1, 1, 1])
# Methods: predict 判断新输入的数据属于哪一类
clustering.predict([[0, 0], [4, 4]])
array([0, 1])
# Methods: 显示 clustering 的 parameters
clustering.get_params() 
{'affinity': 'euclidean',
 'convergence_iter': 15,
 'copy': True,
 'damping': 0.5,
 'max_iter': 200,
 'preference': -3,
 'verbose': False}
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值