AP algorithm:
http://blog.csdn.net/lixi__liu/article/details/48470173
Two important parameters:
Damping:bigger damping result in more computing time and reduce oscillation.
Preference:bigger preference result in more clusters.
Preference is the main diagonal elements of similarity matrix.
Rather than requiring that the number of clusters be prespecified, affinity propagation takes as inputa real number s(k,k)
for each data point kso that data points with larger valuesof s(k,k) are more likely to be chosen as exemplars.
These values are referred to as “preferences.”
--<Clustering by Passing Messages Between Data Points Brendan> J. Frey* and Delbert Dueck,<Science07>
from sklearn.cluster import AffinityPropagation
parameters:
damping : float, optional, default: 0.5
Damping factor between 0.5 and 1.
convergence_iter : int, optional, default: 15
Number of iterations with no change in the number of estimated clusters that stops the convergence.
max_iter : int, optional, default: 200
Maximum number of iterations.
copy : boolean, optional, default: True
Make a copy of input data.
preference : array-like, shape (n_samples,) or float, optional
Preferences for each point - points with larger values of preferences are more likely to be chosen as exemplars. The number of exemplars, ie of clusters, is influenced by the input preferences value. If the preferences are not passed as arguments, they will be set to the median of the input similarities.
affinity : string, optional, default=``euclidean``
Which affinity to use. At the moment
precomputed
andeuclidean
are supported.euclidean
uses the negative squared euclidean distance between points.If affinity=='precomputed',must input a similarity matrix,else AffinityPropagation.fit method can help you compute similarity matrix by negative euclidean distance and you can input a matrix which consists of examples and features.
verbose : boolean, optional, default: False
Whether to be verbose.
attributes:
cluster_centers_indices_ : array, shape (n_clusters,)
Indices of cluster centers
cluster_centers_ : array, shape (n_clusters, n_features)
Cluster centers (if affinity !=
precomputed
).Since input is similarity matrix,we don't have cluster_centers_ which is a example consist of features's values.
labels_ : array, shape (n_samples,)
Labels of each point
affinity_matrix_ : array, shape (n_samples, n_samples)
Stores the affinity matrix used in
fit
.Which is similarity matrix.Return origin similarity matrix rather than the similarity matrix which main diagonal elements is replaced by preferences.
n_iter_ : int
Number of iterations taken to converge.
method:
fit
(X[, y])Create affinity matrix from negative euclidean distances, then apply affinity propagation clustering. fit_predict
(X[, y])Performs clustering on X and returns cluster labels. get_params
([deep])Get parameters for this estimator. predict
(X)Predict the closest cluster each sample in X belongs to. set_params
(\*\*params)Set the parameters of this estimator.