1. 调包的公式:
两个样本a和b离的越近cos距离越小。
2. 余弦相似度:
cos常用于相似性度量:
cos值越大,两个样本越相似,距离越近。
3. 避免混淆
很多文献将余弦相似度说成余弦距离,这是错误的。对于两个样本,两者之间距离越大越不相似;两者之间距离越近越相似。
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.neighbors import kneighbors_graph, NearestNeighbors
import numpy as np
from sklearn.metrics.pairwise import pairwise_kernels, pairwise_distances
from sklearn import datasets
X, y = datasets.load_iris(return_X_y=True)
def cos(a, b):
return a @ b /np.sqrt( a @ a * b @ b)
graph = kneighbors_graph(X, n_neighbors=2,mode='distance',metric='cosine', include_self=False)
# print(graph)
Density = (2-np.sum(graph.toarray(), axis=1))
print(Density)
cos_distance_0_10 = pairwise_distances(X[[0]],X[[10]], metric='cosine')
cos_0_10 = cos(X[0], X[10])
cos_0_2 = cos(X[0], X[2])
print(cos_0_10)
print(cos_distance_0_10)
print(1-cos_distance_0_10)
print("----------")
print((cos_0_10+cos_0_2))