# 18种和“距离(distance)”、“相似度(similarity)”相关的量的小结

Euclidean Distance 欧式距离
d=i=1n(xiyi)2

Manhattan Distance 曼哈顿距离
d=i=1n|xiyi|

Minkowski Distance 闵氏距离
d=i=1n(xiyi)pp

Hamming Distance 海明距离 逐个字符(或逐位)对比，统计不一样的位数的个数总和 所得值越小，参与对比的两个元素约相似；下面是从wikipedia借的4bit的海明距离示意图
Jaccard Coefficient 杰卡德距离
J(A,B)=|AB||AB|

Ochiai Coefficient ?
K=n(AB)n(A)×n(B)
Pearson Correlation 皮尔森相关系数
r=ni=1(Xix¯)(yiy¯)ni=1(Xix¯)2ni=1(yiy¯)2

Cosine Similarity 余弦相似度
S=xy|x||y|
Mahalanobis Distance 马氏距离
d=(x⃗ y⃗ )TS1(x⃗ y⃗ )

Kullback-Leibler Divergence K-L散度
D(P||Q)=i=1nP(i)logP(i)Q(i)

PMI(Pointwise Mutual Information) 点对互信息
pmi=logp(x,y)p(x)p(y)=logp(y|x)p(y)

NGD(x,y)=max{logf(x),logf(y)}logf(x,y)logMmin{logf(x),logf(y)}

Levenshtein Distance(Edit Distance) Levenshtein距离(编辑距离) f(n)=
max(i,j)minleva,b(i1,j)+1leva,b(i,j1)+1leva,b(i1,j1)+1(aibj)if min(i,j)=0,otherwise.

Jaro-Winkler Distance ?
013(m|s1|+m|s2|+mtm)if m=0otherwise
Lee Distance 李氏距离
d=i=1n|xiyi|

Hellinger Distance ?
H2(P,Q)=12(dPdλdQdλ)2dλ

dP/dλdQ/dλ$dP/d\lambda、dQ/d\lambda$为概率密度函数时，进一步有
H2(P,Q)=1f(x)g(x)dx$H^2(P,Q)=\sqrt{1-\int{\sqrt{f(x)g(x)}dx}}$

Canberra Distance 坎贝拉距离
d(p⃗ ,q⃗ )=i=1n|piqi||pi|+|qi|

where
p⃗ =(p1,p2,,pn)
and
q⃗ =(q1,q2,,qn)
Chebyshev Distance 切比雪夫距离
DChebyshev(p,q)=maxi(|piqi|)=limk(i=1n|piqi|k)1/k