闵可夫斯基距离
d
(
i
,
j
)
=
∣
x
i
1
−
x
j
1
∣
h
+
∣
x
i
2
−
x
j
2
∣
h
+
.
.
.
+
∣
x
i
d
−
x
j
d
∣
h
h
d(i,j)=\sqrt[h]{\left | x_{i1}-x_{j1}\right | ^h+\left | x_{i2}-x_{j2}\right | ^h+...+\left | x_{id}-x_{jd}\right | ^h }
d(i,j)=h∣xi1−xj1∣h+∣xi2−xj2∣h+...+∣xid−xjd∣h
i=(xi1,xi2…xid),i=(xj1,xj2…xjd)分别代表两个d维数据对象,h为序,上述距离也被称为Lh范式
曼哈顿距离
h=1,L1范式
d
(
i
,
j
)
=
∣
x
i
1
−
x
j
1
∣
+
∣
x
i
2
−
x
j
2
∣
+
.
.
.
+
∣
x
i
d
−
x
j
d
∣
d(i,j)=\left | x_{i1}-x_{j1}\right | +\left | x_{i2}-x_{j2}\right | +...+\left | x_{id}-x_{jd}\right |
d(i,j)=∣xi1−xj1∣+∣xi2−xj2∣+...+∣xid−xjd∣
欧氏距离
h=2,L2范式
d
(
i
,
j
)
=
∣
x
i
1
−
x
j
1
∣
2
+
∣
x
i
2
−
x
j
2
∣
2
+
.
.
.
+
∣
x
i
d
−
x
j
d
∣
2
2
d(i,j)=\sqrt[2]{\left | x_{i1}-x_{j1}\right | ^2+\left | x_{i2}-x_{j2}\right | ^2+...+\left | x_{id}-x_{jd}\right | ^2}
d(i,j)=2∣xi1−xj1∣2+∣xi2−xj2∣2+...+∣xid−xjd∣2
极大距离
h=∞,L∞范式, d ( i , j ) = m a x ∣ x i f − x j f ∣ d(i,j)=max\left | x_{i}f-x_{j}f\right | d(i,j)=max∣xif−xjf∣
示例
点集 | 特征1 | 特征2 |
---|---|---|
x1 | 1 | 2 |
x2 | 3 | 5 |
x3 | 2 | 0 |
x4 | 4 | 5 |
曼哈顿距离
L1 | x1 | x2 | x3 | x4 |
---|---|---|---|---|
x1 | 0 | |||
x2 | 5 | 0 | ||
x3 | 3 | 6 | 0 | |
x4 | 6 | 1 | 7 | 0 |
欧氏距离
L2 | x1 | x2 | x3 | x4 |
---|---|---|---|---|
x1 | 0 | |||
x2 | 3.61 | 0 | ||
x3 | 2.24 | 5.1 | 0 | |
x4 | 4.24 | 1 | 5.39 | 0 |
极大距离
L∞ | x1 | x2 | x3 | x4 |
---|---|---|---|---|
x1 | 0 | |||
x2 | 3 | 0 | ||
x3 | 2 | 5 | 0 | |
x4 | 3 | 1 | 5 | 0 |
余弦相似度
c o s ( o i , o j ) = ∑ k = 1 n ( x i k ⋅ x j k ) ∑ l = 1 n x i l 2 ⋅ ∑ l = 1 n x j l 2 cos(o_{i},o_{j})=\frac{\sum_{k=1}^{n}(x_{ik}\cdot x_{jk})}{\sqrt{\sum_{l=1}^{n}x_{il}^2}\cdot \sqrt{\sum_{l=1}^{n}x_{jl}^2}} cos(oi,oj)=∑l=1nxil2⋅∑l=1nxjl2∑k=1n(xik⋅xjk)
Instance | Team | Coach | Hockey | Baseball | Soccer | Penalty | Score | Win | Loss | Season |
---|---|---|---|---|---|---|---|---|---|---|
instance1 | 5 | 0 | 3 | 0 | 2 | 0 | 0 | 2 | 0 | 0 |
instance2 | 3 | 0 | 2 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
cos(instance1,instance2)
=
5
∗
3
+
0
∗
0
+
3
∗
2
+
0
∗
0
+
2
∗
1
+
0
∗
1
+
2
∗
1
+
0
∗
0
+
0
∗
1
(
25
+
9
+
4
+
4
)
0.5
∗
(
9
+
4
+
1
+
1
+
1
+
1
)
0.5
\frac{5*3+0*0+3*2+0*0+2*1+0*1+2*1+0*0+0*1}{(25+9+4+4)^{0.5}*(9+4+1+1+1+1)^{0.5}}
(25+9+4+4)0.5∗(9+4+1+1+1+1)0.55∗3+0∗0+3∗2+0∗0+2∗1+0∗1+2∗1+0∗0+0∗1=0.94
▶余弦相似度与欧氏距离的对比:
●衡量角度不同:
欧氏距离:绝对距离
余弦相似度:方向差异
●适应模型不同:
欧氏距离:数值特征绝对差异,用于需要从维度的数值大小中体现差异的分析,如使用用户行为指标分析用户价值的相似度或差异
余弦相似度:对绝对数值不敏感,用于使用用户对内容评分来区分用户兴趣的相似度和差异