假设我们使用两个二进制下标来计算两个对象的特征。
n
00
n_{00}
n00和
n
11
n_{11}
n11分别代表两个对象特征同时不存在和存在,
n
01
n_{01}
n01和
n
10
n_{10}
n10代表对象特征只存在在一个对象。对于两个数据点
x
i
x_{i}
xi和
x
j
x_{j}
xj两种常见类型的相似性度量如下
S
i
j
=
n
11
+
n
00
n
11
+
n
00
+
w
(
n
10
+
n
01
)
S_{ij}=\frac{n_{11}+n_{00}}{n_{11}+n_{00}+w(n_{10}+n_{01})}
Sij=n11+n00+w(n10+n01)n11+n00
w
=
1
,
w=1,
w=1, simple matching coefficient;
w
=
2
,
w=2,
w=2, Rogers and Tanimoto measure;
w
=
1
/
2
,
w=1/2,
w=1/2, Gower and Legendre measure.
这些度量直接计算两个对象之间的匹配。未匹配的对根据它们对相似度的贡献进行加权。
S
i
j
=
n
11
n
11
+
w
(
n
10
+
n
01
)
S_{ij}=\frac{n_{11}}{n_{11}+w(n_{10}+n_{01})}
Sij=n11+w(n10+n01)n11
w
=
1
,
w=1,
w=1, Jaccard coefficient;
w
=
2
,
w=2,
w=2, Sokal and Sneath measure;
w
=
1
/
2
,
w=1/2,
w=1/2, Gower and Legendre measure.
These measures focus on the co-occurrence features while
ignoring the effect of co-absence.
二进制特征的相似性和非相似性度量(the Similarity Measure and Dissimilarity Measure for Binary Features)
最新推荐文章于 2024-04-26 09:49:34 发布