jaccard similarity coefficient 相似度计算

Jaccard index

From Wikipedia, the free encyclopedia
 
 

The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statisticused for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

J(A,B) = {​{|A \cap B|}\over{|A \cup B|}}.

(If A and B are both empty, we define J(A,B) = 1.)

0\le J(A,B)\le 1.

The MinHash min-wise independent permutations locality sensitive hashing scheme may be used to efficiently compute an accurate estimate of the Jaccard similarity coefficient of pairs of sets, where each set is represented by a constant-sized signature derived from the minimum values of ahash function.

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

d_J(A,B) = 1 - J(A,B) = { { |A \cup B| - |A \cap B| } \over |A \cup B| }.

An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference A \triangle B = (A \cup B) - (A \cap B) to the union.

This distance is a metric on the collection of all finite sets.[1][2]

There is also a version of the Jaccard distance for measures, including probability measures. If \mu is a measure on a measurable space X, then we define the Jaccard coefficient by J_\mu(A,B) = {​{\mu(A \cap B)} \over {\mu(A \cup B)}}, and the Jaccard distance by d_\mu(A,B) = 1 - J_\mu(A,B) = {​{\mu(A \triangle B)} \over {\mu(A \cup B)}}. Care must be taken if \mu(A \cup B) = 0 or \infty, since these formulas are not well defined in that case.

转载于:https://www.cnblogs.com/baiting/p/4713940.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值