Jaccard Similarity
Basic Concept
A statistic used for measuring the simularity and diversity sample sets [ 2 ] ^{[2]} [2]
The Jaccard coefficient measures the similarity of between the finite sample sets, and is defined as the intersection divided by the size of union of the sample sets.
J ( A , B ) = ∣ A ∩ B ∣ ∣ A ∪ B ∣ = ∣ ∣ A ∩ B ∣ ∣ A ∣ + ∣ B ∣ − ∣ A ∩ B ∣ ∣ J(A,B)=\frac{|A\cap B|}{|A\cup B|}=|\frac{|A\cap B|}{|A|+|B|-|A\cap B|}| J(A,B)=∣A∪B∣∣A∩B∣=∣∣A∣+∣B∣−∣A∩B∣∣A∩B∣∣
tips:
-
If A and B are both empty, define J ( A , B ) = 1 J(A,B)=1 J(A,B)=1
-
0 ≤ J ( A , B ) ≤ 1 0\le J(A,B) \le1 0≤J(A,B)≤1
The Jaccard distance
The scale that measure dissimiarity between sample sets, is complementary(互补的) to the Jaccard coefficient
d J ( A , B ) = 1 − J ( A , B ) = ∣ A ∪ B ∣ − ∣ A ∩ B ∣ ∣ A ∪ B ∣ d_{J}(A,B)=1-J(A,B)=\frac{|A\cup B|-|A\cap B|}{|A\cup B|} dJ(A,B)=1−J(A,B)=∣A∪B∣∣A∪B∣−∣A∩B∣
Alternative interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference
A Δ B = ( A ∪ B ) − ( A ∩ B ) A\Delta B=(A\cup B)-(A\cap B) AΔB=(A∪B)−