# Dice's coefficient

Dice's coefficient (also known as the Dice coefficient) is a similarity measure related to the Jaccard index.

For sets X and Y of keywords used in information retrieval, the coefficient may be defined as: $s = /frac{2 | X /cap Y |}{| X | + | Y |}$

When taken as a string similarity measure, the coefficient may be calculated for two strings, x and y using bigrams as follows: $s = /frac{2 n_{t}}{n_{x} + n_{y}}$

where nt is the number of character bigrams found in both strings, nx is the number of bigrams in string x and ny is the number of bigrams in string y. For example, to calculate the similarity between:

night
nacht

We would find the set of bigrams in each word:

{ ni, ig, gh, ht}
{ na, ac, ch, ht}

Each set has 4 elements, and the intersection of these two sets has only one element: ht.

Plugging this into the formula, we calculate, s = (2 * 1) / (4 + 4) = 0.25

08-08 1万+
02-19 5515
10-29 3万+
08-08 2765
06-25 1万+
09-16 1615
07-19 2万+
03-22 6010
03-17 3049

### “相关推荐”对你有帮助么？

•  非常没帮助
•  没帮助
•  一般
•  有帮助
•  非常有帮助  被折叠的  条评论 为什么被折叠? 到【灌水乐园】发言   点击重新获取   扫码支付 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。 余额充值