mysql 汉明距离检索_sql – 在数据库中进行汉明距离/相似性搜索

THE HAMMING DISTANCE PROBLEM

Definition: Given a collection of f -bit fingerprints and a

query fingerprint F, identify whether an existing fingerprint

differs from F in at most k bits. (In the batch-mode version

of the above problem, we have a set of query fingerprints

instead of a single query fingerprint)

[…]

Intuition: Consider a sorted table of 2 d f -bit truly random fingerprints. Focus on just the most significant d bits

in the table. A listing of these d-bit numbers amounts to

“almost a counter” in the sense that (a) quite a few 2 d bit-

combinations exist, and (b) very few d-bit combinations are

duplicated. On the other hand, the least significant f − d

bits are “almost random”.

Now choose d such that |d − d| is a small integer. Since

the table is sorted, a single probe suffices to identify all fingerprints which match F in d most significant bit-positions.

Since |d − d| is small, the number of such matches is also

expected to be small. For each matching fingerprint, we can

easily figure out if it differs from F in at most k bit-positions

or not (these differences would naturally be restricted to the

f − d least-significant bit-positions).

The procedure described above helps us locate an existing

fingerprint that differs from F in k bit-positions, all of which

are restricted to be among the least significant f − d bits of

F. This takes care of a fair number of cases. To cover all

the cases, it suffices to build a small number of additional

sorted tables, as formally outlined in the next Section.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值