高效相似度计算 LSH minHash simHash的学习

joey 周琦

本文首先介绍了局部敏感哈希的概念与用处,然后介绍了常见的快速计算相似度、寻找近邻的方法minHash, simHash

局部敏感哈希Locality-sensitive hashing (LSH)

定义

首先我们看看wiki上比较准确的英文描述[1]。
An LSH family F is defined for a metric space M=(M,d) , a threshold R>0 and an approximation factor c>1. This family F is a family of functions h:MS which map elements from the metric space to a bucket sS . The LSH family satisfies the following conditions for any two points p,qM , using a function hF which is chosen uniformly at random:

  • if d(p,q) R, then h(p)=h(q) (i.e.,p and q collide) with probability at least P1 ,
  • if d(p,q) cR, then h(p)=h(q) with probability at most P2 .
    A family is interesting when P1>P2 . Such a family F is called (R,cR,P1,P2) -sensitive.

wiki关于metric space给出的定义

In mathematics, a metric space is a set for which distances between all members of the set are defined. Those distances, taken together, are called a metric on the set.

根据上述信息,我中文理解下。 局部敏感哈希(Locality-sensitive hashing)是为了度量空间 M=(M,d) (metric space)定义的函数族,将度量空间的不同元素映射到相应的桶(bucket)中。满足以下性质:

  • if d(p,q)
  • 3
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值