CDN笔记二 Locality Sensitive Hashing算法 续

这篇博客介绍了局部敏感哈希(LSH)在CDN场景下的应用,主要讨论了近似最近邻(ANN)搜索问题。LSH通过创建一种概率模型,使得相近的点被分配到同一个桶中的概率较高,从而简化高维空间中的数据比较。文章举例说明了LSH的工作原理,并提到了Min Hash和Sim Hash等变种,用于快速估算集合相似性。
摘要由CSDN通过智能技术生成

This is part of my general survey on LSH in CDN class.

NN search
Given a set P of n points, design an algorithm that, given any point q, returns a point in P that is closest to q (its “nearest neighbor” in P).

ANN search
An approximate nearest neighbor search problem is to find the point, whose distance from the query is at most c times the distance from the query to its nearest points. Its formal definition is denoted as followed:
Given a set P and a point q, find a point pi in P so that
d ( p i , q ) ≤ c × m i n p j d ( p j , q ) {d(p_i, q) \leq c \times min_{p_j}d(p_j, q)} d(pi,q)c×minpjd(pj,q)

The appealing point of this approach is that it is almost as good as the exact one in many cases. In particular, if the distance measure accurately captures the notion of user quality, then small differences in the distance should not matter.

General approaches to ANN problem are mainly two kinds, tree-based approach and LSH.

LSH
LSH refers to a family of functions (LSH families) to hash data points into buckets so that data points near each other are located in the same buckets with high probability, while data points far from each other are likely to be in different buckets. This makes it easier to identify observations in high dimensional space. The formal definition is:
A family of hashing functions is ( r , c r , p 1 , p 2 ) − L S H {(r,cr,p_1,p_2)-LSH} (r,cr,p

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值
>