python增量更新数据_Python中的增量最近邻算法

博客探讨了在Python中寻找增量式最近邻算法的挑战,指出标准KD树或KNN树的增量构建会导致不平衡,而重新平衡并不简单。建议采用批量处理的方法,如一次性插入大量点,然后在必要时重新构建数据结构,以保持效率。虽然增量更新的实现可能代码复杂,但通过批量重建可能更简洁且在某些情况下更快。该讨论主要针对高维空间,对于2D或3D情况可能不适用。
摘要由CSDN通过智能技术生成

Is anyone aware of a nearest neighbor algorithm implemented in Python that can be updated incrementally? All the ones I've found, such as this one, appear to be batch processes. Is it possible to implement an incremental NN algorithm?

解决方案

I think the problem with incremental construction of a KD-tree or KNN-tree is, as you've alluded to in a comment, that the tree will eventually become unbalanced and you can't do simple tree rotation to fix balance problems and keep consistency. At the minimum, the re-balancing task is not trivial and one would definitely not want to do it at each insertion. Often, one will choose to build a tree with a batch method, insert a bunch of new points and allow the tree to become unbalanced up to a point, and then re-balance it.

A very similar thing to do is to build the data structure in batch for M points, use it for M' points, and then re-build the data structure in batch with M+M' points. Since re-balancing is not normal, fast algorithm we are familiar with for trees, rebuilding is not necessarily slow in comparison and in some cases can be faster (depending on how the sequence of the points entering your incremental algorithm).

That being said, the amount of code you write, debugging difficulty, and the ease of others' understanding of your code can be significantly smaller if you take the rebuild approach. If you do so, you can use a batch method and keep an external list of points not yet inserted into the tree. A brute force approach can be used to ensure none of these is closer than the ones in the tree.

Some links to Python implementations/discussions are below, but I haven't found any that explicitly claim to be incremental. Good luck.

Note: My comments here apply to high-dimensional spaces. If you're working in 2D or 3D, what I've said may not be appropriate. (If you working in very high dimensional spaces, use brute force or approximate nearest neighbor.)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值