python增量更新数据_Python中的增量最近邻算法

最新推荐文章于 2022-08-16 12:42:30 发布

綾音Ayane

最新推荐文章于 2022-08-16 12:42:30 发布

阅读量294

点赞数

文章标签： python增量更新数据

本文链接：https://blog.csdn.net/weixin_32480007/article/details/112048246

版权

博客探讨了在Python中寻找增量式最近邻算法的挑战，指出标准KD树或KNN树的增量构建会导致不平衡，而重新平衡并不简单。建议采用批量处理的方法，如一次性插入大量点，然后在必要时重新构建数据结构，以保持效率。虽然增量更新的实现可能代码复杂，但通过批量重建可能更简洁且在某些情况下更快。该讨论主要针对高维空间，对于2D或3D情况可能不适用。

摘要由CSDN通过智能技术生成

Is anyone aware of a nearest neighbor algorithm implemented in Python that can be updated incrementally? All the ones I've found, such as this one, appear to be batch processes. Is it possible to implement an incremental NN algorithm?

解决方案

I think the problem with incremental construction of a KD-tree or KNN-tree is, as you've alluded to in a comment, that the tree will eventually become unbalanced and you can't do simple tree rotation to fix balance problems and keep consistency. At the minimum, the re-balancing task is not trivial and one would definitely not want to do it at each insertion. Often, one will choose to build a tree with a batch method, insert a bunch of new points and allow the tree to become unbalanced up to a point, and then re-balance it.

A very similar thing to do is to build the data structure in batch for M points, use it for M' points, and then re-build the data structure in batch with M+M' points. Since re-balancing is not normal, fast algorithm we are familiar with for trees, rebuilding is not necessarily slow in comparison and in some cases can be faster (depending on how the sequence of the points entering your incremental algorithm).

That being said, the amount of code you write, debugging difficulty, and the ease of others' understanding of your code can be significantly smaller if you take the rebuild approach. If you do so, you can use a batch method and keep an external list of points not yet inserted into the tree. A brute force approach can be used to ensure none of these is closer than the ones in the tree.

Some links to Python implementations/discussions are below, but I haven't found any that explicitly claim to be incremental. Good luck.

Note: My comments here apply to high-dimensional spaces. If you're working in 2D or 3D, what I've said may not be appropriate. (If you working in very high dimensional spaces, use brute force or approximate nearest neighbor.)