Jeff Dean的Learned Index为数据库索引带来了哪些启发2

本文探讨了Learned Index如何改进RM-Index、Hash索引和Bloom Filter索引,以降低空间占用并优化查询性能。尽管Learned Index在某些场景下可能牺牲查询速度,但它为数据库索引提供了新思路,尤其是在Read-Only的OLAP查询任务中。未来,Learned Index有望使数据库索引更智能和高效。
摘要由CSDN通过智能技术生成

本文继续讨论Recursive Model Index(RM-Index)索引更新涉及的相关问题,以及Learned Index对Hash索引以及Bloom Filter索引如何进行改造来降低索引占用空间。


RM-Index索引的更新

上篇文章中关于RM-Index的设计以及与B-Tree索引的对比测试结果,主要针对只读场景的内存型数据库系统,也可以应用于更新频率较低的数据仓库系统中,对于Bigtable而言,每一个SSTable都是当内存中的数据积累了一定量之后才生成的,也可应用RM-Index的思路来优化现有的B-Tree索引。

数据更新包括两种情形:

  • appends 在现有数据集合的尾端进行appends
  • inserts 在现有数据集合的中间进行inserts

通常,对于inserts场景,新的数据应该基本遵循已有的数据分布特点的,因此,原来的模型不需要重新进行训练。而对于appends场景

## A C++11 implementation of the B-Tree part of "The Case for Learned Index Structures" A research **proof of concept** that implements the B-Tree section of [The Case for Learned Index Structures](https://arxiv.org/pdf/1712.01208.pdf) paper in C++. The general design is to have a single lookup structure that you can parameterize with a KeyType and a ValueType, and an overflow list that keeps new inserts until you retrain. There is a value in the constructor of the RMI that triggers a retrain when the overflow array reaches a certain size. The basic API: ```c++ // [first/second]StageParams are network parameters int maxAllowedError = 256; int maxBufferBeforeRetrain = 10001; auto modelIndex = RecursiveModelIndex recursiveModelIndex(firstStageParams, secondStageParams, maxAllowedError, maxBufferBeforeRetrain); for (int ii = 0; ii < 10000; ++ii) { modelIndex.insert(ii, ii * 2); } // Since we still have one more insert before retraining, retrain before searching... modelIndex.train(); auto result = modelIndex.find(5); if (result) { std::cout << "Yay! We got: " << result.get().first << ", " << result.get().second << std::endl; } else { std::cout << "Value not found." << std::endl; // This shouldn't happen in the above usage... } ``` See [src/main.cpp](src/main.cpp) for a usage example where it stores scaled log normal data. ### Dependencies - [nn_cpp](https://github.com/bcaine/nn_cpp) - Eigen based minimalistic C++ Neural Network library - [cpp-btree](https://code.google.com/archive/p/cpp-btree/) - A fast C++ implementation of a B+ Tree ### TODO: - Lots of code cleanup - Profiling of where the slowdowns are. On small tests, the cpp_btree lib beats it by 10-100x - Eigen::TensorFixed in nn_cpp would definitel
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值