Jeff Dean的Learned Index为数据库索引带来了哪些启发1

Learned Index是一种利用机器学习优化传统数据库索引的技术,通过掌握数据分布特性,改进B-Tree Index,实现更快的查询速度和更高的存储效率。在测试中,相对于B-Tree,Learned Index的RM-Index模型性能提升60%~70%,存储空间节省高达99%。这种递归模型考虑了全局数据分布,并通过有限步迭代定位数据,降低了错误率。
摘要由CSDN通过智能技术生成

这篇论文在两个月前刚被公布出来的时候,因为带着Jeff Dean的署名曾一度被热传,但直到今天才认真读完这篇论文。Learned Index基于机器学习的方法,对传统数据库索引做了改造。本文先介绍Learned Index的RM-Index模型以及与B-Tree索引的对比。


如论文开篇所言,可以将传统的数据库索引(Index)视为一种模型(Model):

  • B-Tree索引B-Tree索引模型将一个Key映射到一个排序数组中的某位置所对应的记录
  • Hash索引Hash索引模型将一个Key映射到一个无序数组中的某位置所对应的记录
  • Bitmap索引Bitmap索引模型用来判断一条记录是否存在
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
## A C++11 implementation of the B-Tree part of "The Case for Learned Index Structures" A research **proof of concept** that implements the B-Tree section of [The Case for Learned Index Structures](https://arxiv.org/pdf/1712.01208.pdf) paper in C++. The general design is to have a single lookup structure that you can parameterize with a KeyType and a ValueType, and an overflow list that keeps new inserts until you retrain. There is a value in the constructor of the RMI that triggers a retrain when the overflow array reaches a certain size. The basic API: ```c++ // [first/second]StageParams are network parameters int maxAllowedError = 256; int maxBufferBeforeRetrain = 10001; auto modelIndex = RecursiveModelIndex recursiveModelIndex(firstStageParams, secondStageParams, maxAllowedError, maxBufferBeforeRetrain); for (int ii = 0; ii < 10000; ++ii) { modelIndex.insert(ii, ii * 2); } // Since we still have one more insert before retraining, retrain before searching... modelIndex.train(); auto result = modelIndex.find(5); if (result) { std::cout << "Yay! We got: " << result.get().first << ", " << result.get().second << std::endl; } else { std::cout << "Value not found." << std::endl; // This shouldn't happen in the above usage... } ``` See [src/main.cpp](src/main.cpp) for a usage example where it stores scaled log normal data. ### Dependencies - [nn_cpp](https://github.com/bcaine/nn_cpp) - Eigen based minimalistic C++ Neural Network library - [cpp-btree](https://code.google.com/archive/p/cpp-btree/) - A fast C++ implementation of a B+ Tree ### TODO: - Lots of code cleanup - Profiling of where the slowdowns are. On small tests, the cpp_btree lib beats it by 10-100x - Eigen::TensorFixed in nn_cpp would definitel
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值