浅谈时序数据库内核：如何用单机扛住亿级数据写入

在酒吧写代码

于 2021-10-21 13:18:44 发布

阅读量438

点赞数

文章标签：时序数据库数据库 database

本文链接：https://blog.csdn.net/java_xiaoo/article/details/120884546

版权

本文探讨时序数据库面临的挑战，如Prometheus和InfluxDB的数据存储问题，包括LSM Tree和BoltDB的优缺点。提出了解决方案，如采用LSM-Tree的变种Time Structured Merge Tree，利用WAL优化写入性能，以及通过数据保留策略和再采样来节省存储空间。

摘要由CSDN通过智能技术生成

版本	日期	备注
1.0	2021.10.19	文章首发

0. 背景

标题来源于InfluxDB对于它们的存储引擎诞生的背景介绍：

The workload of time series data is quite different from normal database workloads. There are a number of factors that conspire to make it very difficult to get it to scale and perform well:
- Billions of individual data points
- High write throughput
- High read throughput
- Large deletes to free up disk space
- Mostly an insert/append workload, very few updates

The first and most obvious problem is one of scale. In DevOps, for instance, you can collect hundreds of millions or billions of unique data points every day.

To prove out the numbers, let’s say we have 200 VMs or servers running, with each server collecting an average of 100 measurements every 10 seconds. Given there are 86,400 seconds in a day, a single measurement will generate 8,640 points in a day, per server