App Engine datastore tip: monotonically increasing values are bad

最新推荐文章于 2019-07-11 18:56:06 发布

macyang

最新推荐文章于 2019-07-11 18:56:06 发布

阅读量1.6k

点赞数

分类专栏： database/nosql 文章标签： nosql数据库 application dictionary random training insert

本文链接：https://blog.csdn.net/macyang/article/details/6420329

版权

database/nosql 专栏收录该内容

102 篇文章 0 订阅

订阅专栏

题目的关键词是monotonically increasing values are bad，这个在我知道到nosql数据库中的hbase/mongodb都会存在这个问题，所以如果处理单调递增型的row-key很关键，另外作者Ikai Lan画的图很有意思，超赞啊！

原文地址：http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/

When saving entities to App Engine’s datastore at a high write rate, avoid monotonically increasing values such as timestamps. Generally speaking, you don’t have to worry about this sort of thing until your application hits 100s of queries per second. Once you’re in that ballpark, you may want to examine potential hotspots in your application that can increase datastore latency.

To explain why this is, let’s examine what happens to the underlying Bigtable of an application with a high write rate. When a Bigtable tablet, a contiguous unit of storage, experiences a high write rate, the tablet will have to “split” into more than one tablet. This “split” allows new writes to shard. Here’s a visual approximation of what happens:

There’s a moment of pain – this is one of the causes of datastore timeouts in high write applications, as discussed in Nick Johnson‘s article, “Handling Datastore Errors“.

Remember that for indexed values, we must write corresponding index rows. When values are randomly or even semi-randomly distributed, like, say, user email addresses, tablet splits function well. This is because the work to write multiple values is distributed amongst several Bigtable tablets:

The problems appear when we start saving monotonically increasing values like timestamps, or insert dictionary words in alphabetical order:

The new writes aren’t evenly distributed, and whichever tablet they end up going to end up becoming a new hot tablet in need of a split.

As a developer, what can you do to avoid this situation?

Avoid indexes unless you need to query against the values. No index = no hot tablet on increasing value
Lower your write rate, or figure out how to better distribute values. A pure random distribution is best, but even a distribution that isn’t random will be better than a predictable, monotonically increasing value
Prefix a shard identifier to your value. This is problematic if you plan on doing queries, as you will need to prefix and unprefix the values, then join the results in memory – but it will reduce the error rate of your writes

The tips are applicable whether you are on Master-Slave or High Replication datastore. And one more tip: don’t prematurely optimize for this case, since chances are, you won’t run into it. You can be spending that time working on features.

- Ikai

P.S. Yes, I drew those doodles. No, I do not have any formal art training (how could you tell?!)

macyang

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
App Engine datastore tip: monotonically increasing values are bad

<br />题目的关键词是monotonically increasing values are bad，这个在我知道到nosql数据库中的hbase/mongodb都会存在这个问题，所以如果处理单调递增型的row-key很关键，另外作者Ikai Lan画的图很有意思，超赞啊！<br /> <br />原文地址：http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/<br
复制链接

扫一扫

专栏目录