Hbase manual split

最新推荐文章于 2022-07-18 11:50:19 发布

macyang

最新推荐文章于 2022-07-18 11:50:19 发布

阅读量2.6k

点赞数

分类专栏： database/nosql 文章标签： hbase algorithm profiling debugging network application

本文链接：https://blog.csdn.net/macyang/article/details/6420231

版权

database/nosql 专栏收录该内容

102 篇文章 0 订阅

订阅专栏

Hbase 0.90.2提供了一个ReplitSplitter类(org.apache.hadoop.hbase.util.RegionSplitter)用于manual split region，下面主要讲述了如果关掉auto-split region的功能、为什么要使用manual split、manual split有什么好处和设置多少数量的pre-split regions合适等问题。

The RegionSplitter class provides several utilities to help in the administration lifecycle for developers who choose to manually split regions instead of having HBase handle that automatically. The most useful utilities are:

Create a table with a specified number of pre-split regions
Execute a rolling split of all regions on an existing table

Both operations can be safely done on a live server.

Question: How do I turn off automatic splitting?
Answer: Automatic splitting is determined by the configuration value "hbase.hregion.max.filesize". It is not recommended that you set this to Long.MAX_VALUE in case you forget about manual splits. A suggested setting is 100GB, which would result in > 1hr major compactions if reached.

Question: Why did the original authors decide to manually split?
Answer: Specific workload characteristics of our use case allowed us to benefit from a manual split system.

Data (~1k) that would grow instead of being replaced
Data growth was roughly uniform across all regions
OLTP workload. Data loss is a big deal.

Question: Why is manual splitting good for this workload?
Answer: Although automated splitting is not a bad option, there are benefits to manual splitting.

With growing amounts of data, splits will continually be needed. Since you always know exactly what regions you have, long-term debugging and profiling is much easier with manual splits. It is hard to trace the logs to understand region level problems if it keeps splitting and getting renamed.
Data offlining bugs + unknown number of split regions == oh crap! If an HLog or StoreFile was mistakenly unprocessed by HBase due to a weird bug and you notice it a day or so later, you can be assured that the regions specified in these files are the same as the current regions and you have less headaches trying to restore/replay your data.
You can finely tune your compaction algorithm. With roughly uniform data growth, it's easy to cause split / compaction storms as the regions all roughly hit the same data size at the same time. With manual splits, you can let staggered, time-based major compactions spread out your network IO load.

Question: What's the optimal number of pre-split regions to create?
Answer: Mileage will vary depending upon your application.

The short answer for our application is that we started with 10 pre-split regions / server and watched our data growth over time. It's better to err on the side of too little regions and rolling split later.

The more complicated answer is that this depends upon the largest storefile in your region. With a growing data size, this will get larger over time. You want the largest region to be just big enough that theStore compact selection algorithm only compacts it due to a timed major. If you don't, your cluster can be prone to compaction storms as the algorithm decides to run major compactions on a large series of regions all at once. Note that compaction storms are due to the uniform data growth, not the manual split decision.

If you pre-split your regions too thin, you can increase the major compaction interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your data size grows too large, use this script to perform a network IO safe rolling split of all regions.

文章来源：http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.html

macyang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hbase manual split

 Hbase 0.90.2提供了一个ReplitSplitter类(org.apache.hadoop.hbase.util.RegionSplitter)用于manual split region，下面主要讲述了如果关掉auto-split region的功能、为什么要使用manual split、manual split有什么好处和设置多少数量的pre-split regions合适等问题。 The RegionSplitter class p
复制链接

扫一扫

专栏目录