Stabilizing a Large HBase Cluster

Running a large HBase cluster smoothly with minimum downtime is a skill which requires a deep understanding of how HBase works. When a disaster strikes, you find yourself digging into HBase code and/or mailing lists to understand what went wrong, determine how to recover from the current mess and most importantly figure out what can be done to prevent the same thing from happening again.  Apart from the inconvenience downtime, a service crash can also lead to inconsistencies in HBase meta tables.

While some crashes are environment specific, especially in large clusters with heavy load, there are few knobs that can be tuned to help minimize the frequency of failures. This article discusses some of the common scenarios that could result in service crashes and inconsistencies, as well as possible tweaks to HBase to avoid them.

Frequent unassigned regions
When an HBase cluster starts, the active Master begins region assignment for both meta and user tables present in the cluster. Since the ‘.META.’ table contains the assignment information of every region of each user table in the cluster, it must be assigned and brought online before any other user table.

In HBase versions prior to 0.92.0, especially in a large size cluster with thousands of regions, it is possible that region assignments for user table begin before the assignment of ‘.META.’ has completed. All such assignments are bound to fail with ‘NotAllMetaRegionsOnlineException’ and they remain offline for 30 minutes (default) before the master takes notice of such regions and attempts reassignment.

Unfortunately, no full-proof solution is available to prevent this from happening in HBase versions prior to 0.92.0. In version 0.92.0 onwards, the HBase Master ensures that both ‘-ROOT-’ and ‘.META.’ tables are assigned before it attempts to assign any user table region. So if you are running HBase version 0.90.6 or earlier and have been seeing assignment related issues frequently, it may be time to upgrade.

The problem of a runaway Region Server
A runaway Region Server can happen in any cluster running with a default configuration, but it becomes a real problem in a large cluster.

On startup, the HBase Master waits for ‘n’ number of Region Servers to register with it during ‘t’ period of time before creating a region assignment plan. So even if your cluster has a large number of region servers, the initial region assignments will be distributed among these registered servers alone. The default value of ‘n’ and ‘t’ is 1 and 4.5 seconds, respectively. So if by chance if one Region Server starts fast enough to beat other servers by 4.5 seconds, all the regions will be assigned to it alone. You can read more about it in HBASE-6389 and HBASE-6375.

The effect of such bulk assignment to a single server could be disastrous if you have tens of thousands of regions in the zone with a Region Server crashing with an ‘Out of Memory’ error during this assignment phase leading to inconsistency in the ‘.META.’ table. This would be more pronounced in the newer releases (0.92.0 onwards) as MSLAB is enabled by default, which places additional memory (~2MB) requirement per region.

Fortunately, this can be easily avoided with a simple configuration. The value of ‘n’ is controlled by the parameter'hbase.master.wait.on.regionservers.mintostart' which needs to be set on each node running HBase Master in 'hbase-site.xml'. Set this to a value close to the number of region servers sufficient to handle the startup assignment load.

Long Java garbage collection pause
Like any other Java server process running with gigabytes of memory, HBase Region Servers are prone to long garbage collection pauses especially when under heavy write load. This can cause increased latency or timeouts for client requests. A more severe effect of this is that during this pause, the Region Server stops sending heartbeat signals to ZooKeeper which, after a certain period of time, makes the Master believe that the Region Server has crashed and then initiates the recovery process which involves reassignment of the region currently assigned to the affected Region Server.

This can be largely avoided by enabling MemStore Local Allocation Buffers (MSLAB). This is already enabled by default in version 0.92.0 onward. However if you start getting the ‘Out of Memory’ error frequently, you may want to bring down the total number of regions in the cluster to a more appropriate number.

Learning to minimize frequent unassigned regions, runaway Region Servers and long Java garbage collection pauses could reduce downtime significantly and add stability to your large HBase cluster.

Ref: http://www.mapr.com/blog/stabilizing-a-large-hbase-cluster

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值