Allow regions of specific table to be load-balanced

Description:

From our experience, cluster can be well balanced and yet, one table's regions may be badly concentrated on few region servers.
For example, one table has 839 regions (380 regions at time of table creation) out of which 202 are on one server.

It would be desirable for load balancer to distribute regions for specified tables evenly across the cluster. Each of such tables has number of regions many times the cluster size.

 

Jonathan Gray 给出的第一个comments是:

 

On cluster startup in 0.90, regions are assigned in one of two ways. By default, it will attempt to retain the previous assignment of the cluster. The other option which I've also used is round-robin. This will evenly distribute each table.

That plus the change to do round-robin on table create should probably cover per-table distribution fairly well.

I think the next step in the load balancer is a major effort to switch to something with more of a cost-based approach. I think ideally you don't need even distribution of each table, you want even distribution of load. If one hot table, it will get evenly balanced anyways.

One thing we could do is get rid of all random assignments and always try to do some kind of quick load balance or round-robin. It does seem like randomness always leads to one guy who gets an unfair share.

Matt Corgan 提出可以用一致性哈希解决:

Have you guys considered using a consistent hashing method to choose which server a region belongs to? You would create ~50 buckets for each server by hashing serverName_port_bucketNum, and then hash the start key of each region into the buckets.

There are a few benefits:

  • when you add a server it takes an equal load from all existing servers
  • if you remove a server it distributes its regions equally to the remaining servers
  • adding a server does not cause all regions to shuffle like round robin assignment would
  • assignment is nearly random, but repeatable, so no hot spots
  • when a region splits the front half will stay on the same server, but the back half will usually be sent to another server

And a few drawbacks:

  • each server wouldn't end up with exactly the same number of regions, but they would be close
  • if a hot spot does end up developing, you can't do anything about it, at least not unless it supported a list of manual overrides
Jonathan Gray 给出了不能使用一致性哈希的原因:

I think consistent hashing would be a major step backwards for us and unnecessary because there is no cost of moving bits around in HBase. The primary benefit of consistent hashing is that it reduces the amount of data you have to physically move around. Because of our use of HDFS, we never have to move physical data around.

In your benefit list, we are already implementing almost all of these features, or if not, it is possible in the current architecture. In addition, our architecture is extremely flexible and we can do all kinds of interesting load balancing techniques related to actual load profiles not just #s of shards/buckets as we do today or as would be done with consistent hashing.

 

The fact that split regions open back up on the same server is actually an optimization in many cases because it reduces the amount of time the regions are offline and when they come back online and do a compaction to drop references, all the files are more likely to be on the local DataNode rather than remote. In some cases, like time-series, you may want the splits to move to different servers. I could imagine some configurable logic in there to ensure the bottom half goes to a different server (or maybe the top half would actually be more efficient to move away since most the time you'll write more to the bottom half and thus want the data locality / quick turnaround). There's likely going to be a bit of split rework in 0.92 to make it more like the ZK-based regions-in-transition.

As far as binding regions to servers between cluster restarts, this is already implemented and on by default in 0.90.

Consistent hashing also requires a fixed keyspace (right?) and that's a mismatch for HBase's flexibility in this regard.

 

 

更多讨论参考:https://issues.apache.org/jira/browse/HBASE-3373

 

HBase中Region分配问题的探讨: http://www.spnguru.com/?p=246

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值