Trafodion 最多支持多少表

不知道大家在各自的数据库里面都创建过多少的表,反正据我之前的经历,最多也就几百张吧,不过最近听到有人在Oracle环境中有上万张表,我就惊讶了,上万张表,那得是有多少的业务量啊?

不过话说回来,最近就遇到这样的一个问题,人家Oracle存储上万张表没问题,那Trafodion如何?

关于这个问题,我就来说道说道,

首先,Trafodion是基于HBase,也就是说,每个Trafodion表实质上就是一个HBase表,当然了,Trafodion中的索引实质也是一个HBase表。因此,能支持多少表要取决于HBase的能力。

如果说限制,HBase本身并没有一个硬指标说明最大支持多少表,关于这个问题,我们无法从官方的文档得到任何佐证可以说明HBase支持的表的上限。因此,从原理上来说,只要Hadoop集群有足够的资源,足够的节点,HBase能支持的表的数量是没有上限的。

不过,虽然说HBase没有明确说明对表的数量有上限,但HBase对每台regionserver中的region个数倒有一些推荐。以下是CDH网站上关于HBase中region个数的说明,

*In general, HBase is designed to run with a small (20-200) number of relatively large (5-20Gb) regions per server. The considerations for this are as follows:
9.7.1.1. Why cannot I have too many regions?
Typically you want to keep your region count low on HBase for numerous reasons. Usually right around 100 regions per RegionServer has yielded the best results. Here are some of the reasons below for keeping region count low:
1. MSLAB requires 2mb per memstore (that’s 2mb per family per region). 1000 regions that have 2 families each is 3.9GB of heap used, and it’s not even storing data yet. NB: the 2MB value is configurable.
2. If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny flushes when you have too many regions which in turn generates compactions. Rewriting the same data tens of times is the last thing you want. An example is filling 1000 regions (with one family) equally and let’s consider a lower bound for global memstore usage of 5GB (the region server would have a big heap). Once it reaches 5GB it will force flush the biggest region, at that point they should almost all have about 5MB of data so it would flush that amount. 5MB inserted later, it would flush another region that will now have a bit over 5MB of data, and so on. This is currently the main limiting factor for the number of regions; see Section 15.9.2.1, “Number of regions per RS - upper bound” for detailed formula.
3. The master as is is allergic to tons of regions, and will take a lot of time assigning them and moving them around in batches. The reason is that it’s heavy on ZK usage, and it’s not very async at the moment (could really be improved – and has been imporoved a bunch in 0.96 hbase).
4. In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions on a few RS can cause the store file index to rise, increasing heap usage and potentially creating memory pressure or OOME on the RSs
Another issue is the effect of the number of regions on mapreduce jobs; it is typical to have one mapper per HBase region. Thus, hosting only 5 regions per RS may not be enough to get sufficient number of tasks for a mapreduce job, while 1000 regions will generate far too many tasks.*

上面这段话摘自:http://archive-primary.cloudera.com/cdh5/cdh/5/hbase-0.98.1-cdh5.1.5/book/regions.arch.html#too_many_regions
意思很明确,就是说HBase认为,在单台regionserver上的region个数在100个左右时,性能最好,如果region个数太好,可能会引发一些性能问题。

当然,这只是HBase官方推荐的一个,据我们在自己环境的试验,每个regionserver创建上千个region问题也不大,至少HBase功能没问题,但性能就不好说了。

Trafodion建表语句中有一个语法叫”salt using n partitions”,即建表时对表进行预分区,”n partitions”即n个regions,从性能优化的角度来看,对于大表(如超过千万的表),对表进行适当的分区,这样可以达到在对表进行扫描或计算的时候用多个ESP并发来实现性能提升,对于小表(如小于百万的表)就没太大必要划分多个分区了。

所以这样看来,假设Hadoop集群环境是一个10个regionserver的环境,那么根据HBase的推荐配置,总region的个数在10*100即1000的时候性能最佳,如果每个表都是单分区表,那么1000个表时性能较好,如果一个表有多个分区,那么表的数据量要相应的减少。

关于Trafodion建表设置多个分区数,可以参考下面这段,

HBase and therefore EsgynDB tables will split automatically as a table grows in size. However, it is useful to create a table with an appropriate number of partitions ahead of time as this can prevent splits form happening during busy periods. Also we can get consistent behavior for query plans if the number of partitions stays the same. The number of partitions a table will eventually have can be inferred by looking at the maximum size of a region (10 GB is default, but can be increased up to 100 GB) and final maximum size a table is expected to reach. We can create the table with this many SALT partitions during initial create. Typically scan on a table cannot be parallelized any more than the number of regions it has. So for large tables create the table with sufficient number of partitions. The highest number we use currently are about 8 partitions for one table, for a single region server. For example if a cluster has 6 region servers, we may create table with 8*6=48 regions to improve scan performance. This has to be balanced with HBase not performing well if there are more than a few hundred regions in the cluster.

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数据源的港湾

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值