5。hbase高级部分:table design schema

study and summarie below

 

art 1:Table attributes

 

attrdefaultusage/principleuse casenote
 Bloom filterdisablecost some mem to impove lookup time TBD do huge range scan tablethis attr contains 'row','row-col',or none
Column families     a printable string since this will be used as the dir name under region-name
 Maximum file size 10G in 94.2   maxStoreSize in fact;i.e. property "hbase.hregion.max.filesize" set in hbase-site.xml
 Read-only false  like a firmware to keep safe .i.e. a 'dead' table that never changed 
 Memstore flush size 128m in 94.2 same effect with property in xml 'hbase.hregion.memstore.flush.size' 

 1.this value determine the frequency of generating store file

2.as 1,this effects the replay time of hlog when a rs down.

Deferred log flushfalseif false,use 'hbase.regionserver.optionallogflushinterval' to check period to sumit edits 

if true may cause data loss as these cached data are in memory before sync to fs

     
     

 

 

 

 

Part 2:Column Family attributes

attrdefaultusage/principleuse casenote
In-memoryfalsecache some blocks of a small family in mem to speed up queryanalogous to secondarny index table ,for small tablenot guanrantee to when or how much blocks being cached
 Bloom filter    see Part 1
 Replication scope 0(disable) sync local cluster data with remote ones TBD  for load balance by distribute req to clusters? 
 Maximum versions 3 control that how many versions(changes)are kept in storage 

 use 1 in general.if u want to check last verion only,given '2' is a good idea.

this will interact with 'Time-to-live'

 Compression  nonecompress this family if specified SNAPPY,LZO,GZ..  u must be clear completely what your requirements are then use corresponding one
Block size64ka store file is splited into certain blocks,so smaller block cause faster reading randomly;else use bigger if for sequential readings TBD   
Block cachetruewhen read some rows from hbase,this dertermine whehter to write back to cache to speed up last accessuse 'true' if clients used access to the much duplicted rows ;'false' if do a whole table scan or less readings than writes system 
Time-to-livemax.int(sec in unit)how along a cell value will be kept in storage

if this is a 'recycled' system(ie. rolling),use a appropriate value to keep data size

this will interact with 'Maximum versions',that is both attributes contorl the data verions overlying by this
     

 

 

 Ref:

hbase definitive book

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值