HBase RowKey设计的那些事

最新推荐文章于 2022-12-28 19:18:48 发布

柳家河

最新推荐文章于 2022-12-28 19:18:48 发布

阅读量558

点赞数

分类专栏： Hbase 文章标签： hbase

Hbase 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

在说rowkey设计之前，先回答一下大家配置HBase时可能有的疑问，关于HBase是否需要单独的ZooKeeper托管？嗯，如果只是部署HBase，我建议不要用单独的ZooKeeper进行托管，用HBase自带的ZooKeeper就可以，假如要部署其他应用，比如Spark等可以单独部署一个ZooKeeper集群。好，废话不多说了，下面说说RowKey设计的事。

先谈HBase底层架构

对于新手来说，RowKey的设计是比较陌生的一件事，看上去很简单的东西，其实非常复杂，RowKey的设计基本上可以划分成两大影响，分别是分析维度、查询性能。为什么要这样分呢？我们再回头看看HBase系统架构图：

这种设计看上去并没有什么问题，但是这种设计隐藏了非常多陷阱，假如CompanyCode字段非常固定，而TimeStamp变化比较大的话，会造成单个Region连续地存储这些数据，数据量非常大的时候，这个Region会集中了这些数据，当有应用需要访问这些数据时，造成了RPC timeout，甚至应用程序直接报错，无法执行。

合理的RowKey设计方法

基于上面的原因，我们需要考虑单点集中以及数据查询两方面的因素，因此，在RowKey上我们要针对这两个问题进行方案设计。

首先是单点集中问题，我们出现这样单点集中的原因大概有以下几种：

l RowKey前面的字符过于固定

l 集群结点数量过少

集群结点数量是由我们自身硬件资源限制的，这个我们不考虑在内，我们主要考虑RowKey设计。既然是因为前面字符过于集中，那么我们可以通过在RowKey前面添加随机的一个字符串，下面是引自《HBase Essential》里面的一个随机字符计算方法：

int saltNumber = new Long(new Long(timestamp).hashCode()) %<number of region servers>

用这种方法，我们在插入数据的时候可以人为地随机把一断时间内的数据打散，分布到各个RegionServer下的Region中，充分利用分布式的优势，这样做不紧可以加快数据的读写访问，也解决了数据集中的问题。

改良后的RowKey设计方案

通过上面的技术研讨，可以制定出以下的RowKey设计方案了：

随机字符(2位) + 时间位（14位）+ CompanyCode（4位）

我在实际测试过程中，前后两种方案对比，前者的MR程序跑了1个小时，后者只花了5分钟。

合理地编写查询代码

我们完成数据存储之后，假如要取出某部分数值，需要设置Scan查询，以下是我在实战中用到的部分代码，仅供参考：

 
         public 
          class 
          HBaseTableDriver  
         extends 
          Configured  
         implements 
          Tool { 
        
         public 
          int 
          run(String[] arg0)  
         throws 
          Exception { 
        
         if 
         (arg0.length <  
         4 
          || arg0.length >  
         5 
         ) 
        
         throw 
          new 
          IllegalArgumentException( 
         "The input argument need:start && stop && farmid && turbineNum && calid" 
         ); 
        
         if 
         (arg0[ 
         0 
         ].length() !=  
         8 
          || arg0[ 
         1 
         ].length() !=  
         8 
         ) 
        
         throw 
          new 
          IllegalArgumentException( 
         "The date format should be yyyyMMdd" 
         ); 
        
         Configuration conf = HBaseConfiguration.create(); 
        
         conf.set( 
         "hbase.zookeeper.quorum" 
         , ConstantValues.QUOREM); 
        
         conf.set( 
         "hbase.zookeeper.property.clientPort" 
         , ConstantValues.CLIENT_PORT); 
        
         //extract table && tagid && start time && end time 
        
         conf.set( 
         "start" 
         , arg0[ 
         0 
         ]); 
        
         conf.set( 
         "stop" 
         , arg0[ 
         1 
         ]); 
        
         conf.set( 
         "farmid" 
         , arg0[ 
         2 
         ]); 
        
         conf.set( 
         "turbineNum" 
         , arg0[ 
         3 
         ]); 
        
         conf.set( 
         "calid" 
         , arg0[ 
         4 
         ]); 
        
         String startRow =  
         "0" 
          + arg0[ 
         0 
         ] +  
         " 000000" 
          + arg0[ 
         2 
         ] +  
         "001" 
         ; 
        
         String stopRow =  
         "2" 
          + arg0[ 
         1 
         ] +  
         " 235959" 
          + arg0[ 
         2 
         ] + RowKeyGenerator.addZero(Integer.parseInt(arg0[ 
         3 
         ])); 
        
         String targetKpiTableName =  
         "kpi2" 
         ; 
        
         Job job = Job.getInstance(conf,  
         "KPIExtractor" 
         ); 
        
         job.setJarByClass(KPIExtractor. 
         class 
         ); 
        
         job.setNumReduceTasks( 
         6 
         ); 
        
         Scan scan =  
         new 
          Scan(); 
        
         scan.addColumn( 
         "f" 
         .getBytes(),  
         "v" 
         .getBytes()); 
        
         String regEx =  
         "^\\d{1}(?:" 
          + arg0[ 
         0 
         ].substring( 
         0 
         ,  
         4 
         ) +  
         "|" 
          + arg0[ 
         1 
         ].substring( 
         0 
         ,  
         4 
         ) +  
         ")\\d{17}" 
         ; 
        
         switch 
         (arg0[ 
         4 
         ]){ 
        
         case 
          "1" 
         : 
        
         regEx = regEx +  
         "(?:823|834)$" 
         ; 
        
         startRow = startRow +  
         "823" 
         ; 
        
         stopRow = stopRow +  
         "834" 
         ; 
        
         break 
         ; 
        
         case 
          "2" 
         : 
        
         regEx = regEx +  
         "211$" 
         ; 
        
         startRow = startRow +  
         "211" 
         ; 
        
         stopRow = stopRow +  
         "211" 
         ; 
        
         break 
         ; 
        
         case 
          "3" 
         : 
        
         regEx = regEx +  
         "544$" 
         ; 
        
         startRow = startRow +  
         "544" 
         ; 
        
         stopRow = stopRow +  
         "544" 
         ; 
        
         break 
         ; 
        
         case 
          "4" 
         : 
        
         regEx = regEx +  
         "208$" 
         ; 
        
         startRow = startRow +  
         "208" 
         ; 
        
         stopRow = stopRow +  
         "208" 
         ; 
        
         break 
         ; 
        
         case 
          "5" 
         : 
        
         regEx = regEx +  
         "(?:739|823)$" 
         ; 
        
         startRow = startRow +  
         "739" 
         ; 
        
         stopRow = stopRow +  
         "823" 
         ; 
        
         break 
         ; 
        
         case 
          "6" 
         : 
        
         regEx = regEx +  
         "(?:211|823)$" 
         ; 
        
         startRow = startRow +  
         "211" 
         ; 
        
         stopRow = stopRow +  
         "823" 
         ; 
        
         break 
         ; 
        
         case 
          "7" 
         : 
        
         regEx = regEx +  
         "708$" 
         ; 
        
         startRow = startRow +  
         "708" 
         ; 
        
         stopRow = stopRow +  
         "708" 
         ; 
        
         break 
         ; 
        
         case 
          "8" 
         : 
        
         regEx = regEx +  
         "822$" 
         ; 
        
         startRow = startRow +  
         "822" 
         ; 
        
         stopRow = stopRow +  
         "822" 
         ; 
        
         break 
         ; 
        
         case 
          "9" 
         : 
        
         regEx = regEx +  
         "211$" 
         ; 
        
         startRow = startRow +  
         "211" 
         ; 
        
         stopRow = stopRow +  
         "211" 
         ; 
        
         break 
         ; 
        
         default 
         : 
        
         throw 
          new 
          IllegalArgumentException( 
         "UnKnown Argument calid:" 
         +arg0[ 
         4 
         ]+ 
         ",it should be between 1~9" 
         ); 
        
         } 
        
         scan.setStartRow(startRow.getBytes()); 
        
         scan.setStopRow(stopRow.getBytes()); 
        
         scan.setFilter( 
         new 
          RowFilter(CompareOp.EQUAL,  
         new 
          RegexStringComparator(regEx))); 
        
         TableMapReduceUtil.initTableMapperJob( 
         "hellowrold" 
         , scan , KPIMapper. 
         class 
         , ImmutableBytesWritable. 
         class 
         , ImmutableBytesWritable. 
         class 
         , job); 
        
         TableMapReduceUtil.initTableReducerJob(targetKpiTableName, KPIReducer. 
         class 
         , job); 
        
         job.waitForCompletion( 
         true 
         ); 
        
         return 
          0 
         ; 
        
         } 
        
         }