HBase业务实践

最新推荐文章于 2020-12-19 18:50:37 发布

驰驰的老爸

最新推荐文章于 2020-12-19 18:50:37 发布

阅读量1.7k

点赞数

分类专栏： hbase

hbase 专栏收录该内容

59 篇文章 0 订阅

订阅专栏

适合读者

2012年因为业务需求，我们的底层数据库从Mysql迁移到HBase上面，正好也亲身经历了HBase-Client从0.92到0.94变化。我们总结了一些业务上面使用HBase的办法，希望本文能够对业务上面刚刚使用HBase的人一些帮助，降低入门门槛。

准备工作

HBase Toturial，需要对HBase有一定的了解
Mysql 基础，需要对Mysql有一定的了解
Java 基础，需要对Java有一点的了解

为什么需要HBase

优势：

再见了，分库分表。再见了，TDDL
更高性能的读和写。

不足：

没有SQL
没有iBtatis、Hibernate这些ORM工具，HBase的ORM目前还不成熟
HBase的RowKey的设计要求比较高
需要自己构建索引表

HBase的使用

构建单例的HBaseFactory

构建单例的HBaseFactory，我们只需要关心三个事情

hbase.zookeeper.quorum
zookeeper.znode.parent
HTablePool的maxSize

我们使用的是HTablePool构建一个HBaseFactory对象

*为什么使用HTablePool

HTablePool您可以看成JDBC的连接池，适合多线程使用环境，如果需要把连接“还”给连接池的话，只需要调用HTableInterface#close() 就可以了

HBaseFactory的Interface

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
         16 
       
        public 
          
        interface 
          
        HBaseFactory 
          
        { 
       
        /** 
       
             * 通过 tableName 来获取这个 Table 
       
             */ 
       
        HTableInterface  
        getHTable 
        ( 
        String 
          
        tableName 
        ) 
        ; 
       
        /** 
       
             * 关闭某个table 
       
             */ 
       
        void 
          
        closeHTable 
        ( 
        HTableInterface  
        hTableInterface 
        ) 
        ; 
       
        /** only for unit test*/ 
       
        boolean 
          
        deleteTable 
        ( 
        String 
          
        tableName 
        ) 
        ; 
       
        /** only for unit test*/ 
       
        HTableDescriptor  
        createTable 
        ( 
        String 
          
        tableName 
        , 
          
        int 
          
        maxVersion 
        ) 
        ; 
       
        }

HBaseFactory的Implemention

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
         16 
       
         17 
       
         18 
       
         19 
       
         20 
       
         21 
       
         22 
       
         23 
       
         24 
       
         25 
       
         26 
       
         27 
       
         28 
       
         29 
       
         30 
       
         31 
       
         32 
       
         33 
       
         34 
       
         35 
       
         36 
       
         37 
       
         38 
       
         39 
       
         40 
       
         41 
       
         42 
       
         43 
       
         44 
       
         45 
       
         46 
       
         47 
       
         48 
       
         49 
       
         50 
       
         51 
       
         52 
       
         53 
       
         54 
       
         55 
       
         56 
       
         57 
       
         58 
       
         59 
       
         60 
       
         61 
       
         62 
       
         63 
       
         64 
       
         65 
       
         66 
       
         67 
       
         68 
       
         69 
       
         70 
       
         71 
       
         72 
       
         73 
       
         74 
       
         75 
       
         76 
       
         77 
       
         78 
       
         79 
       
         80 
       
         81 
       
         82 
       
         83 
       
         84 
       
         85 
       
         86 
       
         87 
       
         88 
       
         89 
       
         90 
       
         91 
       
         92 
       
         93 
       
         94 
       
         95 
       
         96 
       
         97 
       
         98 
       
         99 
       
         100 
       
         101 
       
         102 
       
         103 
       
         104 
       
         105 
       
         106 
       
         107 
       
         108 
       
         109 
       
         110 
       
         111 
       
         112 
       
         113 
       
         114 
       
         115 
       
         116 
       
         117 
       
         118 
       
         119 
       
         120 
       
         121 
       
         122 
       
         123 
       
         124 
       
         125 
       
         126 
       
         127 
       
        public 
          
        class 
          
        HBaseFactoryImpl 
          
        implements 
          
        HBaseFactory 
          
        { 
       
        static 
          
        Logger  
        logger 
          
        = 
          
        LoggerFactory 
        . 
        getLogger 
        ( 
        HBaseFactoryImpl 
        . 
        class 
        ) 
        ; 
       
        private 
          
        HTablePool  
        hTablePool 
          
        = 
          
        null 
        ; 
       
        private 
          
        HBaseAdmin  
        hBaseAdmin 
          
        = 
          
        null 
        ; 
       
        @ 
        Inject 
       
        public 
          
        HBaseFactoryImpl 
        ( 
        String 
          
        quorum 
        , 
          
        String 
          
        parent 
        , 
          
        int 
          
        maxSize 
        ) 
          
        { 
       
        checkArgument 
        ( 
        isNotBlank 
        ( 
        quorum 
        ) 
        ) 
        ; 
       
        checkArgument 
        ( 
        isNotBlank 
        ( 
        parent 
        ) 
        ) 
        ; 
       
        Configuration  
        conf 
          
        = 
          
        HBaseConfiguration 
        . 
        create 
        ( 
        ) 
        ; 
       
        conf 
        . 
        set 
        ( 
        "hbase.zookeeper.quorum" 
        , 
          
        quorum 
        ) 
        ; 
       
        conf 
        . 
        set 
        ( 
        "zookeeper.znode.parent" 
        , 
          
        parent 
        ) 
        ; 
       
        conf 
        . 
        set 
        ( 
        "hbase.client.retries.number" 
        , 
          
        "5" 
        ) 
        ; 
       
        conf 
        . 
        set 
        ( 
        "hbase.client.pause" 
        , 
          
        "200" 
        ) 
        ; 
       
        conf 
        . 
        set 
        ( 
        "ipc.ping.interval" 
        , 
          
        "3000" 
        ) 
        ; 
       
        conf 
        . 
        setBoolean 
        ( 
        "hbase.ipc.client.tcpnodelay" 
        , 
          
        true 
        ) 
        ; 
       
        hTablePool 
          
        = 
          
        new 
          
        HTablePool 
        ( 
        conf 
        , 
          
        maxSize 
        ) 
        ; 
       
        try 
          
        { 
       
        hBaseAdmin 
          
        = 
          
        new 
          
        HBaseAdmin 
        ( 
        conf 
        ) 
        ; 
       
        } 
          
        catch 
          
        ( 
        Exception 
          
        e 
        ) 
          
        { 
       
        logger 
        . 
        error 
        ( 
        e 
        . 
        getMessage 
        ( 
        ) 
        , 
          
        e 
        ) 
        ; 
       
        throw 
          
        new 
          
        IllegalStateException 
        ( 
        e 
        ) 
        ; 
       
        } 
       
        } 
       
        @ 
        Override 
       
        public 
          
        HBaseAdmin  
        getHBaseAdmin 
        ( 
        ) 
          
        { 
       
        return 
          
        checkNotNull 
        ( 
        hBaseAdmin 
        ) 
        ; 
       
        } 
       
        @ 
        Override 
       
        public 
          
        HTableInterface  
        getHTable 
        ( 
        String 
          
        tableName 
        ) 
          
        { 
       
        checkArgument 
        ( 
        isNotBlank 
        ( 
        tableName 
        ) 
        ) 
        ; 
       
        return 
          
        checkNotNull 
        ( 
        hTablePool 
        . 
        getTable 
        ( 
        tableName 
        ) 
        ) 
        ; 
       
        } 
       
        @ 
        Override 
       
        public 
          
        void 
          
        closeHTable 
        ( 
        HTableInterface  
        hTableInterface 
        ) 
          
        { 
       
        Closeables 
        . 
        closeQuietly 
        ( 
        hTableInterface 
        ) 
        ; 
       
        } 
       
        @ 
        Override 
       
        public 
          
        boolean 
          
        deleteTable 
        ( 
        String 
          
        tableName 
        ) 
          
        { 
       
        checkArgument 
        ( 
        isNotBlank 
        ( 
        tableName 
        ) 
        ) 
        ; 
       
        try 
          
        { 
       
        hBaseAdmin 
        . 
        disableTable 
        ( 
        tableName 
        ) 
        ; 
       
        hBaseAdmin 
        . 
        deleteTable 
        ( 
        tableName 
        ) 
        ; 
       
        } 
          
        catch 
          
        ( 
        IOException 
          
        e 
        ) 
          
        { 
       
        logger 
        . 
        error 
        ( 
        e 
        . 
        getMessage 
        ( 
        ) 
        , 
          
        e 
        ) 
        ; 
       
        return 
          
        false 
        ; 
       
        } 
       
        return 
          
        true 
        ; 
       
        } 
       
        @ 
        Override 
       
        public 
          
        HTableDescriptor  
        createTable 
        ( 
        String 
          
        tableName 
        , 
          
        int 
          
        maxVersion 
        ) 
          
        { 
       
        return 
          
        createTable 
        ( 
        tableName 
        , 
          
        "cf" 
        , 
          
        0 
        , 
          
        maxVersion 
        , 
          
        null 
        , 
          
        null 
        , 
       
        null 
        , 
          
        0 
        ) 
        ; 
       
        } 
       
        protected 
          
        HTableDescriptor  
        createTable 
        ( 
       
        String 
          
        tableName 
        , 
          
        String 
          
        columnFamily 
        , 
          
        int 
          
        lifetime 
        , 
       
        int 
          
        maxVersion 
        , 
          
        StoreFile 
        . 
        BloomType  
        bloomType 
        , 
          
        String 
          
        startKey 
        , 
       
        String 
          
        endKey 
        , 
          
        int 
          
        numRegions 
        ) 
          
        { 
       
        try 
          
        { 
       
        checkArgument 
        ( 
        ! 
        checkNotNull 
        ( 
        hBaseAdmin 
        ) 
        . 
        tableExists 
        ( 
        tableName 
        ) 
        , 
       
        "the table [%s] should not exist." 
        , 
          
        tableName 
        ) 
        ; 
       
        } 
          
        catch 
          
        ( 
        IOException 
          
        e 
        ) 
          
        { 
       
        logger 
        . 
        error 
        ( 
        e 
        . 
        getMessage 
        ( 
        ) 
        , 
          
        e 
        ) 
        ; 
       
        throw 
          
        new 
          
        IllegalStateException 
        ( 
        e 
        ) 
        ; 
       
        } 
       
        HColumnDescriptor  
        cf 
          
        = 
          
        getCF 
        ( 
        columnFamily 
        , 
          
        lifetime 
        , 
          
        maxVersion 
        , 
       
        bloomType 
        ) 
        ; 
       
        HTableDescriptor  
        table 
          
        = 
          
        new 
          
        HTableDescriptor 
        ( 
        tableName 
        ) 
        ; 
       
        table 
        . 
        addFamily 
        ( 
        cf 
        ) 
        ; 
       
        try 
          
        { 
       
        if 
          
        ( 
        StringUtils 
        . 
        isNotBlank 
        ( 
        startKey 
        ) 
       
        && 
          
        StringUtils 
        . 
        isNotBlank 
        ( 
        endKey 
        ) 
          
        && 
          
        numRegions 
          
        > 
          
        0 
        ) 
       
        hBaseAdmin 
        . 
        createTable 
        ( 
        table 
        , 
          
        Bytes 
        . 
        toBytes 
        ( 
        startKey 
        ) 
        , 
       
        Bytes 
        . 
        toBytes 
        ( 
        endKey 
        ) 
        , 
          
        numRegions 
        ) 
        ; 
       
        else 
       
        hBaseAdmin 
        . 
        createTable 
        ( 
        table 
        ) 
        ; 
       
        } 
          
        catch 
          
        ( 
        IOException 
          
        e 
        ) 
          
        { 
       
        logger 
        . 
        error 
        ( 
        e 
        . 
        getMessage 
        ( 
        ) 
        , 
          
        e 
        ) 
        ; 
       
        throw 
          
        new 
          
        IllegalStateException 
        ( 
        e 
        ) 
        ; 
       
        } 
       
        return 
          
        describeTable 
        ( 
        tableName 
        ) 
        ; 
       
        } 
       
        private 
          
        HColumnDescriptor  
        getCF 
        ( 
        String 
          
        columnFamily 
        , 
          
        int 
          
        lifetime 
        , 
       
        int 
          
        maxVersion 
        , 
          
        StoreFile 
        . 
        BloomType  
        bloomType 
        ) 
          
        { 
       
        HColumnDescriptor  
        cf 
          
        = 
          
        new 
          
        HColumnDescriptor 
        ( 
        columnFamily 
        ) 
        ; 
       
        cf 
        . 
        setCompactionCompressionType 
        ( 
        Compression 
        . 
        Algorithm 
        . 
        LZO 
        ) 
        ; 
       
        cf 
        . 
        setCompressionType 
        ( 
        Compression 
        . 
        Algorithm 
        . 
        LZO 
        ) 
        ; 
       
        if 
          
        ( 
        maxVersion 
          
        > 
          
        0 
        ) 
       
        cf 
        . 
        setMaxVersions 
        ( 
        maxVersion 
          
        > 
          
        1000000 
          
        ? 
          
        1000000 
          
        : 
          
        maxVersion 
        ) 
        ; 
       
        if 
          
        ( 
        lifetime 
          
        > 
          
        0 
        ) 
       
        cf 
        . 
        setTimeToLive 
        ( 
        lifetime 
        ) 
        ; 
       
        if 
          
        ( 
        null 
          
        != 
          
        bloomType 
        ) 
       
        cf 
        . 
        setBloomFilterType 
        ( 
        bloomType 
        ) 
        ; 
       
        else 
       
        cf 
        . 
        setBloomFilterType 
        ( 
        StoreFile 
        . 
        BloomType 
        . 
        ROW 
        ) 
        ; 
       
        return 
          
        cf 
        ; 
       
        } 
       
        public 
          
        HTableDescriptor  
        describeTable 
        ( 
        String 
          
        tableName 
        ) 
          
        { 
       
        try 
          
        { 
       
        return 
          
        checkNotNull 
        ( 
        hBaseAdmin 
        ) 
        . 
        getTableDescriptor 
        ( 
        Bytes 
        . 
        toBytes 
        ( 
        tableName 
        ) 
        ) 
        ; 
       
        } 
          
        catch 
          
        ( 
        Exception 
          
        e 
        ) 
          
        { 
       
        logger 
        . 
        error 
        ( 
        e 
        . 
        getMessage 
        ( 
        ) 
        , 
          
        e 
        ) 
        ; 
       
        throw 
          
        new 
          
        IllegalStateException 
        ( 
        e 
        ) 
        ; 
       
        } 
       
        } 
       
        @ 
        PreDestroy 
       
        public 
          
        void 
          
        destroy 
        ( 
        ) 
          
        throws 
          
        Exception 
          
        { 
       
        Closeables 
        . 
        closeQuietly 
        ( 
        hTablePool 
        ) 
        ; 
       
        Closeables 
        . 
        closeQuietly 
        ( 
        hBaseAdmin 
        ) 
        ; 
       
        } 
       
        }

Usage

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
        HTableInterface  
        hTableInterface 
          
        = 
          
        null 
        ; 
       
        try 
          
        { 
       
        hTableInterface 
          
        = 
          
        hBaseFactory 
        . 
        getHTable 
        ( 
        "YOUR_TABLE_NAME" 
        ) 
        ; 
       
        // code here … 
       
        } 
          
        finally 
          
        { 
       
        hBaseFactory 
        . 
        closeHTable 
        ( 
        hTableInterface 
        ) 
        ; 
       
        }

Scan

StartRow&Cache如果不设置StartRow，那就会从头开始搜索，这样做的话速度就会很慢
Cache能够保证一次搜索拿到内存的数据，否则您iterator一次就得走一次网络
关于FilterPrefixFilter是最常用的filter，有个非常需要注意的点
如果Rowkey是”123_1_00000“这样的，如果prefix是123_1，切记切记要记得写成123_1_
其次要注意filter不要太多，最好不要超过2个
关于分页在Mysql里面，常常需要用到分页，那么在HBase里面你该如何实现，使用PageFilter配合startRow，但是在Mysql里面常常会有一个总数的概念，切记切记HBase里面不要做类似Count的操作
关于分布式流式处理比方说，现在有10台机器，需要同时处理1000万的数据，那么这个时候，我们就可以用到checkAndPut。就像Mysql里面的一个乐观锁一样。

具体的做法是：

我们通过PageFilter，SingleColumnValueFilter配合startRow获取一部分数据
然后用checkAndPut标记该数据正在处理
最后再用put标记该数据已经处理

HBase实战经验

向下兼容

开发过程中，难免需要加字段的，那这个时候，就需要代码、数据能够向下兼容。

比方说我们现在需要新增一个column，因为是新加的一列，原来的数据这列就是null，那么这时候从HBase里面读到的值就是null，所以写HBase代码一定要注意：

从HBase里面的数据一定要check null，如果是null，我就用一个默认值
代表元数据的DO类的默认值，最好不要是null，null永远不要存在在代码中
再一次强调，非常建议数据用String的方式存储，因为可视化的数据能够帮您解决很多问题

 
   
 
 
  
         1 
       

         2 
       
 
        // firstNonNull 是 google guava Objects#firstNonNull 的方法，如果方法第一个参数是null，就返回第二个参数 
       
 
        Integer 
        . 
        parseInt 
        ( 
        new 
          
        String 
        ( 
        firstNonNull 
        ( 
        result 
        . 
        getValue 
        ( 
        DEFAULT_COLUMN_FAMILIES 
        , 
          
        COLUMN 
        ) 
        , 
          
        new 
          
        byte 
        [ 
        ] 
        { 
        '0' 
        } 
        ) 
        ) 
        ) 
        ; 
       
 
 

RowKey的设计

建议使用String如果不是特殊要求，RowKey最好都是String。
- 方便线上使用Shell查数据、排查错误
- 更容易让数据均匀分布
- 不必考虑存储成本
RowKey的长度尽量短如果RowKey太长话，第一是，存储开销会增加，影响存储效率；第二是，内存中Rowkey字段过长，内存的利用率会降低，这会降低索引命中率。
一般的做法是：
- 时间使用Long来表示
- 尽量使用编码压缩
RowKey尽量散列RowKey的设计，最重要的是要保证散列，这样就会保证所有的数据都不都是在一个region上，避免做读写的时候负载将会集中在个别region上面。
假设我们需要存储一个用户的所有微博（暂时不需要考虑时间倒排），这时候的RowKey的设计是UserId_WeiboId ，但是这样设计的话，UserId 的分布就很可能不均匀，因为RowKey是字符串排序的。

有两种办法来解决这个问题
- ReversesUserId字符串反转后存储
- Hash或者ModUserIdMD5 后作取前6位为前缀加入到 UserId 前面
RowKey排序假设我们有个很多微博用户发微博，但是这个时候，我们要开辟一个“广场”，所有的微博都是按照时间倒排序展示在这个“广场”里。这个时候我们就得为原来的UserId_WeiboId建立一张索引表，并且这个表的Rowkey要和时间相关
- Rowkey的设计可以使用当前时间 - 微博发表时间的 long 值作为 RowKey 的前缀
- RowKey散列
- 如果数据可以定期清理如果数据不是需要一直保存的话，就算所有数据落在一个region，因为按时间搜索会指定startRow，存储时候Rowkey也是连续的，所以速度也非常块，当然数据容量最好和DBA商量一下
- 如果数据都需要保存把DayOfMonth作为前缀
  那么RowKey会是 DayOfMonth_(当前时间 - 微博发表时间)
  
  不过这样在代码实现上面的时候会有一些麻烦。
关于事务目前HBase的Put，Delete操作都是事务的，但是如果您希望能够对好几个Table发起一连串操作并且希望是事务的话，目前还没有好的办法。所以HBase使用的时候，要有解决数据出错的觉悟。