Hbase增加Snappy压缩格式的调研

Snappy压缩方式已经被合成到官方的版本0.92之中,我在0.90的版本上打的补丁使用的是链接(https://issues.apache.org/jira/browse/HBASE-3691)中hbase-snappy-3691-trunk.patch的patch.安装步骤如下:

1. 下载snappy代码,编译并安装动态连接库到每台datanode和regionserver上。(http://code.google.com/p/snappy/),并确保动态链接库在系统的LD_LIBRARPYPATH之中。如不在,可使用命令: export LD_LIBRARPYPATH=/path/to/snappy/library/ 

2. 下载hadoop-snappy(http://code.google.com/p/hadoop-snappy/)代码。并编译: 
      $ mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR] 

3. 在hdfs中安装snappy. 
     
    <a>. 解压 hadoop-snappy-0.0.1-SNAPSHOT.tar.gz 将其中的native中的动态静态链接库文件拷到hadoop lib的native下面,将hadoop-snappy-0.0.1-SNAPSHOT.jar考到hadoop lib下,此过程类似安装lzo. 

    <b> 将一下改动加入到hadoop的core-site.xml中: 
    <property> 
      <name>io.compression.codecs</name> 
        <value> 
          org.apache.hadoop.io.compress.GzipCodec, 
          org.apache.hadoop.io.compress.DefaultCodec, 
          org.apache.hadoop.io.compress.BZip2Codec, 
          org.apache.hadoop.io.compress.SnappyCodec 
        </value> 
    </property> 
    <c>. 重启动 Hadoop. 

4.将3.a 中的lib拷贝到hbase 的对应lib 和 lib/native 之中。并启动hbase 

测试安装是否成功: 

1 . 使用 CompressionTest 来查看snappy是否 enabled 并且能成功 loaded: 
$ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy 

2 . 创建一章以snappy方式压缩的表来检查能否成功: 
$ hbase shell 
> create 't1', { NAME => 'cf1', COMPRESSION => 'snappy' } 
> describe 't1' 

在"describe" 命令输出中, 需要确认 "COMPRESSION => 'snappy'"

【性能测试】随机写入 结果如下:

第一组: 
数据量:2000k 线程数100 

Snappy:
===Write Result=== 
 TPS: 5409 
 Avg Response Time: 18.49 
 Success Write Result: (2000000/2000000) 
 Failed Ratio: 0% 
 [0~1)ms: 0% (1428/2000000) 
 [1~5)ms: 15% (312854/2000000) 
 [5~10)ms: 12% (253575/2000000) 
 [10~100)ms: 70% (1413117/2000000) 
 [100~1000)ms: 0% (15804/2000000) 
 [1000~10000)ms: 0% (3222/2000000) 
 [10000~100000)ms: 0% (0/2000000) 
 [100000+ms: 0% (0/2000000) 

Lzo:
===Write Result=== 
 TPS: 5207 
 Avg Response Time: 19.2 
 Success Write Result: (2000000/2000000) 
 Failed Ratio: 0% 
 [0~1)ms: 0% (1267/2000000) 
 [1~5)ms: 16% (332766/2000000) 
 [5~10)ms: 10% (208876/2000000) 
 [10~100)ms: 71% (1437943/2000000) 
 [100~1000)ms: 0% (14761/2000000) 
 [1000~10000)ms: 0% (4387/2000000) 
 [10000~100000)ms: 0% (0/2000000) 
 [100000+ms: 0% (0/2000000) 

随机读性能如下:

LZO:

===Read Result=== 
 QPS: 2261 
 Avg Response Time: 44.24 
 Success Read Result: (138044/138044) 
 Failed Ratio: 0% 
 Error Ratio: 0% 
 [0~1)ms: 0% (0/138044) 
 [1~5)ms: 18% (26078/138044) 
 [5~10)ms: 12% (16898/138044) 
 [10~100)ms: 68% (94863/138044) 
 [100~1000)ms: 0% (205/138044) 
 [1000~10000)ms: 0% (0/138044) 
 [10000~100000)ms: 0% (0/138044) 
 [100000+ms: 0% (0/138044) 


Snappy: 

$ ./run-test.sh 6 "{\"tn\":\"100\",\"tc\":\"tt1-snappy(200){cf1:v1(1024)}\",\"rt\":\"60\",\"kn\":\"900\"}" com.taobao.hbase.testcenter.testcase.benchmark.testcase.RandomRead hbase-test-center-testcase-0.0.1-SNAPSHOT.jar 
===Read Result=== 
 QPS: 2357 
 Avg Response Time: 42.43 
 Success Read Result: (143958/143958) 
 Failed Ratio: 0% 
 Error Ratio: 0% 
 [0~1)ms: 0% (0/143958) 
 [1~5)ms: 7% (10217/143958) 
 [5~10)ms: 13% (18743/143958) 
 [10~100)ms: 79% (114922/143958) 
 [100~1000)ms: 0% (76/143958) 
 [1000~10000)ms: 0% (0/143958) 
 [10000~100000)ms: 0% (0/143958) 
 [100000+ms: 0% (0/143958) 

综上所述,在value为1K左右时,2者的速度相差无几。



阅读更多
个人分类: hbase
上一篇Hbase region空洞修复工具
下一篇Hbase Master RIT(Region in Transaction)分析
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭