Snappy压缩方式已经被合成到官方的版本0.92之中,我在0.90的版本上打的补丁使用的是链接(https://issues.apache.org/jira/browse/HBASE-3691)中hbase-snappy-3691-trunk.patch的patch.安装步骤如下:
1. 下载snappy代码,编译并安装动态连接库到每台datanode和regionserver上。(http://code.google.com/p/snappy/),并确保动态链接库在系统的LD_LIBRARPYPATH之中。如不在,可使用命令: export LD_LIBRARPYPATH=/path/to/snappy/library/
2. 下载hadoop-snappy(http://code.google.com/p/hadoop-snappy/)代码。并编译:
$ mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]
3. 在hdfs中安装snappy.
<a>. 解压 hadoop-snappy-0.0.1-SNAPSHOT.tar.gz 将其中的native中的动态静态链接库文件拷到hadoop lib的native下面,将hadoop-snappy-0.0.1-SNAPSHOT.jar考到hadoop lib下,此过程类似安装lzo.
<b> 将一下改动加入到hadoop的core-site.xml中:
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
<c>. 重启动 Hadoop.
4.将3.a 中的lib拷贝到hbase 的对应lib 和 lib/native 之中。并启动hbase
测试安装是否成功:
1 . 使用 CompressionTest 来查看snappy是否 enabled 并且能成功 loaded:
$ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy
2 . 创建一章以snappy方式压缩的表来检查能否成功:
$ hbase shell
> create 't1', { NAME => 'cf1', COMPRESSION => 'snappy' }
> describe 't1'
在"describe" 命令输出中, 需要确认 "COMPRESSION => 'snappy'"
【性能测试】随机写入 结果如下:
第一组:
数据量:2000k 线程数100
Snappy:
===Write Result===
TPS: 5409
Avg Response Time: 18.49
Success Write Result: (2000000/2000000)
Failed Ratio: 0%
[0~1)ms: 0% (1428/2000000)
[1~5)ms: 15% (312854/2000000)
[5~10)ms: 12% (253575/2000000)
[10~100)ms: 70% (1413117/2000000)
[100~1000)ms: 0% (15804/2000000)
[1000~10000)ms: 0% (3222/2000000)
[10000~100000)ms: 0% (0/2000000)
[100000+ms: 0% (0/2000000)
Lzo:
===Write Result===
TPS: 5207
Avg Response Time: 19.2
Success Write Result: (2000000/2000000)
Failed Ratio: 0%
[0~1)ms: 0% (1267/2000000)
[1~5)ms: 16% (332766/2000000)
[5~10)ms: 10% (208876/2000000)
[10~100)ms: 71% (1437943/2000000)
[100~1000)ms: 0% (14761/2000000)
[1000~10000)ms: 0% (4387/2000000)
[10000~100000)ms: 0% (0/2000000)
[100000+ms: 0% (0/2000000)
随机读性能如下:
LZO:
===Read Result===
QPS: 2261
Avg Response Time: 44.24
Success Read Result: (138044/138044)
Failed Ratio: 0%
Error Ratio: 0%
[0~1)ms: 0% (0/138044)
[1~5)ms: 18% (26078/138044)
[5~10)ms: 12% (16898/138044)
[10~100)ms: 68% (94863/138044)
[100~1000)ms: 0% (205/138044)
[1000~10000)ms: 0% (0/138044)
[10000~100000)ms: 0% (0/138044)
[100000+ms: 0% (0/138044)
Snappy:
$ ./run-test.sh 6 "{\"tn\":\"100\",\"tc\":\"tt1-snappy(200){cf1:v1(1024)}\",\"rt\":\"60\",\"kn\":\"900\"}" com.taobao.hbase.testcenter.testcase.benchmark.testcase.RandomRead hbase-test-center-testcase-0.0.1-SNAPSHOT.jar
===Read Result===
QPS: 2357
Avg Response Time: 42.43
Success Read Result: (143958/143958)
Failed Ratio: 0%
Error Ratio: 0%
[0~1)ms: 0% (0/143958)
[1~5)ms: 7% (10217/143958)
[5~10)ms: 13% (18743/143958)
[10~100)ms: 79% (114922/143958)
[100~1000)ms: 0% (76/143958)
[1000~10000)ms: 0% (0/143958)
[10000~100000)ms: 0% (0/143958)
[100000+ms: 0% (0/143958)
综上所述,在value为1K左右时,2者的速度相差无几。