cloudera中hbase使用Snappy算法安装及设置

最新推荐文章于 2023-01-03 14:17:08 发布

豹先生_MR-BAO

最新推荐文章于 2023-01-03 14:17:08 发布

阅读量3.5k

点赞数

分类专栏： hbase cloudera 文章标签： hbase 算法 compression mapreduce properties library

cloudera 同时被 2 个专栏收录

69 篇文章 0 订阅

订阅专栏

hbase

54 篇文章 0 订阅

订阅专栏

Snappy is a compression/decompression library. It aims for very high speeds and reasonable compression, rather than maximum compression or compatibility with other compression libraries.

Snappy Installation

Snappy is provided in the native package along with the other native libraries (such as native gzipcompression). If you are already using this package there is no additional installation to do. Otherwise follow these installation instructions:

To install Snappy on Ubuntu systems:

 
          $ sudo apt-get install hadoop-0.20-native 
         

To install Snappy on Red Hat systems:

 
          $ sudo yum install hadoop-0.20-native 
         

To install Snappy on SUSE systems:

 
          $ sudo zypper install hadoop-0.20-native 
         

Note
If you install Hadoop from a tarball, only 64-bit Snappy libraries are available. If you need to use Snappy on 32-bit platforms, use the other packages, such as Red Hat or Debian packages.

To take advantage of Snappy compression you need to set certain configuration properties, which are explained in the following sections.

Using Snappy for MapReduce Compression

It's very common to enable MapReduce intermediate compression, since this can make jobs run faster without you having to make any application changes. Only the temporary intermediate files created by Hadoop for the shuffle phase are compressed (the final output may or may not be compressed). Snappy is ideal in this case because it compresses and decompresses very fast compared to other compression algorithms, such as Gzip.

To enable Snappy for MapReduce intermediate compression for the whole cluster, set the following properties in mapred-site.xml:

 
          <property> 
         
          <name>mapred.compress.map.output</name>  
         
          <value>true</value> 
         
          </property> 
         
          <property> 
         
          <name>mapred.map.output.compression.codec</name>  
         
          <value>org.apache.hadoop.io.compress.SnappyCodec</value> 
         
          </property>

You can also set these properties on a per-job basis.

Use the properties in the following table to compress the final output of a MapReduce job. These are usually set on a per-job basis.

Property	Description
`mapred.output.compress`	Whether to compress the final job outputs (`true` or `false`)
`mapred.output.compression.codec`	If the final job outputs are to be compressed, which codec should be used. Set to `org.apache.hadoop.io.compress.SnappyCodec` for Snappy compression.
`mapred.output.compression.type`	For `SequenceFile` outputs, what type of compression should be used (`NONE`, `RECORD`, or `BLOCK`). `BLOCK` is recommended.

Using Snappy for Pig Compression

Set the same properties for Pig as for MapReduce (see the table in the previous section).

Using Snappy for Hive Compression

To enable Snappy compression for Hive output when creating SequenceFile outputs, use the following settings:

 
          SET hive.exec.compress.output=true; 
         
          SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; 
         
          SET mapred.output.compression.type=BLOCK;

Configuring Flume to use Snappy Compression

Depending on the architecture of the machine you are installing on, add one of the following lines to/usr/lib/flume/bin/flume-env.sh:

For 32-bit platforms:

 
          export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 
         

For 64-bit platforms:

 
          export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64 
         

The following section explains how to take advantage of Snappy compression.

Using Snappy compression in Flume Sinks

You can specify Snappy as a compression codec in Flume's configuration language. For example, the following specifies a Snappy-compressed SequenceFile sink on HDFS:

 
          customdfs("hdfs://namenode/path", seqfile("snappy")) 
         

Using Snappy compression in Sqoop Imports

On the command line, use the following option to enable Snappy compression:

 
          --compression-codec org.apache.hadoop.io.compress.SnappyCodec 
         

It is a good idea to use the --as-sequencefile option with this compression option.

Configuring HBase to use Snappy Compression

Important
You need to configure HBase to use Snappy only if you installed Hadoop and HBase from tarballs; if you installed them from RPM or Debian packages, Snappy requires no HBase configuration.

Depending on the architecture of the machine you are installing on, add one of the following lines to/etc/hbase/conf/hbase-env.sh:

For 32-bit platforms:

 
          export HBASE_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32 
         

For 64-bit platforms:

 
          export HBASE_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64 
         

To use Snappy compression in HBase Tables, specify the column family compression as snappy. For example, in the shell:

 
          create 'mytable', {NAME=>'mycolumnfamily:', COMPRESSION=>'snappy'} 
         

豹先生_MR-BAO

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
cloudera中hbase使用Snappy算法安装及设置

Snappy is a compression/decompression library. It aims for very high speeds and reasonable compression, rather than maximum compression or c
复制链接

扫一扫

专栏目录