hadoop, hive 启用LZO压缩

http://www.cnblogs.com/bruthe/articles/4554787.html

做笔记,非常感谢原作者。

1  
复制代码
sudo apt-get install liblzo2-dev
 
hadoop@idex140:~/modules/hadoop-2.6.0$ dpkg -L liblzo2-2  (查看安装包的位置)
/.
/usr
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/liblzo2.so.2.0.0
/usr/share
/usr/share/doc
/usr/share/doc/liblzo2-2
/usr/share/doc/liblzo2-2/THANKS
/usr/share/doc/liblzo2-2/AUTHORS
/usr/share/doc/liblzo2-2/changelog.Debian.gz
/usr/share/doc/liblzo2-2/copyright
/usr/share/doc/liblzo2-2/LZO.TXT.gz
/usr/lib/x86_64-linux-gnu/liblzo2.so.2
复制代码

 

2  
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.09.tar.gz

 

3  
tar -xzvf lzo-2.09.tar.gz  
cd lzo-2.09
export CFLAGS=-m64 (字段64位操作系统)
./configure --enable-shared --prefix /usr/local/lzo-2.09
make && sudo make install

 

5  
sudo apt-get install lzop

 

6  
hadoop@master:~/hadoop-lzo$ C_INCLUDE_PATH=/usr/local/lzo-2.09/include/ \
   > LIBRARY_PATH=/usr/local/lzo-2.09/lib/ \
   > CXXFLAGS=-m64 \
   > mvn clean package  (修改hadoop.version为对应正确的版本)

 

7   

tar -cBf - -C target/native/Linux-amd64-64/lib . | tar -xBvf - -C ~/modules/hadoop-2.6.0/lib/native/

  

8  

cp ${HADOOP_LZO_HOME}/target/hadoop-lzo-0.4.20-SNAPSHOT.jar  ${HADOOP_HOME}/share/hadoop/common/lib/
source /etc/profile

 

9  同步以上操作至其它节点 

复制代码
 scp lzo-2.09.tar.gz  hadoop-slave1:/home/hadoop/
 scp lzo-2.09.tar.gz  hadoop-slave2:/home/hadoop/
 
 ./configure --enable-shared --prefix /usr/local/lzo-2.09
 make && sudo make install
 
 sudo apt-get install liblzo2-dev
 sudo apt-get install lzop
 
 scp -r libgpl* hadoop-slave1:/home/hadoop/modules/hadoop-2.6.0/lib/native/
 scp -r libgpl* hadoop-slave2:/home/hadoop/modules/hadoop-2.6.0/lib/native/
 
 scp   $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/
 scp   $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/
 source /etc/profile
复制代码

 

10 更新hadoop配置文件

   (1)在文件$HADOOP_HOME/etc/hadoop/hadoop-env.sh中追加如下内容:
# add lzo environment variables
export LD_LIBRARY_PATH=/usr/local/lzo-2.09/lib

   (2)修改core-size.xml

 
复制代码
      <property>
        <name>io.compression.codecs </name>
        <value>org.apache.hadoop.io.compress.GzipCodec,
          org.apache.hadoop.io.compress.DefaultCodec,
          com.hadoop.compression.lzo.LzoCodec,
          com.hadoop.compression.lzo.LzopCodec,
          org.apache.hadoop.io.compress.BZip2Codec,
          org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
      <property>
        <name>io.compression.codec.lzo.class </name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
复制代码
   (3)修改mapred-site.xml
 
复制代码
      <property>
       <name>mapred.child.env </name>
        <value>LD_LIBRARY_PATH =/usr/local/lzo-2.09/lib </value>
      </property>
       <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress.type</name>
       <value>BLOCK</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress</name>
       <value>false</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress.codec</name>
       <value>org.apache.hadoop.io.compress.DefaultCodec</value>
      </property>
复制代码

 PS:

       中间结果压缩
 
hadoop设置或者hive设置属性名称(最新名称)默认值过时属性名称
hadoop jobmapreduce.map.output.compressfalsemapred.compress.map.output
mapreduce.map.output.compress.codecorg.apache.hadoop.io.compress.DefaultCodec
mapred.map.output.compression.codec
hive   jobhive.exec.compress.intermediatefalse 
 
       最终输出结果压缩
 
hadoop设置或者hive设置属性名称(最新名称)默认值过时属性名称
hadoop jobmapreduce.output.fileoutputformat.compress falsemapred.output.compress
mapreduce.output.fileoutputformat.compress.typeRECORDmapred.output.compression.type
mapreduce.output.fileoutputformat.compress.codecorg.apache.hadoop.io.compress.DefaultCodecmapred.output.compression.codec
hive       jobhive.exec.compress.outputfalse 
 
11  hive创建支持存储lzo压缩数据的测试表
 
复制代码
    CREATE TABLE rawdata(
      appkey string, uid string, uidtype string                            
    )                
    COMMENT 'This is the staging of raw data'
    PARTITIONED BY (day INT)
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY '\t' 
    STORED AS INPUTFORMAT 
      'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; 
复制代码

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值