LZO的安装配置
1.1 在hadoop集群每个节点上安装lzo和lzop及其依赖(主要为解决安装lzop):
[root@hadoop01 ~]# yum -y install *lzo*
[root@hadoop02 ~]# yum -y install *lzo*
[root@hadoop03 ~]# yum -y install *lzo*
1.2 先安装gcc等工具
[root@hadoop01 ~]# yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool
1.3 用wget下载lzo-2.10的源码包
[root@hadoop01 software]# wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
1.4 解压下载的源码
[root@hadoop01 software]# tar -zxvf /home/lzo-2.10.tar.gz -C ./yuanma
1.5 进入解压后的目录,进行源码安装
[root@hadoop01 software]# cd ./yuanma/lzo-2.10/
先手动在/opt/app/下创建一个lzo文件夹
[root@hadoop01 lzo-2.10]# ./configure -prefix=/opt/app/lzo
[root@hadoop01 lzo-2.10]# make & make install
1.6下载并编译hadoop-lzo源码
[root@hadoop01 software]# wget https://github.com/twitter/hadoop-lzo/archive/master.zip
[root@hadoop01 software]# unzip -d /opt/app ./master.zip
[root@hadoop01 software]# cd /opt/app/hadoop-lzo-master
在pom.xml中搜索内容hadoop.current并修改版本号为自己hadoop的版本号
[root@hadoop01 hadoop-lzo-master]# vim pom.xml
1.7配置lzo环境变量
[root@hadoop01 hadoop-lzo-master]# vim /etc/profile
在配置文件中添加lzo的lib与include路径
export LZO_HOME=/opt/app/lzo
export PATH=$PATH:$LZO_HOME/lib:$LZO_HOME/include:
使配置文件生效
[root@hadoop01 hadoop-lzo-master]# source /etc/profile
1.7 使用maven编译(先确认maven已经安装)(需要在当前目录下执行)
[root@hadoop01 hadoop-lzo-master]# mvn package -Dmaven.test.skip=true
1.8 编译完成后,此时hadoop-lzo-master下就会生成一个target目录
进入target,将hadoop-lzo-0.4.21-SNAPSHOT.jar放到hadoop的classpath下
hadoop的classpath有很多 可以通过hadoop class命令查看
如下将此jar包添加到${HADOOP_HOME}/share/hadoop/common
[root@hadoop01 hadoop-lzo-master]# cd target
[root@hadoop01 target]# cp ./hadoop-lzo-0.4.21-SNAPSHOT.jar /opt/app/hadoop-2.7.7/share/hadoop/common
1.9 将此jar包分发到其他服务器
[root@hadoop01 target] scp ./hadoop-lzo-0.4.21-SNAPSHOT.jar hadoop02:/opt/app/hadoop-2.7.7/share/hadoop/common
[root@hadoop01 target] scp ./hadoop-lzo-0.4.21-SNAPSHOT.jar hadoop03:/opt/app/hadoop-2.7.7/share/hadoop/common
2.0 在hadoop的core-stie.xml中添加一下配置 并分发到其他服务器
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
[root@hadoop01 hadoop-2.7.7]# scp -r ./etc/hadoop/core-site.xml hadoop02:/opt/app/hadoop-2.7.7/etc/hadoop/
[root@hadoop01 hadoop-2.7.7]# scp -r ./etc/hadoop/core-site.xml hadoop03:/opt/app/hadoop-2.7.3/etc/hadoop/
2.1 启动集群
start-all.sh
LZO测试
1.创建数据文件 lzodata.txt
19630001 john lennon
19630002 paul mccartney
19630003 george harrison
19630004 ringo starr
2.将lzodata压缩成lzodata.txt.lzo
[root@hadoop01 ~]# lzop ./lzodata.txt
3.进入hive数据库
创建lzo_test表并指定存储模式
CREATE TABLE lzo_test(
id bigint,
firstname string,
lastname string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
;
4.加载数据
load data local inpath '/root/lzodata.txt.lzo' into table lzo_test;
5.查询数据
select * from lzo_test;
6.去hadoop01:50070查看lzo_test中数据的存储格式为lzodata.txt.lzo
因为数据量很小,所以导致压缩后的文件变大了
当数据量大时,就可以明显看出压缩后文件大与原文件差别
lzo的压缩后的文件大小基本为原文件的0.35倍
lzo的压缩与解压缩速度很快
所以整体来看lzo压缩最优