1)hadoop本身并不支持lzo压缩,需要我们编译,过程复杂,这里直接提供编译好的jar包
链接:https://pan.baidu.com/s/1L5S9geY7fSg1_ToNaTYsEg
提取码:vfaa
复制这段内容后打开百度网盘手机App,操作更方便哦
2)将编译好后的hadoop-lzo-0.4.20.jar 放入hadoop260/share/hadoop/common/
3)把hadoop-lzo-0.4.20.jar发送到到hadoop02、hadoop03相同目录下
4)为hadoop的core-site.xml增加配置支持LZO压缩
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>io.compression.codecs</name>
<value> org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec </value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
5)最后同步core-site.xml到hadoop02、hadoop03
测试
1)创建LZO文件的索引,LZO压缩文件的可切片特性依赖于其索引,故我们需要手动为LZO压缩文件创建索引。若无索引,则LZO文件的切片只有一个。
将bigtable.lzo(150M)上传到集群的根目录
对其进行wordcount
hadoop jar /opt/soft/hadoop260/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /input /output1
对上传的LZO文件建索引
hadoop jar /opt/soft/hadoop260/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
再次执行WordCount程序
hadoop jar /opt/soft/hadoop260/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /input /output2
成功分片