Hadoop-2.6.0-cdh5.7.0编译支持压缩

Hadoop-2.6.0-cdh5.7.0编译支持压缩

Hadoop-2.6.0-cdh5.7.0本身是没有进行压缩支持的,但是我们在生产环境进行操作的时候,必须要进行压缩处理.

1. 压缩的好处和坏处

好处

    • 减少存储磁盘空间
    • 降低IO(网络的IO和磁盘的IO)
    • 加快数据在磁盘和网络中的传输速度,从而提高系统的处理速度

坏处

    • 由于使用数据时,需要先将数据解压,加重CPU负荷

2. 压缩格式

压缩格式工具算法扩展名是否支持分割
DEFLATEN/ADEFLATE.deflateNo
gzipgzipDEFLATE.gzNo
bzip2bzip2bzip2.bz2Yes
LZOLzopLZO.lzoYes
LZ4N/ALZ4.lz4No
SnappyN/ASnappy.snappyNo
1.压缩比

在这里插入图片描述

2.压缩时间

在这里插入图片描述

可以看出,压缩比越高,压缩时间越长,压缩比:Snappy<LZ4<LZO<GZIP<BZIP2,所以snappy是比较好的选择.

a. gzip

​ **优点:**压缩比在四种压缩方式中较高;hadoop本身支持,在应用中处理gzip格式的文件就和直接处理文本一样;有hadoop native库;大部分linux系统都自带gzip命令,使用方便

​ **缺点:**不支持split

b. lzo

优点: 压缩/解压速度也比较快,合理的压缩率;支持split,是hadoop中最流行的压缩格式;支持hadoop native库;需要在linux系统下自行安装lzop命令,使用方便

缺点: 压缩率比gzip要低;hadoop本身不支持,需要安装;lzo虽然支持split,但需要对lzo文件建索引,否则hadoop也是会把lzo文件看成一个普通文件(为了支持split需要建索引,需要指定inputformat为lzo格式)

c. snappy

​ **优点:**压缩速度快;支持hadoop native库

​ **缺点:**不支持split;压缩比低;hadoop本身不支持,需要安装;linux系统下没有对应的命令

d. bzip2

**优点:**支持split;具有很高的压缩率,比gzip压缩率都高;hadoop本身支持,但不支持native;在linux系统下自带bzip2命令,使用方便

​ **缺点:**压缩/解压速度慢;不支持native

3.总结

​ 不同的场景选择不同的压缩方式,肯定没有一个一劳永逸的方法,如果选择高压缩比,那么对于cpu的性能要求要高,同时压缩、解压时间耗费也多;选择压缩比低的,对于磁盘io、网络io的时间要多,空间占据要多;对于支持分割的,可以实现并行处理。

Tips:一般我们要尽量控制我们的输出文件的大小不大于一个block块的大小,例如block块大小是128M,我们可以将输出的文件大小控制在126M左右,不能超过128M.

3.编译压缩

1.maven安装
#下载maven,在清华镜像站可以找到下载地址(清华就是流批)
[hadoop@hadoop001 scripts]$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
[root@hadoop001 scripts] tar -xzvf apache-maven-3.3.9-bin.tar.gz -C /home/hadoop/app/
[root@hadoop001 app] mv apache-maven-3.3.9/ maven
[root@hadoop001 app] chown hadoop:hadoop maven

#配置环境变量
[hadoop@hadoop001 ~]$ vim ~/.bash_profile
export MAVEN_HOME=/home/hadoop/app/maven
export PATH=$MAVEN_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
[hadoop@hadoop001 ~]$ source ~/.bash_profile
#配置maven本地仓库目录
[hadoop@hadoop001 ~]$ cd app/maven/conf/
[hadoop@hadoop001 conf]$ vim settings.xml
#配置maven的本地仓库位置
<localRepository>/home/hadoop/maven_repo/repo</localRepository>
#添加阿里云中央仓库地址,注意一定要写在<mirrors></mirrors>之间,这个特别重要
<mirror>
     <id>nexus-aliyun</id>
     <mirrorOf>central</mirrorOf>
     <name>Nexus aliyun</name>
     <url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
2.编译需求软件安装
# 解压findbugs-3.0.1.tar.gz以及protobuf-2.5.0.tar.gz,同时注意一下文件以及文件夹的用户组
[root@hadoop001 software] tar -xzvf findbugs-3.0.1.tar.gz -C /home/hadoop/app/
[root@hadoop001 software] tar -xzvf protobuf-2.5.0.tar.gz -C /usr/local
# 查看环境变量的时候可以看一下Java的环境变量,Hadoop-2.6.0-cdh5.7.0应该使用jdk1.7编译,1.8会编译失败
# 解压Hadoop-2.6.0-cdh5.7.0的源码包
[hadoop@hadoop001 ~]$ tar -xzvf hadoop-2.6.0-cdh5.7.0-src.tar.gz -C /app/source/
# cd到源码目录,查看环境要求
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cat BUILDING.txt
----------------------------------------------------------
## Build instructions for Hadoop
Requirements:
- Unix System
- JDK 1.7+
- Maven 3.0 or later
- Findbugs 1.3.9 (if running findbugs)
- ProtocolBuffer 2.5.0
- CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
- Zlib devel (if compiling native code)
- openssl devel ( if compiling native hadoop-pipes )
- Internet connection for first build (to fetch all Maven and Hadoop dependencies)
----------------------------------------------------------
3.预编译安装
[root@hadoop001 conf] yum install -y gcc gcc-c++ make cmake
#cd到protobuf的目录下,编译软件
[root@hadoop001 app] cd /usr/local/protobuf-2.5.0/
[root@hadoop001 protobuf-2.5.0] ./configure --prefix=/usr/local/protobuf
[root@hadoop001 protobuf-2.5.0] make && make install
#确认所有环境变量
[hadoop@hadoop001 ~]$ vim ~/.bash_profile
export JAVA_HOME=/usr/java/jdk1.7.0_60
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
export HADOOP_HOME=/home/hadoop/app/hadoop
export MAVEN_HOME=/home/hadoop/app/maven
export FINDBUGS_HOME=/home/hadoop/app/findbugs-3.0.1
export PROTOC_HOME=/usr/local/protobuf
export PATH=$FINDBUGS_HOME/bin:$PROTOC_HOME/bin:$MAVEN_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
[hadoop@hadoop001 ~]$ source ~/.bash_profile
[hadoop@hadoop001 ~]$ which java
/usr/java/jdk1.7.0_60/bin/java
[hadoop@hadoop001 ~]$ which mvn
~/app/maven/bin/mvn
[hadoop@hadoop001 ~]$ which findbugs
~/app/findbugs-3.0.1/bin/findbugs
[hadoop@hadoop001 ~]$ which protoc
/usr/local/protobuf/bin/protoc
#查看所有软件版本
[hadoop@hadoop001 ~]$ java -version
java version "1.7.0_60"
Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)
[hadoop@hadoop001 ~]$ mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /home/hadoop/app/maven
Java version: 1.7.0_60, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_60/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-696.16.1.el6.x86_64", arch: "amd64", family: "unix"
[hadoop@hadoop001 ~]$ findbugs -version
3.0.1
[hadoop@hadoop001 ~]$ protoc --version
libprotoc 2.5.0
4.其他组件安装
#主要是几种压缩方式的安装
[root@hadoop001 ~]$ yum install -y openssl openssl-devel svn ncurses-devel zlib-devel libtool
[root@hadoop001 protobuf-2.5.0]$ yum install -y snappy snappy-devel bzip2 bzip2-devel lzo lzo-devel lzop autoconf automake
5.开始编译
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ pwd
/home/hadoop/app/source/hadoop-2.6.0-cdh5.7.0
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ mvn clean package -Pdist,native -DskipTests -Dtar
#出现下面的结果即为编译成功
[INFO] hadoop-mapreduce ................................... SUCCESS [  4.853 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  3.133 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  6.775 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  1.603 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [  1.474 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [  4.117 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [  3.182 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  2.064 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  1.562 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  2.098 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  6.435 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  3.757 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [  5.689 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [  3.103 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [  4.345 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  1.204 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  3.345 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  7.353 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.047 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 34.657 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30:43 min
[INFO] Finished at: 2019-04-08T10:51:51+08:00
[INFO] Final Memory: 212M/970M
[INFO] ------------------------------------------------------------------------
#查看编译之后的压缩方式支持
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hadoop checknative -a
19/04/08 11:28:50 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
19/04/08 11:28:50 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /usr/lib64/libsnappy.so.1
lz4:     true revision:99
bzip2:   true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so

如果报错 Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hadoop checknative -a
19/04/08 11:20:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop:  false 
zlib:    false 
snappy:  false 
lz4:     false 
bzip2:   false 
openssl: false 
19/04/08 11:20:21 INFO util.ExitUtil: Exiting with status 1

解决方法:

到网站 http://dl.bintray.com/sequenceiq/sequenceiq-bin/ 下载对应的编译版本。然后执行以下命令:

[hadoop@hadoop001 software]$ tar -xvf hadoop-native-64-2.6.0.tar -C $HADOOP_HOME/lib/
[hadoop@hadoop001 software]$ tar -xvf hadoop-native-64-2.6.0.tar -C $HADOOP_HOME/lib/native
# 配置环境变量
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
#然后复制文件
[hadoop@hadoop001 software]$ cp ~/app/source/hadoop-2.6.0-cdh5.7.0/hadoop-dist/target/hadoop-2.6.0-cdh5.7.0/lib/native/* /home/hadoop/app/hadoop/lib/native/
[hadoop@hadoop001 software]$ cp ~/app/source/hadoop-2.6.0-cdh5.7.0/hadoop-dist/target/hadoop-2.6.0-cdh5.7.0/lib/native/* /home/hadoop/app/hadoop/lib
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值