参考博客原址:https://blog.csdn.net/weixin_38729343/article/details/97971505
一、说明
直接使用的hadoop-2.6.0-cdh5.7.0.tar.gz包部署的hadoop集群不支持文件压缩,生产上是不可接受的,故需要将hadoop源码下载重新编译支持压缩
二、编译hadoop支持压缩
1、编译流程:
下载软件——>安装必要依赖库——>添加用户并创建文件夹上传软件——>安装JDK并添加环境变量——>安装maven并配置和添加环境变量——>安装protobuf并添加环境变量和编译——>编译hadoop支持压缩
2、步骤解读:
(1)软件下载
组件版本 | 百度网盘链接 |
---|---|
Hadoop-2.6.0-cdh5.7.0-src.tar.gz | https://pan.baidu.com/s/1uRMGIhLSL9QHT-Ee4F16jw 提取码:jb1d |
jdk-7u80-linux-x64.tar.gz | https://pan.baidu.com/s/1xSCQ8rjABVI-zDFQS5nCPA 提取码:lfze |
apache-maven-3.3.9-bin.tar.gz | https://pan.baidu.com/s/1ddkdkLW7r7ahFZmgACGkVw 提取码:fdfz |
protobuf-2.5.0.tar.gz | https://pan.baidu.com/s/1RSNZGd_ThwknMB3vDkEfhQ 提取码:hvc2 |
注:编译的JDK版本必须是1.7,1.8的JDK会导致编译失败
(2)安转必要依赖库
[root@hadoop001 ~]# yum install -y svn ncurses-devel
[root@hadoop001 ~]# yum install -y gcc gcc-c++ make cmake
[root@hadoop001 ~]# yum install -y openssl openssl-devel svn ncurses-devel zlib-devel libtool
[root@hadoop001 ~]# yum install -y snappy snappy-devel bzip2 bzip2-devel lzo lzo-devel lzop autoconf automake cmake
- 1
- 2
- 3
- 4
- 5
(3)添加用户并创建文件夹上传软件
[root@hadoop001 ~]# yum install -y lrzsz
[root@hadoop001 ~]# useradd hadoop
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ mkdir app soft source lib data maven_repo shell mysql
[hadoop@hadoop001 ~]$ cd soft/
[hadoop@hadoop001 soft]$ rz
上传后 ll 查看一下是否有误
[hadoop@hadoop001 soft]$ ll
total 202192
-rw-r--r--. 1 hadoop hadoop 8491533 Apr 7 11:25 apache-maven-3.3.9-bin.tar.gz
-rw-r--r--. 1 hadoop hadoop 42610549 Apr 6 16:55 hadoop-2.6.0-cdh5.7.0-src.tar.gz
-rw-r--r--. 1 hadoop hadoop 153530841 Apr 7 11:12 jdk-7u80-linux-x64.tar.gz
-rw-r--r--. 1 hadoop hadoop 2401901 Apr 7 11:31 protobuf-2.5.0.tar.gz
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
(4)安装JDK并添加环境变量
解压并将文件放在/usr/java目录,修改拥有者为root
创建/usr/java/目录,修改chown权限为root
[root@hadoop001 ~]# mkdir /usr/java/
[root@hadoop001 ~]# tar -zxvf /home/hadoop/soft/jdk-7u80-linux-x64.tar.gz -C /usr/java
[root@hadoop001 ~]# cd /usr/java/
[root@hadoop001 java]# chown -R root:root jdk1.7.0_80
配置/etc/profile环境变量
[root@hadoop001 jdk1.7.0_80]# vi /etc/profile
添加如下两行环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_80
export PATH=
J
A
V
A
H
O
M
E
/
b
i
n
/
:
JAVA_HOME/bin/:
JAVAHOME/bin/:PATH
[root@hadoop001 java]# source /etc/profile
测试java是否安装成功
[root@hadoop001 jdk1.7.0_80]# java -version
java version “1.7.0_80”
Java™ SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot™ 64-Bit Server VM (build 24.80-b11, mixed mode)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
(5)安装maven并配置和添加环境变量
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ tar -zxvf ~/soft/apache-maven-3.3.9-bin.tar.gz -C ~/app/
修改haoop用户的环境变量
[hadoop@hadoop001 ~]$ vi ~/.bash_profile
添加或修改如下内容,注意MAVEN_OPTS设置了maven运行的内存,防止内存太小导致编译失败
export MAVEN_HOME=/home/hadoop/app/apache-maven-3.3.9
export MAVEN_OPTS="-Xms1024m -Xmx1024m"
export PATH=
M
A
V
E
N
H
O
M
E
/
b
i
n
:
MAVEN_HOME/bin:
MAVENHOME/bin:PATH
[hadoop@hadoop001 ~]$ source ~/.bash_profile
[hadoop@hadoop001 ~]$ which mvn
~/app/apache-maven-3.3.9/bin/mvn
[hadoop@hadoop001 protobuf-2.5.0]$ vi ~/app/apache-maven-3.3.9/conf/settings.xml
配置maven的本地仓库位置
注意:本地仓库位置不要放错
<localRepository>/home/hadoop/maven_repo/repo</localRepository>
添加阿里云中央仓库地址
注意一定要写在<mirrors></mirrors>之间
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
通常依赖的jar包下载不齐全,可将repo文件上传
#jar包链接
链接:https://pan.baidu.com/s/1vq4iVFqqyJNkYzg90bVrfg
提取码:vugv
复制这段内容后打开百度网盘手机App,操作更方便哦
#下载后 rz上传解压,注意目录层次
[hadoop@hadoop001 maven_repo]$ rz
[hadoop@hadoop001 maven_repo]$ tar -zxvf repo.tar.gz
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
(6)安装protobuf并添加环境变量和编译
[hadoop@hadoop001 ~]$ tar -zxvf ~/soft/protobuf-2.5.0.tar.gz -C ~/app/
[hadoop@hadoop001 protobuf-2.5.0]$ cd ~/app/protobuf-2.5.0/
--prefix= 是用来待会编译好的包放在为路径
[hadoop@hadoop001 protobuf-2.5.0]$ ./configure --prefix=/home/hadoop/app/protobuf-2.5.0
编译以及安装
[hadoop@hadoop001 protobuf-2.5.0]$ make
[hadoop@hadoop001 protobuf-2.5.0]$ make install
[hadoop@hadoop001 protobuf-2.5.0]$ vim ~/.bash_profile
追加如下两行内容,未编译前是没有bin目录的
export PROTOBUF_HOME=/home/hadoop/app/protobuf-2.5.0
export PATH=
P
R
O
T
O
B
U
F
H
O
M
E
/
b
i
n
:
PROTOBUF_HOME/bin:
PROTOBUFHOME/bin:PATH
[hadoop@hadoop001 protobuf-2.5.0]$ source ~/.bash_profile
测试是否生效,若出现libprotoc 2.5.0则为生效
[hadoop@hadoop001 protobuf-2.5.0]$ protoc --version
libprotoc 2.5.0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
(7)编译hadoop支持压缩
[hadoop@hadoop001 protobuf-2.5.0]$ tar -zxvf ~/soft/hadoop-2.6.0-cdh5.7.0-src.tar.gz -C ~/source/
#进入hadoop的源码目录
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cd ~/source/hadoop-2.6.0-cdh5.7.0/
#进行编译
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ mvn clean package -Pdist,native -DskipTests -Dtar
- 1
- 2
- 3
- 4
- 5
- 6
查看编译后的包:hadoop-2.6.0-cdh5.7.0.tar.gz
#有 BUILD SUCCESS 信息则表示编译成功
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 13.592 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 12.042 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.094 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [01:49 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37:39 min
[INFO] Finished at: 2019-04-07T16:48:42+08:00
[INFO] Final Memory: 200M/989M
[INFO] ------------------------------------------------------------------------
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ ll /home/hadoop/source/hadoop-2.6.0-cdh5.7.0/hadoop-dist/target/
total 564036
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 7 16:46 antrun
drwxrwxr-x. 3 hadoop hadoop 4096 Apr 7 16:46 classes
-rw-rw-r--. 1 hadoop hadoop 1998 Apr 7 16:46 dist-layout-stitching.sh
-rw-rw-r--. 1 hadoop hadoop 690 Apr 7 16:47 dist-tar-stitching.sh
drwxrwxr-x. 9 hadoop hadoop 4096 Apr 7 16:47 hadoop-2.6.0-cdh5.7.0
-rw-rw-r--. 1 hadoop hadoop 191880143 Apr 7 16:47 hadoop-2.6.0-cdh5.7.0.tar.gz
-rw-rw-r--. 1 hadoop hadoop 7314 Apr 7 16:47 hadoop-dist-2.6.0-cdh5.7.0.jar
-rw-rw-r--. 1 hadoop hadoop 385618309 Apr 7 16:48 hadoop-dist-2.6.0-cdh5.7.0-javadoc.jar
-rw-rw-r--. 1 hadoop hadoop 4855 Apr 7 16:47 hadoop-dist-2.6.0-cdh5.7.0-sources.jar
-rw-rw-r--. 1 hadoop hadoop 4855 Apr 7 16:47 hadoop-dist-2.6.0-cdh5.7.0-test-sources.jar
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 7 16:47 javadoc-bundle-options
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 7 16:47 maven-archiver
drwxrwxr-x. 3 hadoop hadoop 4096 Apr 7 16:46 maven-shared-archive-resources
drwxrwxr-x. 3 hadoop hadoop 4096 Apr 7 16:46 test-classes
drwxrwxr-x. 2 hadoop hadoop 4096 Apr 7 16:46 test-dir
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
三、举例说明编译错误以及处理方式
(1)第一种情况
[WARNING] Unrecognised tag: 'mirror' (position: START_TAG seen ...</mirrors>\n<mirror>... @163:9) @ /home/hadoop/app/apache-maven-3.3.9/conf/settings.xml, line 163, column 9
- 1
原因:maven配置错误,未将阿里云中央仓库地址写在 < mirrors> < /mirrors > 之间
(2)第二种情况:
[FATAL] Non-resolvable parent POM for org.apache.hadoop:hadoop-main:2.6.0-cdh5.7.0: Could not transfer artifact com.cloudera.cdh:cdh-root:pom:5.7.0 from/to cdh.repo (https://repository.cloudera.com/artifactory/cloudera-repos): Remote host closed connectio
#分析:是https://repository.cloudera.com/artifactory/cloudera-repos/com/cloudera/cdh/cdh-root/5.7.0/cdh-root-5.7.0.pom文件下载不了
#解决方案:前往本地仓库到目标文件目录,然后 通过wget 文件,来成功获取该文件,重新执行编译命令,或者执行4.5的可选步骤,将需要的jar直接放到本地仓库
- 1
- 2
- 3
- 4
文件cdh-root-5.7.0.pom未下载下来,解决方式将该文件手动下载上传,
pom文件位置(com.cloudera.cdh:cdh-root:pom:5.7.0),打开这个网址https://repository.cloudera.com/artifactory/cloudera-repos/找到对应的pom文件下载后放在maven仓库的文件夹
maven仓库的文件夹位置:在/home/hadoop/maven_repo/repo目录下查看对应的是缺少哪个目录下的pom和jar包文件
重新执行 mvn clean package -Pdist,native -DskipTests -Dtar
举例说明pom文件位置
[hadoop@hadoop001 1.7.6-cdh5.7.0]$ pwd
/home/hadoop/maven_repo/repo/org/apache/avro/avro-compiler/1.7.6-cdh5.7.0
[hadoop@hadoop001 1.7.6-cdh5.7.0]$ ll
total 152
-rw-r--r--. 1 hadoop hadoop 77422 Dec 2 2018 avro-compiler-1.7.6-cdh5.7.0.jar
-rw-r--r--. 1 hadoop hadoop 4690 Dec 2 2018 avro-compiler-1.7.6-cdh5.7.0.pom
- 1
- 2
- 3
- 4
- 5
- 6
- 7