一、配置maven
1)上传并加压配置环境变量
[sc@hadoop102 software]$ unzip apache-maven-3.5.2-bin.zip
[sc@hadoop102 ~]$ mkdir /home/sc/repository
2)修改conf目录下面的settings.xml配置文件
在<mirrors></mirrors>标签中添加以下内容:
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
3)可以设置自己仓库位置,默认为${user.home}/.m2/repository
<localRepository>/home/sc/repository</localRepository>
4)配置环境变量:
编辑profile文件:vim /etc/profile
将环境变量内容修改为:
[sc@hadoop102 conf]$ sudo vim /etc/profile
export M2_HOME=/opt/module/apache-maven-3.5.2
export PATH=$PATH:$M2_HOME/bin
[sc@hadoop102 conf]$source /etc/profile
5)检查是否安装成功:
[sc@hadoop102 conf]$ mvn -version Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T00:58:13-07:00)
二、安装依赖包
通过yum安装即可,
yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool
三、安装并编译LZO
1)下载、安装并编译LZO
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
tar -zxvf lzo-2.10.tar.gz cd lzo-2.10
./configure -prefix=/usr/local/hadoop/lzo/
make
make install
2) 编译hadoop-lzo源码
-
下载hadoop-lzo的源码,
下载地址:https://github.com/twitter/hadoop-lzo/archive/master.zip
wget https://github.com/twitter/hadoop-lzo/archive/master.zip
[sc@hadoop102 software]$ unzip master.zip
[sc@hadoop102 software]$ mv hadoop-lzo-master/ /opt/module/
-
解压之后,修改pom.xml
<hadoop.current.version>2.7.2</hadoop.current.version>
-
声明两个临时环境变量
export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include
export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
-
编译
进入hadoop-lzo-master,执行maven编译命令
[sc@hadoop102 hadoop-lzo-master]$ mvn package -Dmaven.test.skip=true
[INFO] Building jar: /opt/module/hadoop-lzo-master/target/hadoop-lzo-0.4.21-SNAPSHOT-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 08:06 min [INFO] Finished at: 2020-05-18T01:48:03-07:00
[INFO] Final Memory: 34M/274M
- 进入target,将hadoop-lzo-0.4.21-SNAPSHOT.jar放到hadoop的classpath下,如${HADOOP_HOME}/share/hadoop/common
[sc@hadoop102 target]$ cp hadoop-lzo-0.4.21-SNAPSHOT.jar /opt/module/hadoop-2.7.2/share/hadoop/common
四、在hadoop中使用lzo
hadoop本身并不支持lzo压缩,故需要使用twitter提供的hadoop-lzo开源组件。
在此以hadoop2.7.2为例
1)将编译好后的hadoop-lzo-0.4.21.jar放入hadoop-2.7.2/share/hadoop/common/
[sc@hadoop102 common]$ ls |grep lzo hadoop-lzo-0.4.21-SNAPSHOT.jar
2)同步hadoop-lzo-0.4.21.jar到其他hadoop节点
[sc@hadoop102 common]$ xsync hadoop-lzo-0.4.21-SNAPSHOT.jar
3)在core-site.xml添加lzo压缩配置
<property>
<name>io.compression.codecs</name>
<value> org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec </value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
4)同步core-site.xml到其他hadoop节点。
[sc@hadoop102 hadoop]$ xsync core-site.xml
5)重启hadoop集群,ok