- 编译hadoop的原因,官网的hadoop版本是预编译版,由于hadoop需要读写文件,不可能全部用java实现,需要c,c++编译成的动态链接库即.
so
文件。hadoop编译可以使它支持一些压缩工具。 - 笔者看了尚硅谷的hadoop教学视频,有意自主动手编译hadoop-3.1.3版本,从官网下载源码解压后发现有
start-build-env.sh
脚本,该脚本通过dev-support/docker
下的Dockerfile
文件构建了一个docker镜像,在该镜像构建时安装hadoop编译所需的环境工具。
由于3.1.3版本相对较老,在dev-support/Dockerfile
文件中大部分配置不实用,镜像是基于ubuntu:16.04
,nodejs版本太老,无法正常安装browser,故需要对其大规模修改。由于ubuntu的社区属性,容易产生软件版本冲突,故这里建议选用Redhat公司支持的linux系列
我的Dockerfie文件如下
FROM centos:7
WORKDIR /root
ADD jdk-8u291-linux-x64.tar.gz /opt/
# 注意这里jdk放在dev-support/docker文件夹下,注意修改文件名
RUN mv /opt/jdk1.8.0_291 /opt/jdk8
ENV JAVA_HOME /opt/jdk8
ENV CLASSPATH $JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
ENV PATH=$PATH:$JAVA_HOME/bin
RUN yum update -y && \
yum install -y java-1.8.0-openjdk \
gcc* \
make \
snappy* \
bzip2* \
lzo* \
zlib* \
openssl* \
svn \
ncurses* \
autocong \
automake \
libtool \
epel-release \
*zstd* \
gcc-c++ \
bats \
ShellCheck \
python3 \
sudo \
fuse3 \
fuse3-devel \
doxygen \
git \
rsync \
patch \
vim
######
# Install cmake 3.1.0
######
RUN mkdir -p /opt/cmake && \
curl -L -s -S \
https://cmake.org/files/v3.20/cmake-3.20.0-linux-x86_64.tar.gz \
-o /opt/cmake.tar.gz && \
tar xzf /opt/cmake.tar.gz --strip-components 1 -C /opt/cmake
ENV CMAKE_HOME /opt/cmake
ENV PATH "${PATH}:/opt/cmake/bin"
######
# Install Google Protobuf 2.5.0
######
RUN mkdir -p /opt/protobuf-src && \
curl -L -s -S \
https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz \
-o /opt/protobuf.tar.gz && \
tar xzf /opt/protobuf.tar.gz --strip-components 1 -C /opt/protobuf-src
RUN cd /opt/protobuf-src && ./configure --prefix=/opt/protobuf && make install
ENV PROTOBUF_HOME /opt/protobuf
ENV PATH "${PATH}:/opt/protobuf/bin"
#RUN curl -L -s https://rpm.nodesource.com/setup_10.x | bash - && \
# RUN yum install -y nodejs && \
# npm config set registry https://registry.npm.taobao.org
# RUN npm install -g n && \
# n lts && PATH="$PATH" && \
# npm install -g bower && \
# npm install -g ember-cli
RUN yum install -y wget && \
mkdir -p /opt/nodejs && \
wget -O /opt/nodejs.tar.xz https://npm.taobao.org/mirrors/node/v14.17.5/node-v14.17.5-linux-x64.tar.xz && \
tar xf /opt/nodejs.tar.xz --strip-components 1 -C /opt/nodejs && \
ln -s /opt/nodejs/bin/npm /usr/local/bin && \
ln -s /opt/nodejs/bin/node /usr/local/bin && \
npm config set registry https://registry.npm.taobao.org && \
npm install -g bower && \
npm install -g ember-cli
RUN pip3 install pylint==2.6.0 python-dateutil==2.8.1 -i https://pypi.doubanio.com/simple
RUN mkdir -p /opt/maven && \
curl -L -s -S \
https://dlcdn.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz \
-o /opt/maven.tar.gz && \
tar xzf /opt/maven.tar.gz --strip-components 1 -C /opt/maven
ENV MAVEN_HOME /opt/maven
ENV PATH "${PATH}:/opt/maven/bin"
RUN mkdir -p /opt/isa-l-src \
&& yum install -y automake yasm libtool\
&& curl -L -s -S \
https://github.com/intel/isa-l/archive/v2.29.0.tar.gz \
-o /opt/isa-l.tar.gz \
&& tar xzf /opt/isa-l.tar.gz --strip-components 1 -C /opt/isa-l-src \
&& cd /opt/isa-l-src \
&& ./autogen.sh \
&& ./configure \
&& make "-j$(nproc)" \
&& make install \
&& cd /root \
&& rm -rf /opt/isa-l-src
###
# Avoid out of memory errors in builds
###
ENV MAVEN_OPTS -Xms512m -Xmx3072m
# ENV MAVEN_OPTS -Xms256m -Xmx1536m
# Add a welcome message and environment checks.
ADD hadoop_env_checks.sh /root/hadoop_env_checks.sh
RUN chmod 755 /root/hadoop_env_checks.sh
RUN echo '~/hadoop_env_checks.sh' >> /root/.bashrc
我创建了一个source_code
文件夹,将hadoop-3.1.3-src文件夹放入其中。start-build-env.sh
文件默认是通过pwd名利把hadoop-3.1.3-src挂载到docker,为了提高该docker容器的复用性(比如之后要编译hive,spark等需要用到类似的环境),在该脚本文件中把默认的 -v "${PWD}:/home/${USER_NAME}/hadoop${V_OPTS:-}" \
替换成 -v "/xxx/source_code/:/home/${USER_NAME}/source${V_OPTS:-}" \
如上图,说明docker容器构建成功
BUILDING
文件中
Building distributions:
Create binary distribution without native code and without documentation:
$ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
Create binary distribution with native code and with documentation:
$ mvn package -Pdist,native,docs -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc -DskipTests
Create source and binary distributions with native code and documentation:
$ mvn package -Pdist,native,docs,src -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
Note that the site needs to be built in a second pass after other artifacts.
cd ../source
cd hadoop-3.1.3-src
推荐 mvn package -Pdist,native,docs -DskipTests -Dtar
报错
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce-banned-dependencies) on project hadoop-client-check-test-invariants: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :hadoop-client-check-test-invariants
hadoop-client-check-test-invariants
该模块在hadoop各个版本编译过程中经常报错,例如当时最新的版本hadoop-3.3.1 利用提供的脚本无需改变Dockerfile即可成功构建容器,但仍然该模块编译报错
解决方法:该模块属于hadoop-client-modules/
,在其pom.xml
文件中将该模块注释,不将其编译不会造成太大影响
<modules>
<!-- Left as an empty artifact w/dep for compat -->
<module>hadoop-client</module>
<!-- Should be used at compile scope for access to IA.Public classes -->
<module>hadoop-client-api</module>
<!-- Should be used at runtime scope for remaining classes necessary for hadoop-client-api to function -->
<module>hadoop-client-runtime</module>
<!-- Should be used at test scope for those that need access to mini cluster that works with above api and runtime -->
<module>hadoop-client-minicluster</module>
<!-- Checks invariants above -->
<module>hadoop-client-check-invariants</module>
<!-- <module>hadoop-client-check-test-invariants</module> -->
<!-- Attempt to use the created libraries -->
<module>hadoop-client-integration-tests</module>
</modules>
重新执行mvn package -Pdist,native,docs -DskipTests -Dtar
如上图所示编译成功