hadoop3源码编译

  • 编译hadoop的原因,官网的hadoop版本是预编译版,由于hadoop需要读写文件,不可能全部用java实现,需要c,c++编译成的动态链接库即.so文件。hadoop编译可以使它支持一些压缩工具。
  • 笔者看了尚硅谷的hadoop教学视频,有意自主动手编译hadoop-3.1.3版本,从官网下载源码解压后发现有start-build-env.sh脚本,该脚本通过dev-support/docker下的Dockerfile文件构建了一个docker镜像,在该镜像构建时安装hadoop编译所需的环境工具。
    由于3.1.3版本相对较老,在dev-support/Dockerfile文件中大部分配置不实用,镜像是基于ubuntu:16.04,nodejs版本太老,无法正常安装browser,故需要对其大规模修改。由于ubuntu的社区属性,容易产生软件版本冲突,故这里建议选用Redhat公司支持的linux系列
    我的Dockerfie文件如下
FROM centos:7

WORKDIR /root

ADD jdk-8u291-linux-x64.tar.gz /opt/
# 注意这里jdk放在dev-support/docker文件夹下,注意修改文件名
RUN mv /opt/jdk1.8.0_291 /opt/jdk8
ENV JAVA_HOME /opt/jdk8
ENV CLASSPATH $JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
ENV PATH=$PATH:$JAVA_HOME/bin


RUN yum update -y && \
    yum install -y java-1.8.0-openjdk \
    gcc* \
    make \
    snappy* \
    bzip2* \
    lzo* \
    zlib* \
    openssl* \
    svn \
    ncurses* \
    autocong \
    automake \
    libtool \
    epel-release \
    *zstd* \
    gcc-c++ \
    bats \
    ShellCheck \
    python3 \
    sudo \
    fuse3 \
    fuse3-devel \
    doxygen \
    git \
    rsync \
    patch \
    vim
######
# Install cmake 3.1.0
######
RUN mkdir -p /opt/cmake && \
    curl -L -s -S \
       https://cmake.org/files/v3.20/cmake-3.20.0-linux-x86_64.tar.gz \
      -o /opt/cmake.tar.gz && \
    tar xzf /opt/cmake.tar.gz --strip-components 1 -C /opt/cmake
ENV CMAKE_HOME /opt/cmake
ENV PATH "${PATH}:/opt/cmake/bin"

######
# Install Google Protobuf 2.5.0
######
RUN mkdir -p /opt/protobuf-src && \
    curl -L -s -S \
      https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz \
      -o /opt/protobuf.tar.gz && \
    tar xzf /opt/protobuf.tar.gz --strip-components 1 -C /opt/protobuf-src
RUN cd /opt/protobuf-src && ./configure --prefix=/opt/protobuf && make install
ENV PROTOBUF_HOME /opt/protobuf
ENV PATH "${PATH}:/opt/protobuf/bin"

#RUN curl -L -s  https://rpm.nodesource.com/setup_10.x | bash - && \
# RUN yum install -y nodejs && \
#     npm config set registry https://registry.npm.taobao.org
# RUN npm install -g n && \
#     n lts && PATH="$PATH" && \
#     npm install -g bower && \
#     npm install -g ember-cli
RUN yum install -y wget && \
    mkdir -p /opt/nodejs && \
    wget -O /opt/nodejs.tar.xz https://npm.taobao.org/mirrors/node/v14.17.5/node-v14.17.5-linux-x64.tar.xz && \
    tar xf /opt/nodejs.tar.xz --strip-components 1 -C /opt/nodejs && \
    ln -s /opt/nodejs/bin/npm /usr/local/bin && \
    ln -s /opt/nodejs/bin/node /usr/local/bin && \
    npm config set registry https://registry.npm.taobao.org && \
    npm install -g bower && \
    npm install -g ember-cli


RUN pip3 install pylint==2.6.0 python-dateutil==2.8.1  -i https://pypi.doubanio.com/simple



RUN mkdir -p /opt/maven && \
    curl -L -s -S \
        https://dlcdn.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz \
    -o /opt/maven.tar.gz && \
    tar xzf /opt/maven.tar.gz --strip-components 1 -C /opt/maven
ENV MAVEN_HOME /opt/maven
ENV PATH "${PATH}:/opt/maven/bin"

RUN mkdir -p /opt/isa-l-src \
    && yum install -y  automake yasm libtool\
    && curl -L -s -S \
      https://github.com/intel/isa-l/archive/v2.29.0.tar.gz \
      -o /opt/isa-l.tar.gz \
    && tar xzf /opt/isa-l.tar.gz --strip-components 1 -C /opt/isa-l-src \
    && cd /opt/isa-l-src \
    && ./autogen.sh \
    && ./configure \
    && make "-j$(nproc)" \
    && make install \
    && cd /root \
    && rm -rf /opt/isa-l-src
###
# Avoid out of memory errors in builds
###
ENV MAVEN_OPTS -Xms512m -Xmx3072m
# ENV MAVEN_OPTS -Xms256m -Xmx1536m


# Add a welcome message and environment checks.
ADD hadoop_env_checks.sh /root/hadoop_env_checks.sh
RUN chmod 755 /root/hadoop_env_checks.sh
RUN echo '~/hadoop_env_checks.sh' >> /root/.bashrc

我创建了一个source_code 文件夹,将hadoop-3.1.3-src文件夹放入其中。start-build-env.sh文件默认是通过pwd名利把hadoop-3.1.3-src挂载到docker,为了提高该docker容器的复用性(比如之后要编译hive,spark等需要用到类似的环境),在该脚本文件中把默认的 -v "${PWD}:/home/${USER_NAME}/hadoop${V_OPTS:-}" \ 替换成 -v "/xxx/source_code/:/home/${USER_NAME}/source${V_OPTS:-}" \
请添加图片描述
如上图,说明docker容器构建成功
BUILDING文件中


Building distributions:

Create binary distribution without native code and without documentation:

$ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

Create binary distribution with native code and with documentation:

$ mvn package -Pdist,native,docs -DskipTests -Dtar

Create source distribution:

$ mvn package -Psrc -DskipTests

Create source and binary distributions with native code and documentation:

$ mvn package -Pdist,native,docs,src -DskipTests -Dtar

Create a local staging version of the website (in /tmp/hadoop-site)

$ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site

Note that the site needs to be built in a second pass after other artifacts.


cd ../source
cd hadoop-3.1.3-src
推荐 mvn package -Pdist,native,docs -DskipTests -Dtar

报错

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce-banned-dependencies) on project hadoop-client-check-test-invariants: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hadoop-client-check-test-invariants

hadoop-client-check-test-invariants 该模块在hadoop各个版本编译过程中经常报错,例如当时最新的版本hadoop-3.3.1 利用提供的脚本无需改变Dockerfile即可成功构建容器,但仍然该模块编译报错
解决方法:该模块属于hadoop-client-modules/,在其pom.xml文件中将该模块注释,不将其编译不会造成太大影响

  <modules>
    <!-- Left as an empty artifact w/dep for compat -->
    <module>hadoop-client</module>
    <!-- Should be used at compile scope for access to IA.Public classes -->
    <module>hadoop-client-api</module>
    <!-- Should be used at runtime scope for remaining classes necessary for hadoop-client-api to function -->
    <module>hadoop-client-runtime</module>
    <!-- Should be used at test scope for those that need access to mini cluster that works with above api and runtime -->
    <module>hadoop-client-minicluster</module>
    <!-- Checks invariants above -->
    <module>hadoop-client-check-invariants</module>
<!--     <module>hadoop-client-check-test-invariants</module> -->
    <!-- Attempt to use the created libraries -->
    <module>hadoop-client-integration-tests</module>
  </modules>

重新执行mvn package -Pdist,native,docs -DskipTests -Dtar
请添加图片描述
如上图所示编译成功

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值