企业级Hadoop 2.x入门系列之二Hadoop 2.2.0源码编译

2.1下载地址

1、ApacheHadoop(100%永久开源)下载地址:

- http://hadoop.apache.org/releases.html

- SVN:http://svn.apache.org/repos/asf/hadoop/common/branches/

2、CDH(ClouderaDistributed Hadoop,100%永久开源)下载地址:

     - http://archive.cloudera.com/cdh4/cdh/4/(是tar.gz文件!)

     - http://archive.cloudera.com/cdh5/cdh/ (是tar.gz文件!)

2.2官方版本说明

(1)  官网:http://hadoop.apache.org

(2)  下载Hadoop包

 

(3)  官方版本存在的问题

官方版本是在Linux 32位环境下编译的,在Linux64为环境下运行会出错:

u  错误警告:WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable。

u  官网提供的二进制包,里面的native库,是32位的可以通过以下命令进行查看:

        $file $HADOOP_PREFIX/lib/native/libhadoop.so.1.0.0

    可以看到该库是基于32位的

 libhadoop.so.1.0.0: ELF 32-bit LSBshared object, Intel 80386, version 1 (SYSV), dynamically linked,BuildID[sha1]=0x9eb1d49b05f67d38454e42b216e053a27ae8bac9, not stripped。

2.3官方编译说明

在下载下来的hadoop-2.2.0-src.tar.gz包下有个BUILDING.txt文件,这个文件详细说明了编译步骤

Build instructions for Hadoop

----------------------------------------------------------------------------------

Requirements:先决条件

 

* Unix System          (这里采用社区版Linux CentOS 6.4版本 64

* JDK 1.6+             (JDK 1.6以上

* Maven 3.0 or later    (建议最好采用 3.0.5版本

* Findbugs 1.3.9 (if running findbugs)

* ProtocolBuffer 2.5.0

* CMake 2.6 or newer (if compiling native code)     (编译本地库

* Internet connection for first build (to fetch allMaven and Hadoop dependencies) (联网下载依赖包

----------------------------------------------------------------------------------

Maven main modules:

 

  hadoop                            (Main Hadoopproject)

         -hadoop-project           (Parent POM forall Hadoop Maven modules.             )

                                    (Allplugins & dependencies versions are defined here.)

         -hadoop-project-dist      (Parent POM formodules that generate distributions.)

         -hadoop-annotations       (Generates theHadoop doclet used to generated the Javadocs)

         -hadoop-assemblies        (Mavenassemblies used by the different modules)

         -hadoop-common-project    (Hadoop Common)

         -hadoop-hdfs-project      (Hadoop HDFS)

         -hadoop-mapreduce-project (Hadoop MapReduce)

         -hadoop-tools             (Hadoop toolslike Streaming, Distcp, etc.)

         -hadoop-dist              (Hadoopdistribution assembler)

 

----------------------------------------------------------------------------------

Where to run Maven from?

  It can berun from any module. The only catch is that if not run from utrunk  all modules that are not part of the buildrun must be installed in the local  Mavencache or available in a Maven repository.

 

----------------------------------------------------------------------------------

Maven build goals:

 * Clean                     : mvn clean

 *Compile                   : mvn compile[-Pnative]

 * Runtests                 : mvn test[-Pnative]

 * CreateJAR                : mvn package

 * Runfindbugs              : mvn compilefindbugs:findbugs

 * Runcheckstyle            : mvn compilecheckstyle:checkstyle

 * InstallJAR in M2 cache   : mvn install

 * Deploy JARto Maven repo  : mvn deploy

 * Runclover                : mvn test -Pclover[-DcloverLicenseLocation=${user.name}/.clover.license]

 * RunRat                   : mvnapache-rat:check

 * Buildjavadocs            : mvn javadoc:javadoc

 * Builddistribution        : mvn package[-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]

 * Change Hadoopversion     : mvn versions:set-DnewVersion=NEWVERSION

 

 Buildoptions:

  * Use-Pnative to compile/bundle native code

  * Use-Pdocs to generate & bundle the documentation in the distribution (using-Pdist)

  * Use -Psrcto create a project source TAR.GZ

  * Use -Dtarto create a TAR with the distribution (using -Pdist)

 

 Snappybuild options:

   Snappy isa compression library that can be utilized by the native code. It is currentlyan optional component, meaning that Hadoop can be built with  or without this dependency.

 

  * Use-Drequire.snappy to fail the build if libsnappy.so is not found. If this optionis not specified and the snappy library is missing,   we silently build a version of libhadoop.sothat cannot make use of snappy.  Thisoption is recommended if you plan on making use of snappy and want  to get more repeatable builds.

  * Use-Dsnappy.prefix to specify a nonstandard location for the libsnappy headerfiles and library files. You do not need this option if you have installedsnappy using a package manager.

  * Use-Dsnappy.lib to specify a nonstandard location for the libsnappy library   files. Similarly to nappy.prefix, you do not need this option if you have  installed snappy using a package manager.

  * Use-Dbundle.snappy to copy the contents of the snappy.lib directory into the finaltar file. This option requires that -Dsnappy.lib is also given, and it ignoresthe -Dsnappy.prefix option.

---------------------------------------------------------------------------------

Building components separately

 

If you are building a submodule directory, all thehadoop dependencies this submodule has will be resolved as all other 3rd partydependencies. This is,from the Maven cache or from a Maven repository (if notavailable in the cache or the SNAPSHOT 'timed out').

An alternative is to run 'mvn install -DskipTests' from Hadoop source top levelonce; and then work from the submodule. Keep in mind that SNAPSHOTs time outafter a while, using the Maven '-nsu' will stop Maven from trying to updateSNAPSHOTs from external repos.

 

----------------------------------------------------------------------------------

Protocol Buffer compiler

 

The version of Protocol Buffer compiler, protoc,must match the version of the protobuf JAR.

If you have multiple versions of protoc in yoursystem, you can set in your build shell the HADOOP_PROTOC_PATH environmentvariable to point to the one you want to use for the Hadoop build. If you don'tdefine this environment variable,protoc is looked up in the PATH.

----------------------------------------------------------------------------------

Importing projects to eclipse

 

When you import the project to eclipse, installhadoop-maven-plugins at first.

  $ cdhadoop-maven-plugins

  $ mvninstall

Then, generate eclipse project files.

  $ mvneclipse:eclipse -DskipTests

At last, import to eclipse by specifying the rootdirectory of the project via

[File] > [Import] > [Existing Projects intoWorkspace].

 

----------------------------------------------------------------------------------

Building distributions: (编译发布)

Create binary distribution without native codeand without documentation:(二进制源码)

  $ mvnpackage -Pdist -DskipTests –Dtar

 

Create binary distribution with native code andwith documentation:(二进制源码+本地库+文档)

  $ mvnpackage -Pdist,native,docs -DskipTests –Dtar

 

Create source distribution:(源码)

  $ mvnpackage -Psrc –DskipTests

 

Create source and binarydistributions with native code and documentation:(源码+二进制源码+本地库+文档)

  $ mvnpackage -Pdist,native,docs,src -DskipTests –Dtar

 

Create a local staging version of the website (in/tmp/hadoop-site)

  $ mvn cleansite; mvn site:stage -DstagingDirectory=/tmp/hadoop-site

 

----------------------------------------------------------------------------------

Handling out of memory errors in builds(解决内存溢出问题)

If the build process fails with an out of memoryerror, you should be able to fix it by increasing the memory used by maven-which can be done via the environment variable MAVEN_OPTS.

Here is an example setting to allocate between 256and 512 MB of heap space to Maven

export MAVEN_OPTS="-Xms256m -Xmx512m"

----------------------------------------------------------------------------------

2.4编译步骤

Step1:安装VMware 10 (略)

Step2:安装 Linux操作系统 64bit(略)

     这里采用社区版CentOS 6.4版本 64位. 下载地址:http://www.centoscn.com/CentosSoft/

Step3:设置Linux联网

(1)  设置VMware虚拟机网络模式为:NAT模式

(2)  设置Linux操作系统的网络类型为:动态获取DHCP服务器地址,与宿主机共享网络



(3)  测试:ping www.baidu.com


Step4:安装JDK

说明: JDK版本为1.5以上 ; 64位编译版本 (本环境采用jdk-6u45-linux-x64.bin)

(1)使用FTP工具(WinSCP工具或FileZilla)将jdk-6u45-linux-x64.bin上传到Linxu系统/software/目录下

(2)安装jdk

cd /software/

chmod u+x jdk-6u45-linux-x64.bin   --授予执行权限

mkdir /workDir                       --创建一个软件安装目录(个人习惯而已)

cp jdk-6u45-linux-x64.bin /workDir  --复制到workDir目录

./ jdk-6u45-linux-x64.bin        --执行自解压文件

mv jdk1.6.0_45 jdk6u45           --方便起见,对文件夹重命名

(3)配置环境变量

Vi /etc/profile

增加如下配置:

export JAVA_HOME=/workDir/jdk6u45

export PATH=.:$PATH:$JAVA_HOME/bin

(1)  使环境变量生效

source /etc/profile

(5)验证jdk是否安装成功

java –verson

Step5:安装依赖包

yum install autoconf -y

yum install automake -y

yum install libtool -y

yum install cmake -y

 yum installncurses-devel -y

 yum installopenssl-devel -y

 yum installgcc -y

yum install gcc-c++ -y

yum install lzo-devel -y

 yum installzlib-devel -y

说明:-y 代表在安装过程中提示选择默认为“yes”

验证:

rpm –qa | grep autoconf

【yum命令简介】:

yum(全称为 Yellow dog Updater, Modified)是一个在Fedora和RedHat以及SUSE中的Shell前端软件包管理器。基於RPM包管理,能够从指定的服务器自动下载RPM包并且安装,可以自动处理依赖性关系,并且一次安装所有依赖的软体包,无须繁琐地一次次下载、安装。yum提供了查找、安装、删除某一个、一组甚至全部软件包的命令,而且命令简洁而又好记。

 

yum的命令形式一般是如下:yum [options] [command] [package...]

其中的[options]是可选的,选项包括-h(帮助),-y(当安装过程提示选择全部为"yes"),-q(不显示安装的过程)等等。[command]为所要进行的操作,[package ...]是操作的对象。

 

- 部分常用的命令包括:

自动搜索最快镜像插件:   yum install yum-fastestmirror

安装yum图形窗口插件:     yum install yumex

查看可能批量安装的列表:  yum grouplist

 

- 安装

yuminstall 全部安装

yuminstall package1 安装指定的安装包package1

yumgroupinsall group1 安装程序组group1

Step6:安装Maven

(1)  Maven 版本下载apache-maven-3.0.5-bin.tar.gz

说明:不要使用最新的Maven 3.1.1,Hadoop2.2.0的源码与Maven3.x存在兼容性问题,所以会出现

java.lang.NoClassDefFoundError:org/sonatype/aether/graph/DependencyFilter

建议使用Maven3.0.5版本

(2)  下载

地址: http://maven.apache.org/download.cgi

选择 apache-maven-3.0.5-bin.tar.gz下载

(3)  上传到Linux并解压到安装目录

 tar –zxvf apache-maven-3.0.5-bin.tar.gz –C/workDir

(4)  设置环境变量

vi/etc/profile

新增:

exportMAVEN_HOME=/workDir/apache-maven-3.0.5

exportPATH=$PATH:$MAVEN_HOME/bin

 执行命令:source /etc/profile   或者 .  /etc/profile 

 验证:

mvn-v

Step7:配置Maven国内镜像

(1)  编辑 settings.xml文件

进入安装目录 /workDir/apache-maven-3.0.5/conf

* 修改<mirrors>内容:

     <mirror> 

         <id>nexus-osc</id> 

         <mirrorOf>*</mirrorOf> 

         <name>Nexusosc</name> 

         <url>http://maven.oschina.net/content/groups/public/</url> 

     </mirror>

* 修改<profiles>内容:

<profile> 

     <id>jdk-1.6</id> 

     <activation> 

         <jdk>1.6</jdk> 

     </activation> 

     <repositories> 

         <repository> 

              <id>nexus</id> 

              <name>localprivate nexus</name> 

              <url>http://maven.oschina.net/content/groups/public/</url> 

              <releases> 

                   <enabled>true</enabled> 

              </releases> 

              <snapshots> 

                   <enabled>false</enabled> 

              </snapshots> 

         </repository> 

     </repositories>

     <pluginRepositories> 

         <pluginRepository> 

              <id>nexus</id> 

              <name>localprivate nexus</name> 

              <url>http://maven.oschina.net/content/groups/public/</url> 

              <releases> 

                   <enabled>true</enabled> 

              </releases> 

              <snapshots> 

                   <enabled>false</enabled> 

              </snapshots> 

         </pluginRepository> 

     </pluginRepositories> 

</profile>

(2)  复制配置

说明:将settings.xml文件复制到用户目录,使得每次对maven创建时,都采用该配置

cd /home/Hadoop    --*查看用户目录【/home/hadoop】是否存在【.m2】文件夹,如没有,则创建

mkdir .m2

cp /workDir/apache-maven-3.0.5/conf/settings.xml~/.m2    --复制文件

(3)  配置DNS

vi /etc/resolv.conf     

修改如下:

nameserver 8.8.8.8

nameserver 8.8.4.4

Step8:安装protobuf

(1)  下载protobuf-2.5.0.tar.gz

       https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz

(2)  解压到安装目录

     cd /software

tar-zxvf protobuf-2.5.0.tar.gz –C /wrokDir

(3)  安装下面3个依赖包(如果已经安装可以跳过)

yuminstall gcc -y

yuminstall gcc-c++ -y 

yuminstall make  -y

【说明】:如果缺少这个3个依赖包,会报下面的错误:

ERROR]Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.2.0:protoc(compile-protoc) on project hadoop-common:org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did notreturn a version -> [Help 1] 

[ERROR]  

[ERROR]To see the full stack trace of the errors, re-run Maven with the -eswitch. 

[ERROR]Re-run Maven using the -X switch to enable full debug logging. 

[ERROR]  

[ERROR]For more information about the errors and possible solutions, please read thefollowing articles: 

[ERROR][Help 1]http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException 

[ERROR]  

[ERROR]After correcting the problems, you can resume the build with the command 

[ERROR]   mvn <goals> -rf :hadoop-common 

(4)  编译安装,执行配置文件

      进入安装目录,执行configure文件

     cd/workDir/protobuf-2.5.0       --进入安装目录

./configure                     --执行配置文件

(5)  安装

     make& make check & make install

说明:安装protobuf需要安装gcc gcc-c++系统包(如果之前安装的话就不用再安装)

(6)  配置环境变量

vi /etc/profile

新增:

export PROTOBUF_HOME=/workDir/ protobuf-2.5.0

export PATH=$PATH:$PROTOBUF_HOME/bin

 使配置生效:

source /etc/profile   或者  .  /etc/profile 

 验证:

protoc --version

Step9:安装findbugs-3.0.0

(1)  下载:findbugs-3.0.0.tar.gz

http://sourceforge.jp/projects/sfnet_findbugs/releases/

(2)  解压到安装目录

cd /software

tar -zxvf findbugs-3.0.0.tar.gz-C /workDir

(3)  设置环境变量

vi/etc/profile

增加如下内容:

       exportFINDBUGS_HOME=/wrokDir/findbugs-3.0.0

       exportPATH=$PATH:$FINDBUGS_HOME/bin

(4)  使环境变量生效

source/etc/profile   或者  ./etc/profile

(5)  验证

findbugs-version

重要说明

如果出现以下错误,说明jdk版本不兼容导致。findbugs-2.5.0和findbugs3.0.0是在jdk7以上编译的,所以需要在Linux上安装jdk7才可以。

错误提示:


Step10:编译hadoop-src-2.2.0源码

(1)  下载:hadoop-2.2.0-src.tar.gz 

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz 

(2)  解压到安装目录

cd/software

tar-zxvf  hadoop-2.2.0-src.tar.gz –C/workDir

(3)  源码包打Patch

- 重要说明:hadoop-2.2.0版本的源码存在bug,在apache官方JIRA上有说明:

JIRA地址:https://issues.apache.org/jira/browse/HADOOP-10110

- Bug修复办法

Index: hadoop-common-project/hadoop-auth/pom.xml

===================================================================

--- hadoop-common-project/hadoop-auth/pom.xml  (revision 1543124)

+++ hadoop-common-project/hadoop-auth/pom.xml  (working copy)

@@ -54,6 +54,11 @@

    </dependency>

    <dependency>

      <groupId>org.mortbay.jetty</groupId>

+     <artifactId>jetty-util</artifactId>

+     <scope>test</scope>

+   </dependency>

+   <dependency>

+     <groupId>org.mortbay.jetty</groupId>

      <artifactId>jetty</artifactId>

      <scope>test</scope>

    </dependency>

从上面官方的bug修复说明中可以看到,需要编辑目录$HADOOP_SRC_HOME/hadoop-common-project/hadoop-auth中的pom.xml文件,在第55行下增加以下内容:

 <dependency>

     <groupId>org.mortbay.jetty</groupId>

     <artifactId>jetty-util</artifactId>

     <scope>test</scope>

</dependency>

否则会报下面的错误:

[ERROR]Failed to execute goalorg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile(default-testCompile) on project hadoop-auth: Compilation failure: Compilationfailure:

[ERROR]/home/chuan/trunk/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[84,13]cannot access org.mortbay.component.AbstractLifeCycle

[ERROR]class file for org.mortbay.component.AbstractLifeCycle not found

(4)  编译

官方编译说明:

Createsource and binary distributions with native code and documentation:(源码+二进制源码+本地库+文档)

  $ mvnpackage -Pdist,native,docs,src -DskipTests –Dtar

 

cd/wrokDir/Hadoop-2.2.0-src

mvnpackage -DskipTests -Pdist,native -Dtar

说明:如果在编译过程中出现内存溢出的情况时,可以调整一下内存大小

export MAVEN_OPTS="-Xms256m -Xmx512m"

这个过程时间比较久,需要上网下载依赖包……

直到看到下面的信息,说明编译成功:

[INFO]------------------------------------------------------------------------ 

[INFO]BUILD SUCCESS 

[INFO]------------------------------------------------------------------------ 

[INFO]Total time: 11:53.144s 

[INFO]Finished at: Fri Nov 22 16:58:32 CST 2013 

[INFO]Final Memory: 70M/239M 

[INFO]------------------------------------------------------------------------

Step11:编译后说明

1.   查看编译后的文件

编译后的路径在:hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0

cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0

ll   --查看编译好的目录

编译后hadoop-2.2.0目录下的目录:

drwxr-xr-x. 2 root root 4096 Aug 11 12:00 bin

drwxr-xr-x. 3 root root 4096 Aug 11 12:00 etc

drwxr-xr-x. 2 root root 4096 Aug 11 12:00 include

drwxr-xr-x. 3 root root 4096 Aug 11 12:00 lib

drwxr-xr-x. 2 root root 4096 Aug 11 12:00 libexec

drwxr-xr-x. 2 root root 4096 Aug 11 12:00 sbin

drwxr-xr-x. 4 root root 4096 Aug 11 12:00 share

进入 bin目录,执行hadoop命令查看脚本

cd bin

./Hadoop version

可以看到所有版本:

[root@localhost bin]# ./hadoop version

Hadoop 2.2.0

Subversion Unknown -r Unknown

Compiled by root on 2014-08-11T18:34Z

Compiled with protoc 2.5.0

From source with checksum79e53ce7994d1628b240f09af91e1af4

This command was run using /workDir/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/common/

hadoop-common-2.2.0.jar

2.   查看本地库编译版本

cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0

file lib//native/*

可以看到是64位的版本了(红色字部分):

[root@localhost hadoop-2.2.0]# file lib//native/*

lib//native/libhadoop.a:        current ar archive

lib//native/libhadooppipes.a:   current ar archive

lib//native/libhadoop.so:       symbolic link to `libhadoop.so.1.0.0'

lib//native/libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1(SYSV), dynamically linked, not stripped

lib//native/libhadooputils.a:   current ar archive

lib//native/libhdfs.a:          current ar archive

lib//native/libhdfs.so:         symbolic link to `libhdfs.so.0.0.0'

lib//native/libhdfs.so.0.0.0:   ELF 64-bit LSBshared object, x86-64, version 1 (SYSV), dynamically linked, not stripped

至此,编译成功!

《Hadoop 2.x应用开发入门》视频连接地址:http://blog.csdn.net/cloudyhadoop/article/details/42341241

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值