Compiling and installing Hadoop 2.4 on 64-bit Oracle Linux 6

Compiling and installing Hadoop 2.4 on 64-bit Oracle Linux 6

If you are planning to run Hadoop on a 64 bit OS you might want to build it from source. The native Hadoop library (libhadoop.so.1.0.0) found in the Hadoop 2.4 distribution is actually compiled on a 32 bit platform. This results in a myriad of annoying errors like the one below, which you can eliminate if you recompile libhadoop.so.1.0.0 on your 64 bit platform.

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

In this tutorial we will see how to prepare a clean Oracle Linux 6 system for accommodating Hadoop and how to actually build Hadoop from its source files.

Prerequisites
The only prerequisite for following this tutorial is a default installation of 64-bit Oracle Linux Server 6.5. The host I am using is named hadoop

If this is a dev/test system you might as well disable SELinux and iptables to make your life easier.

[root@hadoop ~]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Unloading modules:                               [  OK  ]
[root@hadoop ~]# chkconfig iptables off
[root@hadoop ~]#

For disabling SELinux set the SELINUX parameter in /etc/sysconfig/selinux from enforcing to disabled.

[root@hadoop ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
[root@hadoop ~]#

You must reboot the system for the changes to take effect.

JDK Installation
You can download the latest JDK from Oracle Technology Network or use wget to download it directly if you know the exact URL for the version you’re downloading.

[root@hadoop ~]# wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-linux-x64.tar.gz
--2014-07-12 12:40:23--  http://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-linux-x64.tar.gz
Resolving download.oracle.com... 176.255.203.9, 176.255.203.10
...

100%[=====================================================================================================================================================================>] 159,008,252 1.80M/s   in 93s

2014-07-12 12:41:57 (1.62 MB/s) - “jdk-8u5-linux-x64.tar.gz” saved [159008252/159008252]

[root@hadoop ~]#

Now extract the archive in /opt.

[root@hadoop ~]# tar -xzf jdk-8u5-linux-x64.tar.gz -C /opt/
[root@hadoop ~]#

Use alternatives to set the Java symbolic links to your newly installed JDK.

[root@hadoop ~]# alternatives --install /usr/bin/java java /opt/jdk1.8.0_05/bin/java 2
[root@hadoop ~]# alternatives --config java

There are 3 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
   2           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
   3           /opt/jdk1.8.0_05/bin/java

Enter to keep the current selection[+], or type selection number: 3
[root@hadoop ~]#

Let’s confirm that java points to the correct JDK version.

[root@hadoop ~]# java -version
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
[root@hadoop ~]#

Create a dedicated Hadoop user account
Our next step is to create a dedicated user account that owns and runs the Hadoop software. I am going to name my user haduser and his primary group will be called hadgroup.

[root@hadoop ~]# groupadd hadgroup
[root@hadoop ~]# useradd haduser -G hadgroup
[root@hadoop ~]# passwd haduser
Changing password for user haduser.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@hadoop ~]#

Let’s switch to the newly created.

[root@hadoop ~]# su - haduser
[haduser@hadoop ~]$

Setup key based authentication for haduser
Hadoop requires secure shell connections to the localhost without a passphrase, so let’s configure key-based SSH access.

[haduser@hadoop ~]$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/haduser/.ssh/id_rsa):
Created directory '/home/haduser/.ssh'.
Your identification has been saved in /home/haduser/.ssh/id_rsa.
Your public key has been saved in /home/haduser/.ssh/id_rsa.pub.
The key fingerprint is:
8c:79:27:a5:81:00:3c:00:21:b6:3c:e7:72:bc:2c:65 haduser@hadoop
The key's randomart image is:
+--[ RSA 2048]----+
|*=...            |
|+ +  . .         |
| + o  . . .      |
|  =    + +       |
| . E  o S .      |
|  * .  . o       |
| . o             |
|  .              |
|                 |
+-----------------+
[haduser@hadoop ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[haduser@hadoop ~]$ chmod 0600 ~/.ssh/authorized_keys

We can do a quick test by invoking date via ssh and adding localhost to the list of known hosts if necessary.

[haduser@hadoop ~]$ ssh localhost date
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 8a:c5:52:a2:cf:c9:55:c1:57:15:5c:37:25:16:41:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Sat Jul 12 16:07:28 BST 2014
[haduser@hadoop ~]$

Install build tools
There is a set of tools and libraries we have to install that are required for compiling Hadoop from source. Let’s start by adding the default development toolset (gccautoconf etc.). We’ll have to do this as root.

[root@hadoop protobuf-2.5.0]# yum groupinstall "Development Tools" "Development Libraries"
Loaded plugins: refresh-packagekit, security
Setting up Group Process
Package gcc-4.4.7-4.el6.x86_64 already installed and latest version
Package 1:make-3.81-20.el6.x86_64 already installed and latest version
Package patch-2.6-6.el6.x86_64 already installed and latest version
Package 1:pkgconfig-0.23-9.1.el6.x86_64 already installed and latest version
Package gettext-0.17-16.el6.x86_64 already installed and latest version
Package binutils-2.20.51.0.2-5.36.el6.x86_64 already installed and latest version
Package elfutils-0.152-1.el6.x86_64 already installed and latest version
Package cvs-1.11.23-16.el6.x86_64 already installed and latest version
Warning: Group Development Libraries does not exist.
Resolving Dependencies
--> Running transaction check
---> Package autoconf.noarch 0:2.63-5.1.el6 will be installed
---> Package automake.noarch 0:1.11.1-4.el6 will be installed
---> Package bison.x86_64 0:2.4.1-5.el6 will be installed
…
Transaction Summary
===============================================================================================================================================================================================================
Install      32 Package(s)

Total download size: 57 M
Installed size: 186 M
Is this ok [y/N]:y
Downloading Packages:
(1/32): autoconf-2.63-5.1.el6.noarch.rpm                                                                                                                                                | 781 kB     00:00
(2/32): automake-1.11.1-4.el6.noarch.rpm                                                                                                                                                | 550 kB     00:00
(3/32): bison-2.4.1-5.el6.x86_64.rpm                                                                                                                                                    | 636 kB     00:00
(4/32): byacc-1.9.20070509-7.el6.x86_64.rpm                                                                                                                                             |  47 kB     00:00
…
Dependency Installed:
  gettext-devel.x86_64 0:0.17-16.el6     gettext-libs.x86_64 0:0.17-16.el6   kernel-devel.x86_64 0:2.6.32-431.20.3.el6   libgcj.x86_64 0:4.4.7-4.el6                 libgfortran.x86_64 0:4.4.7-4.el6
  libstdc++-devel.x86_64 0:4.4.7-4.el6   perl-Error.noarch 1:0.17015-4.el6   perl-Git.noarch 0:1.7.1-3.el6_4.1           systemtap-client.x86_64 0:2.3-4.0.1.el6_5   systemtap-devel.x86_64 0:2.3-4.0.1.el6_5

Complete!
[root@hadoop protobuf-2.5.0]#

Another two packages required for successfully compiling Hadoop are openssl-devel and cmake.

[root@hadoop ~]# yum install openssl-devel cmake
Loaded plugins: refresh-packagekit, security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package cmake.x86_64 0:2.6.4-5.el6 will be installed
---> Package openssl-devel.x86_64 0:1.0.1e-16.el6_5.14 will be installed
…
Transaction Summary
=============================================================================================================
Install       8 Package(s)

Total download size: 7.1 M
Installed size: 22 M
Is this ok [y/N]:y
…
Dependency Installed:
  keyutils-libs-devel.x86_64 0:1.4-4.el6               krb5-devel.x86_64 0:1.10.3-15.el6_5.1
  libcom_err-devel.x86_64 0:1.42.8-1.0.1.el6           libselinux-devel.x86_64 0:2.0.94-5.3.el6_4.1
  libsepol-devel.x86_64 0:2.0.41-4.el6                 zlib-devel.x86_64 0:1.2.3-29.el6

Complete!
[root@hadoop ~]#

We will also need Apache Maven (build automation tool) and Protocol Buffers (serialization library developed by Google). Let’s start by getting and uncompressing the latest version of Maven (3.2.2 at the time of writing of this article).

[root@hadoop ~]# wget http://mirrors.gigenet.com/apache/maven/maven-3/3.2.2/binaries/apache-maven-3.2.2-bin.tar.gz
--2014-07-12 16:41:02--  http://mirrors.gigenet.com/apache/maven/maven-3/3.2.2/binaries/apache-maven-3.2.2-bin.tar.gz
Resolving mirrors.gigenet.com... 69.65.15.34
Connecting to mirrors.gigenet.com|69.65.15.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6940967 (6.6M) [application/x-gzip]
Saving to: “apache-maven-3.2.2-bin.tar.gz”

100%[==============================================================================>] 6,940,967    677K/s   in 12s

2014-07-12 16:41:15 (569 KB/s) - “apache-maven-3.2.2-bin.tar.gz” saved [6940967/6940967]

[root@hadoop ~]# tar -zxf apache-maven-3.2.2-bin.tar.gz -C /opt/
[root@hadoop ~]#

Our next step is to create a dedicated initialization script that will set the following environment variables for Maven.

JAVA_HOME=/opt/jdk1.8.0_05
M3_HOME=/opt/apache-maven-3.2.2
PATH=/opt/apache-maven-3.2.2/bin:$PATH

We will create a new file (maven.sh) in /etc/profile.d and put the content above inside:

[root@hadoop ~]# cat < /etc/profile.d/maven.sh
> export JAVA_HOME=/opt/jdk1.8.0_05
> export M3_HOME=/opt/apache-maven-3.2.2
> export PATH=/opt/apache-maven-3.2.2/bin:$PATH
> EOF
[root@hadoop ~]#

Logout and login back and verify that M3_HOME is correctly set.

[root@hadoop ~]# echo $M3_HOME
/opt/apache-maven-3.2.2
[root@hadoop ~]#

Confirm that you can successfully start Maven and that it is using the correct Java version.

[root@hadoop ~]# mvn -version
Apache Maven 3.2.2 (45f7c06d68e745d05611f7fd14efb6594181933e; 2014-06-17T14:51:42+01:00)
Maven home: /opt/apache-maven-3.2.2
Java version: 1.8.0_05, vendor: Oracle Corporation
Java home: /opt/jdk1.8.0_05/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.8.13-35.1.2.el6uek.x86_64", arch: "amd64", family: "unix"
[root@hadoop ~]#

Time to deal with Protocol Buffers. First, let’s download protobufer. Note that I do this using the haduseraccount.

[haduser@hadoop ~]$ wget https://protobuf.googlecode.com/files/protobuf-.5.0.tar.bz2                                                                                                                          
--2014-07-13 13:45:59--  https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2
Resolving protobuf.googlecode.com... 173.194.78.82, 2a00:1450:400c:c00::52
Connecting to protobuf.googlecode.com|173.194.78.82|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1866763 (1.8M) [application/x-bzip2]
Saving to: “protobuf-2.5.0.tar.bz2”

100%[=====================================================================================================================================================================>] 1,866,763   1.37M/s   in 1.3s

2014-07-13 13:46:00 (1.37 MB/s) - “protobuf-2.5.0.tar.bz2” saved [1866763/1866763]

[haduser@hadoop ~]$

Untar the file and run the configure script to prepare the source code for compilation.

[haduser@hadoop ~]$ tar jxf protobuf-2.5.0.tar.bz2
[haduser@hadoop ~]$ cd protobuf-2.5.0
[haduser@hadoop protobuf-2.5.0]$ ./configure --prefix=/home/haduser/protobuf-2.5.0/inst/bin
checking whether to enable maintainer-specific portions of Makefiles... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
...
config.status: creating Makefile
config.status: creating scripts/gtest-config
config.status: creating build-aux/config.h
config.status: build-aux/config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands
[haduser@hadoop protobuf-2.5.0]$

Let’s build the Protocol Buffer objects by running make.

[haduser@hadoop protobuf-2.5.0]$ make
make  all-recursive
make[1]: Entering directory `/home/haduser/protobuf-2.5.0'
Making all in .
make[2]: Entering directory `/home/haduser/protobuf-2.5.0'
make[2]: Leaving directory `/home/haduser/protobuf-2.5.0'
Making all in src
…
make[3]: Leaving directory `/home/haduser/protobuf-2.5.0/src'
make[2]: Leaving directory `/home/haduser/protobuf-2.5.0/src'
make[1]: Leaving directory `/home/haduser/protobuf-2.5.0'
[haduser@hadoop protobuf-2.5.0]$

Invoke make install to put the objects we’ve just built into their proper locations.

[haduser@hadoop protobuf-2.5.0]$ make install
Making install in .
make[1]: Entering directory `/home/haduser/protobuf-2.5.0'
make[2]: Entering directory `/home/haduser/protobuf-2.5.0'
...
----------------------------------------------------------------------
Libraries have been installed in:
   /home/haduser/protobuf-2.5.0/inst/bin/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
...
make[3]: Leaving directory `/home/haduser/protobuf-2.5.0/src'
make[2]: Leaving directory `/home/haduser/protobuf-2.5.0/src'
make[1]: Leaving directory `/home/haduser/protobuf-2.5.0/src'
[haduser@hadoop protobuf-2.5.0]$

This concludes all preparations and we are now ready to crack on with compiling Hadoop.

Compiling Hadoop
We will perform the compilation as the Hadoop owner (haduser). We also need Protocol Buffers added to the current PATH.

[haduser@hadoop protobuf-2.5.0]$ export PATH=/home/haduser/protobuf-2.5.0/inst/bin/bin:$PATH
[haduser@hadoop protobuf-2.5.0]$

Get Hadoop’s source code from Apache.

[haduser@hadoop ~]$ wget http://apache.fastbull.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz                                                                                                         --2014-07-12 16:36:47--  http://apache.fastbull.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
Resolving apache.fastbull.org... 194.116.84.14
...
2014-07-12 16:37:01 (1.13 MB/s) - “hadoop-2.4.1-src.tar.gz” saved [15417097/15417097]

[haduser@hadoop ~]$

Extract the archive in haduser‘s home directory.

[haduser@hadoop ~]$ tar xf hadoop-2.4.1-src.tar.gz
[haduser@hadoop ~]$ cd hadoop-2.4.1-src
[haduser@hadoop hadoop-2.4.1-src]$

Before we try to build Hadoop we’ll have to deal with doclint. This new addition was added to Javadoc with JDK 8, and it’s aim is to make sure that Javadoc’s output is W3C HTML 4.01 compliant. This makes the handling of Javadoc more strict and will prevent the successful compilation of Hadoop due to some missing HTML tags. The easiest way to avoid this issue is to disable doclint completely.

Open the pom.xml file that sits in the haddop-2.4.1-src folder. This XML file contains information about the project and configuration details used by Maven to build it. We will add one additional parameter in the global properties section that will disable doclint.

After the change your properties section should look like this:

<properties>
    <distMgmtSnapshotsId>apache.snapshots.https</distMgmtSnapshotsId>
    <distMgmtSnapshotsName>Apache Development Snapshot Repository</distMgmtSnapshotsName>
    <distMgmtSnapshotsUrl>https://repository.apache.org/content/repositories/snapshots</distMgmtSnapshotsUrl>
    <distMgmtStagingId>apache.staging.https</distMgmtStagingId>
    <distMgmtStagingName>Apache Release Distribution Repository</distMgmtStagingName>
    <distMgmtStagingUrl>https://repository.apache.org/service/local/staging/deploy/maven2</distMgmtStagingUrl>

    <!-- platform encoding override -->
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <additionalparam>-Xdoclint:none</additionalparam>
  </properties>

We are now ready to build Hadoop. Invoke Maven with the appropriate build profile, sit back and wait for the build process to complete. Depending on your system this might take a while.

[haduser@hadoop hadoop-2.4.1-src]$ mvn package -Pdist,native -DskipTests -Dtar
[INFO] Scanning for projects...
Downloading: http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom
Downloaded: http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom (4 KB at 11.8 KB/sec)
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 09:38 min
[INFO] Finished at: 2014-07-13T14:11:43+01:00
[INFO] Final Memory: 194M/493M
[INFO] ------------------------------------------------------------------------
[haduser@hadoop hadoop-2.4.1-src]$

After the build process is complete, switch back to root and place the compiled code in its final location – I use /opt.

[root@hadoop ~]# mv /home/haduser/hadoop-2.4.1-src/hadoop-dist/target/hadoop-2.4.1 /opt/
[root@hadoop ~]#

Switch to haduser again and put the JAVA_HOME and HADOOP_INSTALL environment variables in the user’s profile.

[root@hadoop ~]# su - haduser
[haduser@hadoop ~]$ cat >> ~/.bash_profile << EOF
> export JAVA_HOME=/opt/jdk1.8.0_05
> export HADOOP_INSTALL=/opt/hadoop-2.4.1
> export PATH=$PATH:/opt/hadoop-2.4.1/sbin:/opt/hadoop-2.4.1/bin
> EOF
[haduser@hadoop ~]$ source .bash_profile
[haduser@hadoop ~]$

Testing Hadoop
Our final task is to quickly configure and test the code we’ve just built.
Edit the $HADOOP_INSTALL/etc/hadoop/core-site.xml file and put the following lines between the <configuration></configuration> tags.

<property>
  <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
    <value>/opt/hadoop-2.4.1/tmp</value>
</property>

Create a new mapred-site.xml file based on the standard template, by copying the mapred-site.xml.template file.

[haduser@hadoop ~]$ cp /opt/hadoop-2.4.1/etc/hadoop/mapred-site.xml.template /opt/hadoop-2.4.1/etc/hadoop/mapred-site.xml
[haduser@hadoop ~]$

Edit the newly created $HADOOP_INSTALL/etc/hadoop/mapred-site.xml file and put the following between the <configuration></configuration> tags.

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9002</value>
</property>

Format the NameNode.

[haduser@hadoop ~]$ hdfs namenode -format
14/07/13 14:22:13 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop/192.168.56.101
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.4.1
…
14/07/13 14:22:14 INFO namenode.FSImage: Allocated new BlockPoolId: BP-716394400-192.168.56.101-1405257734268
14/07/13 14:22:14 INFO common.Storage: Storage directory /opt/hadoop-2.4.1/tmp/dfs/name has been successfully formatted.
14/07/13 14:22:14 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/07/13 14:22:14 INFO util.ExitUtil: Exiting with status 0
14/07/13 14:22:14 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.56.101
************************************************************/
[haduser@hadoop ~]$

Now start the Hadoop DFS and Yarn daemons.

[haduser@hadoop ~]$ start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/hadoop-2.4.1/logs/hadoop-haduser-namenode-hadoop.out
localhost: starting datanode, logging to /opt/hadoop-2.4.1/logs/hadoop-haduser-datanode-hadoop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.4.1/logs/hadoop-haduser-secondarynamenode-hadoop.out
[haduser@hadoop ~]$
[haduser@hadoop ~]$  start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.4.1/logs/yarn-haduser-resourcemanager-hadoop.out
localhost: starting nodemanager, logging to /opt/hadoop-2.4.1/logs/yarn-haduser-nodemanager-hadoop.out
[haduser@hadoop ~]$

Create a test directory and list the HDFS root directory to verify.

[haduser@hadoop ~]$ hadoop fs -mkdir hdfs://localhost:9000/test
[haduser@hadoop ~]$ hadoop fs -ls hdfs://localhost:9000/
Found 1 items
drwxr-xr-x   - haduser supergroup          0 2014-07-19 09:31 hdfs://localhost:9000/test
[haduser@hadoop ~]$

Open a web browser and open an HTTP connection to your host IP at port 50070.

HDFS health console

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值