【安装Hadoop 2.3 集群】Installing Hadoop 2.3.0 multi-node cluster on Ubuntu 13.10-CSDN博客

Installing Hadoop 2.3.0 multi-node cluster on Ubuntu 13.10

【转载】:http://www.elcct.com/installing-hadoop-2-3-0-on-ubuntu-13-10/

Make sure you have Oracle JDK 7 installed.

Next, let's create hadoop group and hduser.

sudo addgroup hadoop  
sudo adduser --ingroup hadoop hduser  
sudo adduser hduser sudo

Now, we have to make sure hduser can ssh to its own account without password.

sudo su - hduser  
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa  
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now ssh to localhost from hduser account to make sure it is working.

ssh localhost

To download Hadoop, pick your mirror here and find Hadoop 2.3.0 source:http://www.apache.org/dyn/closer.cgi/hadoop/core

In my case this is (and I assume you are in hduser's home directory):

wget http://apache.mirror.anlx.net/hadoop/core/hadoop-2.3.0/hadoop-2.3.0-src.tar.gz  
tar -xvf hadoop-2.3.0-src.tar.gz

To build Hadoop 2.3.0 we need Maven, build-essential package, zlib1g-dev, cmake, pkg-config, libssl-dev and protobuf, so lets install it.

sudo apt-get install maven build-essential zlib1g-dev cmake pkg-config libssl-dev

Because Ubuntu 13.10 doesn't include latest Protocol Buffers, we have to build them ourselves. Find required 2.5.0 version here: https://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz&can=2&q=

wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz  
tar -xvf protobuf-2.5.0.tar.gz  
cd protobuf-2.5.0/  
sudo ./configure  
sudo make  
sudo make check  
sudo make install  
sudo ldconfig  
protoc --version

Last command should return libprotoc 2.5.0

Once above is installed, build Hadoop:

cd hadoop-2.3.0-src/  
mvn package -Pdist,native -DskipTests -Dtar

This should build Hadoop 2.3.0 distribution and you should find it in hadoop-dist/target/hadoop-2.3.0.tar.gz

Note: You should now copy hadoop-2.3.0.tar.gz to other nodes, so you don't have to build it on each one of them and follow remaining steps. Remember to also install Oracle JDK 7 on each node.

Lets unpack our distribution:

sudo tar -xvf hadoop-dist/target/hadoop-2.3.0.tar.gz -C /usr/local/  
sudo ln -s /usr/local/hadoop-2.3.0 /usr/local/hadoop  
sudo chown -R hduser:hadoop /usr/local/hadoop-2.3.0

Now we need to update our .bashrc with paths for Hadoop.

In my case:

nano ~/.bashrc

and pasted following at the end of the file:

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")  
export HADOOP_INSTALL=/usr/local/hadoop  
export PATH=$PATH:$HADOOP_INSTALL/bin  
export PATH=$PATH:$HADOOP_INSTALL/sbin  
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL  
export HADOOP_COMMON_HOME=$HADOOP_INSTALL  
export HADOOP_HDFS_HOME=$HADOOP_INSTALL  
export YARN_HOME=$HADOOP_INSTALL

Also update hadoop-env.sh file's JAVA_HOME variable in:

nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

to:

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

Now you can re-login to your hduser account and check Hadoop installation, by issuing following command:

hadoop version

You should see something like:

hduser@hadoop01:~$ hadoop version  
Hadoop 2.3.0  
Subversion Unknown -r Unknown  
Compiled by hduser on 2014-03-29T14:11Z  
Compiled with protoc 2.5.0  
From source with checksum dfe46336fbc6a044bc124392ec06b85  
This command was run using /usr/local/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar

cd /usr/local/hadoop/

I assume you have unpacked hadoop-2.3.0.tar.gz in other nodes as well, so in each node Hadoop resides in (or is linked to) /usr/local/hadoop directory.

My nodes and their roles are as follows:

10.0.1.1 NameNode, DataNode  
10.0.1.2 DataNode  
10.0.1.3 DataNode

Lets configure 10.0.1.1 first:

We should create and prepare directories for our data:

mkdir -p ~/hdfs/namenode  
mkdir -p ~/hdfs/datanode  
mkdir $HADOOP_INSTALL/logs

Then update hdfs-site.xml file pointing to our directories:

nano $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml

And paste the following between <configuration> tag:

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hdfs/datanode</value>
        <description>DataNode directory</description>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hdfs/namenode</value>
        <description>NameNode directory for namespace and transaction logs storage.</description>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
    </property>

Let Hadoop modules know, where NameNode is located:

nano $HADOOP_INSTALL/etc/hadoop/core-site.xml

And paste the following between <configuration> tag:

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://10.0.1.1/</value>
        <description>NameNode URI</description>
    </property>

Now we can format our NameNode by issuing:

hdfs namenode -format

Now we have to make sure, that our 10.0.1.1 master node can log in passwordless to other nodes:

ssh-copy-id -i /home/hduser/.ssh/id_rsa.pub hduser@10.0.1.2  
ssh-copy-id -i /home/hduser/.ssh/id_rsa.pub hduser@10.0.1.3

And add your slaves (DataNodes) to slave file. In my case:

nano $HADOOP_INSTALL/etc/hadoop/slaves

and put:

10.0.1.1  
10.0.1.2  
10.0.1.3

Note: YARN will start with its default settings, you can tweak them later

Let's configure our slave DataNodes 10.0.1.2 and 10.0.1.3. Steps are the same for both of them:

Prepare and create directories:

mkdir -p ~/hdfs/datanode  
mkdir $HADOOP_INSTALL/logs

Then update hdfs-site.xml file pointing to our directories:

nano $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml

And paste the following between <configuration> tag:

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hdfs/datanode</value>
        <description>DataNode directory</description>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>false</value>
    </property>

Let Hadoop modules know, where NameNode is located here:

nano $HADOOP_INSTALL/etc/hadoop/core-site.xml

And paste the following between <configuration> tag:

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://10.0.1.1/</value>
        <description>NameNode URI</description>
    </property>

Now go to 10.0.1.1 and summon the daemons!

start-dfs.sh  
start-yarn.sh

You should see something like this:

hduser@hadoop01:~$ start-dfs.sh  
Starting namenodes on [10.0.1.1]  
10.0.1.1: starting namenode, logging to /usr/local/hadoop-2.3.0/logs/hadoop-hduser-namenode-hadoop01.out  
10.0.1.3: starting datanode, logging to /usr/local/hadoop-2.3.0/logs/hadoop-hduser-datanode-hadoop03.out  
10.0.1.1: starting datanode, logging to /usr/local/hadoop-2.3.0/logs/hadoop-hduser-datanode-hadoop01.out  
10.0.1.2: starting datanode, logging to /usr/local/hadoop-2.3.0/logs/hadoop-hduser-datanode-hadoop02.out  
Starting secondary namenodes [0.0.0.0]  
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.3.0/logs/hadoop-hduser-secondarynamenode-hadoop01.out  
hduser@hadoop01:~$ start-yarn.sh  
starting yarn daemons  
starting resourcemanager, logging to /usr/local/hadoop-2.3.0/logs/yarn-hduser-resourcemanager-hadoop01.out  
10.0.1.3: starting nodemanager, logging to /usr/local/hadoop-2.3.0/logs/yarn-hduser-nodemanager-hadoop03.out  
10.0.1.2: starting nodemanager, logging to /usr/local/hadoop-2.3.0/logs/yarn-hduser-nodemanager-hadoop02.out  
10.0.1.1: starting nodemanager, logging to /usr/local/hadoop-2.3.0/logs/yarn-hduser-nodemanager-hadoop01.out

You can check if everything was ok, by running:

jps

and that should return something like this:

hduser@hadoop01:~$ jps  
21807 SecondaryNameNode  
21595 DataNode  
22139 NodeManager  
21983 ResourceManager  
22474 Jps  
21414 NameNode

and

hdfs dfsadmin -report

hduser@hadoop01:~$ hdfs dfsadmin -report  
Configured Capacity: 126424645632 (117.74 GB)  
Present Capacity: 104873385984 (97.67 GB)  
DFS Remaining: 104873312256 (97.67 GB)  
DFS Used: 73728 (72 KB)  
DFS Used%: 0.00%  
Under replicated blocks: 0  
Blocks with corrupt replicas: 0  
Missing blocks: 0

-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)

Live datanodes:  
Name: 10.0.1.1:50010 (10.0.1.1)  
Hostname: localhost  
Decommission Status : Normal  
Configured Capacity: 42141548544 (39.25 GB)  
DFS Used: 24576 (24 KB)  
Non DFS Used: 7196426240 (6.70 GB)  
DFS Remaining: 34945097728 (32.55 GB)  
DFS Used%: 0.00%  
DFS Remaining%: 82.92%  
Configured Cache Capacity: 0 (0 B)  
Cache Used: 0 (0 B)  
Cache Remaining: 0 (0 B)  
Cache Used%: 100.00%  
Cache Remaining%: 0.00%  
Last contact: Sun Mar 30 15:24:44 EDT 2014


Name: 10.0.1.2:50010 (10.0.1.2)  
Hostname: localhost  
Decommission Status : Normal  
Configured Capacity: 42141548544 (39.25 GB)  
DFS Used: 24576 (24 KB)  
Non DFS Used: 7177342976 (6.68 GB)  
DFS Remaining: 34964180992 (32.56 GB)  
DFS Used%: 0.00%  
DFS Remaining%: 82.97%  
Configured Cache Capacity: 0 (0 B)  
Cache Used: 0 (0 B)  
Cache Remaining: 0 (0 B)  
Cache Used%: 100.00%  
Cache Remaining%: 0.00%  
Last contact: Sun Mar 30 15:24:45 EDT 2014


Name: 10.0.1.3:50010 (10.0.1.3)  
Hostname: localhost  
Decommission Status : Normal  
Configured Capacity: 42141548544 (39.25 GB)  
DFS Used: 24576 (24 KB)  
Non DFS Used: 7177490432 (6.68 GB)  
DFS Remaining: 34964033536 (32.56 GB)  
DFS Used%: 0.00%  
DFS Remaining%: 82.97%  
Configured Cache Capacity: 0 (0 B)  
Cache Used: 0 (0 B)  
Cache Remaining: 0 (0 B)  
Cache Used%: 100.00%  
Cache Remaining%: 0.00%  
Last contact: Sun Mar 30 15:24:44 EDT 2014

网页方式看DataNode节点的状态：http://ip:50070/。例如：http://10.10.12.171:50070/

Your Hadoop 2.3.0 cluster is installed :)