好吧,把上个月部署hadoop的步骤记一下。因为之前写的文档是英文的,也就不翻译了。
1. Preparation
7 nodes: 1 name node, 6 data nodes
Install OS (Ubuntu 12.04 64bits)
Install all nodes. Modify the default /bin/sh -> bash
Install JDK 1.7
Download JDK-1.7.0-u21, exact to
/usr/local/share/jdk-1.7.0-u21
Download hadoop 1.0.4 tar ball
Download hadoop-1.0.4.tar.gz to name node machine.
2. Configure host names
1. Name the host names with the following list:
(Insert the below to /etc/hosts)
10.67.254.12 namenode 10.67.254.17 datanode-1 10.67.254.18 datanode-2 10.67.254.19 datanode-3 10.67.254.20 datanode-4 10.67.254.21 datanode-5 |
2. Edit /etc/hostname for each node according to the IP
3. Create User and Group
For all nodes, do the following:
1. Createa group hadoop
# | groupadd hadoop -g 1001 |
2. Createa user
# | useradd -m hadoopor -g hadoop |
# | passwd hadoopor |
4. Setup passphraseless ssh
Make namenode could ssh to all data nodes without typing any password.
ssh-copy-id is available to copy the key phrase file and it's simple.
On name node, by user hadoopor, run the following commands:
$ $ $ $ $ $ | ssh-keygen -t rsa ssh-copy-id datanode-1 ssh-copy-id datanode-2 ssh-copy-id datanode-3 ssh-copy-id datanode-4 ssh-copy-id datanode-5 |
5. Setup NFS server at name node
Why NFS? Because by NFS we don't have to install hadoop executable and manage configurations all over the world.
Setup NFS server on name node, so that the datanode can share all the hadoop executable and configuration.
On the name node
# | apt-get install nfs-server |
# | vi /etc/exports |
# vi /etc/exports |
# add the following /home/hadoopor 10.67.254.0/255.255.255.0(ro) |
Force nfsd to re-read the /etc/exports file.
# | exportfs -ra |
On the datanodes
1. Installnfs client
# | apt-get install nfs-client |
# | mkdir /mnt/hadoopor |
# | mount namenode:/home/hadoopor /mnt/hadoopor |
2. Addto /etc/fstab file in order to mount automatically once restart the data nodes.
# vi /etc/fstab |
# device mountpoint fs-type options dump fsckorder namenode:/home/hadoopor /mnt/hadoopor nfs ro 0 0 |
3. Refer the NFS device. By user hadoopor:
$ ln -sn /mnt/hadoopor/hadoop-1.0.4 |
4. Create logs directory for each data nodes
$ mkdir /home/hadoopor/hadoop-logs |
6. System-wide profile
Forall nodes (both name node and data nodes)
# vi /etc/profile |
… … JAVA_HOME=/usr/local/share/jdk1.7.0_21 HADOOP_HOME=/home/hadoopor/hadoop-1.0.4 PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin |
7. Configure cluster
Onefor name node, the others are data nodes
Replication number 2
$ vi conf/hadoop-env.sh |
export JAVA_HOME=/usr/local/share/jdk1.7.0_21 … … export HADOOP_LOG_DIR=/home/hadoopor/hadoop-logs
|
Configure the hadoop, on the named node, which the configuration will be exported to all datanodes.
$ vi conf/core-site.xml |
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/var/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://namenode:9000</value> </property> <configuration> |
$ vi conf/hdfs-site.xml |
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <configuration> |
$ vi conf/mapred-site.xml |
<configuration> <property> <name>mapred.job.tracker</name> <value>http://namenode:9001</value> </property> <configuration> |
8. Final
Format name node
$ | bin/hadoop namenode -format |
Start all
$ | bin/start-all.sh |
Test
$ | bin/hadoop dfsadmin -report |