Step by step to setup a 3-node hadoop cluster environment
Suppose node1, node2, node3
master: node1
slaver: node1, node2, node3. # node1 used as both master and slave
(make sure node1, node2, node3 are correctly configured in /etc/hosts, they can reach each other)
hadoop: hadoop-0.20.2.tar.gz
JDK: jdk 1.7
Step 1: configure hadoop
* hadoop-0.20.2/conf/hadoop-env.sh
export JAVA_HOME=/path/to/java/jdk1.7
* hadoop-0.20.2/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/your_uid/tmp/hadoop/hadooproot</value>
</property>
</configuration>
# if the value for the hadoop.tmp.dir is not set either with -D option or configuration files,
# the default value is /tmp/hadoop-${user.name}
# where user.name is the username that you used to login to your system.
* hadoop-0.20.2/conf/hdfs-site.xml
<configuration>
<!--
<property>
<name>dfs.data.dir</name>
<value>/home/your_uid/tmp/hadoop/hadoopdata</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/your_uid/tmp/hadoop/hadoopname</value>
</property>
-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
# The default value for the dfs.name.dir is ${hadoop.tmp.dir}/dfs/data and
# The default value for the dfs.data.dir is ${hadoop.tmp.dir}/dfs/data.
# Set replication to 2
* hadoop-0.20.2/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>node1:9001</value>
</property>
</configuration>
* hadoop-0.20.2/conf/masters
node1
* hadoop-0.20.2/conf/slaves
node1
node2
node3
* configuring SSH
logon node1:
$ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
copy authorized_keys to other slaves
$scp authorized_keys node2:~/.ssh/
$scp authorized_keys node3:~/.ssh/
check node2 and node3 can be logon from node1 without a password required
ssh node2
ssh node3
* format name node
hadoop-0.20.2/bin/hadoop namenode -format
* start-up and shutdown
hadoop-0.20.2/bin/start-all.sh
hadoop-0.20.2/bin/stop-all.sh
* webaccess
http://node1:50030
http://node1:50070
* list and create folder
hadoop-0.20.2/bin/hadoop dfs -ls /
hadoop-0.20.2/bin/hadoop dfs -mkdir /data
hadoop-0.20.2/bin/hadoop dfs -ls /data