1. download the newest version
2. unzip the downloaded zip file.
3. configuration
3.1) configure core-site.xml
If we don't specify the parameter:
hadoop.tmp.dir , each time we restart the hadoop cluster, we need to reformat the hadoop system.
3.2) hdfs-site.xml
3.3) mapred-site.xml
<value>[<span style="font-family:Arial, Helvetica, sans-serif;">hostname]</span>:9001</value>
export JAVA_HOME=/opt/java/jdk1.7.0_51
3.5) masters and slaves
Since we set up a Pseudo-Distributed Mode, we change the file content from localhost to [hostname] for these 2 files.
4. Format the new distributed-system
In the terminal, input the following command:
$ bin/hadoop namenode -format
5. Setup passphraseless ssh
Check if you can ssh to the your hostname without a passphrase:
$ ssh [hostname]
If you cannot ssh to [hostname] without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/ >> ~/.ssh/authorized_keys
Possible problem:
ssh: connect to host localhost port 22: Connection refused
It should be from ssh, sshd not installed or firewall blocked.
install ssh:
$ sudo apt-get install ssh
install sshd:
$ sudo apt-get install openssh-server<span><span>
$ sudo net start sshd</span></span>
disable firewall
$ sudo ufw disable
6. start the hadoop
$ bin/
7. Succeed
1) Check from the terminal:
in the terminal, input:
$ jps
As we can see that, jobtracker, namenode, datanode, secondaryNamenode, taskTracker have already been started.
2) Check from the webpage:
namenode :