1 prepare for installation
1.1 Make sure your host name is valid
For example, if your hostname is vm231.com, then edit /etc/hosts, add a entry like:
127.0.0.1 vm231.com
1.2 Install the Oracle Java Development Kit
Download a recommended version of the Oracle JDK from http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html We choose jdk-6u45-linux-x64-rpm.bin (jdk-6uXX-linux-x64-rpm.bin file for 64-bit systems and jdk-6uXX-linux-i586-rpm.bin for 32-bit systems.)
Install jdk
# chmod a+x jdk-6u45-linux-x64-rpm.bin
# ./jdk-6u45-linux-x64-rpm.bin
As the root user, set JAVA_HOME to the directory where the JDK is installed; for example:
# export JAVA_HOME="/usr/java/jdk1.6.0_45"
# export PATH=$JAVA_HOME/bin:$PATH
Note 1: where <jdk-install-dir> might be something like /usr/java/jdk1.6.0_26,which contain executable file "bin/java" ,depending on the system configuration and where the JDK is actually installed.
Note 2: I also tried OpenJDK, but couldn't find the way to make it work.
2 Installing CDH3 on a Single Linux Node in Pseudo-distributed mode
2.1 Download the CDH3 Package
For RedHat/CentOS 6 http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm
Install the RPM:
# sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
2.2 Install CDH3
(Optionally) add a repository key. Add the Cloudera Public GPG Key to your repository by executing the following command:
# sudo rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
Install Hadoop in pseudo-distributed mode:
# sudo yum install hadoop-0.20-conf-pseudo
3 Starting Hadoop and Verifying it is Working Properly:
3.1 Start the Daemons:
# for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
3.2 Confirm Hadoop is working by performing some operations and running a job.
For example, try performing some DFS operations:
$ hadoop fs -mkdir /foo
$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2013-10-22 19:11 /foo
drwxr-xr-x - mapred supergroup 0 2013-10-22 19:11 /var
$ hadoop fs -rmr /foo
Deleted hdfs://localhost/foo
$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - mapred supergroup 0 2013-10-22 19:11 /var
4 Common Error
When you verify your configuration, like performing $ hadoop fs -mkdir /foo, you may got error
13/10/22 19:05:38 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
13/10/22 19:05:39 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s).
……
Fail to connect the localhost/127.0.0.1:8020.
You can look up the hadoop log file "/usr/lib/hadoop/logs/hadoop-hadoop-datanode-<hostname>.log". It show error :
2013-10-22 10:22:55,517 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Missing directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name
2013-10-22 10:22:55,518 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Missing directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name
Appears that your base dirs haven't been created properly, leading the FSNamesystem to fail at initialization (it expects a ready, formatted directory the first time it starts). Try the below and you should have it working after it:
1. Stop all Hadoop/Hadoop-related services.
# for service in /etc/init.d/hadoop-0.20-*; do sudo $service stop; done
2. Run the following fix-up commands:
$ sudo rm -rf /var/lib/hadoop-0.20/cache/hadoop/dfs
$ sudo mkdir -p /var/lib/hadoop-0.20/cache/hadoop/dfs/{name,data}
$ sudo chown hdfs:hdfs /var/lib/hadoop-0.20/cache/hadoop/dfs/{name,data}
$ sudo -u hdfs hadoop namenode -format
3. Start your services now, and even FSNamesystem should start up fine.
# for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
Note : Do not repeat these steps if you run into NameNode-down or other issues later down your use. The rm -rf/format will delete all your HDFS data, and you do not want to do that in a working cluster thats just stopped working because of some other recoverable issue.[Refer to http://grokbase.com/t/cloudera/cdh-user/125p93ggmd/installation-issues-with-with-cdh3]
More Installation see official docs: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH3/CDH3u6/CDH3-Quick-Start/CDH3-Quick-Start.html