一、hadoop简介:
Hadoop伪分布式模式是在单机上模拟Hadoop分布式,单机上的分布式并不是真正的伪分布式,而是使用线程模拟分布式。Hadoop本身是无法区分伪分布式和分布式的,两种配置也很相似。唯一不同的地方是伪分布式是在单机器上配置,数据节点和名字节点均是一个机器。
本次实例,Linux装在VMware的虚拟机上,并且设置host-only的静态网络,网络地址:192.168.80.100。
1.环境说明:
操作系统:Redhat.Enterprise.Linux.v6 x86
hadoop版本:hadoop-2.6.4
JDK版本:jdk1.6
二、JDK安装及Java环境变量的配置
jdk下载路径:http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase6-419409.html
注意:jdk-6u45-linux-i586-rpm.bin 运行后会解压出来一个rpm包,可用rpm -iUh命令安装;
jdk-6u45-linux-i586.bin 运行后会直接把所有文件解压到当前目录的jdk子目录下,本文使用的是这个包;
1).首先把压缩包解压出来:
[root@hadoop ~]# tar -zxvf jdk-6u45-linux-i586.tar.gz
2).新建/usr/java目录,并把安装包移动至此
[root@hadoop ~]# mv jdk1.6.0_45 /usr/java
3).重命名:
[root@hadoop java]# mv jdk1.6.0_45 jdk
4).配置环境变量,在/etc/profile文件中添加下面几行:
[root@hadoop ~]# vi /etc/profile
-----增加内容如下-----
export JAVA_HOME=/usr/java/jdk
export PATH=.:$JAVA_HOME/bin:$PATH
5).验证是否已经成功安装jdk:
[root@hadoop ~]# java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) Client VM (build 20.45-b01, mixed mode, sharing)
三、SSH无密码验证设置
Hadoop需要使用SSH协议,namemode将使用SSH协议启动namenode和datanode进程,伪分布式模式数据节点和名称节点均是本身,必须配置SSH hadoop无密码验证。
1).生成秘钥:
[root@hadoop ~]# ssh-keygen -t rsa
2).秘钥生成后,自动保存至~/.ssh目录:
[root@hadoop ~]# cd .ssh
[root@hadoop .ssh]# ls
id_rsa id_rsa.pub
3).由于是伪分布搭建,将公钥id_rsa.pub复制到本地,命名为authorized_keys
[root@hadoop .ssh]# cp id_rsa.pub authorized_keys
4).验证SSH是否配置成功
[root@hadoop ~]# ssh localhost
四、Hadoop配置
1).下载hadoop-2.6.4.tar.gz,将其解压缩到/usr/local目录下,并重命名为hadoop
[root@hadoop local]# tar -zxvf hadoop-2.6.4.tar.gz
[root@hadoop java]# mv jdk1.6.0_45 jdk
2).进入/usr/local/hadoop/etc/hadoop/ ,配置Hadoop配置文件
2.1).配置hadoop-env.sh文件
[root@hadoop hadoop]# pwd
usr/local/hadoop/etc/hadoop
[root@hadoop hadoop]# vi hadoop-env.sh
----修改内容----
export JAVA_HOME=/usr/java/jdk
2.2).配置core-site.xml
[root@hadoop hadoop]# vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/123/hadooptmp</value>
</property>
</configuration>
2.3).配置hdfs-site.xml
[root@hadoop hadoop]# vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/123/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/123/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3).编辑主机名
[root@hadoop hadoop]# vi /etc/hosts
----将本机的静态地址设置为主机名-----
192.168.80.100 hadoop
五、启动Hadoop并进行验证
1).对namenode进行格式化
[root@hadoop bin]# ./hadoop namenode -format
13/04/26 11:08:05 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
Re-format filesystem in /123/hdfs/name ? (Y or N) Y
13/04/26 11:08:09 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
13/04/26 11:08:09 INFO namenode.FSNamesystem: supergroup=supergroup
13/04/26 11:08:09 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/04/26 11:08:09 INFO common.Storage: Image file of size 94 saved in 0 seconds.
13/04/26 11:08:09 INFO common.Storage: Storage directory /123/hdfs/name has been successfully formatted.
13/04/26 11:08:09 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop.localdomain/127.0.0.1
************************************************************/
2).启动hadoop所有进程
[root@hadoop hadoop]# ./sbin/start-all.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-root-namenode-hadoop.localdomain.out
192.168.80.100: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-root-datanode-hadoop.localdomain.out
192.168.80.100: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-root-secondarynamenode-hadoop.localdomain.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-root-jobtracker-hadoop.localdomain.out
192.168.80.100: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-root-tasktracker-hadoop.localdomain.out
3).使用jps命令查看hadoop进程是否启动完全。
[root@hadoop bin]# jps
15219 JobTracker
15156 SecondaryNameNode
15495 Jps
15326 TaskTracker
15044 DataNode
14959 NameNode
4).查看集群状态:
[root@hadoop bin]# ./hadoop dfsadmin -report
Configured Capacity: 19751522304 (18.4 GB)
Present Capacity: 14953619456 (13.93 GB)
DFS Remaining: 14953582592 (13.93 GB)
DFS Used: 36864 (36 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 192.168.20.150:50010
Decommission Status : Normal
Configured Capacity: 19751522304 (18.4 GB)
DFS Used: 36864 (36 KB)
Non DFS Used: 4797902848 (4.47 GB)
DFS Remaining: 14953582592(13.93 GB)
DFS Used%: 0%
DFS Remaining%: 75.71%
Last contact: Fri Apr 26 13:06:15 CST 2013