终于搞定了hadoop的集群配置,困扰了好几天,因为也没有人讨论,很是郁闷。
【呕心沥血版】 完全分布式模式hadoop集群安装与配置
一、实验环境
1.安装环境简介
物理笔记本:i5 2.27GHz (4 CPU) 4G内存 320GB硬盘 32位win7操作系统
虚拟机:Product VMware® Workstation Version 7.0.0 build-203739
虚拟机安装配置URL:http://ideapad.it168.com/thread-2088751-1-1.html 不会配置的朋友请见
包括(vm tools linux与windows共享文件配置)
我的linux虚拟机配置 master slave1 slave2
CPU:1颗2核
内存:512MB
硬盘:10GB
Linux ISO:CentOS-6.0-i386-bin-DVD.iso 32位
Hadoop software version:hadoop-0.20.205.0.tar.gz
root密码:rootroot
系统版本:
CentOS Linux release 6.0 (Final)
[root@h1 etc]# cat issue
CentOS Linux release 6.0 (Final)
Kernel \r on an \m
主机名 ip 节点名 备注
h1 192.168.2.102 masters namenode和jobtracker
h2 192.168.2.103 slaves namenode和jobtracker
h4 192.168.2.105 slaves namenode和jobtracker
二、完全分布式模式安装
现在配置第一台主机H4
1.配置hosts文件
[grid@h4 .ssh]$ cat /etc/hosts
192.168.2.105 h4 # Added by NetworkManager
127.0.0.1 localhost.localdomain localhost
::1 h4 localhost6.localdomain6 localhost6
192.168.2.102 h1
192.168.2.103 h2
192.168.2.105 h4
2.建立hadoop专用账号 h1 h2 h4 三个虚拟机的
groupadd hadoop 建立grid用户主要属组
useradd grid –g hadoop 建立grid用户
[root@h1 etc]# passwd grid
更改用户grid的密码。
新的密码:grid
无效的密码:过短
无效的密码:过于简单
重新输入新的密码:grid
passwd:所有的身份验证令牌已经成功更新。
Windows任务管理器 同时开了3个虚拟机
3.设置h1 h2 h4 的ssh
H4
[grid@h4 ~]$ ssh-keygen -t rsa 使用RSA加密算法生成密钥对
Generating public/private rsa key pair. 一个公钥和一个私钥
Enter file in which to save the key (/home/grid/.ssh/id_rsa):
Created directory '/home/grid/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/grid/.ssh/id_rsa. 这个是私钥
Your public key has been saved in /home/grid/.ssh/id_rsa.pub. 这个是公钥
The key fingerprint is:
50:29:8a:78:ac:0e:a1:72:10:2d:01:66:77:f0:7e:c1 grid@h4
The key's randomart image is: RSA随机图
+--[ RSA 2048]----+
|+= o.. .. |
|= o o o.. |
| = . o.E |
|+ + o .. |
|.= . .S |
|= . . |
|+. |
| . |
| |
+-----------------+
[grid@h4 ~]$ ll –lrta 只要在家目录下生成.ssh隐藏目录就算成功
总用量
-rw-r--r--. 1 grid hadoop 500 1月24 2007 .emacs
drwxr-xr-x. 2 grid hadoop 4096 11月12 2010 .gnome2
-rw-r--r--. 1 grid hadoop 124 5月31 2011 .bashrc
-rw-r--r--. 1 grid hadoop 176 5月31 2011 .bash_profile
-rw-r--r--. 1 grid hadoop 18 5月31 2011 .bash_logout
drwxr-xr-x. 4 root root 4096 9月 1 21:14 ..
drwx------. 5 grid hadoop 4096 9月 1 21:34 .
drwx------. 2 grid hadoop 4096 9月 1 21:34 .ssh
drwxr-xr-x. 4 grid hadoop 4096 9月 2 2012 .mozilla
[grid@h4 ~]$ cd .ssh
[grid@h4 .ssh]$ ll
总用量8
-rw-------. 1 grid hadoop 1675 9月 1 21:34 id_rsa
-rw-r--r--. 1 grid hadoop 389 9月 1 21:34 id_rsa.pub
[grid@h4 .ssh]$ cp id_rsa.pub authorized_keys 生成授权文件
[grid@h4 .ssh]$ cat authorized_keys 打开authorized_keys查看里面的公钥
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4
有了这个公钥还配合私钥,我们就可以免密码登陆了
H2
[grid@h2 ~]$ ssh-keygens -t rsa
-bash: ssh-keygens: command not found
[grid@h2 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/grid/.ssh/id_rsa):
Created directory '/home/grid/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/grid/.ssh/id_rsa.
Your public key has been saved in /home/grid/.ssh/id_rsa.pub.
The key fingerprint is:
14:55:b9:d1:4a:60:a1:5c:47:37:30:49:09:aa:30:3d grid@h2
The key's randomart image is: RSA随机图每个都是不一样的
+--[ RSA 2048]----+
| ..BBB*o |
| . . * .*o.. |
| o E = . + |
| o + o |
| . S |
| |
| |
| |
| |
+-----------------+
[grid@h2 ~]$ cd .ssh
[grid@h2 .ssh]$ ll
总用量12
-rw-------. 1 grid hadoop 1675 9月 1 21:59 id_rsa 也生成了私钥和公钥
-rw-r--r--. 1 grid hadoop 389 9月 1 21:59 id_rsa.pub
H4
[grid@h4 .ssh]$ scp authorized_keys h2:/home/grid/.ssh/ 把h4的授权文件->h2
H2
[grid@h2 .ssh]$ ll
总用量12
-rw-r--r--. 1 grid hadoop 778 9月 1 22:02 authorized_keys
-rw-------. 1 grid hadoop 1675 9月 1 21:59 id_rsa
-rw-r--r--. 1 grid hadoop 389 9月 1 21:59 id_rsa.pub
[grid@h2 .ssh]$ cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5iKGfOGKh3d8BYr4vkkNaEtZkxCbBzBn6pfD0n3h82/1f9PwEtT4CEgqzBssYvQ2Nbc6dUy2NbDD9j5dIwQENS/fAJDwccdiJjEYMo5+o4ocPABx6OVM0r9nsUkyU7bxeHjap3ZUmcC1UvgW5asOsRMl7ePCze+rnt5D5ldZ+VOKh0NgtY2/CST8qXHmedfZFbQSEhIPf5Lh4A6oSoRHTFQbDN4apvf5s7Cm5/NgPiyhU+KbHBz96pNCxkjuOwj69a7kx4AgQYJoYc0T9O6YfjfVy3l1a7N2aJ6jp4SMv0GaohgzIrBNXwoFK6skuyf10yIxvNlGzkhTYK9GS9hjJw
看现在授权文件中已经有了h4和h2的公钥了,就差h1的
H1
[grid@h1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/grid//.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/grid//.ssh/id_rsa.
Your public key has been saved in /home/grid//.ssh/id_rsa.pub.
The key fingerprint is:
b6:4e:a6:05:d3:37:e7:3d:ca:44:7b:cf:2c:d2:5b:a4 grid@h1
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| . |
| o S o o .|
| + o = o o |
| = +.E .|
| * o.oo* |
| . . o..o+|
+-----------------+
H2
[grid@h2 .ssh]$ scp authorized_keys h1:/home/grid/.ssh/
H1
[grid@h1 .ssh]$ ll
总用量12
-rw-r--r--. 1 grid hadoop 778 9月 1 22:12 authorized_keys
-rw-------. 1 grid hadoop 1675 9月 1 22:12 id_rsa
-rw-r--r--. 1 grid hadoop 389 9月 1 22:12 id_rsa.pub
[grid@h1 .ssh]$ cat id_rsa.pub >> authorized_keys 把三个节点的公钥相互拷贝到文件中
[grid@h1 .ssh]$ cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5iKGfOGKh3d8BYr4vkkNaEtZkxCbBzBn6pfD0n3h82/1f9PwEtT4CEgqzBssYvQ2Nbc6dUy2NbDD9j5dIwQENS/fAJDwccdiJjEYMo5+o4ocPABx6OVM0r9nsUkyU7bxeHjap3ZUmcC1UvgW5asOsRMl7ePCze+rnt5D5ldZ+VOKh0NgtY2/CST8qXHmedfZFbQSEhIPf5Lh4A6oSoRHTFQbDN4apvf5s7Cm5/NgPiyhU+KbHBz96pNCxkjuOwj69a7kx4AgQYJoYc0T9O6YfjfVy3l1a7N2aJ6jp4SMv0GaohgzIrBNXwoFK6skuyf10yIxvNlGzkhTYK9GS9hjJw== grid@h2
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5V1lyss14a8aWFEkTk/aBgKHFLMX/XZX/xtXVUqJl8NkTQVLQ37+XLyqvTfrcJSja70diqB3TrwBp3K5eXNxp3EOr6EGHsi0B6D8owsg0bCDhxHGHu8RX8WB4DH9UOv1uPL5BESAPHjuemQuQaQzLagqrnXbrKix8CzdIEgmnOknYiS49q9msnzawqo3luQFRU7MQvAU9UZqkxotrnzHqh0tgjJ3Sq6O6nscA7w//Xmb0JGobVQAFCDJQdn/z1kOq7E5WNhVa8ynF9GOF7cMdppug7Ibw1RZ9cKa+igi1KhhavS5H7XCM64NuGfC87aQE9nz0ysS3Kh8PT5h6zlxfw== grid@h1
[grid@h1 .ssh]$ scp authorized_keys h2:/home/grid/.ssh/ 传给H2
[grid@h1 .ssh]$ scp authorized_keys h4:/home/grid/.ssh/ 传给H4
4.安装JDK 【h1 h2 h4上操作】
先把jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin拷贝到h1 h2 h4的/usr目录下
加上执行权限
Chmod 777 jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin
Root用户执行安装命令
./ jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin
cd命令进入/etc目录vim profile即执行编辑profile文件命令
在umask 022前添加如下内容:环境变量
export JAVA_HOME=/usr/java/jdk1.6.0_25
export JRE_HOME=/usr/java/jdk1.6.0_25/jre
export PATH=$PATH:/usr/java/jdk1.6.0_25/bin
export CLASSPATH=./:/usr/java/jdk1.6.0_25/lib:/usr/java/jdk1.6.0_25/jre/lib
source profile 加载环境变量使之生效
5.下载hadoop压缩包并解压 【h1上操作】
下载网址:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-0.20.2/
将下载的hadoop-0.20.2.tar.gz包上传到/home/grid/目录
[grid@h1 grid]$ mv hadoop-0.20.2.tar.gz /home/grid/
[grid@h1 grid]$ tar -zxvf hadoop-0.20.2.tar.gz
【还可以先解压在解包】
gzip -d hadoop-0.20.2.tar.gz 解压 -d decompress 代表解压并删除源文件
tar -xvf hadoop-0.20.2.tar 解包
6.修改hadoop-env.sh文件【h1上操作】
添加export JAVA_HOME=/usr/java/jdk1.6.0_25环境变量
[grid@h1 conf]$ pwd
/home/grid/hadoop-0.20.2/conf
[grid@h1 conf]$ ll
hadoop-env.sh
[grid@h1 conf]$ vim hadoop-env.sh
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/usr/java/jdk1.6.0_25 把前面#号去掉,修改java目录
7.修改core-site.xml文件【h1上操作】
[grid@h1 conf]$ vim core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name> --文件系统默认名称节点
<value>hdfs://192.168.2.102:9000</value> --名称节点ip地址和端口号
</property>
</configuration>
8.修改hdfs-site.xml文件【h1上操作】
[grid@h1 hadoop-0.20.2]$ mkdir data 建立存放数据的目录
[grid@h1 hadoop-0.20.2]$ cd conf
[grid@h1 conf]$ pwd
/home/grid/hadoop-0.20.2/conf
[grid@h1 conf]$ vim hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name> --数据节点存放数据的位置
<value>/home/grid/hadoop-0.20.2/data</value> --存放路径
</property>
<property>
<name>dfs.replication</name> --分布式文件系统,数据块复制多少份
<value>2</value> --有几个datanode节点就复制几份
</property> --这里有2个,我们复制2份
</configuration>
HDFS默认存储目录在/tmp/hadoop-${user.name},这些属性的设置很重要,我们要把它修改成我们设计的位置来存放
9.修改mapred-site.xml文件【h1上操作】
[grid@h1 conf]$ vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name> --设置jobtracker作业跟踪器主机ip:端口
<value>192.168.2.102:9001</value> --master节点ip:9001端口
</property> --9001是hadoop喜欢默认端口不用修改
</configuration>
10.修改masters和slaves文件【h1上操作】
[grid@h1 conf]$ vim masters 记录运行namenode和jobtracker的主机,一行代表一个
h1
[grid@h1 conf]$ vim slaves 记录运行datanode和tasktracker的主机,一行代表一个
h2
h4
11.向h2 h4复制hadoop-0.20.2目录
[grid@h1 grid]$ scp -r ./hadoop-0.20.2/ h4:/home/grid/ 向h4节点复制
[grid@h1 grid]$ scp -r ./hadoop-0.20.2/ h2:/home/grid/ 向h2节点复制
12.格式化分布式文件系统【h1操作格式化名称节点】
[grid@h1 bin]$ pwd
/home/grid/hadoop-0.20.2/bin
[grid@h1 bin]$ ./hadoop namenode -format
12/09/02 20:19:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = h1/192.168.2.102 主机
STARTUP_MSG: args = [-format] 格式化
STARTUP_MSG: version = 0.20.2 版本号
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
12/09/02 20:19:35 INFO namenode.FSNamesystem: fsOwner=grid,hadoop 拥有者
12/09/02 20:19:35 INFO namenode.FSNamesystem: supergroup=supergroup 超级组
12/09/02 20:19:35 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/09/02 20:19:36 INFO common.Storage: Image file of size 94 saved in 0 seconds.
12/09/02 20:19:36 INFO common.Storage: Storage directory /tmp/hadoop-grid/dfs/name has been successfully formatted. 存储目录格式化成功
12/09/02 20:19:36 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at h1/192.168.2.102 关闭名称节点
************************************************************/
格式化名称节点:建立一系列结构,存放HDFS元数据
13.启动Hadoop 【只在h1上操作就可以】
命令:bin/start-all.sh
[grid@h1 bin]$ ./start-all.sh
starting namenode, logging to 启动名称节点h1日志路径
/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-h1.out
h4: starting datanode, logging to 启动数据节点h4日志路径
/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h4.out
h2: starting datanode, logging to 启动数据节点h2日志路径
/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h2.out
The authenticity of host 'h1 (::1)' can't be established.
RSA key fingerprint is c0:84:4f:27:ef:aa:a8:77:24:b7:00:72:fc:bb:32:aa.
Are you sure you want to continue connecting (yes/no)? yes
h1: Warning: Permanently added 'h1' (RSA) to the list of known hosts.
h1: starting secondarynamenode, logging to 启动辅助名称节点日志路径
/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-h1.out
starting jobtracker, logging to 启动作业跟追器日志路径
/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-h1.out
h4: starting tasktracker, logging to 启动任务跟追器日志路径
/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h4.out
h2: starting tasktracker, logging to 启动任务跟追器日志路径
/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h2.out
14.检测守护进程启动情况
H1
[grid@h1 bin]$ pwd
/usr/java/jdk1.6.0_25/bin
[grid@h1 bin]$ ./jps 查看master后台java进程,统计和运行这个就可以查看了
28037 NameNode 名称节点进程 28037是进程号
28950 Jps
28220 SecondaryNameNode 辅助名称节点进程 28220是进程号
28259 JobTracker 作业跟踪器进程 28259是进程号
第二种查看java进程方法
[grid@h1 bin]$ ps -ef | grep java
H4
[grid@h4 logs]$ cd /usr/java/jdk1.6.0_25/bin
[grid@h4 bin]$ ./jps 查看slave后台java进程,统计和运行这个就可以查看了
9754 DataNode 数据节点进程
31085 Jps java进程
9847 TaskTracker 任务跟踪器进程
H2
[grid@h2 logs]$ cd /usr/java/jdk1.6.0_25/bin
[grid@h2 bin]$ ./jps 查看slave后台java进程,统计和运行这个就可以查看了
7435 DataNode 数据节点进程
7535 TaskTracker 任务跟踪器进程
2261 Jps java进程
15.Hadoop测试
(1)创建一个文本leonarding.txt
[grid@h1 grid]$ vim leonarding.txt
(2)文本内容是I Love You Hadoop:)
[grid@h1 grid]$ cat leonarding.txt
I Love You Hadoop:)
[grid@h1 grid]$ cd hadoop-0.20.2/bin
(3)在HDFS文件系统上创建一个目录leo
[grid@h1 bin]$ ./hadoop fs -mkdir /leo
(4)复制文件leonarding.txt到leo目录
[grid@h1 bin]$ ./hadoop fs -copyFromLocal leonarding.txt /leo
(5)显示HDSF文件系统目录下的内容
[grid@h1 bin]$ ./hadoop fs -ls /leo
Found 1 items
-rw-r--r-- 2 grid supergroup 0 2012-09-02 21:08 /leo/leonarding.txt
(6)查看在HDFS文件系统上leonarding.txt内容
[grid@h1 bin]$ ./hadoop fs -cat /leo/leonarding.txt
实验完毕
Leonarding
2012.9.2
天津&autumn
分享技术~收获快乐
Blog:http://space.itpub.net/26686207