【呕心沥血版】 完全分布式模式hadoop集群安装与配置

终于成功了感谢天 感谢地 感谢大老虎 感谢海兄sun  折磨了我好几天 完全分布式模式下hadoop安装与配置终于安装好了,下面我把步骤一一写出来

一、实验环境

1.安装环境简介

物理笔记本:i5  2.27GHz (4 CPU)  4G内存  320GB硬盘   32win7 操作系统

虚拟机:Product  VMware® Workstation   Version  7.0.0 build-203739

虚拟机安装配置URLhttp://ideapad.it168.com/thread-2088751-1-1.html  不会配置的朋友请见

包括(vm tools  linuxwindows 共享文件 配置)

我的linux虚拟机配置   master   slave1   slave2

CPU12

内存:512MB  

硬盘:10GB

Linux ISOCentOS-6.0-i386-bin-DVD.iso  32

Hadoop software versionhadoop-0.20.205.0.tar.gz

root密码:rootroot

系统版本:

CentOS Linux release 6.0 (Final)

[root@h1 etc]# cat issue

CentOS Linux release 6.0 (Final)

Kernel \r on an \m

主机名      ip         节点名     备注

h1      192.168.2.102       masters       namenodejobtracker

h2      192.168.2.103       slaves          namenodejobtracker

h4      192.168.2.105       slaves          namenodejobtracker

二、完全分布式模式安装

现在配置第一台主机 H4

1.配置hosts文件

[grid@h4 .ssh]$ cat /etc/hosts

192.168.2.105     h4   # Added by NetworkManager

127.0.0.1      localhost.localdomain       localhost

::1   h4   localhost6.localdomain6   localhost6

192.168.2.102   h1

192.168.2.103   h2

192.168.2.105   h4

2.建立hadoop专用账号  h1  h2  h4  三个虚拟机的

groupadd  hadoop                   建立grid用户主要属组

useradd  grid  –g  hadoop           建立grid用户

[root@h1 etc]# passwd grid

更改用户 grid 的密码

新的 密码:grid

无效的密码:过短

无效的密码:过于简单

重新输入新的密码: grid

passwd:所有的身份验证令牌已经成功更新。

Windows 任务管理器   同时开了3个虚拟机

win任务管理器

 

3.设置h1  h2  h4  ssh

H4

[grid@h4 ~]$ ssh-keygen -t rsa                           使用RSA加密算法生成密钥对

Generating public/private rsa key pair.                     一个公钥和一个私钥

Enter file in which to save the key (/home/grid/.ssh/id_rsa):

Created directory '/home/grid/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/grid/.ssh/id_rsa.     这个是私钥

Your public key has been saved in /home/grid/.ssh/id_rsa.pub.    这个是公钥

The key fingerprint is:

50:29:8a:78:ac:0e:a1:72:10:2d:01:66:77:f0:7e:c1 grid@h4

The key's randomart image is:                               RSA随机图

+--[ RSA 2048]----+

|+= o..  ..       |

|= o o o..        |

| = . o.E         |

|+ + o  ..        |

|.=   . .S        |

|= .   .          |

|+.               |

| .               |

|                 |

+-----------------+

[grid@h4 ~]$ ll –lrta                           只要在家目录下生成.ssh隐藏目录就算成功

总用量

-rw-r--r--. 1 grid hadoop  500  1 24 2007 .emacs

drwxr-xr-x. 2 grid hadoop 4096 11 12 2010 .gnome2

-rw-r--r--. 1 grid hadoop  124  5 31 2011 .bashrc

-rw-r--r--. 1 grid hadoop  176  5 31 2011 .bash_profile

-rw-r--r--. 1 grid hadoop   18  5 31 2011 .bash_logout

drwxr-xr-x. 4 root root   4096  9  1 21:14 ..

drwx------. 5 grid hadoop 4096  9  1 21:34 .

drwx------. 2 grid hadoop 4096  9  1 21:34 .ssh

drwxr-xr-x. 4 grid hadoop 4096  9  2 2012 .mozilla

[grid@h4 ~]$ cd .ssh

[grid@h4 .ssh]$ ll

总用量 8

-rw-------. 1 grid hadoop 1675  9  1 21:34 id_rsa

-rw-r--r--. 1 grid hadoop  389  9  1 21:34 id_rsa.pub

[grid@h4 .ssh]$ cp id_rsa.pub authorized_keys     生成授权文件

[grid@h4 .ssh]$ cat authorized_keys              打开authorized_keys查看里面的公钥

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4

有了这个公钥还配合私钥,我们就可以免密码登陆了

 

H2

[grid@h2 ~]$ ssh-keygens -t rsa

-bash: ssh-keygens: command not found

[grid@h2 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/grid/.ssh/id_rsa):

Created directory '/home/grid/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/grid/.ssh/id_rsa.

Your public key has been saved in /home/grid/.ssh/id_rsa.pub.

The key fingerprint is:

14:55:b9:d1:4a:60:a1:5c:47:37:30:49:09:aa:30:3d grid@h2

The key's randomart image is:                            RSA随机图每个都是不一样的

+--[ RSA 2048]----+

|        ..BBB*o  |

|     . . * .*o.. |

|    o E =  . +   |

|     o +    o    |

|      . S        |

|                 |

|                 |

|                 |

|                 |

+-----------------+

[grid@h2 ~]$ cd .ssh

[grid@h2 .ssh]$ ll

总用量 12

-rw-------. 1 grid hadoop 1675  9  1 21:59 id_rsa         也生成了私钥和公钥

-rw-r--r--. 1 grid hadoop  389  9  1 21:59 id_rsa.pub

 

H4

[grid@h4 .ssh]$ scp authorized_keys h2:/home/grid/.ssh/     h4的授权文件->h2

 

H2

[grid@h2 .ssh]$ ll

总用量 12

-rw-r--r--. 1 grid hadoop  778  9  1 22:02 authorized_keys

-rw-------. 1 grid hadoop 1675  9  1 21:59 id_rsa

-rw-r--r--. 1 grid hadoop  389  9  1 21:59 id_rsa.pub

[grid@h2 .ssh]$ cat authorized_keys

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5iKGfOGKh3d8BYr4vkkNaEtZkxCbBzBn6pfD0n3h82/1f9PwEtT4CEgqzBssYvQ2Nbc6dUy2NbDD9j5dIwQENS/fAJDwccdiJjEYMo5+o4ocPABx6OVM0r9nsUkyU7bxeHjap3ZUmcC1UvgW5asOsRMl7ePCze+rnt5D5ldZ+VOKh0NgtY2/CST8qXHmedfZFbQSEhIPf5Lh4A6oSoRHTFQbDN4apvf5s7Cm5/NgPiyhU+KbHBz96pNCxkjuOwj69a7kx4AgQYJoYc0T9O6YfjfVy3l1a7N2aJ6jp4SMv0GaohgzIrBNXwoFK6skuyf10yIxvNlGzkhTYK9GS9hjJw

看现在授权文件中已经有了h4h2的公钥了,就差h1

 

H1

[grid@h1 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/grid//.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/grid//.ssh/id_rsa.

Your public key has been saved in /home/grid//.ssh/id_rsa.pub.

The key fingerprint is:

b6:4e:a6:05:d3:37:e7:3d:ca:44:7b:cf:2c:d2:5b:a4 grid@h1

The key's randomart image is:

+--[ RSA 2048]----+

|                 |

|                 |

|                 |

|       .         |

|      o S o o   .|

|       + o = o o |

|        =   +.E .|

|       *   o.oo* |

|      . .   o..o+|

+-----------------+

 

H2

[grid@h2 .ssh]$ scp authorized_keys h1:/home/grid/.ssh/

 

H1

[grid@h1 .ssh]$ ll

总用量 12

-rw-r--r--. 1 grid hadoop  778  9  1 22:12 authorized_keys

-rw-------. 1 grid hadoop 1675  9  1 22:12 id_rsa

-rw-r--r--. 1 grid hadoop  389  9  1 22:12 id_rsa.pub

 [grid@h1 .ssh]$ cat id_rsa.pub >>  authorized_keys     把三个节点的公钥相互拷贝到文件中

[grid@h1 .ssh]$ cat authorized_keys

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5iKGfOGKh3d8BYr4vkkNaEtZkxCbBzBn6pfD0n3h82/1f9PwEtT4CEgqzBssYvQ2Nbc6dUy2NbDD9j5dIwQENS/fAJDwccdiJjEYMo5+o4ocPABx6OVM0r9nsUkyU7bxeHjap3ZUmcC1UvgW5asOsRMl7ePCze+rnt5D5ldZ+VOKh0NgtY2/CST8qXHmedfZFbQSEhIPf5Lh4A6oSoRHTFQbDN4apvf5s7Cm5/NgPiyhU+KbHBz96pNCxkjuOwj69a7kx4AgQYJoYc0T9O6YfjfVy3l1a7N2aJ6jp4SMv0GaohgzIrBNXwoFK6skuyf10yIxvNlGzkhTYK9GS9hjJw== grid@h2

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5V1lyss14a8aWFEkTk/aBgKHFLMX/XZX/xtXVUqJl8NkTQVLQ37+XLyqvTfrcJSja70diqB3TrwBp3K5eXNxp3EOr6EGHsi0B6D8owsg0bCDhxHGHu8RX8WB4DH9UOv1uPL5BESAPHjuemQuQaQzLagqrnXbrKix8CzdIEgmnOknYiS49q9msnzawqo3luQFRU7MQvAU9UZqkxotrnzHqh0tgjJ3Sq6O6nscA7w//Xmb0JGobVQAFCDJQdn/z1kOq7E5WNhVa8ynF9GOF7cMdppug7Ibw1RZ9cKa+igi1KhhavS5H7XCM64NuGfC87aQE9nz0ysS3Kh8PT5h6zlxfw== grid@h1

[grid@h1 .ssh]$ scp authorized_keys h2:/home/grid/.ssh/    传给H2

[grid@h1 .ssh]$ scp authorized_keys h4:/home/grid/.ssh/    传给H4

 

4.安装JDK  h1 h2 h4上操作】

 

先把jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin拷贝到 h1 h2 h4 /usr目录下

加上执行权限

Chmod  777  jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin 

Root用户执行安装命令

./ jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin

cd命令进入/etc目录vim  profile即执行编辑profile文件命令

umask 022前添加如下内容: 环境变量

export JAVA_HOME=/usr/java/jdk1.6.0_25

export JRE_HOME=/usr/java/jdk1.6.0_25/jre

export PATH=$PATH:/usr/java/jdk1.6.0_25/bin

export CLASSPATH=./:/usr/java/jdk1.6.0_25/lib:/usr/java/jdk1.6.0_25/jre/lib

source profile  加载环境变量使之生效

 

5.下载hadoop压缩包并解压  h1 上操作】

下载网址:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-0.20.2/

将下载的hadoop-0.20.2.tar.gz包上传到/home/grid/目录

[grid@h1 grid]$ mv  hadoop-0.20.2.tar.gz  /home/grid/

[grid@h1 grid]$ tar -zxvf hadoop-0.20.2.tar.gz

【还可以先解压在解包】

gzip  -d  hadoop-0.20.2.tar.gz   解压   -d  decompress   代表解压并删除源文件

tar  -xvf  hadoop-0.20.2.tar     解包

 

6.修改hadoop-env.sh文件 h1 上操作】

添加export JAVA_HOME=/usr/java/jdk1.6.0_25 环境变量

[grid@h1 conf]$ pwd

/home/grid/hadoop-0.20.2/conf

[grid@h1 conf]$ ll

hadoop-env.sh

[grid@h1 conf]$ vim hadoop-env.sh

# The only required environment variable is JAVA_HOME.  All others are

# optional.  When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.

# The java implementation to use.  Required.

export JAVA_HOME=/usr/java/jdk1.6.0_25       把前面#号去掉,修改java目录

 

7.修改core-site.xml 文件 h1 上操作】

[grid@h1 conf]$ vim core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. --&gt

            

             fs.default.name            --文件系统默认名称节点

             hdfs://192.168.2.102:9000    --名称节点ip地址和端口号

             

 

8.修改hdfs-site.xml文件 h1 上操作】

[grid@h1 hadoop-0.20.2]$ mkdir data       建立存放数据的目录

[grid@h1 hadoop-0.20.2]$ cd conf

[grid@h1 conf]$ pwd

/home/grid/hadoop-0.20.2/conf

[grid@h1 conf]$ vim hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. --&gt

 

             

              dfs.data.dir        --数据节点存放数据的位置

              /home/grid/hadoop-0.20.2/data   --存放路径

             

             

              dfs.replication     --分布式文件系统,数据块复制多少份

              2                --有几个datanode节点就复制几份

                                   --这里有2个,我们复制2

HDFS默认存储目录在/tmp/hadoop-${user.name},这些属性的设置很重要,我们要把它修改成我们设计的位置来存放

 

9.修改mapred-site.xml文件【h1 上操作】

[grid@h1 conf]$ vim mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. --&gt

 

              

               mapred.job.tracker  --设置jobtracker作业跟踪器主机ip:端口

               192.168.2.102:9001   --master节点ip:9001端口

                                     --9001hadoop喜欢默认端口不用修改

 

 

10.修改mastersslaves文件【h1 上操作】

 

[grid@h1 conf]$ vim masters         记录运行namenodejobtracker的主机,一行代表一个

h1

[grid@h1 conf]$ vim slaves           记录运行datanodetasktracker的主机,一行代表一个

h2

h4

 

11.h2  h4 复制hadoop-0.20.2目录

[grid@h1 grid]$ scp -r ./hadoop-0.20.2/ h4:/home/grid/    h4节点复制

[grid@h1 grid]$ scp -r ./hadoop-0.20.2/ h2:/home/grid/    h2节点复制

 

12.格式化分布式文件系统【h1 操作 格式化名称节点】

[grid@h1 bin]$ pwd

/home/grid/hadoop-0.20.2/bin

[grid@h1 bin]$ ./hadoop namenode -format

12/09/02 20:19:34 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = h1/192.168.2.102    主机

STARTUP_MSG:   args = [-format]           格式化

STARTUP_MSG:   version = 0.20.2           版本号

STARTUP_MSG:  build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

12/09/02 20:19:35 INFO namenode.FSNamesystem: fsOwner=grid,hadoop     拥有者

12/09/02 20:19:35 INFO namenode.FSNamesystem: supergroup=supergroup   超级组

12/09/02 20:19:35 INFO namenode.FSNamesystem: isPermissionEnabled=true 

12/09/02 20:19:36 INFO common.Storage: Image file of size 94 saved in 0 seconds.

12/09/02 20:19:36 INFO common.Storage: Storage directory /tmp/hadoop-grid/dfs/name has been successfully formatted.    存储目录格式化成功

12/09/02 20:19:36 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at h1/192.168.2.102       关闭名称节点

************************************************************/

格式化名称节点:建立一系列结构,存放HDFS元数据

 

13.启动Hadoop  【只在h1上操作就可以】

命令:bin/start-all.sh

[grid@h1 bin]$ ./start-all.sh

starting namenode, logging to     启动名称节点h1日志路径

/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-h1.out

h4: starting datanode, logging to   启动数据节点h4日志路径

/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h4.out

h2: starting datanode, logging to   启动数据节点h2日志路径

/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h2.out

The authenticity of host 'h1 (::1)' can't be established.

RSA key fingerprint is c0:84:4f:27:ef:aa:a8:77:24:b7:00:72:fc:bb:32:aa.

Are you sure you want to continue connecting (yes/no)? yes

h1: Warning: Permanently added 'h1' (RSA) to the list of known hosts.

h1: starting secondarynamenode, logging to   启动辅助名称节点日志路径

/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-h1.out

starting jobtracker, logging to               启动作业跟追器日志路径

/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-h1.out

h4: starting tasktracker, logging to           启动任务跟追器日志路径

/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h4.out

h2: starting tasktracker, logging to           启动任务跟追器日志路径

/home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h2.out

 

14.检测守护进程启动情况

H1

[grid@h1 bin]$ pwd

/usr/java/jdk1.6.0_25/bin

[grid@h1 bin]$ ./jps             查看master后台java进程,统计和运行这个就可以查看了

28037 NameNode               名称节点进程       28037是进程号

28950 Jps

28220 SecondaryNameNode       辅助名称节点进程   28220是进程号

28259 JobTracker                作业跟踪器进程     28259是进程号

第二种查看java进程方法

[grid@h1 bin]$ ps -ef | grep java   

 

H4

[grid@h4 logs]$ cd /usr/java/jdk1.6.0_25/bin

[grid@h4 bin]$ ./jps              查看slave后台java进程,统计和运行这个就可以查看了

9754 DataNode                  数据节点进程

31085 Jps                       java进程

9847 TaskTracker                 任务跟踪器进程

 

H2

[grid@h2 logs]$ cd /usr/java/jdk1.6.0_25/bin  

[grid@h2 bin]$ ./jps              查看slave后台java进程,统计和运行这个就可以查看了

7435 DataNode                  数据节点进程

7535 TaskTracker                 任务跟踪器进程

2261 Jps                        java进程

 

15.Hadoop测试

1)创建一个文本leonarding.txt

[grid@h1 grid]$ vim leonarding.txt

2)文本内容是I Love You Hadoop:)

[grid@h1 grid]$ cat leonarding.txt

I Love You Hadoop:)                     

[grid@h1 grid]$ cd hadoop-0.20.2/bin

3)在HDFS文件系统上创建一个目录leo

[grid@h1 bin]$ ./hadoop fs -mkdir /leo  

4)复制文件leonarding.txtleo目录   

[grid@h1 bin]$ ./hadoop fs -copyFromLocal leonarding.txt /leo

5)显示HDSF文件系统目录下的内容

[grid@h1 bin]$ ./hadoop fs -ls /leo          

Found 1 items

-rw-r--r--   2 grid supergroup          0 2012-09-02 21:08 /leo/leonarding.txt

6)查看在HDFS文件系统上leonarding.txt内容

[grid@h1 bin]$ ./hadoop fs -cat /leo/leonarding.txt  

实验完毕

 

Leonarding
2012.9.2
天津&autumn
分享技术~收获快乐
Blog
http://space.itpub.net/26686207

 

 

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26686207/viewspace-742502/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/26686207/viewspace-742502/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值