hadoop 2.2.0 集群模式安装配置和测试


hadoop 2.2.0 集群模式安装配置和测试

作者: Michael 日期: 2014 年 1 月 22 日 发表评论 (5) 查看评论

本文详细记录Hadoop 2.2.0 集群安装配置的步骤,并运行演示一个简单的job。基本目录结构如下:

  • 环境准备
  • Hadoop安装配置
  • 启动及演示

[一]、环境准备

本文所有集群节点的操作系统均为:CentOS 6.0 32位,不管是实体机还是虚拟机都可以,在这里统一叫做 “实例”吧,以4 台主机实例作为集群配置的演示,具体的划分如下:

hostname IP 用途
Master.Hadoop 192.168.13.33 NameNode/ResouceManager
Slave0.Hadoop 192.168.13.30 DataNode/NodeManager
Slave1.Hadoop 192.168.13.31 DataNode/NodeManager
Slave2.Hadoop 192.168.13.32 DataNode/NodeManager
Slave4.Hadoop 192.168.13.34 DataNode/NodeManager
Slave5.Hadoop 192.168.13.35 DataNode/NodeManager


ps:如果是虚拟机可以把环境配置好后,copy多个实例即可,需要注意修改hostname/hosts,关闭所有机器防火墙(service iptables stop

1、vi /etc/hosts  添加如下内容:

192.168.13.33 Master.Hadoop
192.168.13.30 Slave0.Hadoop
192.168.13.31 Slave1.Hadoop
192.168.13.32 Slave2.Hadoop
192.168.13.34 Slave4.Hadoop
192.168.13.35 Slave5.Hadoop

2、JDK

到Java 的官网下载jdk6 64位的版本,安装最基础的安装即可,当然由于CentOS6 自带了OpenJDK,本文直接用OpenJDK来演示(ps: OpenJDK的目录一般在/usr/lib/jvm/ 路径下),该系统的JAVA_HOME 配置如下:export JAVA_HOME = /usr/lib/jvm/java-1.6.0-openjdk.x86_64

3、SSHD服务

确保系统已经安装了SSHD相关服务,并启动(CentOS默认已经安装好)。

4、创建用户(未创建,搭建过程使用Root可以正常搭建测试)

创建一个专用的账户:hadoop

1 $ useradd hadoop

5、配置SSH无密码登录

需要实现 Master到所有的Slave的SSH无密码登录(所有Slave 到Master的SSH无密码登录不需要)

有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆

6、配置时钟同步

3 $ crontab -e
4 */2 * * * * /usr/sbin/ntpdate 192.168.9.2

ps: 如果是实体机以上所有步骤需要在每个实例里都操作一遍;如果是虚拟机只需要一个实例中完成,其他实例复制即可。

7、关闭防火墙

在master和slave执行: service iptables stop

8、修改hostname

[root@Master sources]# more /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=Master.Hadoop
[root@Master sources]# hostname Master.Hadoop
[root@Master sources]# hostname
Master.Hadoop

[二]、Hadoop安装配置

0、在master执行后,使用scp分发到其他5台Slave机器上

1、下载源码编译本地库----(可直接下载发布包)

由于官方的发布包中的本地库是32位的,不符合我们的要求,需要自己编译本地库,编译本地库的过程可以参考:Hadoop 2.x build native library on Mac os x ,大同小异,编译完成后,替换<HADOOP_HOME>/lib/native/ 下的文件即可,注意lib文件名

ps:这步只需要做一次即可,因为集群中的6个实例的环境一样。

下载地址:

2、下载发布包

打开官方下载链接 http://hadoop.apache.org/releases.html#Download  ,选择2.2.0版本的发布包下载后解压到指定路径下:

1 $ tar -zxf hadoop-2.2.0.tar.gz

那么本文中 HADOOP_HOME = /home/hadoop/ .

3、配置hadoop用户的环境变量 vi ~/.bash_profile ,添加如下内容:

1 # set java environment
  export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH

# Hadoop
export HADOOP_PREFIX=/home/hadoop/
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
 

4、编辑 <HADOOP_HOME>/etc/hadoop/hadoop-env.sh

修改JAVA_HOME的配置:

1 export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9

5、编辑 <HADOOP_HOME>/etc/hadoop/yarn-env.sh

修改JAVA_HOME的配置:

1 export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9

6、编辑  <HADOOP_HOME>/etc/hadoop/core-site.xml

在<configuration>节点下添加或者更新下面的配置信息:

1 <!-- 新变量f:s.defaultFS 代替旧的:fs.default.name |micmiu.com-->
2 <property>
3     <name>fs.defaultFS</name>
4     <value>hdfs://Master.Hadoop:9000</value>
5     <description>The name of the default file system.</description>
6 </property>
7 <property>
8     <name>hadoop.tmp.dir</name>
9     <!-- 注意创建相关的目录结构 -->
10     <value>/usr/local/hadoop/temp</value>
11     <description>A base for other temporary directories.</description>
12 </property>

7、编辑<HADOOP_HOME>/etc/hadoop/hdfs-site.xml

在<configuration>节点下添加或者更新下面的配置信息:

1 <property>
2     <name>dfs.replication</name>
3     <!-- 值需要与实际的DataNode节点数要一致,本文为3 -->
4     <value>3</value>
5     </property>
6 <property>
7     <name>dfs.namenode.name.dir</name>
8     <!-- 注意创建相关的目录结构 -->
9     <value>file:/usr/local/hadoop/dfs/name</value>
10     <final>true</final>
11 </property>
12 <property>
13     <name>dfs.datanode.data.dir</name>
14      <!-- 注意创建相关的目录结构 -->
15     <value>file:/usr/local/hadoop/dfs/data</value>
16 </property>

8、编辑<HADOOP_HOME>/etc/hadoop/yarn-site.xml

在<configuration>节点下添加或者更新下面的配置信息:

1 <!-- micmiu.com -->
2 <property>
3     <name>yarn.nodemanager.aux-services</name>
4     <value>mapreduce_shuffle</value>
5 </property>
6 <property>
7     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
8     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
9 </property>
10  
11 <!--  resourcemanager hostname或ip地址-->
12 <property>
13     <name>yarn.resourcemanager.hostname</name>
14     <value>Master.Hadoop</value>
15 </property>

9、编辑 <HADOOP_HOME>/etc/hadoop/mapred-site.xml

默认没有mapred-site.xml文件,copy  mapred-site.xml.template 一份为 mapred-site.xml即可

在<configuration>节点下添加或者更新下面的配置信息:

1 <!-- micmiu.com -->
2 <property>
3         <name>mapreduce.framework.name</name>
4         <value>yarn</value>
5         <final>true</final>
6 </property>

10、编辑  <HADOOP_HOME>/etc/hadoop/slaves

Slave0.Hadoop
Slave1.Hadoop
Slave2.Hadoop
Slave4.Hadoop
Slave5.Hadoop

11、将master的hadoop分发到其他slave节点

scp -r /home/hadoop slave0.hadoop:/home
scp -r /home/hadoop slave1.hadoop:/home
scp -r /home/hadoop slave2.hadoop:/home
scp -r /home/hadoop slave4.hadoop:/home
scp -r /home/hadoop slave5.hadoop:/home

[三]、启动和测试

1、启动Hadoop


1.1、第一次启动需要在Master.Hadoop 执行format  hdfs namenode -format  :

PS:不能重复格式化多次,否则,后续会异常。

1 [hadoop@Master ~]$ hdfs namenode -format
2 14/01/22 15:43:10 INFO namenode.NameNode: STARTUP_MSG:
3 /************************************************************
4 STARTUP_MSG: Starting NameNode
5 STARTUP_MSG:   host = Master.Hadoop/192.168.6.77
6 STARTUP_MSG:   args = [-format]
7 STARTUP_MSG:   version = 2.2.0
8 STARTUP_MSG:   classpath =
9 ........................................
10 ............micmiu.com.............
11 ........................................
12 STARTUP_MSG:   java = 1.6.0_20
13 ************************************************************/
14 14/01/22 15:43:10 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15 Formatting using clusterid: CID-645f2ed2-6f02-4c24-8cbc-82b09eca963d
16 14/01/22 15:43:11 INFO namenode.HostFileManager: read includes:
17 HostSet(
18 )
19 14/01/22 15:43:11 INFO namenode.HostFileManager: read excludes:
20 HostSet(
21 )
22 14/01/22 15:43:11 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
23 14/01/22 15:43:11 INFO util.GSet: Computing capacity for map BlocksMap
24 14/01/22 15:43:11 INFO util.GSet: VM type       = 64-bit
25 14/01/22 15:43:11 INFO util.GSet: 2.0% max memory = 888.9 MB
26 14/01/22 15:43:11 INFO util.GSet: capacity      = 2^21 = 2097152 entries
27 14/01/22 15:43:11 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
28 14/01/22 15:43:11 INFO blockmanagement.BlockManager: defaultReplication         = 3
29 14/01/22 15:43:11 INFO blockmanagement.BlockManager: maxReplication             = 512
30 14/01/22 15:43:11 INFO blockmanagement.BlockManager: minReplication             = 1
31 14/01/22 15:43:11 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
32 14/01/22 15:43:11 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
33 14/01/22 15:43:11 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
34 14/01/22 15:43:11 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
35 14/01/22 15:43:11 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
36 14/01/22 15:43:11 INFO namenode.FSNamesystem: supergroup          = supergroup
37 14/01/22 15:43:11 INFO namenode.FSNamesystem: isPermissionEnabled = true
38 14/01/22 15:43:11 INFO namenode.FSNamesystem: HA Enabled: false
39 14/01/22 15:43:11 INFO namenode.FSNamesystem: Append Enabled: true
40 14/01/22 15:43:11 INFO util.GSet: Computing capacity for map INodeMap
41 14/01/22 15:43:11 INFO util.GSet: VM type       = 64-bit
42 14/01/22 15:43:11 INFO util.GSet: 1.0% max memory = 888.9 MB
43 14/01/22 15:43:11 INFO util.GSet: capacity      = 2^20 = 1048576 entries
44 14/01/22 15:43:11 INFO namenode.NameNode: Caching file names occuring more than 10 times
45 14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
46 14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
47 14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
48 14/01/22 15:43:11 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
49 14/01/22 15:43:11 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
50 14/01/22 15:43:11 INFO util.GSet: Computing capacity for map Namenode Retry Cache
51 14/01/22 15:43:11 INFO util.GSet: VM type       = 64-bit
52 14/01/22 15:43:11 INFO util.GSet: 0.029999999329447746% max memory = 888.9 MB
53 14/01/22 15:43:11 INFO util.GSet: capacity      = 2^15 = 32768 entries
54 14/01/22 15:43:11 INFO common.Storage: Storage directory /usr/local/hadoop/dfs/name has been successfully formatted.
55 14/01/22 15:43:11 INFO namenode.FSImage: Saving image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
56 14/01/22 15:43:11 INFO namenode.FSImage: Image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
57 14/01/22 15:43:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
58 14/01/22 15:43:11 INFO util.ExitUtil: Exiting with status 0
59 14/01/22 15:43:11 INFO namenode.NameNode: SHUTDOWN_MSG:
60 /************************************************************
61 SHUTDOWN_MSG: Shutting down NameNode at Master.Hadoop/192.168.6.77
62 ************************************************************/

1.2、在Master.Hadoop执行 start-dfs.sh 

1 [hadoop@Master ~]$ start-dfs.sh
2 Starting namenodes on [Master.Hadoop]
3 Master.Hadoop: starting namenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-namenode-Master.Hadoop.out
4 Slave7.Hadoop: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave7.Hadoop.out
5 Slave5.Hadoop: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave5.Hadoop.out
6 Slave6.Hadoop: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave6.Hadoop.out
7 Starting secondary namenodes [0.0.0.0]
8 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out

在Master.Hadoop 验证启动进程:

1 [hadoop@Master ~]$ jps
2 7695 Jps
3 7589 SecondaryNameNode
4 7403 NameNode

在SlaveX.Hadop 验证启动进程如下:

1 [hadoop@Slave5 ~]$ jps
2 8724 DataNode
3 8815 Jps

1.3、在Master.Hadoop 执行 start-yarn.sh :

1 [hadoop@Master ~]$ start-yarn.sh
2 starting yarn daemons
3 starting resourcemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-resourcemanager-Master.Hadoop.out
4 Slave7.Hadoop: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave7.Hadoop.out
5 Slave5.Hadoop: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave5.Hadoop.out
6 Slave6.Hadoop: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave6.Hadoop.out

在Master.Hadoop 验证启动进程:

1 [hadoop@Master ~]$ jps
2 8071 Jps
3 7589 SecondaryNameNode
4 7821 ResourceManager
5 7403 NameNode

在SlaveX.Hadop 验证启动进程如下:

1 [hadoop@Slave5 ~]$ jps
2 9013 Jps
3 8724 DataNode
4 8882 NodeManager

2、演示

2.1、演示hdfs 一些常用命令,为wordcount演示做准备:

1 [hadoop@Master ~]$ hdfs dfs -ls /
2 [hadoop@Master ~]$ hdfs dfs -mkdir /user
3 [hadoop@Master ~]$ hdfs dfs -mkdir -p /user/micmiu/wordcount/in
4 [hadoop@Master ~]$ hdfs dfs -ls /user/micmiu/wordcount
5 Found 1 items
6 drwxr-xr-x   - hadoop supergroup          0 2014-01-22 16:01 /user/micmiu/wordcount/in

2.2、本地创建三个文件 micmiu-01.txt、micmiu-03.txt、micmiu-03.txt, 分别写入如下内容:

micmiu-01.txt:

Hi Michael welcome to Hadoop 
more see micmiu.com

micmiu-02.txt:

Hi Michael welcome to BigData
more see micmiu.com

micmiu-03.txt:

Hi Michael welcome to Spark 
more see micmiu.com

把 micmiu 打头的三个文件上传到hdfs:

1 [hadoop@Master ~]$ hdfs dfs -put micmiu*.txt /user/micmiu/wordcount/in
2 [hadoop@Master ~]$ hdfs dfs -ls /user/micmiu/wordcount/in
3 Found 3 items
4 -rw-r--r--   3 hadoop supergroup         50 2014-01-22 16:06 /user/micmiu/wordcount/in/micmiu-01.txt
5 -rw-r--r--   3 hadoop supergroup         50 2014-01-22 16:06 /user/micmiu/wordcount/in/micmiu-02.txt
6 -rw-r--r--   3 hadoop supergroup         49 2014-01-22 16:06 /user/micmiu/wordcount/in/micmiu-03.txt

2.3、然后cd 切换到Hadoop的根目录下执行

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount  /user/micmiu/wordcount/in /user/micmiu/wordcount/out

ps: hdfs 中 /user/micmiu/wordcount/out 目录不能存在 否则运行报错。

看到类似如下的日志信息:

1 [hadoop@Master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount  /user/micmiu/wordcount/in /user/micmiu/wordcount/out
2 14/01/22 16:36:28 INFO client.RMProxy: Connecting to ResourceManager at Master.Hadoop/192.168.6.77:8032
3 14/01/22 16:36:29 INFO input.FileInputFormat: Total input paths to process : 3
4 14/01/22 16:36:29 INFO mapreduce.JobSubmitter: number of splits:3
5 ............................
6 .....micmiu.com........
7 ............................
8 File System Counters
9         FILE: Number of bytes read=297
10         FILE: Number of bytes written=317359
11         FILE: Number of read operations=0
12         FILE: Number of large read operations=0
13         FILE: Number of write operations=0
14         HDFS: Number of bytes read=536
15         HDFS: Number of bytes written=83
16         HDFS: Number of read operations=12
17         HDFS: Number of large read operations=0
18         HDFS: Number of write operations=2
19     Job Counters
20         Launched map tasks=3
21         Launched reduce tasks=1
22         Data-local map tasks=3
23         Total time spent by all maps in occupied slots (ms)=55742
24         Total time spent by all reduces in occupied slots (ms)=3933
25     Map-Reduce Framework
26         Map input records=6
27         Map output records=24
28         Map output bytes=243
29         Map output materialized bytes=309
30         Input split bytes=387
31         Combine input records=24
32         Combine output records=24
33         Reduce input groups=10
34         Reduce shuffle bytes=309
35         Reduce input records=24
36         Reduce output records=10
37         Spilled Records=48
38         Shuffled Maps =3
39         Failed Shuffles=0
40         Merged Map outputs=3
41         GC time elapsed (ms)=1069
42         CPU time spent (ms)=12390
43         Physical memory (bytes) snapshot=846753792
44         Virtual memory (bytes) snapshot=5155561472
45         Total committed heap usage (bytes)=499580928
46     Shuffle Errors
47         BAD_ID=0
48         CONNECTION=0
49         IO_ERROR=0
50         WRONG_LENGTH=0
51         WRONG_MAP=0
52         WRONG_REDUCE=0
53     File Input Format Counters
54         Bytes Read=149
55     File Output Format Counters
56         Bytes Written=83

到此 wordcount的job已经执行完成,执行如下命令可以查看刚才job的执行结果:

1 [hadoop@Master hadoop]$ hdfs dfs -ls /user/micmiu/wordcount/out
2 Found 2 items
3 -rw-r--r--   3 hadoop supergroup          0 2014-01-22 16:38 /user/micmiu/wordcount/out/_SUCCESS
4 -rw-r--r--   3 hadoop supergroup         83 2014-01-22 16:38 /user/micmiu/wordcount/out/part-r-00000
5 [hadoop@Master hadoop]$ hdfs dfs -cat /user/micmiu/wordcount/out/part-r-00000
6 BigData 1
7 Hadoop  1
8 Hi  3
9 Michael 3
10 Spark   1
11 micmiu.com  3
12 more    3
13 see 3
14 to  3
15 welcome 3

打开浏览器输入:http://192.168.13.33(Master.Hadoop):8088 可查看相关的应用运行情况。


参考文档:

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值