Hadoop2.2.0完全分布式配置

实际上Hadoop完全分布式配置跟伪分布式配置没有本质的区别,前者只是把后者的多个守护进程配置到各个pc节点上。
下面给出搜集的5台pc组成的集群配置。

主机名

IP

角色

系统

用户

hadoop1

192.168.1.101

namenode/resourcemanager

CentOS 6.5

hadoop

hadoop2

192.168.1.102

secondarynamenode/保留(ZooKeeper)

CentOS 6.5

hadoop

hadoop3

192.168.1.103

datanode/nodemanager

CentOS 6.5

hadoop

hadoop4

192.168.1.104

datanode/nodemanager

CentOS 6.5

hadoop

hadoop5

192.168.1.105

datanode/nodemanager

CentOS 6.5

hadoop

1、设置所有pc环境变量,具体情况自己修改

[hadoop@hadoop1 ~]$ sudo vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/binexport JAVA_HOME=/usr/java/jdk1.7.0_51
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:%JAVA_HOME%/lib/dt.jar:%JAVA_HOME%/lib/tools.jar

2、修改IP地址

方法一:如果系统带有图形界面,可以用centos自带的网络管理小程序直接设置

方法二:sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0

测试:保证局域网环境内5台pc相互ping通,ping主机名或ip地址

3、设置主机名及IP与主机名称映射

[hadoop@hadoop1 ~]$ sudo vi/etc/sysconfig/network
HOSTNAME=hadoop1.centos
[hadoop@hadoop1 ~]$ sudo vi /etc/hosts
192.168.1.101   hadoop1.centos hadoop1
192.168.1.102   hadoop2.centos hadoop2
192.168.1.103   hadoop3.centos hadoop3
192.168.1.104   hadoop4.centos hadoop4
192.168.1.105   hadoop5.centos hadoop5

4、设置ssh无密码登录,这里主要保证我所用的hadoop1能够远程操控其他pc,后期配置相对方便一些。

首先在hadoop1上设置好无密码登录,具体请参考《hadoop2.2.0伪分布式配置》中ssh无密码设置,

再将hadoop1上生成的id_rsa.pub拷贝到hadoop2-5上去即可。如有错误,请祥看[1]里面的说明。

scp /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop2:~/.ssh/authorized_keys

设置ssh开机自启动

[hadoop@hadoop1 ~]$ sudo chkconfig sshd on

5、安装Hadoop,下载地址http://mirror.esocc.com/apache/hadoop/common/

这里用的是hadoop2.2.0.tar.gz

将hadoop解压到/usr/local,环境变量配置见第1步,这里我将hadoop及其包含的文件的所有者和所属组均设置为hadoop,不然以后

指令命令都需要root权限。

[hadoop@hadoop1 hadoop]$ ls -l
总用量 100
drwxr-xr-x. 2 hadoop hadoop  4096 10月  7 14:38 bin
drwxr-xr-x. 3 hadoop hadoop  4096 3月  12 22:37 etc
drwxr-xr-x. 2 hadoop hadoop  4096 10月  7 14:38 include
drwxr-xr-x. 3 hadoop hadoop  4096 10月  7 14:38 lib
drwxr-xr-x. 2 hadoop hadoop  4096 10月  7 14:38 libexec
-rw-r--r--. 1 hadoop hadoop 15164 10月  7 14:46 LICENSE.txt
drwxr-xr-x. 3 hadoop hadoop  4096 3月  22 15:40 logs
-rw-r--r--. 1 hadoop hadoop   101 10月  7 14:46 NOTICE.txt
-rw-r--r--. 1 hadoop hadoop  1366 10月  7 14:46 README.txt
drwxr-xr-x. 2 hadoop hadoop  4096 10月  7 14:38 sbin
drwxr-xr-x. 4 hadoop hadoop  4096 10月  7 14:38 share

设置jdk,修改etc/hadoop/hadoop-env.sh和yarn-env.sh

[hadoop@hadoop1 hadoop]$ vi hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.7.0_51
[hadoop@hadoop1 hadoop]$ vi yarn-env.sh
# some Java parameters
export JAVA_HOME=/usr/java/jdk1.7.0_51

6、Hadoop集群配置,配置文件在etc/hadoop下

(1)修改core-site.xml

<!-- core-site.xml -->
<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop1:9000</value>
</property>
<property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
    <description>Size of read/write buffer used in SequenceFiles.default:4096</description>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop-${user.name}</value>
    <description>Default configuration.</description>
</property>
</configuration>

(2)修改hdfs-site.xml

<!-- hdfs-site.xml -->
<configuration>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop2:50090</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file://${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file://${hadoop.tmp.dir}/dfs/data</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>
</configuration>

(3)修改yarn-site.xml

<!-- yarn-site.xml -->
<configuration>
<property>
    <name>yarn.resourcemanager.address</name>
   <value>hadoop1:8032</value>
</property>
<property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>hadoop1:8030</value>
</property>
<property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>hadoop1:8031</value>
</property>
<property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>hadoop1:8033</value>
</property>
<property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>hadoop1:8088</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

(4)创建mapred-site.xml

[hadoop@hadoop1 hadoop]$ mv mapred-site.xml mapred-site.xml
<!-- mapred-site.xml -->
<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop1:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop1:19888</value>
</property>
</configuration>

(5)编辑slaves,启动hdfs时就会启动这3台pc的datanode,启动yarn时就会启动这3台pc的nodemanager

[hadoop@hadoop1 hadoop]$ vi slaves
hadoop3
hadoop4
hadoop5

配置好相关文件后,将hadoop-2.2.0整个文件夹拷贝到hadoop2-5上面去,无需任何修改!

scp -r /usr/local/hadoop hadoop@hadoop2:/usr/local

8、关闭hadoop1-5防火墙,也可以开设置防火墙开放hadoop1-5上的端口,不过很麻烦

[hadoop@hadoop1 ~]$ sudo service iptables stop

设置防火墙开机不启动

[hadoop@hadoop1 ~]$ sudo chkconfig iptables off

9、运行Hadoop

[hadoop@hadoop1 ~]$ hdfs namenode -format
省略...
Re-format filesystem in Storage Directory /tmp/hadoop-hadoop/dfs/name ? (Y or N) y
14/03/22 17:43:37 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
14/03/22 17:43:37 INFO namenode.FSImage: Saving image file /tmp/hadoop
hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
14/03/22 17:43:37 INFO namenode.FSImage: Image file /tmp/hadoop
hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
14/03/22 17:43:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/03/22 17:43:37 INFO util.ExitUtil: Exiting with status 0
14/03/22 17:43:37 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1.centos/192.168.1.101
************************************************************/
[hadoop@hadoop1 ~]$ cd /usr/local/hadoop/sbin
[hadoop@hadoop1 sbin]$ start-dfs.sh
Starting namenodes on [hadoop1]
hadoop1: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-hadoop1.centos.out
hadoop5: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop5.centos.out
hadoop4: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop4.centos.out
hadoop3: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop3.centos.out
Starting secondary namenodes [hadoop2]
hadoop2: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenodehadoop2.centos.out
[hadoop@hadoop1 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-hadoop1.centos.out
hadoop4: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop4.centos.out
hadoop5: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop5.centos.out
hadoop3: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.centos.out
[hadoop@hadoop1 sbin]$ jps
5247 Jps
4495 NameNode
4993 ResourceManager
[hadoop@hadoop1 sbin]$ ssh hadoop2
Last login: Sun Mar 23 00:38:40 2014 from hadoop1.centos
[hadoop@hadoop2 ~]$ jps
3505 SecondaryNameNode
3565 Jps
[hadoop@hadoop1 sbin]$ ssh hadoop3
Last login: Sun Mar 23 02:00:17 2014 from hadoop1.centos
[hadoop@hadoop3 ~]$ jps
3526 NodeManager
3649 Jps
3426 DataNode

10、测试

http://hadoop1:50070 -->namenodeUI


http://hadoop2:50090 -->secondaryUI


http://hadoop1:8088 -->resourcemanagerUI



一般在部署中出现问题,如一些节点不能启动,一般都是文件权限的问题,这些错误在日志文件里面描述的很清楚。

这里只是最基本配置,复杂的配置和需求建议大家参考hadoop的官方文档,里面写的很详细!我是Hadoop初学者,有错误恳请指正!

[1]http://m.blog.csdn.net/blog/w397090770/14446291

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值