实际上Hadoop完全分布式配置跟伪分布式配置没有本质的区别,前者只是把后者的多个守护进程配置到各个pc节点上。
下面给出搜集的5台pc组成的集群配置。
主机名 | IP | 角色 | 系统 | 用户 |
hadoop1 | 192.168.1.101 | namenode/resourcemanager | CentOS 6.5 | hadoop |
hadoop2 | 192.168.1.102 | secondarynamenode/保留(ZooKeeper) | CentOS 6.5 | hadoop |
hadoop3 | 192.168.1.103 | datanode/nodemanager | CentOS 6.5 | hadoop |
hadoop4 | 192.168.1.104 | datanode/nodemanager | CentOS 6.5 | hadoop |
hadoop5 | 192.168.1.105 | datanode/nodemanager | CentOS 6.5 | hadoop |
1、设置所有pc环境变量,具体情况自己修改
[hadoop@hadoop1 ~]$ sudo vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/binexport JAVA_HOME=/usr/java/jdk1.7.0_51
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:%JAVA_HOME%/lib/dt.jar:%JAVA_HOME%/lib/tools.jar
2、修改IP地址
方法一:如果系统带有图形界面,可以用centos自带的网络管理小程序直接设置
方法二:sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
测试:保证局域网环境内5台pc相互ping通,ping主机名或ip地址
3、设置主机名及IP与主机名称映射
[hadoop@hadoop1 ~]$ sudo vi/etc/sysconfig/network
HOSTNAME=hadoop1.centos
[hadoop@hadoop1 ~]$ sudo vi /etc/hosts
192.168.1.101 hadoop1.centos hadoop1
192.168.1.102 hadoop2.centos hadoop2
192.168.1.103 hadoop3.centos hadoop3
192.168.1.104 hadoop4.centos hadoop4
192.168.1.105 hadoop5.centos hadoop5
4、设置ssh无密码登录,这里主要保证我所用的hadoop1能够远程操控其他pc,后期配置相对方便一些。
首先在hadoop1上设置好无密码登录,具体请参考《hadoop2.2.0伪分布式配置》中ssh无密码设置,
再将hadoop1上生成的id_rsa.pub拷贝到hadoop2-5上去即可。如有错误,请祥看[1]里面的说明。
scp /home/hadoop/.ssh/id_rsa.pub hadoop@hadoop2:~/.ssh/authorized_keys
设置ssh开机自启动
[hadoop@hadoop1 ~]$ sudo chkconfig sshd on
5、安装Hadoop,下载地址http://mirror.esocc.com/apache/hadoop/common/
这里用的是hadoop2.2.0.tar.gz
将hadoop解压到/usr/local,环境变量配置见第1步,这里我将hadoop及其包含的文件的所有者和所属组均设置为hadoop,不然以后
指令命令都需要root权限。
[hadoop@hadoop1 hadoop]$ ls -l
总用量 100
drwxr-xr-x. 2 hadoop hadoop 4096 10月 7 14:38 bin
drwxr-xr-x. 3 hadoop hadoop 4096 3月 12 22:37 etc
drwxr-xr-x. 2 hadoop hadoop 4096 10月 7 14:38 include
drwxr-xr-x. 3 hadoop hadoop 4096 10月 7 14:38 lib
drwxr-xr-x. 2 hadoop hadoop 4096 10月 7 14:38 libexec
-rw-r--r--. 1 hadoop hadoop 15164 10月 7 14:46 LICENSE.txt
drwxr-xr-x. 3 hadoop hadoop 4096 3月 22 15:40 logs
-rw-r--r--. 1 hadoop hadoop 101 10月 7 14:46 NOTICE.txt
-rw-r--r--. 1 hadoop hadoop 1366 10月 7 14:46 README.txt
drwxr-xr-x. 2 hadoop hadoop 4096 10月 7 14:38 sbin
drwxr-xr-x. 4 hadoop hadoop 4096 10月 7 14:38 share
设置jdk,修改etc/hadoop/hadoop-env.sh和yarn-env.sh
[hadoop@hadoop1 hadoop]$ vi hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.7.0_51
[hadoop@hadoop1 hadoop]$ vi yarn-env.sh
# some Java parameters
export JAVA_HOME=/usr/java/jdk1.7.0_51
6、Hadoop集群配置,配置文件在etc/hadoop下
(1)修改core-site.xml
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>Size of read/write buffer used in SequenceFiles.default:4096</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<description>Default configuration.</description>
</property>
</configuration>
(2)修改hdfs-site.xml
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
(3)修改yarn-site.xml
<!-- yarn-site.xml -->
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
(4)创建mapred-site.xml
[hadoop@hadoop1 hadoop]$ mv mapred-site.xml mapred-site.xml
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
</configuration>
(5)编辑slaves,启动hdfs时就会启动这3台pc的datanode,启动yarn时就会启动这3台pc的nodemanager
[hadoop@hadoop1 hadoop]$ vi slaves
hadoop3
hadoop4
hadoop5
配置好相关文件后,将hadoop-2.2.0整个文件夹拷贝到hadoop2-5上面去,无需任何修改!
scp -r /usr/local/hadoop hadoop@hadoop2:/usr/local
8、关闭hadoop1-5防火墙,也可以开设置防火墙开放hadoop1-5上的端口,不过很麻烦
[hadoop@hadoop1 ~]$ sudo service iptables stop
设置防火墙开机不启动
[hadoop@hadoop1 ~]$ sudo chkconfig iptables off
9、运行Hadoop
[hadoop@hadoop1 ~]$ hdfs namenode -format
省略...
Re-format filesystem in Storage Directory /tmp/hadoop-hadoop/dfs/name ? (Y or N) y
14/03/22 17:43:37 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
14/03/22 17:43:37 INFO namenode.FSImage: Saving image file /tmp/hadoop
hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
14/03/22 17:43:37 INFO namenode.FSImage: Image file /tmp/hadoop
hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
14/03/22 17:43:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/03/22 17:43:37 INFO util.ExitUtil: Exiting with status 0
14/03/22 17:43:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1.centos/192.168.1.101
************************************************************/
[hadoop@hadoop1 ~]$ cd /usr/local/hadoop/sbin
[hadoop@hadoop1 sbin]$ start-dfs.sh
Starting namenodes on [hadoop1]
hadoop1: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-hadoop1.centos.out
hadoop5: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop5.centos.out
hadoop4: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop4.centos.out
hadoop3: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop3.centos.out
Starting secondary namenodes [hadoop2]
hadoop2: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenodehadoop2.centos.out
[hadoop@hadoop1 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-hadoop1.centos.out
hadoop4: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop4.centos.out
hadoop5: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop5.centos.out
hadoop3: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.centos.out
[hadoop@hadoop1 sbin]$ jps
5247 Jps
4495 NameNode
4993 ResourceManager
[hadoop@hadoop1 sbin]$ ssh hadoop2
Last login: Sun Mar 23 00:38:40 2014 from hadoop1.centos
[hadoop@hadoop2 ~]$ jps
3505 SecondaryNameNode
3565 Jps
[hadoop@hadoop1 sbin]$ ssh hadoop3
Last login: Sun Mar 23 02:00:17 2014 from hadoop1.centos
[hadoop@hadoop3 ~]$ jps
3526 NodeManager
3649 Jps
3426 DataNode
10、测试
http://hadoop1:50070 -->namenodeUI
http://hadoop2:50090 -->secondaryUI
http://hadoop1:8088 -->resourcemanagerUI
一般在部署中出现问题,如一些节点不能启动,一般都是文件权限的问题,这些错误在日志文件里面描述的很清楚。
这里只是最基本配置,复杂的配置和需求建议大家参考hadoop的官方文档,里面写的很详细!我是Hadoop初学者,有错误恳请指正!