一、Hadoop2.x安装配置表
NN | DN | ZK | ZKFC | JN | RM | DM | |
node1 | √ | √ | √ | √ | |||
node2 | √ | √ | √ | √ | √ | √ | |
node3 | √ | √ | √ | √ | |||
node4 | √ | √ | √ |
二、安装准备
- 下载hadoop-2.5.1-x64.tar.gz安装包 下载地址:http://download.csdn.net/detail/colacat911/7924541
- 下载zookeeper-3.4.6.tar.gz安装包 下载地址:http://apache.fayea.com/zookeeper/zookeeper-3.4.6/
- 上传到node1服务器 ~/hadoop-2.5.1-x64.tar.gz、~/zookeeper-3.4.6.tar.gz
- 准备四台centos7服务器,主机名分别为node1, node2, node3, node4
- 设置node1,node2,node3,node4之间ssh免密码登陆
教程:http://blog.csdn.net/alex_bean/article/details/51462090
三、安装hadoop-2.5.1
- tar zxvf hadoop-2.5.1-x64.tar.gz
- 创建软链接 ln -sf /root/hadoop-2.5.1 /home/hadoop-2.5
四、安装和配置zookeeper-3.4.6
- tar zxvf zookeeper-3.4.6.tar.gz
- 创建软链接 ln -sf /root/zookeeper-3.4.6 /home/zk
- 配置zookeeper
①cd /home/zk/conf/ -> cp zoo_sample.cfg zoo.cfg -> vim zoo.cfg -> 修改配置如下:
②cd /opt -> mkdir zookeeper -> cd /opt/zookeeper -> vim myid -> 输入1,保存退出# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/opt/zookeeper # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 #设置zookeeper集群 server.1=node1:2888:3888 server.2=node2:2888:3888 server.3=node3:2888:3888
③node1的zookeeper配置完毕后,拷贝/opt/zookeeper到node2,node3
进入node2和node3服务器 -> vim /opt/zookeeper/myid -> 分别改为2和3,保存退出scp -r /opt/zookeeper/ root@node2:/opt/ scp -r /opt/zookeeper/ root@node3:/opt/
④按照node1步骤,给node2和node3安装zookeeper
⑤配置/home/zk/bin到环境变量,并拷贝给node2,node3
⑥启动zookeeper,执行命令:zkServer.sh startvim /etc/profile export PATH=$PATH:/home/zk/bin source /etc/profile scp /etc/profile root@node2:/etc/ scp /etc/profile root@node3:/etc/ node1,node2, node3 分别执行source /etc/profile
五、修改hadoop配置文件 位置:cd /home/hadoop-2.5/etc/hadoop
- vim hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
- vim hdfs-site.xml
<configuration> <property> <name>dfs.nameservices</name> <value>alexz</value> </property> <property> <name>dfs.ha.namenodes.alexz</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.alexz.nn1</name> <value>node1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.alexz.nn2</name> <value>node2:8020</value> </property> <property> <name>dfs.namenode.http-address.alexz.nn1</name> <value>node1:50070</value> </property> <property> <name>dfs.namenode.http-address.alexz.nn2</name> <value>node2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node2:8485;node3:8485;node4:8485/alexz</value> </property> <property> <name>dfs.client.failover.proxy.provider.alexz</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/jn/data</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration>
- vim core-site.xml
<configuration> <!--namenode入口--> <property> <name>fs.defaultFS</name> <value>hdfs://alexz</value> </property> <!--设置工作目录--> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.5</value> </property> <!--配置zookeeper集群--> <property> <name>ha.zookeeper.quorum</name> <value>node1:2181,node2:2181,node3:2181</value> </property> </configuration>
- vim slaves 配置DN
node2 node3 node4
node2,node3,node4安装hadoop2.x
安装、创建软链接、拷贝etc/hadoop配置文件
将Node1,node2,node3,node4防火墙关闭
systemctl stop iptables.service
六、启动
- 启动JournalNode(node2,node3,node4),cd /home/hadoop-2.5/sbin/ -> ./hadoop-daemon.sh start journalnode
- NameNode格式化(node1),cd /home/hadoop-2.5/bin/ -> ./hdfs namenode -format
启动namenode(node1),cd /home/hadoop-2.5/sbin/ => ./hadoop-daemon.sh start namenode 如下图:
- 拷贝NameNode元数据 到node2,切换到node2服务器,cd /home/hadoop-2.5/bin/ -> ./hdfs namenode -bootstrapStandby
- 切换到node1,格式化ZKFC,cd /home/hadoop-2.5/bin/ -> ./hdfs zkfc -formatZK
- cd /home/hadoop-2.5/sbin/ -> ./stop-dfs.sh -> 全部启动 ./start-dfs.sh 结果如下图:
到此为止,hadoop全部启动成功,访问浏览器结果如下图:
node1:50070
node2:50070
七、测试
- hdfs 创建目录 cd /home/hadoop-2.5/bin -> ./hdfs dfs -mkdir -p /usr/file
- hdfs 上传测试文件 cd /home/hadoop-2.5/bin -> ./hdfs dfs -put /root/test.txt /usr/file
上传成功如图:
八、配置MapReduce
- 配置 mapred-site.xml
cd /home/hadoop-2.5/etc/hadoop/ cp mapred-site.xml.template mapred-site.xml vim mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
配置 yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>node1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
拷贝到node2,node3,node4
cd /home/hadoop-2.5/etc/hadoop scp ./* root@node2:/home/hadoop-2.5/etc/hadoop/ scp ./* root@node3:/home/hadoop-2.5/etc/hadoop/ scp ./* root@node4:/home/hadoop-2.5/etc/hadoop/
启动MapReduce cd /home/hadoop-2.5/sbin/ -> ./start-yarn.sh
总结:
- 全部启动
查看node1,node2,node3,node4系统时间:
执行命令date,如果时间不同,全局设置为相同的时间:date -s '2016-05-28 17:27:00'
启动hadoop,虽然官方提醒:(This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh),但是start-all.sh仍然可以使用。
1、node1~node3启动ZK : zkServer.sh start
2、node1启动DFS、YARN : sbin/start-all.sh
或者官方推荐启动方式,如下: - 1、启动ZK : zkServer.sh start
2、启动DFS : sbin/start-dfs.sh
3、启动YARN : sbin/start-yarn.sh
- 关闭hadoop
1、node1:关闭DFS、YARN : sbin/stop-all.sh
2、node1~node3关闭ZK :zkServer.sh stop
或者官方推荐关闭方式,如下:
1、关闭YRAN : sbin/stop-yarn.sh
2、关闭DFS : sbin/stop-dfs.sh
- 3、关闭ZK : zkServer.sh stop