1.安装包
2.环境:
1) 4台linux服务器(本人用的vm虚拟机,直接用克隆方式拷贝了4台centos6.4)
为方便使用,给每台机器取个别名:node1,node2,node3,node4。在/etc/hosts下配置以下信息:
192.168.121.11 node1
192.168.121.12 node2
192.168.121.13 node3
192.168.121.14 node4
2) 在每台机器上安装ssh,jdk等初始环境,将
hadoop-2.5.2.tar.gz
解压到node1,2,3,4中的相同目录下;将zookeeper-3.4.6.tar.gz 解压到node1,2,3中
3) 每台机器运行的服务如下(表格中有1,表示在在台节点运行该服务):
别名
| nn | dn | zk |
zkfc
|
jn
|
rm
| dm |
node1
|
1
|
|
1
|
1
|
|
1
|
|
node2
|
1
|
1
|
1
|
1
|
1
|
|
1
|
node3
|
|
1
|
1
|
|
1
|
|
1
|
node4
|
|
1
|
|
|
1
|
|
1
|
3. 配置 ssh无密码登录
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa (在本地生产秘钥,公钥)
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys (将本地生成的公钥加入得到认证文件中)
scp ~/.ssh/id_dsa.pub root@node2:~/ (将自己的公钥拷贝到远程主机node2上)
以下步骤重复在node2,3,4中执行
ssh node2 按提示输入密码 (登录到远程主机node2)
cat ~/id_dsa.pub >> ~/.ssh/authorized_keys (将刚才拷贝到node2上的公钥id_dsa.pub写入到认证文件中,这样就可以免密登录了)service sshd restart 重启ssh服务
4. 在 hadoop目录/etc/hadoop中配置 hadoop 配置文件(同步到2,3,4)1) hadoop-env.sh 配置JAVAHOME(node1,2,3,4)
2)配置core-site.xml
(node1,2,3,4)
<configuration>
<!--集群入口-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!-- 配置zookeeper部署的机器 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop_data/temp_dir</value>
</property>
</configuration>
3)配置hdfs-site.xml:
(node1,2,3,4)
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node2:50070</value>
</property>
<!-- 保存fsimage,edit 文件 的共享目录-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node2:8485;node3:8485;node4:8485/mycluster</value>
</property>
<!-- zookeeper 故障切换代理 -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 指定ssh 远程登录证书的配置位置 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
<!-- 配置jn的文件保存目录-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/root/hadoop_data/jndata</value>
</property>
<!--namenode故障自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
4)配置slaves(node1,2,3,4)
node2
node3
node4
5)配置zookeeper-3.4.6/conf/zoo.cfg(在node1,2,3):
#未黏贴进来的不用改
dataDir=/root/hadoop_data/zoo_data
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888
6)在node1,2,3中添加一个文件/root/hadoop_data/zoo_data/myid,内容分别是1, 2 , 3
5.启动zookeeper:(在node1,2,3上执行)
zookeeper-3.4.6/bin/zkServer.sh start
6.
启动jn(node2,3,4中执行) :
hadoop-2.5.2/sbin/hadoop-daemon.sh start journalnode7. 在node1中格式化namenode:
hadoop-2.5.2/
bin/hdfs namenode -format(注意如果没有连接jn成功可能是因为防火墙没关)
8.在node1中启动namenode:
hadoop-2.5.2/sbin/hadoop-daemon.sh start namenode9.在node2中执行namenode初始化文件的拷贝
hadoop-2.5.2/bin/hdfs namenode -bootstrapStandby
10.关闭所有的dfs进程(在node1中执行):
hadoop-2.5.2/sbin/stop-dfs.sh
11.格式化zkfc:
hadoop-2.5.2/bin/hdfs zkfc -formatZK
12.启动hdfs:
12.启动hdfs:
hadoop-2.5.2/sbin/start-dfs.sh
访问namenode测试:http://node1:50070
13.配置mapreduc:
1)mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2)配置yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
14.启动mapreduce:
hadoop-2.5.2/sbin/start-yarn.sh
15.测试:
访问http://node1:8088/