集群规划
HDFS需要部署NameNode
和DataNode
YARN需要部署ResourceManager
和NodeManager
假设现在有三台服务器,规划方式如下:
hadoop000 192.168.199.234
NameNode
DataNode
ResourceManager
NodeManager
hadoop001 192.168.199.235
NameNode
DataNode
hadoop002 192.168.199.236
NameNode
DataNode
配置
- 修改hostname(每台都要):
vim /etc/hostname
,将hostname分别改为hadoop000、hadoop001和hadoop002 - 配置hostname和ip的映射关系(每台都要):
vim /etc/hosts
e.g:假设当前机器是hadoop000,则配置如下:192.168.199.234 hadoop000 192.168.199.235 hadoop001 192.168.199.236 hadoop002 192.168.199.234 localhost
前置安装
-
ssh免密登录 (每台都要):
ssh-keygen -t rsa
-
机器间免密登录(只要hadoop000):
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop000 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002
第一次执行需要密码,后续就不用输密码了
-
JDK安装(每台都要)
1)先在hadoop000机器上部署了jdk
2)将jdk bin配置到系统环境变量
3)将jdk拷贝到其他节点上去(从hadoop000机器出发)
scp -r jdk1.8.0_91 hadoop@hadoop001:~/app/
scp -r jdk1.8.0_91 hadoop@hadoop002:~/app/
scp ~/.bash_profile hadoop@hadoop001:~/
scp ~/.bash_profile hadoop@hadoop002:~/
- Hadoop部署(每台都要)
-
hadoop-env.sh
JAVA_HOME=xxx
-
core-site.xml
<property> <name>fs.default.name</name> <value>hdfs://hadoop000:8020</value> </property>
-
hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/app/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/app/tmp/dfs/data</value> </property>
-
yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop000</value> </property>
-
mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
-
slaves
hadoop000 hadoop001 hadoop002
-
复制hadoop到其他机器
scp -r hadoop-2.6.0-cdh5.15.1 hadoop@hadoop001:~/app/ scp -r hadoop-2.6.0-cdh5.15.1 hadoop@hadoop002:~/app/
-
NN格式化
hadoop namenode -format
-
启动HDFS
-
启动YARN