Hadoop集群部署
1.Hadoop集群规划
节点配置ip映射:
node1
, 修改/etc/hosts
:
10.0.194.30 node1
10.0.195.109 node2
10.0.194.59 node3
10.0.194.30 localhost
node2
, 修改/etc/hosts
:
10.0.194.30 node1
10.0.195.109 node2
10.0.194.59 node3
10.0.195.109 localhost
node3
, 修改/etc/hosts
:
10.0.194.30 node1
10.0.195.109 node2
10.0.194.59 node3
10.0.194.59 localhost
Hadoop-cdh, hive 软件压缩包和jdk 压缩包网盘地址:
链接:https://pan.baidu.com/s/1zm6ur2-aq4hSNVwuDqsXbA
提取码:1234
2.前置安装
配置主节点与副节点间SSH免密登录
-
node1, node2, node3 分别生成ssh秘钥:
ssh-keygen -t rsa
-
因为主节点为node1, 为了保证node1可以免密登录node1, node2, node3所以将node1的公钥分别拷贝至node1, node2, node3(在node1上执行):
ssh-copy-id -i ~/.ssh/id_rsa.pub node1 ssh-copy-id -i ~/.ssh/id_rsa.pub node2 ssh-copy-id -i ~/.ssh/id_rsa.pub node3
注意:如果前面没有配置好ip映射,这里是无法识别node主机名的,第一次复制过去的时候需要输入密码,后续登录可以免密
3.JDK安装
-
node1安装JDK1.8:
-
解压缩jdk1.8.tar.gz至指定目录(我安装至/usr/local下)
-
配置JDK环境变量,ubuntu 修改
~/.bashrc
:export JAVA_HOME=/usr/local/jdk1.8.0_291 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
source ~/.bashrc
sudo update-alternatives --install /usr/bin/java java /usr/local/jdk1.8.0_291/bin/java 3000 sudo update-alternatives --install /usr/bin/javac javac /usr/local/jdk1.8.0_291/bin/javac 3000 sudo update-alternatives --install /usr/bin/jar jar /usr/local/jdk1.8.0_291/bin/jar 3000 sudo update-alternatives --install /usr/bin/javah javah /usr/local/jdk1.8.0_291/bin/javah 3000 sudo update-alternatives --install /usr/bin/javap javap /usr/local/jdk1.8.0_291/bin/javap 3000 sudo update-alternatives --install /usr/bin/jshell jshell /usr/local/jdk1.8.0_291/bin/jshell 3000 sudo update-alternatives --install /usr/bin/jconsole jconsole /usr/local/jdk1.8.0_291/bin/jconsole 3000
-
-
将node1解压后的目录scp至node2和node3, 并配置node2和node3的环境变量:
-
在node1上执行
scp -r jdk1.8.0_291 root@node2:/usr/local/
,scp -r jdk1.8.0_291 root@node3:/usr/local/
-
配置node2 JDK环境变量,修改
~/.bashrc
:export JAVA_HOME=/usr/local/jdk1.8.0_291 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
source ~/.bashrc
sudo update-alternatives --install /usr/bin/java java /usr/local/jdk1.8.0_291/bin/java 3000 sudo update-alternatives --install /usr/bin/javac javac /usr/local/jdk1.8.0_291/bin/javac 3000 sudo update-alternatives --install /usr/bin/jar jar /usr/local/jdk1.8.0_291/bin/jar 3000 sudo update-alternatives --install /usr/bin/javah javah /usr/local/jdk1.8.0_291/bin/javah 3000 sudo update-alternatives --install /usr/bin/javap javap /usr/local/jdk1.8.0_291/bin/javap 3000 sudo update-alternatives --install /usr/bin/jshell jshell /usr/local/jdk1.8.0_291/bin/jshell 3000 sudo update-alternatives --install /usr/bin/jconsole jconsole /usr/local/jdk1.8.0_291/bin/jconsole 3000
-
配置node3 JDK环境变量, 修改
~/.bashrc
:export JAVA_HOME=/usr/local/jdk1.8.0_291 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
source ~/.bashrc
sudo update-alternatives --install /usr/bin/java java /usr/local/jdk1.8.0_291/bin/java 3000 sudo update-alternatives --install /usr/bin/javac javac /usr/local/jdk1.8.0_291/bin/javac 3000 sudo update-alternatives --install /usr/bin/jar jar /usr/local/jdk1.8.0_291/bin/jar 3000 sudo update-alternatives --install /usr/bin/javah javah /usr/local/jdk1.8.0_291/bin/javah 3000 sudo update-alternatives --install /usr/bin/javap javap /usr/local/jdk1.8.0_291/bin/javap 3000 sudo update-alternatives --install /usr/bin/jshell jshell /usr/local/jdk1.8.0_291/bin/jshell 3000 sudo update-alternatives --install /usr/bin/jconsole jconsole /usr/local/jdk1.8.0_291/bin/jconsole 3000
-
4Hadoop集群部署
主节点node1配置
-
解压hadoop-cdh 压缩包至指定目录
/usr/local
下,添加环境变量:export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh5.15.1 export PATH=$HADOOP_HOME/bin:$PATH
source ~/.bashrc
-
修改hadoop 配置, 配置文件目录
/usr/local/hadoop-2.6.0-cdh5.15.1/etc/hadoop
:-
配置
hadoop-env.sh
:export JAVA_HOME=/usr/local/jdk1.8.0_291
-
配置
core-site.xml
:<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node1:8020</value> </property> </configuration>
-
配置
hdfs-site.xml
:<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/app/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/app/tmp/dfs/data</value> </property> </configuration>
-
配置
yarn-site.xml
:<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>node1</value> </property>
-
配置
mapred-site.xml
, 在配置目录下没有mapred-site.xml
, 有一个mapred-site.xml.template
,重命名为mapred-site.xml
就可以了:<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
-
配置
slaves
:node1 node2 node3
-
分发主节点hadoop到其他机器
-
拷贝hadoop解压缩后的目录至node2与node3, 相同目录下:
scp -r /usr/local/hadoop-2.6.0-cdh5.15.1 root@node2:/usr/local/ scp -r /usr/local/hadoop-2.6.0-cdh5.15.1 root@node3:/usr/local/
-
配置node2与node3 hadoop环境变量:
export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh5.15.1 export PATH=$HADOOP_HOME/bin:$PATH
source ~/.bashrc
注意:如果不是通过scp分发的模式配置hadoop,而是通过在对应节点主机上解压缩的方式配置, 配置内容和上边node1的需要一样
需要将从节点的core-site.xml修改为主节点的urI, 否则在启动DN时会报错:无法来连接服务,因为NN是在主节点启动的
同时也需要将从节点的yarn-site.xml修改为主节点node1
NameNode格式化
-
在主节点node1上执行命令
hadoop namenode -format
-
如果以前格式化过,需要先将
/home/hadoop/app/tmp/dfs/name
和/home/hadoop/app/tmp/dfs/namesecondary
以及/home/hadoop/app/tmp/dfs/data
目录下的current文件夹删除。上边这些目录是在
hdfs-site.xml
中配置的,如果重新format不删除current的话,可能会导致节点启动失败。 -
当在最后出现这一行信息表示格式化成功:
21/06/30 00:45:54 INFO common.Storage: Storage directory /home/hadoop/app/tmp/dfs/name has been successfully formatted.
启动HDFS
-
在主节点node1上执行
/usr/local/hadoop-2.6.0-cdh5.15.1/sbin/start-dfs.sh
:root@node1:/usr/local/hadoop-2.6.0-cdh5.15.1/sbin# /usr/local/hadoop-2.6.0-cdh5.15.1/sbin/start-dfs.sh 21/06/30 00:48:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [node1] node1: starting namenode, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/hadoop-root-namenode-node1.out node1: starting datanode, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/hadoop-root-datanode-node1.out node2: starting datanode, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/hadoop-root-datanode-node2.out node3: starting datanode, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/hadoop-root-datanode-node3.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/hadoop-root-secondarynamenode-node1.out 21/06/30 00:48:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-
查看node1是否启动成功:
root@node1:/usr/local/hadoop-2.6.0-cdh5.15.1/sbin# jps 1041 SecondaryNameNode 593 NameNode 794 DataNode 1391 Jps
查看node2的DN是否启动成功:
root@node2:/usr/local# jps 16258 DataNode 17897 Jps
查看node3的DN是否启动成功:
root@node3:~# jps 23409 Jps 23109 DataNode
-
查看web 界面:
启动YARN
root@node1:/usr/local/hadoop-2.6.0-cdh5.15.1/sbin# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/yarn-root-resourcemanager-node1.out
node3: starting nodemanager, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/yarn-root-nodemanager-node3.out
node2: starting nodemanager, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/yarn-root-nodemanager-node2.out
node1: starting nodemanager, logging to /usr/local/hadoop-2.6.0-cdh5.15.1/logs/yarn-root-nodemanager-node1.out
-
查看node1的RM和DM是否启动:
root@node1:/usr/local/hadoop-2.6.0-cdh5.15.1/sbin# jps 29921 NameNode 30370 SecondaryNameNode 5957 NodeManager 6571 Jps 5613 ResourceManager 30126 DataNode
查看node2的NM:
root@node2:~# jps 5283 Jps 25913 DataNode 3660 NodeManager
查看node3的NM:
root@node3:~# jps 31572 DataNode 6567 NodeManager 8382 Jp
-
查看yarn界面:
5.作业提交到Hadoop集群运行
-
到hadoop自带的demo目录
/usr/local/hadoop-2.6.0-cdh5.15.1/share/hadoop/mapreduce
下运行一个jar包:root@node2:/usr/local/hadoop-2.6.0-cdh5.15.1/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.15.1.jar pi 2 3