1.集群搭建预处理(主机配置)
准备至少2台机器
ip hostname
172.19.0.1 hserver1
172.18.2.32 hserver2
192.43.2.31 hserver3
a). 你可以改变你准备机器的hostname,这个步骤不是必须的,这样做只要是便于识别机器
hostnamectl set-hostname hserver1(172.19.0.1)
hostnamectl set-hostname hserver2(172.18.2.32)
hostnamectl set-hostname hserver3(192.43.2.31)
b).在你准备的机器每一台机器做如下配置
[root@hserver1 ~]# vim /etc/hosts
添加如下参数:
172.19.0.1 hserver1
172.18.2.32 hserver2
192.43.2.31 hserver3
2.创建相应的用户
例如:hadoop(每台机器都需要分别执行)
[root@hserver1 ~]# useradd -m hadoop // 创建hadoop用户
[root@hserver1 ~]# echo 123456 | passwd --stdin hadoop // 为hadoop用户设置密码为123456
3.免密设置
每台机器都需要分别执行
例如:第一台机器hserver1
[root@hserver1 ~]# su hadoop // 切换至hadoop用户
[hadoop@hserver1 ~]# ssh-keygen -t rsa // 生成密钥
[hadoop@hserver1 ~]# ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@hserver1 // 将hadoop密钥分别加到对应机器上,包括自己
[hadoop@hserver1 ~]# ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@hserver2
[hadoop@hserver1 ~]# ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@hhserver3
验证:ssh hadoop@hserver2 // 不用输入密码
其他机器依次上诉操作
4.将相应的hadoop包和java包移至hadoop用户目录
我的hadoop包为:hadoop-2.9.2
我的java包为:jdk1.8.0_171
加权限,因为很可能hadoop包与java包权限并不是hadoop用户
[root@hserver1 ~]# chown -R hadoop:hadoop /home/hadoop/hadoop-2.9.2
[root@hserver1 ~]# chown -R hadoop:hadoop /home/hadoop/jdk1.8.0_171
5.创建hadoop的data目录(后面的操作均是在hadoop用户下操作,不是root下)
mkdir -p /home/hadoop/hadoop
mkdir -p /home/hadoop/hadoop/tmp
mkdir -p /home/hadoop/hadoop/var
mkdir -p /home/hadoop/hadoop/dfs
mkdir -p /home/hadoop/hadoop/dfs/name
mkdir -p /home/hadoop/hadoop/dfs/data
6.配置hadoop配置文件
a).core-site.xml
[hadoop@hserver1 hadoop-2.9.2]# vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hserver1:8888</value>
</property>
</configuration>
b).hdfs-site.xml
[hadoop@hserver1 hadoop-2.9.2]# vim etc/hadoop/hdfs-site.xml
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its bls.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>need not permissions</description>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hserver1:8118</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hserver3:8119</value>
</property>
</configuration>
c)mapred-site.xml
[hadoop@hserver1 hadoop-2.9.2]# vim etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hserver1:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
d).slaves
[hadoop@hserver1 hadoop-2.9.2]# vim etc/hadoop/slaves
hserver1
hserver2
hserver3
e).yarn-site.xml
[hadoop@hserver1 hadoop-2.9.2]# vim etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hserver1</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
<discription>The default:8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
7.将hadoop和java加入环境变量
[hadoop@hserver1 hadoop-2.9.2]# vim ~/.bashrc
export JDK_ROOT=/home/hadoop/jdk1.8.0_171
export J2SDKDIR=${JDK_ROOT}
export J2REDIR=${JDK_ROOT}/jre
export JAVA_HOME=${JDK_ROOT}
export DERBY_HOME=${JDK_ROOT}/db
export HADOOP_HOME=/home/hadoop/hadoop-2.9.2
export PATH=${JDK_ROOT}/bin:${JDK_ROOT}/jre/bin:${JDK_ROOT}/db/bin:$PATH
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#export MANPATH=${JDK_ROOT}/opt/sun-java8/man:$MANPATH
PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
为了保证环境变量可用,需要source ~/.bashrc (确保是hadoop用户)
8.hadoop的namenode格式化
[hadoop@hserver1 hadoop-2.9.2]# hadoop namenode -format
....
19/01/18 10:46:35 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2072119921-10.58.107.38-1547779595732
19/01/18 10:46:35 INFO common.Storage: Storage directory /home/work/hadoop/dfs/name has been successfully formatted.
19/01/18 10:46:35 INFO namenode.FSImageFormatProtobuf: Saving image file /home/work/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
19/01/18 10:46:35 INFO namenode.FSImageFormatProtobuf: Image file /home/work/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
19/01/18 10:46:35 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/01/18 10:46:35 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cq02-bda-advsvr06-02.cq02.baidu.com/10.58.107.38
************************************************************/
....
9.启动hadoop集群
[hadoop@hserver1 hadoop-2.9.2]$ start-all.sh
10.终端验证
[hadoop@hserver1 ~]# jps
120432 NodeManager
119714 DataNode
122194 Jps
119593 NameNode
120299 ResourceManager
验证时每台机器对应验证,每台机器对应机器进行对照,例如NameNode仅仅在hserver1才会有,而DataNode 我这里每台机器都有,因为slaves我全部都加了DataNode
11.前端验证