环境准备:
3台安装了centOS 7的虚拟机
jdk8安装包
hadoop 3.x安装包,本文以3.1.1为例:
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
1.ip静态化、hostname以及hosts修改
a.将3台服务器ip地址静态化
vi /etc/sysconfig/network-scripts/ifcfg-eno16777728
TYPE=Ethernet
#将ip获取设为静态路由
BOOTPROTO=static
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=eno16777728
UUID=8bdd9f1d-d212-4adf-83e1-f9eb12e72b2a
DEVICE=eno16777728
#启动网卡开关
ONBOOT=yes
#ip
IPADDR=192.168.100.21
#子网掩码
NETMASK=255.255.255.0
#网关
GATEWAY=192.168.100.1
将上述注释部分修改。3台机器ip尾号分别为21、22、23.
b.永久修改机器的hostname
将21设为master;22、23分别设为slave1和slave2
vi /etc/sysconfig/network
添加如下两行
#使用网络
NETWORKING=yes
#设置主机名
HOSTNAME=master
c.修改hosts文件
vi /etc/hosts
在最下方加上三台机器的ip
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#三台机器host
192.168.100.21 master
192.168.100.22 slave1
192.168.100.23 slave2
同时处理好22、23机器,hostname分别为slave1和slave2。
2.关闭SELinux以及永久关闭防火墙
a.查看SELinux状态&手动关闭
sestatus
若状态为enable需要手动关闭
SELinux status: enabled#需关闭
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
手动关闭
vi /etc/sysconfig/selinux
将SELINUX设为disabled
b.永久关闭防火墙
不关闭防火墙后期使用过程中可能会出现很多意向不到的问题,本来坑就很多了,就不再尝试了,关就行了。
#centos 7专用
systemctl disable firewalld.service
3.同步服务器时间
a.修正时区
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
按y覆盖
b.安装ntp同步工具
yum -y install ntp
c.同步服务器时间
ntpdate ntp1.aliyun.com
d.同步硬件时间
hwclock --systohc
重启验证修改是否生效。
4.新建用户组、用户、设置用户密码
a.新建hadoop用户组
groupadd hadoop
b.新建hadoop操作用户并加入用户组
useradd -g hadoop hadoop
c.设置用户密码
passwd hadoop
su hadoop 测试。
5.目录规划
切换hadoop用户,进入/home/hadoop目录
a.创建应用安装目录
mkdir app
b.创建数据临时目录(留着后续会用)
mkdir /data/tmp
6.服务器间无秘跳转
保持hadoop用户登录
查看当前目录pwd,为/home/hadoop
a.创建.ssh目录
mkdir .ssh
b.生成ssh秘钥
ssh-keygen -t rsa
三次回车生成。
进入.ssh目录,将秘钥输出到authorized_keys文件存储
cat id_rsa.pub >> authorized_keys
c.检查秘钥内容
cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDTS2/7b0HpfCQe1rDD77hILNBPzgcdSGZFkGFPh5P+YciV6FYugEg272TUqlyUoOsOaQykJdiAtAW0MvH2iZqB1o8H+cyHt1+kFE87N9lOdT7V2dUkOs78EGz2XWm6QHO81MhbaFA8VOCry7ezXi4Gra2B5XEvTMn21zlIeFoF80RZtwjO7aMUk3s+l0Ji2qQYYMZ12Bl7E5jCrwrbwWpU+l9tdxVOByv0o38JR1XZoh7RRW1LwAZ2XXHzxhJTIbagckT1aqpxqF0BDNu5MFpPD6hc+8OVOIA0MjO4P/UDyGU0wjxEdAHnCzGOhb9yogQtQIwNdRqxRCILPplHYO19 hadoop@master
回到/home/hadoop目录下
d.设置权限
chmod 700 .ssh
chmod 600 .ssh/*
e.安装openssh-clients
yum -y install openssh-clients
将slave1和slave2按如上步骤操作
f.将slave1和slave2的跳转秘钥给到master
salve1:
cat ~/.ssh/id_rsa.pub | ssh hadoop@master 'cat >> ~/.ssh/authorized_keys'
salve2:
cat ~/.ssh/id_rsa.pub | ssh hadoop@master 'cat >> ~/.ssh/authorized_keys'
g.将master的最终秘钥文件copy到slave1和slave2
scp -r authorized_keys hadoop@slave1:~/.ssh/
scp -r authorized_keys hadoop@slave2:~/.ssh/
互相跳转测试:
ssh master
ssh slave1
ssh slave2
7.安装JDK并配置环境变量
不赘述
8.安装Hadoop
a.下载hadoop安装包
进入/home/hadoop/app目录
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
b.解压安装包
tar -zxvf hadoop-3.1.1.tar.gz
c.修改hadoop环境变量
HADOOP_HOME=/home/hadoop/app/hadoop-3.1.1
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
HADOOP_MAPRED_HOME=$HADOOP_HOME
export JAVA_HOME HADOOP_HOME CLASSPATH PATH
刷新环境变量
d.查看hadoop版本
hadoop version
9.配置修改
进入/home/hadoop/app/hadoop-3.1.1/etc/hadoop目录
a.修改hadoop的jdk运行环境
vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_144
b.修改全局配置
vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
<description>The name of the default file system, using 9000 port.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
c.修改局部配置
vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.rpc-address</name>
<value>master:9000</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Set to 1 for pseudo-distributed mode,Set to 2 for distributed mode,Set to 3 for distributed mode.</description>
</property>
</configuration>
vi mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
</configuration>
vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>resourcemanager</description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandle</value>
<description>NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序
</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>由于我的测试机内存少,所以就关闭虚拟内存检测s</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
<description>The hostname of the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
<description>The hostname of the RM.</description>
</property>
</configuration>
#3.x版本更新为workers,2.x及以前为slaves
vi wokers
#datanodes
slave1
slave2
d.hadoop安装环境同步至slave1和slave2
切换至/home/hadoop/app目录
scp -r ./hadoop-3.1.1 slave1:/home/hadoop/app/
scp -r ./hadoop-3.1.1 slave2:/home/hadoop/app/
配置好slave1和slave2的环境变量
10.启动验证
a.对namenode进行格式化
进入/home/hadoop/app/hadoop-2.6.0/bin目录
./hadoop namenode -format
b.在namenode上启动集群
进入/home/hadoop/app/hadoop-3.1.1/sbin目录
./start-all.sh
c.验证hadoop环境
http://192.168.100.21:8088/cluster
d.验证hdfs环境
http://192.168.100.21:50070/dfshealth.html
至此,hadoop环境搭建完成。
参考文档url: