虚拟机准备3台,分别是hadoop01 hadoop02 hadoop03,所装系统为centos7
1.修改主机名
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop01
NETWORKING_IPV6=no
PEERNTP=no
vim /etc/hostname
hadoop01
2.主机名映射
vim /etc/hosts(3台机子都改)
192.168.133.xxx hadoop01
192.168.133.xxx hadoop02
192.168.133.xxx hadoop03
修改C:\Windows\System32\drivers\etc\hosts 文件(便于后期本机用主机名访问集群服务)
192.168.133.xxx hadoop01
192.168.133.xxx hadoop02
192.168.133.xxx hadoop03
3.设置静态ip(3台机子都改)
vim /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="bfaae4ba-2275-4a2c-85db-c94585096a42"
DEVICE="ens33"
ONBOOT="yes"
IPADDR="192.168.133.xxx"
NETMASK="255.255.255.0"
GATEWAY="192.168.133.x"
DNS1="192.168.133.x"
service network restart
4.关闭防火墙(设置开机禁用)
查看状态:systemctl status firewalld
开启: systemctl start firewalld.service
重启:systemctl restart firewalld.service
关闭:systemctl stop firewalld.service
开机禁用:systemctl disable firewalld.service
5.关闭selinux
vim /etc/sysconfig/selinux
修改内容SELINUX=disabled
6.ssh无密码访问
ssh-keygen -t rsa(主节点上输入回车到结束)
ssh-copy-id hadoop01(根据提示输入密码)
ssh-copy-id hadoop02(根据提示输入密码)
ssh-copy-id hadoop03(根据提示输入密码)
7.Linux系统最大打开文件数量设置
查看命令
ulimit -a ## 查看所有
ulimit -n ##查看同时打开的文件数量
ulimit -u ##查看同时的进程数量
修改命令
vim /etc/security/limits.conf(添加下面的内容)
* soft nofile 32768
* hard nofile 1048576
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited
vim /etc/security/limits.d/90-nproc.conf(添加下面的内容)
* soft nproc 65536
8.时钟同步
选择一台机器作为时间服务器: hadoop01
hadoop01进行操作:
修改ntpd服务的配置参数:
vim /etc/ntp.conf (添加下面的内容)
server 127.127.0.1
fudge 127.127.0.1 stratum 8
启动ntpd服务:
service ntpd restart
systemctl enable ntpd.service ## 开机启动服务
创建同步脚本:
vim /opt/date_sync.sh
service ntpd stop
/usr/sbin/ntpdate -u hadoop01
service ntpd start
修改权限:
chmod u+x /opt/date_sync.sh
运行shell脚本:
cd /opt
./date_sync.sh
同步到其他机器:
scp date_sync.sh hadoop02:/opt
scp date_sync.sh hadoop03:/opt
启动定时任务(所有机器)
crontab -e
0-59/5 * * * * /opt/date_sync.sh
9.重启机器
10.集群搭建准备工作(所有机器)
在 /opt 下创建两个目录 softwares 和module
softwares 中放所有的包
module中放解压后的文件
11.安装jdk(所有机器)
需要卸载系统中已有的jdk,然后重新安装对应版本的jdk
查看已有的jdk
rpm -qa | grep java ## 查看到包含java的服务
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
tzdata-java-2013g-1.el6.noarch
卸载jdk
rpm -e --nodeps python-javapackages-3.4.1-11.el7.noarch java-1.8.0-openjdk-1.8.0.161-2.b14.el7.x86_64
javassist-3.16.1-10.el7.noarch javamail-1.4.6-8.el7.noarch java-1.8.0-openjdk-headless-1.8.0.161-2.b14.el7.x86_64
tzdata-java-2018c-1.el7.noarch javapackages-tools-3.4.1-11.el7.noarch
安装jdk
cd /opt/softwares/
rpm -ivh jdk-8u11-linux-x64.rpm
配置JAVA_HOME环境变量
vim /etc/profile
export PATH=$PATH:/usr/java/jdk1.8.0_11/bin(不配置jsp命令不可用)
source /etc/profile
12.ZooKeeper-3.4.10集群安装
选择MySQL作为数据存在的容器,默认使用postgresql
解压:
tar -zxvf /opt/softwares/zookeeper-3.4.10.tar.gz -C /opt/modules/
cd /opt/modules/zookeeper-3.4.10/conf
cp zoo_sample.cfg zoo.cfg
修改zoo.cfg
vim zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/modules/zookeeper-3.4.10/data
# the port at which the clients will connect
clientPort=2181
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
设置myid
在/opt/module/zookeeper-3.4.10/data 目录下创建一个 myid 的文件
touch myid
添加 myid 文件,注意一定要在 linux 里面创建,在 notepad++里面很可能乱码
在文件中添加与 server 对应的编号:如 1
各节点分发:将配置好的文件拷贝到其他机器
scp -r /opt/modules/zookeeper-3.4.10/ root@hadoop02:/opt/modules/
scp -r /opt/modules/zookeeper-3.4.10/ root@hadoop03:/opt/modules/
并分别修改 myid 文件中内容为 2,3
配置环境变量
export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin
修改日志输出路径为指定目录:
修改zkEnv.sh中的
if [ "x${ZOO_LOG_DIR}" = "x" ]
then
ZOO_LOG_DIR="/opt/modules/zookeeper-3.4.10/log"
fi
if [ "x${ZOO_LOG4J_PROP}" = "x" ]
then
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
fi
修改log4j.properties中的
zookeeper.root.logger=INFO,ROLLINGFILE
常用操作命令
(1)启动 zookeeper
zkServer.sh start
(2)查看状态
zkServer.sh status
(3)停止zookeeper
zkServer.sh stop
启动若报错:java.net.NoRouteToHostException: No route to host
一般是防火墙没关闭
13.搭建hadoop集群的HA
解压:
tar -zxvf /opt/softwares/hadoop-2.7.7.tar.gz -C /opt/modules/
修改hadoo-env.sh
export JAVA_HOME=/root/training/jdk1.8.0_144
修改core-site.xml
<configuration&