1. 最小化安装CentOS 7 系统
1.1 安装net-tools启用 ifconfig
yum install net-tools vim
1.2 更新系统
yum update
1.3 配置系统IP为固定IP
1.3.1 查看网卡(文件 ifcfg-enp* 为网卡文件)
ls /etc/sysconfig/network-scripts/
1.3.2 配置网卡(virtualBox 分配 host-only网卡,并使用固定IP)
vi /etc/sysconfig/network-scripts/ifcfg-enp*
# 启用host-only网卡
cd /etc/sysconfig/network-scripts/
cp ifcfg-enp0s3 ifcfg-enp0s8
修改网卡为静态IP
1. 修改BOOTPROTO为static
2. 修改NAME为enp0s8
3. 修改UUID(可以随意改动一个值,只要不和原先的一样)
4. 添加IPADDR,可以自己制定,用于主机连接虚拟机使用。
5. 添加NETMASK=255.255.255.0 (网管 也可以和网段一样 x.x.x.255)
1.3.3 重启网卡
service network restart
1.3.4 修改主机名称(可以在安装时候指定)
vim /etc/hostname
1.4 配置Host,可以使用名称直接访问
vim /etc/hosts
# 复制到其他机器
scp /etc/hosts root@192.168.56.12:/etc/hosts
增加内容
1.5 配置免密码登录,生成各种密码文件
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
# 拷贝公钥到远程服务器
cat ~/.ssh/id_rsa.pub | ssh root@192.168.56.101 "cat - >> ~/.ssh/authorized_keys"
# 如果需要互相免密码登录,则执行下面命令
scp .ssh/authorized_keys root@192.168.56.14:~/.ssh/authorized_keys
2. 安装JDK
2.1 下载JDK 下载
2.2 将下载的JDK放到 opt目录下解压
cd /opt/
tar -xzvf server-jre-8u161-linux-x64.tar.gz
# 创建快捷方式
ln -sf jdk1.8.0_161/ jdk
2.2 将JDK添加到环境变量中
vim /etc/profile
# 添加如下内容
export JAVA_HOME=/opt/jdk
export PATH=.:$PATH:$JAVA_HOME/bin
# 使修改生效
source /etc/profile
2.3 验证JDK是否安装成功
java -version
3. 安装Hadoop
3.1 下载Hadoop,下载地址
3.2 将下载的Hadoop放入/opt目录
# 1. 解压Hadoop
tar -xzvf hadoop-3.0.0.tar.gz
# 2. 创建超连接
ln -sf hadoop-3.0.0 hadoop
3.3 安装Zookeeper
3.3.1 下载Zookeeper 下载地址
3.3.2 拷贝zookeeper到需要的机器上
scp /opt/zookeeper-3.4.11.tar.gz node2:/opt/
3.3.3 解压zookeeper
tar -xzvf zookeeper-3.4.11.tar.gz
3.3.4 创建连接文件
ln -sf zookeeper-3.4.11 zookeeper
3.3.5 配置环境变量
vim /etc/profilve
# 添加如下内容
export ZOOKEEPER_HOME = /opt/zookeeper
export PATH = $PATH:$ZOOKEEPER_HOME/bin
3.3.6 配置zookeeper集群,修改配置文件
cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
# 5.1 在zoo.cfg 文件末尾追加(zoo1 为 服务器名称)
# 具体配置见:http://zookeeper.apache.org/doc/r3.4.11/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
tickTime=2000
dataDir=/opt/data/zookeeper # 数据存放路径
clientPort=2181
initLimit=5
syncLimit=2
server.1=node2:2888:3888
server.2=node3:2888:3888
server.3=node4:2888:3888
3.3.7 将配置文件复制到其他节点
scp /opt/zookeeper/conf/zoo.cfg node2:/opt/zookeeper/conf/
3.3.8 创建节点ID,在配置的 dataDir 路径中添加myid文件
echo "1" > myid
3.3.9 启动 zookeeper(已经添加到环境变量了)
zkServer.sh start
3.3.10 检验是否启动成功
jps
如果看到 如下图进程,表示启动成功
3.3.11 (可选) zookeeper Centos7 配置开机自启动
- 在/etc/systemd/system/文件夹下创建一个启动脚本zookeeper.service
内容如下:
[Unit]
Description=zookeeper
After=syslog.target network.target
[Service]
Type=forking
# 指定zookeeper 日志文件路径,也可以在zkServer.sh 中定义
Environment=ZOO_LOG_DIR=/opt/data/zookeeper/logs
# 指定JDK路径,也可以在zkServer.sh 中定义
Environment=JAVA_HOME=/opt/jdk
ExecStart=/opt/zookeeper/bin/zkServer.sh start
ExecStop=/opt/zookeeper/bin/zkServer.sh stop
Restart=always
User=root
Group=root
[Install]
WantedBy=multi-user.target
- 重新加载服务
systemctl daemon-reload
- 启动zookeeper
systemctl start zookeeper
- 开机自启动
systemctl enable zookeeper
- 查看zookeeper状态
systemctl status zookeeper
问题:
nohup: 无法运行命令”java”: 没有那个文件或目录 \
nohup: failed to run command `java’: No such file or directory
解决方法: \
主要是找不到Java造成的,配置下环境变量即可,可以在zkServer.sh 中添加如下:
JAVA_HOME=/opt/jdk
或者在zookeeper.service中指定:
Environment=JAVA_HOME=/opt/jdk
3.4 修改Hadoop配置(完全分布式)
参考文档:
1. (Hadoop HDFS分布式配置)http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
2. (Hadoop Yarn 分布式配置) http://hadoop.apache.org/docs/r3.0.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
3.4.1 配置Hadoop 环境变量
# 添加hadoop环境变量
export HADOOP_HOME = /opt/hadoop
export PATH = $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
# 启用环境变量
source /etc/profile
3.4.2 HADOOP 节点分布如下:
节点 | NN | DN | ZK | ZKFC | JN | RM | NM |
---|---|---|---|---|---|---|---|
Node1 | 1 | 1 | 1 | ||||
Node2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Node3 | 1 | 1 | 1 | 1 | |||
Node4 | 1 | 1 | 1 | 1 |
上面已经配置好了zookeeper,这里就不需要在配置了
3.4.3 修改Hadoop环境配置文件 hadoop-env.sh
# 设置Java环境变量
exprot JAVA_HOME = /opt/jdk
export HADOOP_HOME = /opt/hadoop
3.4.4 参考 官方文档 配置高可用HDFS
- 配置 hdfs-site.xml 文件如下:
<configuration>
<property>
<!--这里配置逻辑名称,可以随意写 -->
<name>dfs.nameservices</name>
<value>hbzx</value>
</property>
<property>
<!-- 禁用权限 -->
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<!-- 配置namenode 的名称,多个用逗号分割 -->
<name>dfs.ha.namenodes.hbzx</name>
<value>nn1,nn2</value>
</property>
<property>
<!-- dfs.namenode.rpc-address.[nameservice ID].[name node ID] namenode 所在服务器名称和RPC监听端口号 -->
<name>dfs.namenode.rpc-address.hbzx.nn1</name>
<value>node1:9820</value>
</property>
<property>
<!-- dfs.namenode.rpc-address.[nameservice ID].[name node ID] namenode 所在服务器名称和RPC监听端口号 -->
<name>dfs.namenode.rpc-address.hbzx.nn2</name>
<value>node2:9820</value>
</property>
<property>
<!-- dfs.namenode.http-address.[nameservice ID].[name node ID] namenode 监听的HTTP协议端口 -->
<name>dfs.namenode.http-address.hbzx.nn1</name>
<value>node1:9870</value>
</property>
<property>
<!-- dfs.namenode.http-address.[nameservice ID].[name node ID] namenode 监听的HTTP协议端口 -->
<name>dfs.namenode.http-address.hbzx.nn2</name>
<value>node2:9870</value>
</property>
<property>
<!-- namenode 共享的编辑目录, journalnode 所在服务器名称和监听的端口 -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node2:8485;node3:8485;node4:8485/hbzx</value>
</property>
<property>
<!-- namenode高可用代理类 -->
<name>dfs.client.failover.proxy.provider.hbzx</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<!-- 使用ssh 免密码自动登录 -->
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<!-- journalnode 存储数据的地方 -->
<name>dfs.journalnode.edits.dir</name>
<value>/opt/data/journal/node/local/data</value>
</property>
<property>
<!-- 配置namenode自动切换 -->
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
- 配置 core-site.xml
<configuration>
<property>
<!-- 为Hadoop 客户端配置默认的高可用路径 -->
<name>fs.defaultFS</name>
<value>hdfs://hbzx</value>
</property>
<property>
<!-- Hadoop 数据存放的路径,namenode,datanode 数据存放路径都依赖本路径,不要使用 file:/ 开头,使用绝对路径即可
namenode 默认存放路径 :file://${hadoop.tmp.dir}/dfs/name
datanode 默认存放路径 :file://${hadoop.tmp.dir}/dfs/data
-->
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop/</value>
</property>
<property>
<!-- 指定zookeeper所在的节点 -->
<name>ha.zookeeper.quorum</name>
<value>node2:2181,node3:2181,node4:2181</value>
</property>
</configuration>
- 配置yarn-site.xml 为单节点默认,多节点参考:官方文档
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<!-- 配置yarn为高可用 -->
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<!-- 集群的唯一标识 -->
<name>yarn.resourcemanager.cluster-id</name>
<value>hbzx</value>
</property>
<property>
<!-- ResourceManager ID -->
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<!-- 指定ResourceManager 所在的节点 -->
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node1</value>
</property>
<property>
<!-- 指定ResourceManager 所在的节点 -->
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node2</value>
</property>
<property>
<!-- 指定ResourceManager Http监听的节点 -->
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node1:8088</value>
</property>
<property>
<!-- 指定ResourceManager Http监听的节点 -->
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node2:8088</value>
</property>
<property>
<!-- 指定zookeeper所在的节点 -->
<name>yarn.resourcemanager.zk-address</name>
<value>node2:2181,node3:2181,node4:2181</value>
</property>
<property>
<!-- 启用节点的内容和CPU自动检测,最小内存为1G -->
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>true</value>
</property>
</configuration>
- 配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 将配置文件复制到其他机器
scp ./* node4:/opt/hadoop/etc/hadoop/
3.5 启动HDFS
3.5.1 先启动zookeeper
zkServer.sh start
3.5.2 在其中一个namenode上格式化zookeeper
hdfs zkfc -formatZK
如下图表示格式化成功
3.5.3 启动journalnode,需要启动所有节点的journalnode
hdfs --daemon start journalnode
使用JPS命令查看journalnode是否启动成功,成功之后能看到JournalNode如下图:
3.5.4 格式化namenode
hdfs namenode -format
# 如果有多个namenode名称,可以使用 hdfs namenode -format xxx 指定
如果没有Error日志输出表示格式化成功
3.5.5 启动namenode,以便同步其他namenode
hdfs --daemon start namenode
启动之后使用jps命令查询是否启动成功
3.5.6 其他namenode同步
- 如果是使用高可用方式配置的namenode,使用下面命令同步(需要同步的namenode执行).
hdfs namenode -bootstrapStandby
2. 如果不是使用高可用方式配置的namenode,使用下面命令同步:
hdfs namenode -initializeSharedEdits
3.5.7 配置datanode
修改workers 文件,添加datanode节点
node2
node3
node4
3.5.7 启动hdfs
start-dfs.sh
jps 查看结果:
通过浏览器访问hdfs
http://192.168.56.11:9870
4. Hadoop 配置日志聚合和jobhistoryserver
4.1 yarn-site.xml 配置resourcemanager web监听
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>rmhost:8088</value>
</property>
4.2 mapred-site.xml配置jobhistoryserver
<property>
<name>mapreduce.jobhistory.address</name>
<value>rmhost:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>rmhost:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
注意:jobhistoryserver需单独启动
mapred --daemon start historyserver
4.3 yarn-site.xml配置日志聚合
<!-- 开启日志聚合 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志聚合目录 -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/user/container/logs</value>
</property>
错误处理
1. zkfc 格式化错误
java.net.NoRouteToHostException: 没有到主机的路由
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2018-02-06 11:34:01,218 ERROR ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds
2018-02-06 11:34:01,461 INFO zookeeper.ClientCnxn: Opening socket connection to server node2/192.168.56.12:2181. Will not attempt to authenticate using SASL (unknown error)
解决方法:
关闭防火墙,并禁止防火墙启动
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动
2. 格式化namenode 报错,一直在尝试连接
如图:
2018-02-06 11:43:58,061 INFO ipc.Client: Retrying connect to server: node2/192.168.56.12:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-02-06 11:43:58,062 INFO ipc.Client: Retrying connect to server: node4/192.168.56.14:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-02-06 11:43:58,062 INFO ipc.Client: Retrying connect to server: node3/192.168.56.13:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
解决办法:
启用 journalnode,需要分别启动所有节点
hdfs --daemon start journalnode
使用JPS命令查看journalnode是否启动成功,成功之后能看到JournalNode如下图:
3. hdfs 启动报错
Starting namenodes on [node1 node2]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting journal nodes [node2 node3 node4]
ERROR: Attempting to operate on hdfs journalnode as root
ERROR: but there is no HDFS_JOURNALNODE_USER defined. Aborting operation.
Starting ZK Failover Controllers on NN hosts [node1 node2]
ERROR: Attempting to operate on hdfs zkfc as root
ERROR: but there is no HDFS_ZKFC_USER defined. Aborting operation.
解决方法:
在start-dfs.sh,stop-dfs.sh 开始位置增加如下配置:
# 注意等号前后不要有空格
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
4. yarn 启用报错
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
解决办法:
在start-yarn.sh 文件开始处添加:
# 注意等号前后不要有空格
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
5. NodeManager 启动报错
2018-02-06 15:22:36,169 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:259)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:451)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:894)
解决办法:
让 NodeManager自动检测内容和CPU,在yarn-size.xml 添加如下配置:
<property>
<!-- 启用节点的内容和CPU自动检测 -->
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>true</value>
</property>
6. NodeManager启动之后又结束
2018-02-06 16:50:31,210 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Received SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: NodeManager from node4 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager. Node capabilities are <memory:256, vCores:1>; minimums are 1024mb and 1 vcores
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:259)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:451)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:894)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Received SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: NodeManager from node4 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager. Node capabilities are <memory:256, vCores:1>; minimums are 1024mb and 1 vcores
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:375)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:253)
... 6 more
解决办法:升级内存,NodeManager内存最小要求为1024M 和 1核CPU
7. hdfs 安全模式开(safe mode is on)
解决办法:
hadoop dfsadmin -safemode leave