一、什么是Hadoop
Hadoop是一个由Apache基金会所开发的分布式系统基础架,能够对大量数据进行分布式处理的软件框架。
二、部署
1、部署前准备
- 规划集群服务器,修改主机名,修改hosts配置好映射
#修改主机名
[root@localhost opt]# hostnamectl set-hostname {新的主机名}
#地址映射
[root@cathlhdp1 opt]# more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.31.59 cathlhdp1
192.168.31.60 cathlhdp2
192.168.21.167 cathlhdp3
#地址映射
[root@cathlhdp2 opt]# more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.31.59 cathlhdp1
192.168.31.60 cathlhdp2
192.168.21.167 cathlhdp3
#地址映射
[root@cathlhdp3 opt]# more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.31.59 cathlhdp1
192.168.31.60 cathlhdp2
192.168.21.167 cathlhdp3
- 关闭防火墙
#关闭防火墙和SELINUX
[root@localhost ~]# systemctl disable firewalld
[root@localhost ~]# systemctl stop firewalld
#设置selinux为disable
[root@localhost ~]#vi /etc/selinux/config
SELINUX=disabled
[root@localhost ~]#setenforce 0
- 安装JDK1.8
- 建立SSH无密码登录本机,确保所有机器都可以相互免密登录
#在cathlhdp1上生成免密登录秘钥文件
[root@cathlhdp1 ~]# ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
71:54:30:f2:2a:c9:74:b2:2a:0c:25:af:e3:f1:25:01 root@cathlhdp1
The key's randomart image is:
+--[ RSA 2048]----+
| . +o. |
| + . |
|.E. o o o |
| +. o = + |
|. .. = S |
| + .. . |
|o.o... |
|..o.o |
| . . |
+-----------------+
[root@cathlhdp1 ~]# cd .ssh
[root@cathlhdp1 .ssh]# ls
id_rsa id_rsa.pub
[root@cathlhdp1 .ssh]# cat id_rsa.pub >> authorized_keys
[root@cathlhdp1 .ssh]# cp id_rsa.pub id_rsa_cathlhdp1.pub
#在cathlhdp2上生成免密登录秘钥文件
[root@cathlhdp2 ~]# ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
71:54:30:f2:2a:c9:74:b2:2a:0c:25:af:e3:f1:25:01 root@cathlhdp1
The key's randomart image is:
+--[ RSA 2048]----+
| . +o. |
| + . |
|.E. o o o |
| +. o = + |
|. .. = S |
| + .. . |
|o.o... |
|..o.o |
| . . |
+-----------------+
[root@cathlhdp2 ~]# cd .ssh
[root@cathlhdp2 .ssh]# ls
id_rsa id_rsa.pub
[root@cathlhdp2 .ssh]# cat id_rsa.pub >> authorized_keys
[root@cathlhdp2 .ssh]# cp id_rsa.pub id_rsa_cathlhdp2.pub
#把id_rsa_cathlhdp1.pub传到cathlhdp2的/root/.ssh下,并执行
[root@cathlhdp2 .ssh]# cat id_rsa_cathlhdp1.pub >> authorized_keys
#校验是否可免密登录,有提示时输入yes,无需输入密码
[root@cathlhdp2 .ssh]# ssh 192.168.31.59
The authenticity of host '192.168.31.59 (192.168.31.59)' can't be established.
ECDSA key fingerprint is 33:17:fe:0a:dc:dd:7e:92:9f:d5:a7:a0:8b:9c:f9:c8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.31.59' (ECDSA) to the list of known hosts.
Last login: Tue Aug 3 22:46:46 2021 from localhost
[root@cathlhdp1 ~]#
#把id_rsa_cathlhdp2.pub传到cathlhdp1的/root/.ssh下,并执行
[root@cathlhdp1 .ssh]# cat id_rsa_cathlhdp2.pub >> authorized_keys
#校验是否可免密登录,有提示时输入yes,无需输入密码
[root@cathlhdp1 .ssh]# ssh 192.168.31.60
Last failed login: Tue Aug 3 22:49:16 EDT 2021 from cathlhdp1 on ssh:notty
There were 2 failed login attempts since the last successful login.
Last login: Tue Aug 3 22:47:38 2021 from localhost
[root@cathlhdp2 ~]#
2、下载安装包
下载地址:https://archive.apache.org/dist/hadoop/common/
下载hadoop-2.7.3.tar.gz并解压
3、部署安装
Hadoop集群部署详见:https://www.cnblogs.com/LHWorldBlog/p/8232935.html
Yarn集群部署详见:https://www.cnblogs.com/LHWorldBlog/p/8219453.html
三、问题排查
- 无法启动DataNode
执行start-dfs.sh脚本启动Hadoop后,通过jps命令检测DataNode节点进程发现有DataNode进程启动失败,报错:
2021-08-05 03:30:04,886 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/opt/hadoop-2.7.3/hdfs/ha/dfs/data/
java.io.IOException: Incompatible clusterIDs in /opt/hadoop-2.7.3/hdfs/ha/dfs/data: namenode clusterID = CID-e6c3999d-0727-4bfd-8c73-0fa0889051db; datanode clusterID = CID-d51af208-8afb-422d-bb53-6a0c77510d2c
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:775)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:300)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:416)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:395)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:573)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1362)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1327)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
at java.lang.Thread.run(Thread.java:748)
原因分析:首先日志警告,“Failed to add storage directory [DISK]file:/opt/hadoop-2.7.3/hdfs/ha/dfs/data/
”添加存储文件失败,失败路劲位于“/opt/hadoop-2.7.3/hdfs/ha/dfs/data/”。紧接着报Java IO异常“java.io.IOException”,异常信息里显示的是,namenode和datanode的clusterID不一致,这些异常就导致了后面提示初始化失败、DataNode进程退出等异常、警告信息。网上很多其他博主说,这是因为多次对namenode进行format导致的。将namenode和datanode的clusterID和namespaceID修改一致即可。的确,进行了两次format操作,但是修改过后仍然报同样的问题,通过对比”/tmp/hadoop-hadoop/dfs/data/current/VERSION”文件发现VERSION文件里clusterID都是一致的,其他的信息也一致
解决方案:删除“/tmp/hadoop-hadoop/dfs/data/current/VERSION”文件,再重新启动hadoop。
问题总结:该问题因为多次对namenode进行format,每一次format主节点NameNode产生新的clusterID、namespaceID,于是导致主节点的clusterID、namespaceID与各个子节点DataNode不一致。当format过后再启动hadoop,hadoop尝试创建新的current目录,但是由于已存在current目录,导致创建失败,最终引起DataNode节点的DataNode进程启动失败,从而引起hadoop集群完全启动失败。因此可以通过直接删除数据节点DataNode的current文件夹,进行解决该问题。
- Yarn Application一直处于ACCEPTED状态
原因分析:Capacity Scheduler是YARN中默认的资源调度器, maximum-am-resource-percent:集群中用于运行应用程序ApplicationMaster的资源比例上限,该参数通常用于限制处于活动状态的应用程序数目。该参数类型为浮点型,默认是0.1,表示10%。所有队列的ApplicationMaster资源比例上限可通过参数yarn.scheduler.capacity. maximum-am-resource-percent设置(可看做默认值),而单个队列可通过参数yarn.scheduler.capacity.<queue-path>. maximum-am-resource-percent设置适合自己的值
解决方案:修改配置文件etc/hadoop/capacity-scheduler.xml文件中的yarn.scheduler.capacity.maximum-am-resource-percent的值,默认为0.1,推荐修改为0.5。