Hadoop3.2.1版本的环境搭建
-
机器准备,使用的配置好的docker基础镜像sweetfly123/baseubuntu:18.04v5
服务器 系统 内存 IP 规划 JDK HADOOP node1 Ubuntu 18.04.2 LTS 8G 10.101.18.21 zoo1 JDK 1.8.0_222 hadoop-3.2.1 node2 Ubuntu 18.04.2 LTS 8G 10.101.18.8 zoo2 JDK 1.8.0_222 hadoop-3.2.1 node3 Ubuntu 18.04.2 LTS 8G 10.101.18.24 zoo3 JDK 1.8.0_222 hadoop-3.2.12 -
host配置
修改三台服务器的hosts文件
vim /etc/hosts
#添加下面内容,根据个人服务器IP配置
10.101.18.21 zoo1
10.101.18.8 zoo2
10.101.18.24 zoo3
Hadoop搭建
我们先在Master节点下载Hadoop包,然后修改配置,随后复制到其他Slave节点稍作修改就可以了。
- 下载安装包,创建Hadoop目录
#下载
wget http://apache.claz.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
#解压到 /usr/local 目录
sudo tar -xzvf hadoop-3.2.1.tar.gz -C /usr/local
cd /usr/local
#修改hadoop的文件权限,我这里没有创建用户,使用的时root用户启动的Hadoop
# sudo chown -R ubuntu:ubuntu hadoop-3.2.1.tar.gz
#重命名文件夹
sudo mv hadoop-3.2.1 hadoop
- 配置Master节点的Hadoop环境变量
和配置JDK环境变量一样,编辑用户目录下的.profile
文件, 添加Hadoop环境变量:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
执行 source .profile
让立即生效
- 配置Master节点
Hadoop 的各个组件均用XML文件进行配置, 配置文件都放在 /usr/local/hadoop/etc/hadoop
目录中:
- core-site.xml:配置通用属性,例如HDFS和MapReduce常用的I/O设置等
- hdfs-site.xml:Hadoop守护进程配置,包括namenode、辅助namenode和datanode等
- mapred-site.xml:MapReduce守护进程配置
- yarn-site.xml:资源调度相关配置
a. 编辑core-site.xml
文件,修改内容如下:这里的fs.defaultFS是当前master节点的hostname,也就是zoo1
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://zoo1:9000</value>
</property>
</configuration>
参数说明:
- fs.defaultFS:默认文件系统,HDFS的客户端访问HDFS需要此参数
- hadoop.tmp.dir:指定Hadoop数据存储的临时目录,其它目录会基于此路径, 建议设置到一个足够空间的地方,而不是默认的/tmp下
如没有配置
hadoop.tmp.dir
参数,系统使用默认的临时目录:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被删除,必须重新执行format才行,否则会出错。
b. 编辑hdfs-site.xml
,修改内容如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
</configuration>
参数说明:
- dfs.replication:数据块副本数
- dfs.name.dir:指定namenode节点的文件存储目录
- dfs.data.dir:指定datanode节点的文件存储目录
c. 编辑mapred-site.xml
,修改内容如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
d. 编辑yarn-site.xml
,修改内容如下:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>zoo1</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME</value>
</property>
</configuration>
e. 编辑workers
, 修改内容如下:
slave1
slave2
配置worker节点
- 配置Slave节点
将Master节点配置好的Hadoop打包,发送到其他两个节点:
# 打包hadoop包
tar -cxf hadoop.tar.gz /usr/local/hadoop
# 拷贝到其他两个节点
scp hadoop.tar.gz ubuntu@slave1:~
scp hadoop.tar.gz ubuntu@slave2:~
在其他节点加压Hadoop包到/usr/local
目录
sudo tar -xzvf hadoop.tar.gz -C /usr/local/
配置Slave1和Slaver2两个节点的Hadoop环境变量:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
启动集群
- 格式化HDFS文件系统
进入Master节点的Hadoop目录,执行一下操作:
bin/hadoop namenode -format
格式化namenode,第一次启动服务前执行的操作,以后不需要执行。
截取部分日志(看第5行日志表示格式化成功):
2019-11-11 13:34:18,960 INFO util.GSet: VM type = 64-bit
2019-11-11 13:34:18,960 INFO util.GSet: 0.029999999329447746% max memory 1.7 GB = 544.5 KB
2019-11-11 13:34:18,961 INFO util.GSet: capacity = 2^16 = 65536 entries
2019-11-11 13:34:18,994 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2017092058-10.101.18.21-1573450458983
2019-11-11 13:34:19,010 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
2019-11-11 13:34:19,051 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2019-11-11 13:34:19,186 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2019-11-11 13:34:19,207 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-11-11 13:34:19,214 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
- 启动Hadoop集群
sbin/start-all.sh
启动过程遇到的问题与解决方案:
a. 错误:master: rcmd: socket: Permission denied
解决:
执行 echo "ssh" > /etc/pdsh/rcmd_default
b. 错误:JAVA_HOME is not set and could not be found.
解决:
修改三个节点的hadoop-env.sh
,添加下面JAVA环境变量
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
c. 写在最前注意:
1、master,slave都需要修改start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh四个文件
2、如果你的Hadoop是另外启用其它用户来启动,记得将root改为对应用户
HDFS格式化后启动dfs出现以下错误:
[root@master sbin]# ./start-dfs.sh
Starting namenodes on [master]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [slave1]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
在/hadoop/sbin路径下:
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
123456
还有,start-yarn.sh,stop-yarn.sh顶部也需添加以下:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
# Licensed to the Apache Software Foundation (ASF) under one or more
-
使用jps命令查看运行情况
Master节点执行输出:
19557 ResourceManager
19914 Jps
19291 SecondaryNameNode
18959 NameNode
Slave节点执行输入:
18580 NodeManager
18366 DataNode
18703 Jps
- 查看Hadoop集群状态
hadoop dfsadmin -report
查看结果:
Configured Capacity: 41258442752 (38.42 GB)
Present Capacity: 5170511872 (4.82 GB)
DFS Remaining: 5170454528 (4.82 GB)
DFS Used: 57344 (56 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 10.101.18.24:9866 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 20629221376 (19.21 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 16919797760 (15.76 GB)
DFS Remaining: 3692617728 (3.44 GB)
DFS Used%: 0.00%
DFS Remaining%: 17.90%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Nov 11 15:00:27 CST 2019
Last Block Report: Mon Nov 11 14:05:48 CST 2019
Num of Blocks: 0
Name: 10.101.18.8:9866 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 20629221376 (19.21 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 19134578688 (17.82 GB)
DFS Remaining: 1477836800 (1.38 GB)
DFS Used%: 0.00%
DFS Remaining%: 7.16%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Nov 11 15:00:24 CST 2019
Last Block Report: Mon Nov 11 13:53:57 CST 2019
Num of Blocks: 0
- 关闭Hadoop
sbin/stop-all.sh
安装HIVE
https://www.cnblogs.com/hankleo/p/10703641.html
初始化hive元数据mysql
从报错信息来看:
一,java.lang.NoSuchMethodError
原因:1.系统找不到相关jar包
2.同一类型的 jar 包有不同版本存在,系统无法决定使用哪一个
二,com.google.common.base.Preconditions.checkArgument
根据百度可知,该类来自于guava.jar
三,查看该jar包在hadoop和hive中的版本信息
hadoop-3.2.1(路径:hadoop\share\hadoop\common\lib)中该jar包为 *guava-27.0-jre.jar*
hive-2.3.6(路径:hive/lib)中该jar包为guava-14.0.1.jar
四,解决方案
删除hive中低版本的guava-14.0.1.jar包,将hadoop中的guava-27.0-jre.jar复制到hive的lib目录下即可。