单机伪集群部署
环境:
Ubuntu18.04虚拟机
Hadoop3.2.1
JDK8
1、下载清华镜像部署安装3.2.1版本
mkdir /var/lib/hadoop
cd /var/lib/hadoop
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
tar -zvxf hadoop-3.2.1.tar.gz
2、添加用户组
#添加hadoop用户
sudo adduser hadoop
#添加用户组
sudo addgroup hadoop
#hadoop加入hadoop组
gpasswd –a hadoop hadoop
3、修改文件
#修改bashrc文件
nano ~/.bashrc
#首行添加一下内容
export HADOOP_HOME=/var/lib/hadoop/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
#获取.bashrc
source ~/.bashrc
#查看openjdk java安装位置
ls -lrt /usr/bin/java /usr/bin/java -> /etc/alternatives/java
ls -lrt /etc/alternatives/java /etc/alternatives/java -> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
#配置JAVA_HOME
vim /etc/profile
# 输入
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar
export JAVA_HOME
export PATH
export CLASSPATH
# 保存
source /etc/source
#编辑配置文件配置JAVA_HOME
nano /var/lib/hadoop/hadoop-3.2.1/etc/hadoop/hadoop-env.sh
# 添加启动用户参数(默认改为root能免去很多问题)
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/var/lib/hadoop-3.2.1/etc/hadoop"}
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
在文件最后添加
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/var/lib/hadoop-3.2.1/etc/hadoop"}
测试输入hadoop看下是否配置成功:
修改/var/lib/hadoop/hadoop-3.2.1/etc/hadoop/core-site.xml文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop/tmpdata</value>
</property>
</configuration>
修改hdfs配置文件 /var/lib/hadoop/hadoop-3.2.1/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<name>dfs.name.dir</name>
<value>file:///var/lib/hadoop/hdfs/namenode</value>
<name>dfs.data.dir</name>
<value>file:///var/lib/hadoop/hdfs/datanode</value>
</property>
</configuration>
修改mapreduce配置 mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改yarn配置,yarn-site.xml
<configuration>
<property>
<name>mapreduceyarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
首次使用之前格式化namenode。 由于HDFS用户运行以下命令来格式化Namenode。
$ hdfs namenode -format
格式化Namenode后,使用start-dfs.sh脚本启动HDFS。
要启动YARN服务,您需要执行纱线启动脚本,即start-yarn.sh
要验证所有Hadoop服务/守护程序是否已成功启动,您可以使用jps命令。
部署问题解决
ERROR: Attempting to operate on hdfs namenode as root
启动hdfs遇到以下报错。
无法通过root运行该服务。stackoverflow上的说法是这样的:
The root cause of this problem,
hadoop install for different user and you start yarn service for different user. OR
in hadoop config’s hadoop-env.sh specified HDFS_NAMENODE_USER and HDFS_DATANODE_USER user is something else.
Hence we need to correct and make it consistent at every place. So a simple solution of this problem is to edit your hadoop-env.sh file and add the user-name for which you want to start the yarn service. So go ahead and edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh by adding the following lines
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Now save and start yarn, hdfs service and check that it works.
我们在hadoop-env.sh文件中也可以找到如下的描述
To prevent accidents, shell commands be (superficially) locked to only allow certain users to execute certain subcommands.
为了防止发生意外,仅(部分)锁定shell命令以仅允许某些用户执行某些子命令。
It uses the format of (command)_(subcommand)_USER.For example, to limit who can execute the namenode command,export HDFS_NAMENODE_USER=hdfs
使用“命令_子命令_用户”,例如,通过使用export HDFS_NAMENODE_USER=hdfs来限制哪个用户可以执行namenode命令。
ERROR: Unable to write in /var/lib/hadoop/hadoop-3.2.1/logs. Aborting.
logs目录权限不够修改logs目录为777解决。
ERROR: Cannot set priority of datanode process 3727
参考:https://www.codeleading.com/article/72031400447/
在$HADOOP_HOME/etc/hadoop/hadoop-env.sh最后一行加上HADOOP_SHELL_EXECNAME=root,否则该环境变量默认值为hdfs。同时其他涉及USER的变量也改成root,例如:
export HDFS_DATANODE_USER=root
export HADOOP_SECURE_DN_USER=root
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root