前言:Hadoop很重要,有很强的应用,上手学习一下。
目的:具备基本的Hadoop知识,包括所需组件的设置以及成功安装Hadoop集群
书籍:Hadoop权威指南-大数据的存储与分析(清华大学出版社)
blog:https://www.cnblogs.com/biehongli/p/7640469.html
http://hadoop.apache.org/docs/r3.0.0/index.html
第一步,目前存在3台Linux CentOS7 服务器,均可使用,首先使这3台服务器能够互相连通,ssh登录root用户。
#使用 ssh-keygen –t rsa 在服务器上生成密钥对,并将公钥放到其他服务器上。
#使用 cat 公钥名 >>authorized_keys 将公钥分别添加到文件 authorized_keys 中。
#尝试 ssh ip 登录时出现错误,如图
分析:权限不允许的错误,就是说网络上应该没问题,是通的,解决办法应该是开放相关配置权限。当使用ssh进行登录的时候,本机需要拿着自己的私钥,去与目标的密钥进行匹配,这个问题是由于没有在客户端配置好本服务器的私钥,所以相当于不拿钥匙就想去开门,门自然就开不开
解决:
-- 配置CentOS7 客户端私钥 vi /etc/ssh/ssh_config
-- 重启ssh服务
systemctl restart sshd.service
#分别配置 vi /etc/sysconfig/network 文件
#分别配置 vi /etc/hosts 文件
#重启三个服务器
#在各个服务器上 ping master slaver1 slaver2 测试
#使用 ssh 角色名 进行登录
第二步,在服务器上安装jdk
参考 https://blog.csdn.net/weixin_39139129/article/details/80434728 step3
第三步,在各服务器上创建hadoop用户
adduser hadoop
passwd hadoop
把hadoop用户加入到hadoop用户组
sudo usermod -a -G hadoop hadoop
把hadoop用户赋予root权限,让他可以使用sudo命令
vi /etc/sudoers
在 root ALL=(ALL) ALL 下加入
hadoop ALL=(ALL) ALL
:wq!
将hadoop文件夹的所有者指定为hadoop用户
chown -R hadoop:hadoop /opt/hadoop
第四步,使用解压hadoop安装包,并配置文件
#下载并解压安装包
#打开目录
cd /opt/hadoop
#下载安装包
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.6/hadoop-2.7.6.tar.gz
#解压
tar -zxvf hadoop-2.7.6.tar.gz
#配置环境变量文件
vi /etc/profile
#添加如下内容
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.6
export PATH=$PATH:$HADOOP_HOME/sbin
export PATH=$PATH:$HADOOP_HOME/bin
#没有启动Hadoop、hbase时都会有没加载lib成功的警告
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
source /etc/profile
#配置hadoop-env.sh、yarn-env.sh
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/hadoop-env.sh
#加入
export JAVA_HOME=/opt/java/jdk1.8.0_171
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/yarn-env.sh
#加入
export JAVA_HOME=/opt/java/jdk1.8.0_171
#配置core-site.xml
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/core-site.xml
#加入
<configuration>
<property>
<name>fs.defaultFS</name> <!--NameNode 的URI-->
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name> <!--hadoop临时文件的存放目录-->
<value>/opt/hadoop/hadoop-2.7.6/temp</value>
</property>
</configuration>
#配置hdfs-site.xml
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/hdfs-site.xml
#加入
<configuration>
<property>
<!--namenode持久存储名字空间及事务日志的本地文件系统路径-->
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/hadoop-2.7.6/dfs/name</value>
<!--目录无需预先创建,会自动创建-->
</property>
<property>
<!--DataNode存放块数据的本地文件系统路径-->
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/hadoop-2.7.6/dfs/data</value>
</property>
<property>
<!--数据需要备份的数量,不能大于集群的机器数量,默认为3-->
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<!--设置为true,可以在浏览器中IP+port查看-->
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
#配置mapred-site.xml
cp /opt/hadoop/hadoop-2.7.6/etc/hadoop/mapred-site.xml.template /opt/hadoop/hadoop-2.7.6/etc/hadoop/mapred-site.xml
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/mapred-site.xml
<configuration>
<property>
<!--mapreduce运用了yarn框架,设置name为yarn-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<!--历史服务器,查看Mapreduce作业记录-->
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
#配置yarn-site.xml
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/yarn-site.xml
<configuration>
<property>
<!--NodeManager上运行的附属服务,用于运行mapreduce-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<!--ResourceManager 对客户端暴露的地址-->
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<!--ResourceManager 对ApplicationMaster暴露的地址-->
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<!--ResourceManager 对NodeManager暴露的地址-->
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<!--ResourceManager 对管理员暴露的地址-->
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<!--ResourceManager 对外web暴露的地址,可在浏览器查看-->
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
#配置slaves文件
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/slaves
#注释掉
localhost
#加入
slaver1
slaver2
#配置结束,通过远程复制命令将master上配置好的Hadoop复制到slaver1和slaver2上
scp -r /opt/hadoop/hadoop-2.7.6 slaver1:/opt/hadoop/hadoop-2.7.6
scp -r /opt/hadoop/hadoop-2.7.6 slaver2:/opt/hadoop/hadoop-2.7.6
第五步,启动Hadoop
#在Hadoop目录中,进行hdfs格式化
bin/hdfs namenode -format
#启动
sbin/start-all.sh
#查看master上jps
#查看slaver上jps
#停止
sbin/stop-all.sh
question:master服务器上jps查看出的结果不对
查看一下hadoop-root-namenode-x.log日志
2018-07-27 11:25:21,493 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
2018-07-27 11:25:21,494 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1
2018-07-27 11:25:21,609 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized with 0 entries 0 lookups
2018-07-27 11:25:21,609 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 317 msecs
2018-07-27 11:25:21,746 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: RPC server is binding to master:9000
2018-07-27 11:25:21,752 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 1000
2018-07-27 11:25:21,762 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state
2018-07-27 11:25:21,763 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 1
2018-07-27 11:25:21,764 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 11
2018-07-27 11:25:21,765 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /opt/hadoop/hadoop-2.7.6/dfs/name/current/edits_inprogress_0000000000000000001 -> /opt/hadoop/hadoop-2.7.6/dfs/name/current/edits_0000000000000000001-0000000000000000002
2018-07-27 11:25:21,783 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state
2018-07-27 11:25:21,783 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state
2018-07-27 11:25:21,788 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
2018-07-27 11:25:21,790 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2018-07-27 11:25:21,791 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2018-07-27 11:25:21,791 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2018-07-27 11:25:21,796 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.net.BindException: Problem binding to [master:9000] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
at org.apache.hadoop.ipc.Server.bind(Server.java:484)
at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:690)
at org.apache.hadoop.ipc.Server.<init>(Server.java:2379)
at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:534)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.<init>(NameNodeRpcServer.java:351)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:675)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:648)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:820)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:804)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1516)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1582)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:467)
... 13 more
2018-07-27 11:25:21,799 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-07-27 11:25:21,802 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at iz2zeeiutul2gfdixjzqcoz/172.17.5.165
************************************************************/
查看报错及结合网络搜索信息,发现master:9000这个地址可能不好用,遂重新配置 core-site.xml
vi /opt/hadoop/hadoop-2.7.6/etc/hadoop/core-site.xml
<value>hdfs://master:9000</value>
-- >
<value>hdfs://127.0.0.1:9000</value>
最后配置后依旧无法启动master上的
ResourceManager、NameNode、SecondaryNameNode进程
盖以为是因为3个服务器并非内网架构,无法通过局域网互通的原因,下一步尝试在虚拟机上架设伪分布式。
在经过第二次在虚拟机上尝试后,我本打算跑个单机版的hadoop凑合使用,结果我看到这样一篇文章,并认为可能是由于我的机器未设置对机器的hostname导致的hadoop跑不起来,于是我在设置了hadoop的hostname后又按照前面步骤重新部署后,好用了。
#CentOS7 设置hostname
hostnamectl set-hostname master
#编辑下hosts文件
vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 master
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
/*
第四步,使用解压hadoop安装包,并配置文件(失败了!废弃)
#将下载的安装包解压到目录 /opt/hadoop 下
tar -zxvf hadoop-3.1.0.tar.gz
#配置环境变量
vi /etc/profile
#set hadoop
export HADOOP_HOME=/opt/hadoop/hadoop-3.1.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
source /etc/profile
#配置文件
cd /opt/hadoop/hadoop-3.1.0/etc/hadoop
#配置文件:hadoop-env.sh
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/java/jdk1.8.0_171
export HADOOP_HOME=/opt/hadoop/hadoop-3.1.0
#配置 hdfs-site.xml
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<!--这里配置逻辑名称,可以随意写 -->
<name>dfs.nameservices</name>
<value>hbzx</value>
</property>
<property>
<!-- 禁用权限 -->
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<!-- 配置namenode 的名称,多个用逗号分割 -->
<name>dfs.ha.namenodes.hbzx</name>
<value>nn1,nn2</value>
</property>
<property>
<!-- dfs.namenode.rpc-address.[nameservice ID].[name node ID] namenode 所在服务器名称和RPC监听端口号 -->
<name>dfs.namenode.rpc-address.hbzx.nn1</name>
<value>master:9820</value>
</property>
<property>
<!-- dfs.namenode.rpc-address.[nameservice ID].[name node ID] namenode 所在服务器名称和RPC监听端口号 -->
<name>dfs.namenode.rpc-address.hbzx.nn2</name>
<value>slaver1:9820</value>
</property>
<property>
<!-- dfs.namenode.http-address.[nameservice ID].[name node ID] namenode 监听的HTTP协议端口 -->
<name>dfs.namenode.http-address.hbzx.nn1</name>
<value>master:9870</value>
</property>
<property>
<!-- dfs.namenode.http-address.[nameservice ID].[name node ID] namenode 监听的HTTP协议端口 -->
<name>dfs.namenode.http-address.hbzx.nn2</name>
<value>slaver1:9870</value>
</property>
<property>
<!-- namenode 共享的编辑目录, journalnode 所在服务器名称和监听的端口 -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slaver1:8485;slaver2:8485/hbzx</value>
</property>
<property>
<!-- namenode高可用代理类 -->
<name>dfs.client.failover.proxy.provider.hbzx</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<!-- 使用ssh 免密码自动登录 -->
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/master</value>
</property>
<property>
<!-- journalnode 存储数据的地方 -->
<name>dfs.journalnode.edits.dir</name>
<value>/opt/data/journal/node/local/data</value>
</property>
<property>
<!-- 配置namenode自动切换 -->
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
#配置 core-site.xml
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/core-site.xml
<configuration>
<property>
<!-- 为Hadoop 客户端配置默认的高可用路径 -->
<name>fs.defaultFS</name>
<value>hdfs://hbzx</value>
</property>
<property>
<!-- Hadoop 数据存放的路径,namenode,datanode 数据存放路径都依赖本路径,不要使用file:/ 开头,使用绝对路径即可
namenode 默认存放路径 :file://${hadoop.tmp.dir}/dfs/name
datanode 默认存放路径 :file://${hadoop.tmp.dir}/dfs/data
-->
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop/</value>
</property>
<property>
<!-- 指定zookeeper所在的节点 -->
<name>ha.zookeeper.quorum</name>
<value>master:2181,slaver1:2181,slaver2:2181</value>
</property>
</configuration>
#配置 yarn-site.xml 为单节点默认
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<!-- 配置yarn为高可用 -->
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<!-- 集群的唯一标识 -->
<name>yarn.resourcemanager.cluster-id</name>
<value>hbzx</value>
</property>
<property>
<!-- ResourceManager ID -->
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<!-- 指定ResourceManager 所在的节点 -->
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master</value>
</property>
<property>
<!-- 指定ResourceManager 所在的节点 -->
<name>yarn.resourcemanager.hostname.rm2</name>
<value>slaver1</value>
</property>
<property>
<!-- 指定ResourceManager Http监听的节点 -->
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<!-- 指定ResourceManager Http监听的节点 -->
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>slaver1:8088</value>
</property>
<property>
<!-- 指定zookeeper所在的节点 -->
<name>yarn.resourcemanager.zk-address</name>
<value>master:2181,slaver1:2181,slaver2:2181</value>
</property>
<property>
<!-- 启用节点的内容和CPU自动检测,最小内存为1G -->
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>true</value>
</property>
</configuration>
#配置 mapred-site.xml
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
#配置后启动未成功,回滚,重新进行第四步
#配置 core-site.xml
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///opt/hadoop/hadoop-3.1.0/tmp</value>
</property>
</configuration>
#配置 hdfs-site.xml
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/hadoop-3.1.0/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop/hadoop-3.1.0/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slaver1:9001</value>
</property>
</configuration>
#配置 yarn-site.xml
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandle</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
</configuration>
#配置 mapred-site.xml
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop/hadoop-3.1.0/etc/hadoop,
/opt/hadoop/hadoop-3.1.0/share/hadoop/common/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/common/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/hdfs/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/hdfs/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/mapreduce/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/yarn/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
#配置 workers
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/workers
#填入三台机器ip地址
39.106.4.66
101.132.236.106
39.106.27.129
#配置 hadoop-env.sh yarn-env.sh 的 Java_Home
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/java/jdk1.8.0_171
vi /opt/hadoop/hadoop-3.1.0/etc/hadoop/yarn-env.sh
export JAVA_HOME=/opt/java/jdk1.8.0_171
第五步,启动hadoop
1. 格式化namenode
进入bin目录
./hdfs namenode -format
2. 启动hadoop
进入sbin目录
./start-dfs.sh
报错:
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
把缺少的环境变量加上:hadoop-env.sh
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
再启动
报错:
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
start-yarn.sh,stop-yarn.sh顶部也需添加以下
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
ERROR: Cannot set priority of datanode process 10539
由于新报错没有明确方向,网上部署案例多为hadoop2.x版本,所以退回hadoop2.X版本。回滚到第四步
./start-yarn.sh
*/