大数据平台实时数仓从0到1搭建之 - 04 hadoop安装测试

概述

本篇是关于hadoop的安装测试。
在server110上安装配置,然后同步到server111,server112

环境

  • Centos 7
  • jdk 1.8
  • hadoop-3.2.1

server110 192.168.1.110
server111 192.168.1.111
server112 192.168.1.112

安装

#解压
[root@server110 software]# tar -xzvf hadoop-3.2.1.tar.gz -C /opt/modules/
#环境变量
[root@server110 hadoop-3.2.1]# vim /etc/profile
#java
JAVA_HOME=/opt/modules/jdk1.8.0_181
PATH=$PATH:$JAVA_HOME/bin

#hadoop
HADOOP_HOME=/opt/modules/hadoop-3.2.1
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export JAVA_HOME HADOOP_HOME PATH 
:wq #保存退出

#使环境变量生效
[root@server110 hadoop-3.2.1]# source /etc/profile
#测试
[root@server110 hadoop-3.2.1]# hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
 or    hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
..

本地模式wordcount测试

#创建测试文件
[root@server110 opt]# mkdir input
[root@server110 opt]# cd input/
[root@server110 input]# vim input.txt
hello world
hello bigdata
hello stream
hello hadoop

#执行hadoop自带示例jar
[root@server110 opt]# hadoop jar /opt/modules/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /opt/input/  /opt/output

#查看输出结果
[root@server110 opt]# cat output/part-r-00000 
bigdata	1
hadoop	1
hello	4
stream	1
world	1

输出目录结构
在这里插入图片描述

伪分布式配置

配置文件

默认配置文件,不确定格式怎么写的,可以参考默认配置文件
core-default.xml
hdfs-default.xml
mapred-default.xml
yarn-default.xml

  • hadoop-3.2.1\share\hadoop\common\hadoop-common-3.2.1.jar\core-default.xml
  • hadoop-3.2.1\share\hadoop\hdfs\hadoop-hdfs-3.2.1.jar\hdfs-default.xml
  • hadoop-3.2.1\share\hadoop\mapreduce\hadoop-mapreduce-client-core-3.2.1.jar\mapred-default.xml
  • hadoop-3.2.1\share\hadoop\yarn\hadoop-yarn-common-3.2.1.jar\yarn-default.xml

遇到env文件就配置JAVA_HOME

#配置文件目录
[root@server110 hadoop]# pwd
/opt/modules/hadoop-3.2.1/etc/hadoop

[root@server110 hadoop]# vim hadoop-env.sh
#shift+g 跳转到最后一行
export JAVA_HOME=/opt/modules/jdk1.8.0_181
#core-site.xml配置
[root@server110 hadoop]# vim core-site.xml
<!--配置namenode地址 -->
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://server110:9000</value>
</property>
<!--hadoop运行时产生的临时文件的存储目录 -->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/opt/modules/hadoop-3.2.1/data/tmp</value>
</property>


#hdfs-site.xml配置
[root@server110 hadoop]# vim hdfs-site.xml
<!-- 副本数 默认3-->
<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>

格式化NameNode

[root@server110 hadoop-3.2.1]# bin/hdfs namenode -format
2021-10-02 19:37:23,307 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = server110/192.168.1.110
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.2.1
..
`021-10-02 19:37:24,414 INFO common.Storage: Storage directory /opt/modules/hadoop-3.2.1/data/tmp/dfs/name has been successfully formatted.`
..
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at server110/192.168.1.110
************************************************************/

启动hdfs

启动报错,提示使用root启动,
需要配置HDFS_NAMENODE_USER,HDFS_DATANODE_USER,HDFS_SECONDARYNAMENODE_USER

[root@server110 hadoop-3.2.1]# sbin/start-dfs.sh 
Starting namenodes on [server110]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [server110]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

在start-dfs.sh和stop-dfs.sh最上方配置如下
(如果不配置stop-dfs.sh,会导致能启动,停不了的情况)

[root@server110 hadoop-3.2.1]# vim sbin/start-dfs.sh
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
[root@server110 hadoop-3.2.1]# vim sbin/stop-dfs.sh
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

配置完成,重新启动

[root@server110 hadoop-3.2.1]# sbin/start-dfs.sh 
Starting namenodes on [server110]
上一次登录:六 102 19:28:08 CST 2021192.168.1.107pts/0 上
Starting datanodes
上一次登录:六 102 19:52:03 CST 2021pts/1 上
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Starting secondary namenodes [server110]
上一次登录:六 102 19:52:06 CST 2021pts/1 上
[root@server110 hadoop-3.2.1]# jps
22962 SecondaryNameNode
23109 Jps
22539 NameNode
22699 DataNode

关闭防火墙

发现web端口9870不通,正好关闭三台机器防火墙

[root@server110 hadoop-3.2.1]# systemctl stop firewalld.service
[root@server110 hadoop-3.2.1]# systemctl disable firewalld.service 
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@server110 hadoop-3.2.1]# systemctl status firewalld.service 
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

10月 02 17:45:14 server111 systemd[1]: Starting firewalld - dynamic firewall daemon...
10月 02 17:45:15 server111 systemd[1]: Started firewalld - dynamic firewall daemon.
10月 02 19:58:36 server110 systemd[1]: Stopping firewalld - dynamic firewall daemon...
10月 02 19:58:37 server110 systemd[1]: Stopped firewalld - dynamic firewall daemon.

查看web界面

查看web页面,hadoop3.x版本,web端口改为了9870
http://192.168.1.110:9870/
可以正常访问,配置成功

在这里插入图片描述

关闭hdfs

[root@server110 hadoop-3.2.1]# sbin/stop-dfs.sh 
Stopping namenodes on [server110]
上一次登录:六 102 20:03:58 CST 2021pts/1 上
Stopping datanodes
上一次登录:六 102 20:12:06 CST 2021pts/1 上
Stopping secondary namenodes [server110]
上一次登录:六 102 20:12:08 CST 2021pts/1 上

集群配置

集群规划

NameNode、SecondaryNameNode、ResourceManager分别占用资源较大,所以将它们分布在不同的机器上

server110server111server112
HDFSNameNode
DataNode
DataNodeSecondaryNameNode
DataNode
YARNNodeManagerResourceManager
NodeManager
NodeManager
HistoryServerJobHistoryServer

workers

[root@server110 hadoop]# vim workers
server110
server111
server112

hdfs-site.xml

添加secondaryNameNode配置信息

[root@server110 hadoop]# vim hdfs-site.xml
<!--secondary NameNode配置 -->
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>server112:9868</value>
</property>

yarn-env.xml

[root@server110 hadoop]# vim yarn-env.sh
#shift+g 跳转到最后一行
export JAVA_HOME=/opt/modules/jdk1.8.0_181

yarn-site.xml

指定ResourceManager地址
reducer获取数据的方式mapreduce_shuffle

[root@server110 hadoop]# vim yarn-site.xml
<!--指定resourcemanager -->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>server111</value>
</property>
<!--reducer获取数据的方式-->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

start-yarn.sh & stop-yarn.sh

使用root用户启动yarn,需要在start-yarn.sh & stop-yarn.sh 这两个文件顶部添加如下变量,这里提前配置好了,省的一会再报错
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

mapred-env.sh

遇到env文件就配置JAVA_HOME

[root@server110 hadoop]# vim mapred-env.sh
#shift+g 跳转到最后一行
export JAVA_HOME=/opt/modules/jdk1.8.0_181

mapred-site.xml

1、配置MapReduce运行在yarn上
2、配置classpath,不然执行mr,报找不到类的错

[root@server110 hadoop]# vim mapred-site.xml
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>The runtime framework for executing MapReduce jobs.
  Can be one of local, classic or yarn.
  </description>
</property>
<property>
        <name>mapreduce.application.classpath</name>
        <value>
            ${HADOOP_HOME}/etc/hadoop,
            ${HADOOP_HOME}/share/hadoop/common/*,
            ${HADOOP_HOME}/share/hadoop/common/lib/*,
            ${HADOOP_HOME}/share/hadoop/hdfs/*,
            ${HADOOP_HOME}/share/hadoop/hdfs/lib/*,
            ${HADOOP_HOME}/share/hadoop/mapreduce/*,
            ${HADOOP_HOME}/share/hadoop/mapreduce/lib/*,
            ${HADOOP_HOME}/share/hadoop/yarn/*,
            ${HADOOP_HOME}/share/hadoop/yarn/lib/*
        </value>
</property>

删除format数据

删除伪分布式中格式化的namenode信息,以及日志信息

[root@server110 hadoop-3.2.1]# rm -rf data/ logs/

同步文件

[root@server110 modules]# scp -r hadoop-3.2.1/ server111:/opt/modules/
[root@server110 modules]# scp -r hadoop-3.2.1/ server112:/opt/modules/

环境变量

[root@server111 modules]# vim /etc/profile
HADOOP_HOME=/opt/modules/hadoop-3.2.1
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export JAVA_HOME HADOOP_HOME PATH
[root@server112 modules]# vim /etc/profile
HADOOP_HOME=/opt/modules/hadoop-3.2.1
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export JAVA_HOME HADOOP_HOME PATH

重新格式化namenode

[root@server110 hadoop-3.2.1]# bin/hadoop namenode -format

启动dfs

[root@server110 hadoop-3.2.1]# sbin/start-dfs.sh 
Starting namenodes on [server110]
上一次登录:六 102 21:49:16 CST 2021pts/1 上
Starting datanodes
上一次登录:六 102 21:50:08 CST 2021pts/1 上
Starting secondary namenodes [server112]
上一次登录:六 102 21:50:10 CST 2021pts/1 上

启动yarn

因为配置的resourceManager在server111上,所以yarn只能在server111上启动,其他节点启动会报错

[root@server111 hadoop-3.2.1]# sbin/start-yarn.sh 
Starting resourcemanager
上一次登录:六 102 18:13:38 CST 2021从 server112pts/1 上
Starting nodemanagers
上一次登录:六 102 22:07:15 CST 2021pts/0 上

jps

server110上有NameNode、DataNode、NodeManager

[root@server110 opt]# jps
29718 Jps
29607 NodeManager
28907 DataNode
28734 NameNode

server111上有DataNode、ResourceManager、NodeManager

[root@server111 hadoop-3.2.1]# jps
22609 DataNode
23025 ResourceManager
23189 NodeManager
23542 Jps

server112上有DataNode、SecondaryNameNode、NodeManager

[root@server112 hadoop-3.2.1]# jps
23472 Jps
23347 NodeManager
22974 SecondaryNameNode
22879 DataNode

查看web

http://192.168.1.110:9870
http://192.168.1.111:8088

在这里插入图片描述
在这里插入图片描述

测试

HDFS

#将opt下的input文件夹上传到hdfs的根目录下
[root@server110 opt]# hdfs dfs -put input /

在这里插入图片描述

查看文件信息
文件很小,只分配一个块
上面配置的副本数3,在下面Availability显示可用副本所在节点。

在这里插入图片描述

MR

使用hdfs上现有的input目录,测试wordcount

[root@server110 opt]# hadoop jar /opt/modules/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /input /output
[root@server110 opt]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2021-10-02 21:55 /input
drwxr-xr-x   - root supergroup          0 2021-10-02 22:35 /output
drwx------   - root supergroup          0 2021-10-02 22:05 /tmp
[root@server110 opt]# hdfs dfs -ls /output
Found 2 items
-rw-r--r--   3 root supergroup          0 2021-10-02 22:35 /output/_SUCCESS
-rw-r--r--   3 root supergroup         44 2021-10-02 22:35 /output/part-r-00000
[root@server110 opt]# hdfs dfs -cat /output/part-r-00000
2021-10-02 22:36:23,310 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
bigdata	1
hadoop	1
hello	4
stream	1
world	1

在这里插入图片描述

历史服务器 & 日志聚集

测试mr的时候发现历史服务器忘配置了,这里配置下。
配置在server111节点,只能在server111节点启动

[root@server110 opt]# vim /opt/modules/hadoop-3.2.1/etc/hadoop/mapred-site.xml
<!--配置历史服务器 -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>server111:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>server111:19888</value>
</property>
[root@server110 opt]# vim /opt/modules/hadoop-3.2.1/etc/hadoop/yarn-site.xml
<!--日志聚集-->
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>-1</value>
  </property>

启动历史服务器

[root@server111 hadoop-3.2.1]# mapred --daemon start historyserver
[root@server111 hadoop-3.2.1]# jps
27446 ResourceManager
27319 DataNode
28316 Jps
27597 NodeManager
28254 JobHistoryServer

测试日志

#删除原来dfs上的output文件夹
[root@server110 hadoop-3.2.1]# hdfs dfs -rm -r /output
#重新执行wordcount程序
[root@server110 hadoop-3.2.1]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /input /output
#查看output目录
[root@server110 hadoop-3.2.1]# hdfs dfs -ls /output
Found 2 items
-rw-r--r--   3 root supergroup          0 2021-10-02 23:10 /output/_SUCCESS
-rw-r--r--   3 root supergroup         44 2021-10-02 23:10 /output/part-r-00000
#查看输出结果
[root@server110 hadoop-3.2.1]# hdfs dfs -cat /output/part-r-00000
2021-10-02 23:10:44,393 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
bigdata	1
hadoop	1
hello	4
stream	1
world	1

查看日志
1、无法访问日志服务器

  • 是因为配置文件配置的事hostname ,server111,但是本地window系统是访问不了的,
    需在C:\Windows\System32\drivers\etc\hosts添加
192.168.1.110 server110
192.168.1.111 server111
192.168.1.112 server112

2、查看日志报错,historyserver正常启动,web看不了日志,经查看是/tmp目录权限问题,yarn自动创建的/tmp/logs目录是root:root,而默认管理员组是root:supergroup,

穿越回来加个强调:刚hive报错了,这里必须777,不能755,不然hive写不进去

[root@server110 hadoop-3.2.1]# hdfs dfs -chmod -R 777 /tmp

日志正常

在这里插入图片描述

在这里插入图片描述

安装测试完成

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值