Hadoop搭建伪分布式运行模式、启动HDFS并运行MapReduce程序、启动YARN并运行MapReduce程序、配置历史服务器、日志的聚集、配置文件说明

1、简介

        从字面上,很好理解伪分布式的含义,对于Hadoop而言,可以在单节点上以伪分布式的方式运行,Hadoop 进程以分离的 Java 进程来运行,节点既作为 NameNode 也作为 DataNode

2、修改/opt/module/hadoop-2.7.2/etc/hadoop/hadoop-env.sh文件

3、修改/opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml文件

<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://hadoop01:9000</value>
</property>

<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

4、修改/opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml文件

<!-- 指定HDFS副本的数量 -->
<property>
	<name>dfs.replication</name>
	<value>1</value>
</property>

5、格式化NameNode(第一次启动时格式化,以后就不要总格式化)

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs namenode -format

6、启动NameNode

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kgf-namenode-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ jps
1127 NameNode
1165 Jps
[kgf@hadoop01 hadoop-2.7.2]$

7、启动DataNode

[kgf@hadoop01 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kgf-datanode-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ jps
1268 Jps
1221 DataNode
1127 NameNode
[kgf@hadoop01 hadoop-2.7.2]$

看到上面的两个说明就启动成功了

 8、web端查看HDFS文件系统

地址:http://hadoop01:50070/dfshealth.html#tab-overview

9、在HDFS中创建用户目录

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/hadoop
[kgf@hadoop01 hadoop-2.7.2]$

10、在hadoop上创建input目录,将本地文件拷贝到hadoop上

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ hdfs dfs -mkdir /input
[kgf@hadoop01 hadoop-2.7.2]$ hdfs dfs -put /opt/module/hadoop-2.7.2/etc/hadoop/*.xml /input/
[kgf@hadoop01 hadoop-2.7.2]$

 11、我们这次使用当前搭建的伪分布式模式运行wordcount案例

命令:hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hadoop上的输入目录 hadoop上的输出目录

注意:如果输出目录存在要删掉

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output

运行结果:

查看运行结果

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -ls /output
Found 2 items
-rw-r--r--   1 kgf supergroup          0 2022-05-23 13:39 /output/_SUCCESS
-rw-r--r--   1 kgf supergroup      10274 2022-05-23 13:39 /output/part-r-00000
[kgf@hadoop01 hadoop-2.7.2]$

bin/hdfs dfs -cat /output/part-r-00000

12、启动HDFS并运行MapReduce程序

1、修改/opt/module/hadoop-2.7.2/etc/hadoop/yarn-env.sh 

 2、修改/opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml

<!-- Reducer获取数据的方式 -->
<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>

<!-- 指定YARN的ResourceManager的地址 -->
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>hadoop01</value>
</property>

 3、修改/opt/module/hadoop-2.7.2/etc/hadoop/mapred-env.sh

 

4、修改/opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml(对mapred-site.xml.template重新命名为)

<!-- 指定MR运行在YARN上 -->
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>

 5、启动集群

  •    5.1、启动前必须保证NameNode和DataNode已经启动
  •    5.2、启动ResourceManager
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-resourcemanager-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ jps
1076 NameNode
1132 DataNode
1645 Jps
1423 ResourceManager
[kgf@hadoop01 hadoop-2.7.2]$
  •      5.3、启动NodeManager
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-nodemanager-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ jps
1076 NameNode
1736 Jps
1132 DataNode
1423 ResourceManager
1679 NodeManager
[kgf@hadoop01 hadoop-2.7.2]$

6、YARN的浏览器页面查看

删除文件系统上的output文件

 

[kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -rm -R /output
22/05/25 13:45:01 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /output
[kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - kgf supergroup          0 2022-05-23 13:34 /input
drwxr-xr-x   - kgf supergroup          0 2022-05-23 13:31 /user
[kgf@hadoop01 hadoop-2.7.2]$

 执行MapReduce程序

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input  /output

查看运行结果

 

 13、配置历史服务器

为了查看程序的历史运行情况,需要配置一下历史服务器。具体配置步骤如下:

1、修改/opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml

<!-- 历史服务器端地址 -->
<property>
	<name>mapreduce.jobhistory.address</name>
	<value>hadoop01:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop01:19888</value>
</property>

2、启动历史服务器

[kgf@hadoop01 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[kgf@hadoop01 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-kgf-historyserver-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ jps
1076 NameNode
2693 Jps
1132 DataNode
2652 JobHistoryServer
1423 ResourceManager
1679 NodeManager
[kgf@hadoop01 hadoop-2.7.2]$

 3、查看JobHistory

 

 

14、配置日志的聚集

日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。

日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。

注意:开启日志聚集功能,需要重新启动NodeManager ResourceManagerHistoryManager

开启日志聚集功能具体步骤如下:

 1、修改/opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml

<!-- 日志聚集功能使能 -->
<property>
	<name>yarn.log-aggregation-enable</name>
	<value>true</value>
</property>

<!-- 日志保留时间设置7天 -->
<property>
	<name>yarn.log-aggregation.retain-seconds</name>
	<value>604800</value>
</property>

2、关闭NodeManager 、ResourceManager和HistoryManager

 

[kgf@hadoop01 hadoop-2.7.2]$ vim etc/hadoop/yarn-site.xml
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with kill -9
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
resourcemanager running as process 1423. Stop it first.
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[kgf@hadoop01 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh stop historyserver
stopping historyserver
[kgf@hadoop01 hadoop-2.7.2]$ jps
1076 NameNode
1132 DataNode
2862 Jps
[kgf@hadoop01 hadoop-2.7.2]$

3、启动NodeManager 、ResourceManager和HistoryManager

[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-resourcemanager-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-nodemanager-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-kgf-historyserver-hadoop01.out
[kgf@hadoop01 hadoop-2.7.2]$ jps
3313 Jps
2898 ResourceManager
3250 JobHistoryServer
1076 NameNode
3143 NodeManager
1132 DataNode
[kgf@hadoop01 hadoop-2.7.2]$

4、删除HDFS上已经存在的输出文件

[kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -rm -R /output
22/05/25 13:58:50 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /output
[kgf@hadoop01 hadoop-2.7.2]$

5、执行WordCount程序

[kgf@hadoop01 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input  /output
22/05/25 14:30:33 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.56.20:8032
22/05/25 14:30:34 INFO input.FileInputFormat: Total input paths to process : 8
22/05/25 14:30:34 INFO mapreduce.JobSubmitter: number of splits:8
22/05/25 14:30:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1653488933845_0001
22/05/25 14:30:35 INFO impl.YarnClientImpl: Submitted application application_1653488933845_0001
22/05/25 14:30:35 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1653488933845_0001/
22/05/25 14:30:35 INFO mapreduce.Job: Running job: job_1653488933845_0001
22/05/25 14:30:42 INFO mapreduce.Job: Job job_1653488933845_0001 running in uber mode : false
22/05/25 14:30:42 INFO mapreduce.Job:  map 0% reduce 0%
22/05/25 14:30:52 INFO mapreduce.Job:  map 75% reduce 0%
22/05/25 14:30:59 INFO mapreduce.Job:  map 100% reduce 0%
22/05/25 14:31:00 INFO mapreduce.Job:  map 100% reduce 100%
22/05/25 14:31:01 INFO mapreduce.Job: Job job_1653488933845_0001 completed successfully
22/05/25 14:31:01 INFO mapreduce.Job: Counters: 49

6、查看日志

 

 

 

15、配置文件说明

Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值。

(1)默认配置文件:

 

(2)自定义配置文件:

core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置 

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值