Hadoop基础之《(7)—Hadoop三种运行模式》

一、hadoop有三种运行模式

1、本地模式
数据存储在linux本地,不用

2、伪分布式集群
数据存储在HDFS,测试用

3、完全分布式集群
数据存储在HDFS,同时多台服务器工作。企业大量使用

二、单机运行
单机运行就是直接执行hadoop命令

1、例子-统计单词数量
cd /appserver/hadoop/hadoop-3.3.4
mkdir wcinput
mkdir outinput
在wcinput下建立一个word.txt,输入一些单词

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount wcinput/ wcoutput/

三、ssh免密登录

ssh-keygen生成本机的公私钥对。ssh-copy-id将本机公钥安装到远程主机上,实现免密登录远程主机

四、单机伪集群安装
见上一篇

五、集群部署规划

1、NameNode和SecondaryNameNode不要安装在同一台服务器。

2、ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。

hadoop101hadoop102hadoop103
HDFS

NameNode

DataNode

DataNode

SecondaryNameNode

DataNode

YARN

NodeManager

ResourceManager

NodeManager

NodeManager

3、配置文件配置
Hadoop配置文件分为两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值

(1)默认配置文件

默认的四大配置文件存放位置
core-default.xmlhadoop-common-3.x.x.jar/core-default.xml
hdfs-default.xmlhadoop-hdfs-3.x.x.jar/hdfs-default.xml
yarn-default.xmlhadoop-yarn-common-3.x.x.jar/yarn-default.xml
mapred-default.xmlhadoop-mapreduce-client-core-3.x.x.jar/mapred-default.xml

(2)自定义配置文件
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置

(3)workers配置文件
配置集群,把所有节点添加进去

六、NameNode初始化

1、如果是第一次启动集群,需要进行初始化操作,在hadoop101节点上格式化NameNode。(类似于电脑新加了一个硬盘,需要给硬盘进行分盘符等初始化操作)。

2、如果多次格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到以往数据。如果集群在运行过程中报错,需要重新格式化NameNode的话,一定要先停止NameNode和DataNode的进程,并且要删除所有机器的data和logs目录,然后再进行格式化。

七、查看HDFS上存储的数据

http://192.168.1.1:9870/,访问Browse the file system

八、查看Yarn上运行的Job

九、测试

1、在hdfs创建一个目录
cd /appserver/hadoop/hadoop-3.3.4/bin
./hadoop fs -mkdir /wcinput

2、上传一个文件
./hadoop fs -put /tmp/aa.txt /wcinput

3、点下载文件报错
Couldn't preview the file. NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'http://localhost:9864/webhdfs/v1/wcinput/aa.txt?op=OPEN&namenoderpcaddress=hadoop001:8020&offset=0&_=1675238533732'.

解决办法:检查/etc/hosts下主机名配置,关闭hdfs、yarn,重启服务器,就好了。。。

这里的Availability本来是localhost,重启后变成hadoop001。(PS:访问的电脑本机也要配置好hosts,加上节点和IP)

4、文件存储位置
在/appserver/hadoop/datanode/data/current/BP-1427885282-127.0.0.1-1675234871384/current/finalized/subdir0/subdir0目录下

-rw-r--r--. 1 root root 29 2月   1 15:50 blk_1073741825
-rw-r--r--. 1 root root 11 2月   1 15:50 blk_1073741825_1001.meta
[root@hadoop001 subdir0]# cat blk_1073741825
aa
bb ce
bobo
tom
aa
bobo
aa

5、执行wordcount程序
看看yarn是怎么工作的

cd /appserver/hadoop/hadoop-3.3.4
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /wcinput /wcoutput

报错了:

2023-02-01 17:25:44,177 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop001/192.168.0.3:8032
2023-02-01 17:25:44,435 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1675239930166_0002
2023-02-01 17:25:45,103 INFO input.FileInputFormat: Total input files to process : 1
2023-02-01 17:25:46,136 INFO mapreduce.JobSubmitter: number of splits:1
2023-02-01 17:25:46,692 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1675239930166_0002
2023-02-01 17:25:46,692 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-02-01 17:25:46,803 INFO conf.Configuration: resource-types.xml not found
2023-02-01 17:25:46,803 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-02-01 17:25:47,171 INFO impl.YarnClientImpl: Submitted application application_1675239930166_0002
2023-02-01 17:25:47,191 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1675239930166_0002/
2023-02-01 17:25:47,192 INFO mapreduce.Job: Running job: job_1675239930166_0002
2023-02-01 17:25:50,231 INFO mapreduce.Job: Job job_1675239930166_0002 running in uber mode : false
2023-02-01 17:25:50,232 INFO mapreduce.Job:  map 0% reduce 0%
2023-02-01 17:25:50,241 INFO mapreduce.Job: Job job_1675239930166_0002 failed with state FAILED due to: Application application_1675239930166_0002 failed 2 times due to AM Container for appattempt_1675239930166_0002_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2023-02-01 17:25:49.755]Exception from container-launch.
Container id: container_1675239930166_0002_02_000001
Exit code: 1

[2023-02-01 17:25:49.757]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster


[2023-02-01 17:25:49.760]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster


For more detailed output, check the application tracking page: http://hadoop001:8088/cluster/app/application_1675239930166_0002 Then click on links to logs of each attempt.
. Failing the application.
2023-02-01 17:25:50,252 INFO mapreduce.Job: Counters: 0

解决办法:

bin/hadoop classpath
/appserver/hadoop/hadoop-3.3.4/etc/hadoop:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/mapreduce/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/*

将路径加入yarn-site.xml:

    <property>
        <name>yarn.application.classpath</name>
        <value>/appserver/hadoop/hadoop-3.3.4/etc/hadoop:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/mapreduce/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/*</value>
    </property>

重启yarn:

yarn --daemon stop resourcemanager
yarn --daemon stop nodemanager

yarn --daemon start resourcemanager
yarn --daemon start nodemanager

执行成功:

2023-02-01 17:40:08,122 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop001/192.168.0.3:8032
2023-02-01 17:40:08,390 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1675244388030_0001
2023-02-01 17:40:09,045 INFO input.FileInputFormat: Total input files to process : 1
2023-02-01 17:40:09,275 INFO mapreduce.JobSubmitter: number of splits:1
2023-02-01 17:40:09,833 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1675244388030_0001
2023-02-01 17:40:09,834 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-02-01 17:40:09,943 INFO conf.Configuration: resource-types.xml not found
2023-02-01 17:40:09,944 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-02-01 17:40:10,106 INFO impl.YarnClientImpl: Submitted application application_1675244388030_0001
2023-02-01 17:40:10,162 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1675244388030_0001/
2023-02-01 17:40:10,162 INFO mapreduce.Job: Running job: job_1675244388030_0001
2023-02-01 17:40:16,271 INFO mapreduce.Job: Job job_1675244388030_0001 running in uber mode : false
2023-02-01 17:40:16,273 INFO mapreduce.Job:  map 0% reduce 0%
2023-02-01 17:40:19,311 INFO mapreduce.Job:  map 100% reduce 0%
2023-02-01 17:40:23,337 INFO mapreduce.Job:  map 100% reduce 100%
2023-02-01 17:40:25,359 INFO mapreduce.Job: Job job_1675244388030_0001 completed successfully
2023-02-01 17:40:25,416 INFO mapreduce.Job: Counters: 54
	File System Counters
		FILE: Number of bytes read=54
		FILE: Number of bytes written=552301
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=130
		HDFS: Number of bytes written=28
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
		HDFS: Number of bytes read erasure-coded=0
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1397
		Total time spent by all reduces in occupied slots (ms)=1565
		Total time spent by all map tasks (ms)=1397
		Total time spent by all reduce tasks (ms)=1565
		Total vcore-milliseconds taken by all map tasks=1397
		Total vcore-milliseconds taken by all reduce tasks=1565
		Total megabyte-milliseconds taken by all map tasks=1430528
		Total megabyte-milliseconds taken by all reduce tasks=1602560
	Map-Reduce Framework
		Map input records=7
		Map output records=8
		Map output bytes=61
		Map output materialized bytes=54
		Input split bytes=101
		Combine input records=8
		Combine output records=5
		Reduce input groups=5
		Reduce shuffle bytes=54
		Reduce input records=5
		Reduce output records=5
		Spilled Records=10
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=62
		CPU time spent (ms)=540
		Physical memory (bytes) snapshot=567222272
		Virtual memory (bytes) snapshot=5580062720
		Total committed heap usage (bytes)=479199232
		Peak Map Physical memory (bytes)=329158656
		Peak Map Virtual memory (bytes)=2786742272
		Peak Reduce Physical memory (bytes)=238063616
		Peak Reduce Virtual memory (bytes)=2793320448
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=29
	File Output Format Counters 
		Bytes Written=28

6、查看结果

十、历史服务器和日志归集

1、历史服务器可以查看程序的历史运行情况。

2、日志归集

日志聚集:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。

我们配置的历史服务器虽然可以查看到历史 job 的运行信息,但是如果点击后面的logs链接查看其详细日志,却无法查看,提示 Aggregation is not enabled.(日志聚集功能未开启)。

我们开启了日志聚集功能后,可以很方便的查看程序运行详情,方便开发调试。

开启日志聚集功能,需要重新启动 NodeMananger、ResourceMananger、HistoryServer。

3、配置、重启好服务后,删除wcoutput目录,重新执行任务

hadoop fs -rm -f -r /wcoutput
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /wcinput /wcoutput

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值