Yarn是一个资源调度平台,负责为运算程序提供服务器运算资源,相当于一个分布式的操作系统平台,而MapReduce等运算程序则相当于运行于操作系统之上的应用程序。
YARN基础架构
RM相当于整个集群资源管理器的老大,而NM是单个节点的老大,AM则是管理Map Task 和Reduce Task资源(向RM申请资源,分配给它两),Container相当于一台小电脑
YARN工作机制
0,当Driver类的main方法执行到job,waitForCompletion();时开启YARNRunner
1,需要申请一个Application
2,RM返回资源提交路径以及application_id
3,提交job运行所需的资源(切片,xml(自己所需要的配置参数)),jar)
4,申请mrAppMaster
5,RM将用户的请求初始化成一个Task并放入队列
6、7,一个NM领取Task并创建Container
8,从提交的路径中下载job资源到本地
9,AM向RM申请运行MapTask容器
10,创建容器(这里可以再同一个NM下,也可以不同的NM,但都要创建自己的容器)
11,AM发送程序启动脚本
12,运行完Map后AM向RM申请容器运行Reduce Task
14,程序运行完后,AM会向RM申请注销自己
Yarn调度器和调度算法
目前,Hadoop作业调度器主要有三种:FIFO、容量(Capacity Scheduler)和公平(Fair Scheduler)。Apache Hadoop3.1.3默认的资源调度器是Capacity Scheduler,CDH框架默认调度器是Fair Scheduler。
先进先出调度器(FIFO)
FIFO调度器(First In First Out):单队列,根据提交作业的先后顺序,先来先服务
缺点:不支持多队列,生产环境很少使用;
容量调度器(Capacity Scheduler)是Yahoo开发的多用户调度器
1、多队列:每个队列可配置一定的资源量,每个队列采用FIFO调度策略。
2、容量保证:管理员可为每个队列设置资源最低保证和资源使用上限(即最大最小值)
3、灵活性:如果一个队列中的资源有剩余,可以暂时共享给那些需要资源的队列,而一旦该队列有新的应用程序提交,则其他队列借调的资源会归还给该队列。
4、多租户:
支持多用户共享集群和多应用程序同时运行。
为了防止同一个用户的作业独占队列中的资源,该调度器会对同一用户提交的作业所占资源 量进行限定。
公平调度器(Fair Scheduler) Facebook开发的多用户调度器
因为要平均分配,所以画圈的地方为job15的缺额
公平调度器设计目标是:在时间尺度上,所有作业获得公平的资源。某一时刻一个作业应获资源和实际获取资源的差距叫“缺额”
调度器会优先为缺额大的作业分配资源
公平调度器队列资源分配方式 :任何情况第一步都是先均分 对并发要求高的用公平调度器
队列分配
作业资源分配
Yarn常用命令
Yarn状态的查询,除了可以在hadoop103:8088页面查看外,还可以通过命令操作
//执行WordCount案例
[xwt@hadoop102 mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.1.3.jar wordcount /input /outputyarnwc
[xwt@hadoop102 mapreduce]$ pwd
/opt/module/hadoop-3.1.3/share/hadoop/mapreduce
列出所有Application :yarn application -list
[xwt@hadoop102 mapreduce]$ yarn application -list
2022-05-30 19:08:22,105 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
根据Application状态过滤:yarn application -list -appStates (所有状态:ALL、NEW、NEW_SAVING、SUBMITTED、ACCEPTED、RUNNING、FINISHED、FAILED、KILLED)
[xwt@hadoop102 mapreduce]$ yarn application -list -appStates FINISHED
2022-05-30 19:05:02,986 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [FINISHED] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1653902284486_0002 word count MAPREDUCE xwt default FINISHED SUCCEEDED 100% http://hadoop102:19888/jobhistory/job/job_1653902284486_0002
Kill掉Application: yarn application -kill application_1653902284486_0002
[xwt@hadoop102 mapreduce]$ yarn application -kill application_1653902284486_0002
2022-05-30 19:10:11,758 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Application application_1653902284486_0002 has already finished
yarn logs查看日志
(1)查询Application日志:yarn logs -applicationId <ApplicationId>
[xwt@hadoop102 ~]$ yarn logs -applicationId application_1653902284486_0002
(2)查询Container日志:yarn logs -applicationId <ApplicationId> -containerId <ContainerId>
yarn applicationattempt查看尝试运行的任务
(1)列出所有Application尝试的列表:yarn applicationattempt -list <ApplicationId>
[xwt@hadoop102 ~]$ yarn applicationattempt -list application_1653902284486_0002
2022-05-30 19:22:00,808 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of application attempts :1
ApplicationAttempt-Id State AM-Container-Id Tracking-URL
appattempt_1653902284486_0002_000001 FINISHED container_1653902284486_0002_01_000001 http://hadoop103:8088/proxy/application_1653902284486_0002/
(2)打印ApplicationAttemp状态:yarn applicationattempt -status <ApplicationAttemptId>
[xwt@hadoop102 ~]$ yarn applicationattempt -list application_1653902284486_0002
2022-05-30 19:25:55,825 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of application attempts :1
ApplicationAttempt-Id State AM-Container-Id Tracking-URL
appattempt_1653902284486_0002_000001 FINISHED container_1653902284486_0002_01_000001 http://hadoop103:8088/proxy/application_1653902284486_0002/
[xwt@hadoop102 ~]$ yarn applicationattempt -status appattempt_1653902284486_0002_000001
2022-05-30 19:27:25,945 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Application Attempt Report :
ApplicationAttempt-Id : appattempt_1653902284486_0002_000001
State : FINISHED
AMContainer : container_1653902284486_0002_01_000001
Tracking-URL : http://hadoop103:8088/proxy/application_1653902284486_0002/
RPC Port : 33783
AM Host : hadoop102
Diagnostics :
yarn container查看容器 容器只有正在运行的程序才会有,运行完就会被释放掉
(1)列出所有Container:yarn container -list <ApplicationAttemptId>
[xwt@hadoop102 ~]$ yarn container -list appattempt_1653902284486_0002_000001
2022-05-30 19:29:58,897 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of containers :0
Container-Id Start Time Finish Time State Host Node Http Address LOG-URL
(2)打印Container状态: yarn container -status <ContainerId>注:只有在任务跑的途中才能看到container的状态
[xwt@hadoop102 ~]$ yarn container -status container_1653902284486_0002_01_000001
2022-05-30 19:33:17,243 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Container with id 'container_1653902284486_0002_01_000001' doesn't exist in RM or Timeline Server.
yarn node查看节点状态 列出所有节点:yarn node -list -all
[xwt@hadoop102 ~]$ yarn node -list -all
2022-05-30 19:34:02,601 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop104:37056 RUNNING hadoop104:8042 0
hadoop102:41254 RUNNING hadoop102:8042 0
hadoop103:33917 RUNNING hadoop103:8042 0
yarn rmadmin更新配置 加载队列配置:yarn rmadmin -refreshQueues
[xwt@hadoop102 ~]$ yarn rmadmin -refreshQueues
2022-05-30 19:34:46,577 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8033
yarn queue查看队列 打印队列信息:yarn queue -status <QueueName>
[xwt@hadoop102 ~]$ yarn queue -status default
2022-05-30 19:35:36,406 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Queue Information :
Queue Name : default
State : RUNNING
Capacity : 100.0%
Current Capacity : .0%
Maximum Capacity : 100.0%
Default Node Label expression : <DEFAULT_PARTITION>
Accessible Node Labels : *
Preemption : disabled
Intra-queue Preemption : disabled
也可以通过8088页面查看
Yarn生产环境核心参数