文章目录
1、命令行操作Yarn
1.1 网页查看Yarn状态
网址
hadoop103:8088
执行WordCount任务开始
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output1
1.2 yarn application查看正在运行的任务
- 列出所有 Application:yarn application -list
[atguigu@hadoop102 ~]$ yarn application -list
2022-10-22 12:05:24,481 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
[atguigu@hadoop102 ~]$ yarn application -list
2022-10-22 12:05:56,314 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1666410750861_0002 word count MAPREDUCE atguigu default ACCEPTED UNDEFINED 0% N/A
[atguigu@hadoop102 ~]$ yarn application -list
2022-10-22 12:06:06,044 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1666410750861_0002 word count MAPREDUCE atguigu default RUNNING UNDEFINED 50% http://hadoop104:38677
[atguigu@hadoop102 ~]$ yarn application -list
2022-10-22 12:06:09,257 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
将web上的任务信息打印到了控制台
- 根据任务状态过滤
所有状态:ALL、NEW、NEW_SAVING、SUBMITTED、ACCEPTED、RUNNING、FINISHED、FAILED、KILLED
例:打印所有已经跑完的任务:yarn application -list -appStates FINISHED
[atguigu@hadoop102 ~]$ yarn application -list -appStates FINISHED
2022-10-22 12:13:58,928 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [FINISHED] and tags: []):2
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1666410750861_0001 word count MAPREDUCE atguigu default FINISHED SUCCEEDED 100% http://hadoop102:19888/jobhistory/job/job_1666410750861_0001
application_1666410750861_0002 word count MAPREDUCE atguigu default FINISHED SUCCEEDED 100% http://hadoop102:19888/jobhistory/job/job_1666410750861_0002
- Kill掉Application:yarn application -kill 任务ID
[atguigu@hadoop102 ~]$ yarn application -kill application_1666410750861_0003
2022-10-22 12:22:13,796 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Killing application application_1666410750861_0003
2022-10-22 12:22:14,522 INFO impl.YarnClientImpl: Killed application application_1666410750861_0003
[atguigu@hadoop102 ~]$ yarn application -list -appStates KILLED
2022-10-22 12:23:37,002 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [KILLED] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1666410750861_0003 word count MAPREDUCE atguigu default KILLED KILLED 100% http://hadoop103:8088/cluster/app/application_1666410750861_0003
1.3 yarn logs 查看日志
- 将任务日志打印到控制台,生产环境常用:yarn logs -applicationId 任务ID
[atguigu@hadoop102 ~]$ yarn logs -applicationId application_1666410750861_0002
- 将任务日志下的子任务日志打印到控制台,通过子任务的容器ID来判别子任务:yarn logs -applicationId 任务ID -containerId 容器ID
[atguigu@hadoop102 ~]$ yarn logs -applicationId application_1666515493135_0004 -containerId container_1666515493135_0004_01_000001
1.4 yarn applicationattempt 查看尝试运行的任务
- 列出当前任务的尝试任务列表:yarn applicationattempt -list 任务ID
主要用于查看尝试任务ID和Container容器信息
不仅可以打印正在RUNNING的任务,还可以打印FINISHED、KiLLED类型的任务
[atguigu@hadoop102 ~]$ yarn applicationattempt -list application_1666515493135_0002
2022-10-23 17:15:25,144 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of application attempts :1
ApplicationAttempt-Id State AM-Container-Id Tracking-URL
appattempt_1666515493135_0002_000001 FINISHED container_1666515493135_0002_01_000001 http://hadoop103:8088/proxy/application_1666515493135_0002/
[atguigu@hadoop102 ~]$ yarn applicationattempt -list application_1666515493135_0003
2022-10-23 17:15:28,579 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of application attempts :1
ApplicationAttempt-Id State AM-Container-Id Tracking-URL
appattempt_1666515493135_0003_000001 KILLED container_1666515493135_0003_01_000001 http://hadoop103:8088/cluster/app/application_1666515493135_0003
- 打印ApplicationAttemp状态:yarn applicationattempt -status 尝试任务ID
[atguigu@hadoop102 ~]$ yarn applicationattempt -status appattempt_1666518343848_0003_000001
2022-10-23 17:53:48,570 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Application Attempt Report :
ApplicationAttempt-Id : appattempt_1666518343848_0003_000001
State : RUNNING
AMContainer : container_1666518343848_0003_01_000001
Tracking-URL : http://hadoop103:8088/proxy/application_1666518343848_0003/
RPC Port : 37555
AM Host : hadoop103
Diagnostics :
[atguigu@hadoop102 ~]$ yarn applicationattempt -status appattempt_1666518343848_0003_000001
2022-10-23 17:58:56,823 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Application Attempt Report :
ApplicationAttempt-Id : appattempt_1666518343848_0003_000001
State : FINISHED
AMContainer : container_1666518343848_0003_01_000001
Tracking-URL : http://hadoop103:8088/proxy/application_1666518343848_0003/
RPC Port : 37555
AM Host : hadoop103
Diagnostics :
1.5 yarn container查看容器
- 列出所有Container:yarn container -list 尝试任务ID
容器信息,只有任务正在运行时才可以查看,否则是查看不到任何信息的
因为容器会在任务一结束时就立即释放
[atguigu@hadoop102 ~]$ yarn container -list appattempt_1666518343848_0004_000001
2022-10-23 18:07:16,321 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of containers :1
Container-Id Start Time Finish Time State Host Node Http Address LOG-URL
container_1666518343848_0004_01_000001 星期日 十月 23 18:07:02 +0800 2022 N/A RUNNING hadoop102:34951 http://hadoop102:804http://hadoop102:8042/node/containerlogs/container_1666518343848_0004_01_000001/atguigu
[atguigu@hadoop102 ~]$ yarn container -list appattempt_1666518343848_0004_000001
2022-10-23 18:10:07,820 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of containers :0
Container-Id Start Time Finish Time State Host Node Http Address LOG-URL
- 打印Container状态: yarn container -status 容器ID
[atguigu@hadoop102 ~]$ yarn container -status container_1666515493135_0006_01_000001
2022-10-23 17:35:20,188 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Container Report :
Container-Id : container_1666515493135_0006_01_000001
Start-Time : 1666517710188
Finish-Time : 0
State : RUNNING
Execution-Type : GUARANTEED
LOG-URL : http://hadoop102:8042/node/containerlogs/container_1666515493135_0006_01_000001/atguigu
Host : hadoop102:34218
NodeHttpAddress : http://hadoop102:8042
Diagnostics : null
[atguigu@hadoop102 ~]$ yarn container -status container_1666515493135_0004_01_000001
2022-10-23 17:31:59,399 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Container with id 'container_1666515493135_0004_01_000001' doesn't exist in RM or Timeline Server.
1.6 yarn node查看节点状态
可以看到集群有三台Node Manager,都正在运行:yarn node -list -all
[atguigu@hadoop102 ~]$ yarn node -list -all
2022-10-23 18:22:39,998 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop103:35552 RUNNING hadoop103:8042 0
hadoop102:34951 RUNNING hadoop102:8042 0
hadoop104:36756 RUNNING hadoop104:8042 0
1.7 yarn rmadmin更新配置
刷新队列相关参数:yarn rmadmin -refreshQueues
后续我们修改队列相关参数信息,一般是需要重新启动yarn
如果不想重新启动,也可以直接执行该命令,则会直接重新读取参数配置,修改队列信息
动态修改不停机
[atguigu@hadoop102 ~]$ yarn rmadmin -refreshQueues
2022-10-23 18:23:03,910 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8033
1.8 yarn queue查看队列
查看队列资源使用情况,如状态、资源、已使用率……
yarn queue -status 队列名
[atguigu@hadoop102 ~]$ yarn queue -status default
2022-10-23 18:23:43,909 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Queue Information :
Queue Name : default
State : RUNNING
Capacity : 100.0%
Current Capacity : .0%
Maximum Capacity : 100.0%
Default Node Label expression : <DEFAULT_PARTITION>
Accessible Node Labels : *
Preemption : disabled
Intra-queue Preemption : disabled
也可以在网页上查看,网页的内容更全
2、Yarn 生产环境核心参数
2.1 Resource Manager 相关
2.1.1 调度器配置:默认容量调度器
- 对于大公司或者是并发度要求比较高的公司,选择公平调度器
- 对于中小型公司或者是并发度要求比较低的公司,选择容量调度器
- FIFO不选
yarn-default.xml
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
2.1.2 处理调度器请求线程数量,默认50
- 指Resource Manager 默认最多处理50个提交上来的任务,后续会讲怎么调整
<property>
<description>Number of threads to handle scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>50</value>
</property>
2.2 Node Manager 相关(单节点)
2.2.1 是否让Yarn检查硬件进行配置
- 默认是false,通常情况也是选false,让开发自己根据硬件配置,自主调整相关参数
- false类似于安装软件时自定义安装,true为自动安装
<property>
<description>Enable auto-detection of node capabilities such as
memory and CPU.
</description>
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>false</value>
</property>
2.2.2 是否将虚拟CPU核数当作CPU核数
- 默认是false,通常情况也是选false
- 在各台服务器的CPU配置不一样时,才会选择去开启虚拟核
- 例如:Node Manager1 是i7 , Node Manager2 是 i5,Node Manager3 是 i3。但一般情况下,服务器都是一起购买的,配置基本相同
<property>
<description>Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
</description>
<name>yarn.nodemanager.resource.count-logical-processors-as-cores</name>
<value>false</value>
</property>
2.2.3 虚拟核数和物理核数的比值
- 默认是1.0,1核1线程
- 如果电脑是4核8线程,则设置为2.0
<property>
<description>Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The
number of vcores will be calculated as
number of CPUs * multiplier.
</description>
<name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
<value>1.0</value>
</property>
2.2.4 Node Manager使用的内存
- 默认8G,生产环境必须调整,服务器一般是128G内存起
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>-1</value>
</property>
2.2.5 Node Manager为系统保留多少内存
- 2.2.4和2.2.5只选其一配置
- 系统内存 = 2.2.4 Node Manager使用的内存 + 2.2.5 Node Manager为系统保留多少内存
<property>
<description>Amount of physical memory, in MB, that is reserved
for non-YARN processes. This configuration is only used if
yarn.nodemanager.resource.detect-hardware-capabilities is set
to true and yarn.nodemanager.resource.memory-mb is -1. If set
to -1, this amount is calculated as
20% of (system memory - 2*HADOOP_HEAPSIZE)
</description>
<name>yarn.nodemanager.resource.system-reserved-memory-mb</name>
<value>-1</value>
</property>
2.2.6 Node Manager占用的CPU核数
- 默认8个
<property>
<description>Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>-1</value>
</property>
2.2.7 是否开启物理内存检查限制container
- 默认打开
- 检查当前节点, Node Manager 使用的内存有没有超过预设的内存大小
- 例如:Node Manager预设内存为100G,Linux系统内存为128G,当Node Manager使用的内存超过100G时会预警,防止占用Linux系统内存
<property>
<description>Whether physical memory limits will be enforced for
containers.</description>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>true</value>
</property>
2.2.8 是否开启虚拟内存检查限制container
- 默认打开
- 检查当前节点, Node Manager 使用的内存有没有超过预设的逻辑上内存大小
- 例如:Node Manager预设虚拟内存为200G,Linux系统虚拟内存为256G,当Node Manager使用的内存超过200G时会预警,防止占用Linux系统内存(后续会详细讲解)
<property>
<description>Whether virtual memory limits will be enforced for
containers.</description>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>true</value>
</property>
2.2.9 虚拟内存和物理内存的比例
- 默认是2.1倍
<property>
<description>Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
2.3 Contain 相关
2.3.1 容器最小内存
- 默认1G
<property>
<description>The minimum allocation for every container request at the RM
in MBs. Memory requests lower than this will be set to the value of this
property. Additionally, a node manager that is configured to have less memory
than this value will be shut down by the resource manager.</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
2.3.2 容器最大内存
- 默认8G,容器最大内存不能超过Node Manager 预设的使用内存
<property>
<description>The maximum allocation for every container request at the RM
in MBs. Memory requests higher than this will throw an
InvalidResourceRequestException.</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
2.3.3 容器最小CPU核数
- 默认1个
<property>
<description>The minimum allocation for every container request at the RM
in terms of virtual CPU cores. Requests lower than this will be set to the
value of this property. Additionally, a node manager that is configured to
have fewer virtual cores than this value will be shut down by the resource
manager.</description>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
2.3.4 容器最大CPU核数
- 默认4个,生产环境必须调整,服务器一般是20核40线程
<property>
<description>The maximum allocation for every container request at the RM
in terms of virtual CPU cores. Requests higher than this will throw an
InvalidResourceRequestException.</description>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>4</value>
</property>
3、Yarn 生产环境核心参数配置案例
3.1 需求分析
需求
- 对1G数据,统计每个单词出现次数
服务器
- 3台,每台配置4G内存,4核CPU,4线程
- 如果服务器CPU配置较好,可以设置成8线程
查看CPU核数和线程数方式
- 如图:4核8线程
需求分析
- 数据可以被切片
- 1G数据,1G/128 = 8
- 所以需要8个Map Task
- 1个Reduce Task
- 1个MrAppMaster
- 一共要开启10个容器,平均每个节点运行 10个 / 3台 ≈ 3个任务(4 3 3)
3.2 核心配置
3.2.1 Resource Manager 相关
- 调度器配置:默认容量调度器
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
- 处理调度器请求线程数量,默认50
如果提交的任务数大于50,可以增加该值,但是不能超过3台 * 4线程 = 12线程
指集群最多可以运行12个任务
Resource Manager用于接收任务的线程,按照该需求,最多是会同时开启8个MapTask,所以给8,而且103上除了有 Resouce Manager
<property>
<description>Number of threads to handle scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>8</value>
</property>
3.2.2 Node Manager 相关(单节点)
- 是否让Yarn检查硬件进行配置
默认是false,通常情况也是选false,让开发自己根据硬件配置,自主调整相关参数
如果这个参数设置成true,下面无论怎么配置都是无效的
<property>
<description>Enable auto-detection of node capabilities such as
memory and CPU.
</description>
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>false</value>
</property>
- 是否将虚拟核数当作CPU核数,默认是false,采用物理CPU核数
因为每一台服务器CPU核数都是一样的,如果有某台服务器CPU核数配置较好,则需要在该服务器上的 yarn-site.xml,设置成true
再对下面虚拟核数和物理核数的比值参数做相应调整
如果是虚拟成2台,设置成2.0;如果是3台,设置3.0
<property>
<description>Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
</description>
<name>yarn.nodemanager.resource.count-logical-processors-as-cores</name>
<value>false</value>
</property>
- 虚拟核数和物理核数的比值,默认是1.0
<property>
<description>Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The number of vcores will be calculated as number of CPUs * multiplier.
</description>
<name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
<value>1.0</value>
</property>
- Node Manager使用的内存,默认8G,修改为4G内存
因为案例使用的服务器内存为4G
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
- Node Manager占用的CPU核数,默认8个,修改为4个
<property>
<description>Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
3.2.3 Contain 相关
- 容器最小内存,默认1G,
单节点有4G,可以开4个容器
<property>
<description>The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager.
</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
- 容器最大内存,默认8G,修改为2G
单节点有4G,可以开2个容器跑大型任务
<property>
<description>The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException.
</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
- 容器最小CPU核数,默认1个
<property>
<description>The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.
</description>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
- 容器最大CPU核数,默认4个,修改为2个
<property>
<description>The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an
InvalidResourceRequestException.</description>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
- 虚拟内存检查,默认打开,修改为关闭
因为centos7以上的版本和jdk1.8存在bug
物理内存默认打开,但不做修改,正常使用
<property>
<description>Whether virtual memory limits will be enforced for
containers.</description>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
- 虚拟内存和物理内存设置比例,默认2.1
<property>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
3.3 配置步骤
- 进入 yarn-site.xml 存储路径
cd /opt/module/hadoop-3.1.3/etc/hadoop
- 修改配置(将上面的配置粘贴到 xml 文件中)
- 分发配置 (注意:如果集群的硬件资源不一致,要每个NodeManager单独配置)
- 重新启动集群
可以看到上面配置已被修改
- 跑一个WordCount任务,可以看到使用了2G内存
[atguigu@hadoop102 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output5