Container
容器,虚拟化的,维度
内容是memory+vcore
负责运行task任务
生产如何调优Container参数:
假设128G,16物理core,分配内存
装完CentOS,消耗内存1G
系统预览15%-20%内存(包含装完CentOS需要的内存空间),以防全部使用导致系统夯住,和oom机制事件,或者给未来部署组件预览点空间
计算空间128 * 20% = 25.6G == 26G
假设只有DN NM节点,余下内存有128-26 = 102g
假设DN拥有内存2G,还有NM留存4G这样就剩下102-2-4=96G
container内存
划分区域,每个区域的内存
1 2 3 4 5 6 7 | yarn.nodemanager.resource.memory-mb 96G (调度的最小内存)yarn.scheduler.minimum-allocation-mb 1G 极限情况下,只有96个container 内存1G (调度的最大内存,一般设置成和环境一样大的内存)yarn.scheduler.maximum-allocation-mb 96G 极限情况下,只有1个container 内存96G container的内存会自动增加,默认1g递增 所以这种情况下container的个数是1-96个 |
container虚拟核
物理核:虚拟核一般比例是1:2 32vcore
1 2 3 4 5 | yarn.nodemanager.resource.pcores-vcores-multiplier 2 yarn.nodemanager.resource.cpu-vcores 32 yarn.scheduler.minimum-allocation-vcores 1 极限情况下,只有32个container yarn.scheduler.maximum-allocation-vcores 32 极限情况下,只有1个container 所以这种情况下container的个数是1-32个 |
官方建议
cloudera公司推荐,一个container的vcore最好不要超过5,那么我们设置4(这是核心)
1 | yarn.scheduler.maximum-allocation-vcores 4 极限情况下,只有8个container |
综合memory+vcore
确定vcore = 4,container = 8
1 2 3 4 5 6 7 8 | yarn.nodemanager.resource.memory-mb 96G yarn.scheduler.minimum-allocation-mb 1G yarn.scheduler.maximum-allocation-mb 12G 极限container 8个 当然当spark计算时内存不够大,这个参数肯定要调大,那么这种理想化的设置个数必然要打破,以memory为主 yarn.nodemanager.resource.cpu-vcores 32 yarn.scheduler.minimum-allocation-vcores 1 yarn.scheduler.maximum-allocation-vcores 4 极限container 8个 所以在设定的情况下最好是8个container,每个有12G,再加上4个vcore |
举个例子:
1 2 3 4 5 6 | yarn.nodemanager.resource.memory-mb 96G yarn.scheduler.minimum-allocation-mb 1G yarn.scheduler.maximum-allocation-mb 8G yarn.nodemanager.resource.cpu-vcores 32 yarn.scheduler.minimum-allocation-vcores 1 yarn.scheduler.maximum-allocation-vcores 2 |
如果是上面的情况,在以内存为主导的情况下:
12container 12 * 2 = 24但是现在有32个vcore(不能100%完全使用)
但是如果有16 container每个8g内存就会爆掉
假如 256G内存 56core,请问参数如何设置
256 * 20% = 52G
DN拥有内存2G,还有NM留存4G这样就剩下204 - 2 -4 = 198G
1 2 3 4 5 6 | yarn.nodemanager.resource.memory-mb 198G yarn.scheduler.minimum-allocation-mb 1G yarn.scheduler.maximum-allocation-mb 24G 极限container 8个 yarn.nodemanager.resource.cpu-vcores 112 yarn.scheduler.minimum-allocation-vcores 1 yarn.scheduler.maximum-allocation-vcores 4 极限container 8个 |
假如该节点还有组件,比如hbase regionserver进程,那么该如何设置?
假如hbase regionserver有30G,此时内存的缩放就是102-2-4-30=66G
vcore是yarn自己引入的
设计初衷是考虑不同节点的CPU的性能不一样,每个CPU的计算能力不一样。
比如某个物理CPU是另外一个物理CPU的2倍,这时通过设置第一个物理CPU的虚拟core来弥补这种差异。
第一台机器很强悍设置为pcore: vcore=1:2
第二台机器不强悍设置为pcore: vcore=1:1
现在的CPU性能都差不多所以xml配置,所有节点pcore: vcore=1:2
再回顾架构
Client提交一个job到Resource Manager,然后RM会向某一个Node Manager请求一个Container容器来运行job的App Master,App Master启动成功之后会去申请资源,并且到对应的Container容器的节点去跑task任务。
调度器
有三种调度器:
FIFO(先进先出)
2处表示job2一直等待job1完成之后再提交
Capacity(计算)
有一个专门的队列来运行小任务,但是为了小任务专门设置一个队列预先占用一定的集群资源,这会导致大任务的执行时间落后FIFO的调度时间。
Fair(公平)
Apache中默认是Capacity(计算):
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
CDH默认Fair(公平):
Web界面可以查看,CDH认为计算调度器会占一个位置20%的队列就会浪费了
CDH fair来配置动态资源池,放置规则
常用命令
yarn jar
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | [mao@JD root]$ yarn Usage: yarn [--config confdir] COMMAND where COMMAND is one of: resourcemanager -format-state-store deletes the RMStateStore resourcemanager run the ResourceManager Use -format-state-store for deleting the RMStateStore. Use -remove-application-from-state-store <appId> for removing application from RMStateStore. nodemanager run a nodemanager on each slave timelineserver run the timeline server rmadmin admin tools version print the version jar <jar> run a jar file application prints application(s) report/kill application applicationattempt prints applicationattempt(s) report container prints container(s) report node prints node report(s) queue prints queue information logs dump container logs classpath prints the class path needed to get the Hadoop jar and the required libraries daemonlog get/set the log level for each daemon top run cluster usage tool or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters. |
yarn application -kill <Application ID> (如果权限控制严格会经常使用)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | [mao@JD root]$ yarn application 19/12/17 17:53:51 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 19/12/17 17:53:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: application -appStates <States> Works with -list to filter applications based on input comma-separated list of application states. The valid application state can be one of the following: ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN NING,FINISHED,FAILED,KILLED -appTypes <Types> Works with -list to filter applications based on input comma-separated list of application types. -help Displays help for all commands. -kill <Application ID> Kills the application. -list List applications. Supports optional use of -appTypes to filter applications based on application type, and -appStates to filter applications based on application state. -movetoqueue <Application ID> Moves the application to a different queue. -queue <Queue Name> Works with the movetoqueue command to specify which queue to move an application to. -status <Application ID> Prints the status of the application. |
还有yarn logs
1 2 3 4 5 6 7 8 9 10 11 | [mao@JD root]$ yarn logs Retrieve logs for completed YARN applications. usage: yarn logs -applicationId <application ID> [OPTIONS] general options are: -appOwner <Application Owner> AppOwner (assumed to be current user if not specified) -containerId <Container ID> ContainerId (must be specified if node address is specified) -nodeAddress <Node Address> NodeAddress in the format nodename:port (must be specified if container id is specified) |
木桶效应:
一个水桶无论有多高,它盛水的高度取决于其中最低的那块木板。
原始的 128M:
f1 130M 2task 128M 00:55 2M 00:03
f2 14M 1task 14M 00:05
f3 20M 1task 20M 00:09
job中4个task的时间是128M的map task控制的,总共需要55秒,是不规则的
解决办法将文件均分成最理想的情况来执行:
164M/55M
f1 55m 1task
f2 55m 1task
f3 55m 1task