- what is yarn
- Yarn application run
- Resources request
all requests up front (Spark) or dynamic request (MapReduce, mapper tasks requests are up front, but reduce tasks are dynamic)
- application lifespan, one application per job (MR), one application per workflow (Spark, more efficient since containers are reused), long-running application
- Yarn compared to MR 1
scalability availability utilization multitenancy
- Scheduling,
delay scheduling to meet data locality,
preemption based on timeout on minimum share and half of its fair share
- FIFO
- Capacity
- Fair
<?xml version="1.0"?>
<allocations>
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
<queue name="prod">
<weight>40</weight>
<schedulingPolicy>fifo</schedulingPolicy>
</queue>
<queue name="dev">
<weight>60</weight>
<queue name="eng" />
<queue name="science" />
</queue>
<queuePlacementPolicy>
<rule name="specified" create="false" /> //create specified queue ?
<rule name="primaryGroup" create="false" /> //create queue of user's primary unix group ?
<rule name="default" queue="dev.eng" />
</queuePlacementPolicy>
</allocations>
- Dominant Resource Fairness
Imagine a cluster with a total of 100 CPUs and 10 TB of memory. Application A requests
containers of (2 CPUs, 300 GB), and application B requests containers of (6 CPUs, 100
GB). A’s request is (2%, 3%) of the cluster, so memory is dominant since its proportion
(3%) is larger than CPU’s (2%). B’s request is (6%, 1%), so CPU is dominant. Since B’s
container requests are twice as big in the dominant resource (6% versus 3%), it will be
allocated half as many containers under fair sharing.