5.You have a cluster running with the fair Scheduler enabled. There are currently no jobs running on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you submit Job B. now Job A and Job B are running on the cluster at the same time. How will the Fair Scheduler handle these two jobs?
A. When Job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks.
B. When Job B gets submitted, Job A has to finish first, before job B can gets scheduled.
C. When Job A gets submitted, it doesn't consumes all the task slots.
D. When Job A gets submitted, it consumes all the task slots.
Answer: B --> A
解析: A
oreily:
With the Fair Scheduler (iii in Figure 4-3), there is no need to reserve a set amount of
capacity, since it will dynamically balance resources between all running jobs. Just after
the first (large) job starts, it is the only job running, so it gets all the resources in the
cluster. When the second (small) job starts, it is allocated half of the cluster resources so
that each job is using its fair share of resources.
Note that there is a lag between the time the second job starts and when it receives its fair
share, since it has to wait for resources to free up as containers used by the first job
complete. After the small job completes and no longer requires resources, the large job
goes back to using the full cluster capacity again. The overall effect is both high cluster
utilization and timely small job completion.
Fair scheduling is a method of assigning resources to jobs such that all jobs get, on average, an equal share of resources over time. When there is a single job running, that job uses the entire cluster. When other jobs are submitted, tasks slots that free up are assigned to the new jobs, so that each job gets roughly the same amount of CPU time. Unlike the default Hadoop scheduler, which forms a queue of jobs, this lets short jobs finish in reasonable time while not starving long jobs. It is also a reasonable way to share a cluster between a number of users.