MapReduce Workflow

check output folder

calculate splits

application master gets progress and completion reports from tasks. it also requests containers for map tasks and reduce tasks. it starts container by the nodemanager after container is assigned for task.

if uber task is enabled (mapreduce.job.ubertask.enable), uber task runs inside the application master if it's less than 10 mappers, one reducer or size of input within one block.

all map task must be completed by the sort phase of reduce.

resource requests are per-job basis, see mapreduce.map.memory.mb, mapreduce.reduce.memory.mb, mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores.

when job is completed, delete temp files, commit job and archive job history


- task failure (heart beats to AM)

user function error or JVM error

default retry times is four. can be configured by mapreduce.map.maxattempts and mapreduce.reduce.maxattempts

mapreduce.map.failures.maxpercent and mapreduce.reduce.failures.maxpercent 

- application master failure (heart beats to RM)

default retry times is 2. mapreduce.am.max-attempts and yarn.resourcemanager.am.max-attempts

use job history to recover completed tasks

- node manager failure (heart beats to RM)

could be blacklisted if application failures on the node exceed configured max values mapreduce.job.maxtaskfailures.per.tracker.

- resource manager failure (HA, stand-by resource manager)

all application info are persisted in zookeepr or shared state.

need to restart all application masters if it's failed


- shuffle and sort

  • Map

number of partitions is same as number of reducer tasks

multipe spill files for spills. combiner function runs after sort running by background process

single output file after map task is completed. need to merge multiple spill files into a sorted file.

  • Reduce
copy output of map tasks to memory first. spill to disk when it exceeds threshold. need to merge outputs from different tasks to a single sorted file
  • Configuration (tuning on different parameters, buffer size, spill percentage, background processes...)


- task execution

  • speculative task
  • output commit
public abstract class OutputCommitter {
public abstract void setupJob(JobContext jobContext) throws IOException;
public void commitJob(JobContext jobContext) throws IOException { }
public void abortJob(JobContext jobContext, JobStatus.State state)
throws IOException { }
public abstract void setupTask(TaskAttemptContext taskContext)
throws IOException;
public abstract boolean needsTaskCommit(TaskAttemptContext taskContext)
throws IOException;
public abstract void commitTask(TaskAttemptContext taskContext)
throws IOException;
public abstract void abortTask(TaskAttemptContext taskContext)
throws IOException;
}
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值