《Hadoop The Definitive Guide》ch06 How MapReduce Works

1. MapReduce的工作原理

1) 客户端 提交MapReduce作业。

2) jobtracker 协调作业的运行。 jobtracker是一个Java应用程序,它的主类是JobTracker。

3) tasktracker 运行作业划分后的任务。tasktracker是一个Java应用程序,它的主类是TaskTracker。

4) 分布式文件系统(一般为HDFS),用来在其他实体间共享作业文件。



2. JobClient的submitJob()方法所实现的作业提交过程如下

a. Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker) (step 2).
b. Checks the output specification of the job. For example, if the output directory has not been specified or it already exists, the job is not submitted and an error is
thrown to the MapReduce program.
c. Computes the input splits for the job. If the splits cannot be computed, because the input paths don’t exist, for example, then the job is not submitted and an error
is thrown to the MapReduce program.
d. Copies the resources needed to run the job, including the job JAR file, the configuration file, and the computed input splits, to the jobtracker’s filesystem in a
directory named after the job ID. The job JAR is copied with a high replication factor (controlled by the mapred.submit.replication property, which defaults to
10) so that there are lots of copies across the cluster for the tasktrackers to access when they run tasks for the job (step 3).
e. Tells the jobtracker that the job is ready for execution (by calling submitJob() onJobTracker) (step 4).

3. tasktracker中执行的流和管道及其子进程的关系


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值