通过jobtracker、tasktraker的log分析mapreduce的过程

本文从JobTracker和TaskTracker的日志角度详细分析MapReduce的执行过程,包括作业提交、任务调度、任务执行、数据本地化、任务完成及清理等阶段,帮助理解MapReduce的工作机制。
摘要由CSDN通过智能技术生成

网上有很多mapreduce的源码解析以及执行过程,这里从相关的日志角度看mapreduce的执行过程,以便于在通过日志分析定位分析问题

先简单说一下mapreduce的执行过程,如下图:

1、应用程序客户端client准备作业(数据split划分,作业配置等),并上传值HDFS中;并提交作业至JobTracker。

2、JobTracker根据job的配置信息(jobconf,split等),创建Tasks,并放到任务队列中,等待调度,JobTracker会启动JobInProcess跟踪job的执行过程。

3、TaskTracker通过心跳到JobTracker获取相关的task,TaskTracker首先进行JOB_SETUP会从HDFS中获取执行的jar,

   然后对于获取的每一个task开启一个taskinprocess跟踪task的执行过程,TaskTracker会启动一个JVM执行map、reduce的相关逻辑。

下面结合JobTracker和TaskTracker的日志进行分析关键执行路径

hadoop-root-jobtracker-xxxxxx.log

#表示这个job包含三个map和1个reduce任务,jobid为job_201212191009_0012
2012-12-19 13:46:40,478 INFO org.apache.hadoop.mapred.JobInProgress: job_201212191009_0012: nMaps=3 nReduces=1 max=-1
2012-12-19 13:46:40,479 INFO org.apache.hadoop.mapred.JobTracker: Job job_201212191009_0012 added successfully for user 'root' to queue 'default'

2012-12-19 13:46:40,479 INFO org.apache.hadoop.mapred.AuditLogger: USER=root IP=10.1.1.28 OPERATION=SUBMIT_JOB

TARGET=job_201212191009_0012 RESULT=SUCCESS

#job的初始化,根据job的相关的配置信息(根据件job.jar,job.split,和job.xml,生成三个map task,一个reduce task,放到调度队列中)

2012-12-19 13:46:40,479 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201212191009_0012
2012-12-19 13:46:40,479 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201212191009_0012
2012-12-19 13:46:40,525 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys

in /var/hadoop2/hadoop-tmp/mapred/system/job_201212191009_0012/jobToken
2012-12-19 13:46:40,529 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201212191009_0012 = 525769586. Number of splits = 3
2012-12-19 13:46:40,529 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201212191009_0012_m_000000has split on node:/default-rack/wxlab29
2012-12-19 13:46:40,529 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201212191009_0012_m_000001has split on node:/default-rack/wxlab29
2012-12-19 13:46:40,530 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201212191009_0012_m_000002has split on node:/default-rack/wxlab29
2012-12-19 13:46:40,530 INFO org.apache.hadoop.mapred.JobInProgress: job_201212191009_0012 LOCALITY_WAIT_FACTOR=1.0
2012-12-19 13:46:40,530 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201212191009_0012 initialized successfully with 3 map tasks and 1 reduce tasks.

#增加一个JOB_SETUP的task给tasktracker节点做job的设置,比如从HDFS中获取执行的jar文件等,每个tasktracker在执行一个job开始时都必须进行JOB_SETUP

2012-12-19 13:46:40,831 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201212191009_0012_m_000004_0'

to tiptask_201212191009_0012_m_000004,

for tracker 'tracker_wxlab29:localhost.localdomain/127.0.0.1:28696'
2012-12-19 13:46:49,838 INFO org.apache.hadoop.mapred.JobInProgress:

Task 'attempt_201212191009_0012_m_000004_0' has completed task_201212191009_0012_m_000004 successfully.

#tasktraker心跳到jobtracker后,获取分配的task;hadoop默认情况下,每个tasktracker只有2个slots,所以只能同时分配2个map task给该taskNode执行

2012-12-19 13:46:49,840 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP)

'attempt_201212191009_0012_m_000000_0' to tip task_201212191009_0012_m_000000, for tracker 'tracker_wxlab29:localhost.localdomain/127.0.0.1:28696'
2012-12-19 13:46:49,840 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201212191009_0012_m_000000
2012-12-19 13:46:49,841 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP)

'attempt_201212191009_0012_m_000001_0' to tiptask_201212191009_0012_m_000001, for tracker 'tracker_wxlab29:localhost.localdomain/127.0.0.1:28696'
2012-12-19 13:46:49,841 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201212191009_0012_m_000001

#执行完成一个task后,空余一个slot,分配下一map task给该节点执行
2012-12-19 13:47:04,850 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201212191009_0012_m_000001_0'

has completedtask_201212191009_0012_m_000001 successfully.
2012-12-19 13:47:04,852 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201212191009_0012_m_000002_0'

to tiptask_201212191009_0012_m_000002, for tracker 'tracker_wxlab29:localhost.localdomain/127.0.0.1:28696'
2012-12-19 13:47:04,852 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local task task_201212191009_0012_m_000002

#taskNode map任务执行完成后,执行reduce task

2012-12-19 13:47:04,852 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201212191009_0012_r_000000_0'

to tiptask_201212191009_0012_r_000000, for tracker 'tracker_wxlab29:localhost.localdomain/127.0.0.1:28696'
2012-12-19 13:47:07,856 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201212191009_0012_m_000000_0'

has completedtask_201212191009_0012_m_000000 successfully.
2012-12-19 13:47:10,860 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201212191009_0012_m_000002_0'

has completedtask_201212191009_0012_m_000002 successfully.
2012-12-19 13:47:22,867 INFO org.apache.hadoop.mapred.JobInProgress: Tas

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值