在Hadoop MapReduce中,不同组件之间的通信协议均基于RPC。
MapReduce框架中有6个主要的通信协议,面向Client4个,MapReduce框架内部2个。
分别是:
- JobSubmissionProtocol:Client与JobTracker之间的通信协议。用户通过该协议提交作业,查看作业运行情况。
- RefreshUserMappingsProtocol:Client通过该协议更新用户-用户组映射关系。
- RefreshAuthorizationPolicyProtocol:Client通过该协议更新MapReduce服务级别访问控制列表。
- AdminOperationsProtocol:Client通过该协议更新队列(存在JobTracker或者Scheduler中)访问控制列表和节点列表。
- InterTrackerProtocol:TaskTracker与JobTracker之间的通信协议,TaskTracker通过相关接口汇报本节点的资源使用情况和任务运行状态等信息,并执行JobTracker发送的命令。
- TaskUmbilicalProtocol:Task与TaskTracker之间的通信协议,每个Task实际上是其同节点TaskTracker的子进程,他们通过该协议汇报Task运行状态,运行进度等信息。
所有Hadoop RPC的协议基类都是VersionedProtocol,该类主要用于描述协议版本号,以防止不同版本号的客户端与服务器端之间通信,除了最后一个与TaskTracker相关,上面五个都是与JobTracker相关的。
细讲通信协议
我看的是hadoop-2.10的源码,发送在VersionedProtocol下找不到这几个协议,而找到了一个总的ClientProtocol。
同时书里的几个协议的内容,都集成到了这个ClientProtocol中了。
但是为了方便区分各个方法的作用,我还是按照书里来写。记住有这个变化就ok。
JobSubmissionProtocol通信协议
Client与JobTracker之间的通信协议,用户通过该协议提交作业和查看作业运行状态,该协议的接口分为三类
- 作业提交:
/**
* Submit a Job for execution. Returns the latest profile for
* that job.
* 提交作业,返回最新的作业配置文件
*/
public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
throws IOException, InterruptedException;
/*
* 参数解析:
* jobId:Client可通过getNewJobId()为作业获取一个唯一的ID
* jobSubmitDir:作业文件所在目录,一般是HDFS上的目录
* ts:作业分配到的密钥
*/
- 作业控制
/**
* Kill the indicated job
* 杀死作业
*/
public void killJob(JobID jobid) throws IOException, InterruptedException;
/**
* Set the priority of the specified job
* @param jobid ID of the job
* @param priority Priority to be set for the job
* 设置作业优先级
*/
public void setJobPriority(JobID jobid, String priority)
throws IOException, InterruptedException;
/**
* Kill indicated task attempt.
* @param taskId the id of the task to kill.
* @param shouldFail if true the task is failed and added to failed tasks list, otherwise
* it is just killed, w/o affecting job failure status.
* 杀死任务
*/
public boolean killTask(TaskAttemptID taskId, boolean shouldFail)
throws IOException, InterruptedException;
- 查看系统状态和作业运行状态
/*
* Grab a handle to a job that is already known to the JobTracker.
* @return Status of the job, or null if not found.
*/
public JobStatus getJobStatus(JobID jobid)
throws IOException, InterruptedException;
/*
* Get all the jobs submitted.
* @return array of JobStatus for the submitted jobs
*/
public JobStatus[] getAllJobs() throws IOException, InterruptedException;
/*
* Get the current status of the cluster
*
* @return summary of the state of the cluster
*/
public ClusterMetrics getClusterMetrics()
throws IOException, InterruptedException;
InterTrackerProtocol通信协议
TaskTracker和JobTracker之间的通信协议,TaskTracker通过该协议向JobTracker汇报所在节点的资源使用情况和任务运行状况,并接收和执行JobTracker返回的命令。
协议中最重要的方法就是heatbeat。它周期性被调用,形成TaskTracker和JobTracker之间的心跳。
TaskUmbilicalProtocol通信协议
Task和TaskTracker之间通信的协议。每个Task通过这个协议 向对应的TaskTracker汇报自己的运行状况或者出错信息。
该协议中的方法分成两类:一类是周期性被调用的方法,另一类是按需调用的方法
第一类方法:两个
/**
* Report child's progress to parent.
*
* @param taskId task-id of the child
* @param taskStatus status of the child
* @throws IOException
* @throws InterruptedException
* @return True if the task is known
* Task向Task Tracker汇报自己当前的状态,状态信息封装到TaskStatus
*/
boolean statusUpdate(TaskAttemptID taskId, TaskStatus taskStatus)
throws IOException, InterruptedException;
/** Periodically called by child to check if parent is still alive.
* @return True if the task is known
* Task周期性探测TaskTracker是否活着
*/
boolean ping(TaskAttemptID taskid) throws IOException;
Task每隔3s会调用一次statusUpdate函数向TaskTracker汇报最新进度,然而,如果Task在3s内没有处理任何数据,则不再汇报进度,而是直接调用ping方法探测TaskTracker,确保当前数据处理过程中它一直是活着的。
第二类方法:在Task的不同运行阶段被调用,调用时机为:
- Task初始化:TaskTracker从JobTracker接收到一个启动新Task的命令(LaunchTaskAction)后,创建一个子进程(child),并由该子进程调用getTask方法领取对应的Task。
- Task运行中:汇报错误与异常、汇报记录范围、获取Map Task完成列表
汇报异常
/** Report error messages back to parent. Calls should be sparing, since all
* such messages are held in the job tracker.
* @param taskid the id of the task involved
* @param trace the text to report
* 汇报异常方法
*/
void reportDiagnosticInfo(TaskAttemptID taskid, String trace) throws IOException;
/** Report that the task encounted a local filesystem error.*/
void fsError(TaskAttemptID taskId, String message) throws IOException;
/** Report that the task encounted a fatal error.*/
void fatalError(TaskAttemptID taskId, String message) throws IOException;
/** Report that a reduce-task couldn't shuffle map-outputs.*/
void shuffleError(TaskAttemptID taskId, String message) throws IOException;
汇报记录范围:Hadoop可跳过坏记录提高程序的容错性,此方法方便定位坏记录的位置。
/**
* Report the record range which is going to process next by the Task.
* @param taskid the id of the task involved
* @param range the range of record sequence nos
* @throws IOException
*/
void reportNextRecordRange(TaskAttemptID taskid, SortedRanges.Range range)
throws IOException;
获取Map Task完成列表:Reduce Task和Map Task之间存在依赖关系。此方法方便Recude Task从其Task Tracker中获取已经完成的Map Task列表,进而能够获取Map Task产生的临时数据存放位置,并远程读取(对应Reduce Task 的Shuffle阶段)这些数据。
/** Called by a reduce task to get the map output locations for finished maps.
* Returns an update centered around the map-task-completion-events.
* The update also piggybacks the information whether the events copy at the
* task-tracker has changed or not. This will trigger some action at the
* child-process.
*
* @param fromIndex the index starting from which the locations should be
* fetched
* @param maxLocs the max number of locations to fetch
* @param id The attempt id of the task that is trying to communicate
* @return A {@link MapTaskCompletionEventsUpdate}
*/
MapTaskCompletionEventsUpdate getMapCompletionEvents(JobID jobId,
int fromIndex,
int maxLocs,
TaskAttemptID id)
throws IOException;
}
- Task运行完成
当Task处理完最后一条记录后,会完成最后收尾工作。
/**
* Report that the task is complete, but its commit is pending.
* 报告task已经完成,但没提交
* @param taskId task's id
* @param taskStatus status of the child
* @throws IOException
*/
void commitPending(TaskAttemptID taskId, TaskStatus taskStatus)
throws IOException, InterruptedException;
/**
* Polling to know whether the task can go-ahead with commit
* 轮询以了解任务是否可以继续提交
* @param taskid
* @return true/false
* @throws IOException
*/
boolean canCommit(TaskAttemptID taskid) throws IOException;
/** Report that the task is successfully completed. Failure is assumed if
* the task process exits without calling this.
* 报告任务成功完成。如果任务进程没有调用它就退出,则假定失败。
* @param taskid task's id
*/
void done(TaskAttemptID taskid) throws IOException;