在Hadoop中,启动作业运行的方式有很多,可以用命令行格式把打包好后的作业提交还可以,用Hadoop的插件进行应用开发,在这么多的方式中,都会必经过一个流程,作业会以JobInProgress的形式提交到JobTracker中。什么叫JobTracker呢,也许有些人了解Hadoop只知道他的MapReduce计算模型,那个过程只是其中的Task执行的一个具体过程,比较微观上的流程,而JobTrack是一个比较宏观上的东西。涉及到作业的提交的过程。Hadoop遵循的是Master/Slave的架构,也就是主从关系,对应的就是JobTracker/TaskTracker,前者负责资源管理和作业调度,后者主要负责执行由前者分配过来的作业。这样说的话,简单明了。JobTracker里面的执行的过程很多,那就得从开头开始分析,也就是作业最最开始的提交流程开始。后面的分析我会结合MapReduce的代码穿插式的分析,便于大家理解。
其实在作业的提交状态之前,还不会到达JobTacker阶段的,首先是到了MapReduce中一个叫JobClient的类中。也就是说,比如用户通过bin/hadoop jar xxx.jar把打包的jar包上传到系统中时,首先会触发的就是JobClient.。
- publicRunningJobsubmitJob(StringjobFile)throwsFileNotFoundException,
- InvalidJobConfException,
- IOException{
- //Loadinthesubmittedjobdetails
- JobConfjob=newJobConf(jobFile);
- returnsubmitJob(job);
- }
- publicRunningJobsubmitJob(JobConfjob)throwsFileNotFoundException,
- IOException{
- try{
- //又继续调用的是submitJobInternal方法
- returnsubmitJobInternal(job);
- }catch(InterruptedExceptionie){
- thrownewIOException("interrupted",ie);
- }catch(ClassNotFoundExceptioncnfe){
- thrownewIOException("classnotfound",cnfe);
- }
- }
- ...
- jobCopy=(JobConf)context.getConfiguration();
- //Createthesplitsforthejob为作业创建输入信息
- FileSystemfs=submitJobDir.getFileSystem(jobCopy);
- LOG.debug("Creatingsplitsat"+fs.makeQualified(submitJobDir));
- intmaps=writeSplits(context,submitJobDir);
- jobCopy.setNumMapTasks(maps);
- //write"queueadminsofthequeuetowhichjobisbeingsubmitted"
- //tojobfile.
- Stringqueue=jobCopy.getQueueName();
- AccessControlListacl=jobSubmitClient.getQueueAdmins(queue);
- jobCopy.set(QueueManager.toFullPropertyName(queue,
- QueueACL.ADMINISTER_JOBS.getAclName()),acl.getACLString());
- //WritejobfiletoJobTracker'sfs
- FSDataOutputStreamout=
- FileSystem.create(fs,submitJobFile,
- newFsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION));
- try{
- jobCopy.writeXml(out);
- }finally{
- out.close();
- }
- //
- //Now,actuallysubmitthejob(usingthesubmitname)
- //
- printTokens(jobId,jobCopy.getCredentials());
- //所有信息配置完毕,作业的初始化工作完成,最后将通过RPC方式正式提交作业
- status=jobSubmitClient.submitJob(
- jobId,submitJobDir.toString(),jobCopy.getCredentials());
- JobProfileprof=jobSubmitClient.getJobProfile(jobId);
至此我们知道,我们作业已经从本地提交出去了,后面的事情就是JobTracker的事情了,这个时候我们直接会触发的是JobTacker的addJob()方法。
- privatesynchronizedJobStatusaddJob(JobIDjobId,JobInProgressjob)
- throwsIOException{
- totalSubmissions++;
- synchronized(jobs){
- synchronized(taskScheduler){
- jobs.put(job.getProfile().getJobID(),job);
- //观察者模式,会触发每个监听器的方法
- for(JobInProgressListenerlistener:jobInProgressListeners){
- listener.jobAdded(job);
- }
- }
- }
- myInstrumentation.submitJob(job.getJobConf(),jobId);
- job.getQueueMetrics().submitJob(job.getJobConf(),jobId);
- LOG.info("Job"+jobId+"addedsuccessfullyforuser'"
- +job.getJobConf().getUser()+"'toqueue'"
- +job.getJobConf().getQueueName()+"'");
- AuditLogger.logSuccess(job.getUser(),
- Operation.SUBMIT_JOB.name(),jobId.toString());
- returnjob.getStatus();
- }
- /**
- *StarttheJobTrackerprocess.Thisisusedonlyfordebugging.Asarule,
- *JobTrackershouldberunaspartoftheDFSNamenodeprocess.
- *JobTracker也是一个后台进程,伴随NameNode进程启动进行,main方法是他的执行入口地址
- */
- publicstaticvoidmain(Stringargv[]
- )throwsIOException,InterruptedException
- publicstaticvoidmain(Stringargv[]
- )throwsIOException,InterruptedException{
- StringUtils.startupShutdownMessage(JobTracker.class,argv,LOG);
- try{
- if(argv.length==0){
- //调用startTracker方法开始启动JobTracker
- JobTrackertracker=startTracker(newJobConf());
- //JobTracker初始化完毕,开启里面的各项线程服务
- tracker.offerService();
- }
- else{
- if("-dumpConfiguration".equals(argv[0])&&argv.length==1){
- dumpConfiguration(newPrintWriter(System.out));
- }
- else{
- System.out.println("usage:JobTracker[-dumpConfiguration]");
- System.exit(-1);
- }
- }
- }catch(Throwablee){
- LOG.fatal(StringUtils.stringifyException(e));
- System.exit(-1);
- }
- }
- JobTracker(finalJobConfconf,Stringidentifier,Clockclock,QueueManagerqm)
- throwsIOException,InterruptedException{
- .....
- //初始化安全相关操作
- secretManager=
- newDelegationTokenSecretManager(secretKeyInterval,
- tokenMaxLifetime,
- tokenRenewInterval,
- DELEGATION_TOKEN_GC_INTERVAL);
- secretManager.startThreads();
- ......
- //Readthehosts/excludefilestorestrictaccesstothejobtracker.
- this.hostsReader=newHostsFileReader(conf.get("mapred.hosts",""),
- conf.get("mapred.hosts.exclude",""));
- //初始化ACL访问控制列表
- aclsManager=newACLsManager(conf,newJobACLsManager(conf),queueManager);
- LOG.info("Startingjobtrackerwithowneras"+
- getMROwner().getShortUserName());
- //Createthescheduler
- Class<?extendsTaskScheduler>schedulerClass
- =conf.getClass("mapred.jobtracker.taskScheduler",
- JobQueueTaskScheduler.class,TaskScheduler.class);
- //初始化Task任务调度器
- taskScheduler=(TaskScheduler)ReflectionUtils.newInstance(schedulerClass,conf);
- //Setservice-levelauthorizationsecuritypolicy
- if(conf.getBoolean(
- ServiceAuthorizationManager.SERVICE_AUTHORIZATION_CONFIG,false)){
- ServiceAuthorizationManager.refresh(conf,newMapReducePolicyProvider());
- }
- inthandlerCount=conf.getInt("mapred.job.tracker.handler.count",10);
- this.interTrackerServer=
- RPC.getServer(this,addr.getHostName(),addr.getPort(),handlerCount,
- false,conf,secretManager);
- if(LOG.isDebugEnabled()){
- Propertiesp=System.getProperties();
- for(Iteratorit=p.keySet().iterator();it.hasNext();){
- Stringkey=(String)it.next();
- Stringval=p.getProperty(key);
- LOG.debug("Property'"+key+"'is"+val);
- }
- }
1.初始化ACL访问控制列表数据
2.创建TaskSchedule任务调度器
3.得到DPC Server。
4.还有其他一些零零碎碎的操作....
然后第2个方法offService(),主要开启了各项服务;
- publicvoidofferService()throwsInterruptedException,IOException{
- //Prepareforrecovery.Thisisdoneirrespectiveofthestatusofrestart
- //flag.
- while(true){
- try{
- recoveryManager.updateRestartCount();
- break;
- }catch(IOExceptionioe){
- LOG.warn("Failedtoinitializerecoverymanager.",ioe);
- //waitforsometime
- Thread.sleep(FS_ACCESS_RETRY_PERIOD);
- LOG.warn("Retrying...");
- }
- }
- taskScheduler.start();
- .....
- this.expireTrackersThread=newThread(this.expireTrackers,
- "expireTrackers");
- //启动该线程的主要作用是发现和清理死掉的任务
- this.expireTrackersThread.start();
- this.retireJobsThread=newThread(this.retireJobs,"retireJobs");
- //启动该线程的作用是清理长时间驻留在内存中且已经执行完的任务
- this.retireJobsThread.start();
- expireLaunchingTaskThread.start();
- if(completedJobStatusStore.isActive()){
- completedJobsStoreThread=newThread(completedJobStatusStore,
- "completedjobsStore-housekeeper");
- //该线程的作用是把已经运行完成的任务的信息保存到HDFS中,以便后续的查询
- completedJobsStoreThread.start();
- }
- //starttheinter-trackerserveroncethejtisready
- this.interTrackerServer.start();
- synchronized(this){
- state=State.RUNNING;
- }
- LOG.info("StartingRUNNING");
- this.interTrackerServer.join();
- LOG.info("StoppedinterTrackerServer");
- }
- voidclose()throwsIOException{
- //服务停止
- if(this.infoServer!=null){
- LOG.info("StoppinginfoServer");
- try{
- this.infoServer.stop();
- }catch(Exceptionex){
- LOG.warn("ExceptionshuttingdownJobTracker",ex);
- }
- }
- if(this.interTrackerServer!=null){
- LOG.info("StoppinginterTrackerServer");
- this.interTrackerServer.stop();
- }
- if(this.expireTrackersThread!=null&&this.expireTrackersThread.isAlive()){
- LOG.info("StoppingexpireTrackers");
- //执行线程中断操作
- this.expireTrackersThread.interrupt();
- try{
- //等待线程执行完毕再执行后面的操作
- this.expireTrackersThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(this.retireJobsThread!=null&&this.retireJobsThread.isAlive()){
- LOG.info("Stoppingretirer");
- this.retireJobsThread.interrupt();
- try{
- this.retireJobsThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(taskScheduler!=null){
- //调度器的方法终止
- taskScheduler.terminate();
- }
- if(this.expireLaunchingTaskThread!=null&&this.expireLaunchingTaskThread.isAlive()){
- LOG.info("StoppingexpireLaunchingTasks");
- this.expireLaunchingTaskThread.interrupt();
- try{
- this.expireLaunchingTaskThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(this.completedJobsStoreThread!=null&&
- this.completedJobsStoreThread.isAlive()){
- LOG.info("StoppingcompletedJobsStorethread");
- this.completedJobsStoreThread.interrupt();
- try{
- this.completedJobsStoreThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(jobHistoryServer!=null){
- LOG.info("Stoppingjobhistoryserver");
- try{
- jobHistoryServer.shutdown();
- }catch(Exceptionex){
- LOG.warn("ExceptionshuttingdownJobHistoryserver",ex);
- }
- }
- DelegationTokenRenewal.close();
- LOG.info("stoppedalljobtrackerservices");
- return;
- }
至此,JobTracker的执行过程总算有了一个了解了吧,不算太难。后面的过程分析。JobTracker是如何把任务进行分解和分配的,从宏观上去理解Hadoop的工作原理。下面是以上过程的一个时序图
-
顶
在Hadoop中,启动作业运行的方式有很多,可以用命令行格式把打包好后的作业提交还可以,用Hadoop的插件进行应用开发,在这么多的方式中,都会必经过一个流程,作业会以JobInProgress的形式提交到JobTracker中。什么叫JobTracker呢,也许有些人了解Hadoop只知道他的MapReduce计算模型,那个过程只是其中的Task执行的一个具体过程,比较微观上的流程,而JobTrack是一个比较宏观上的东西。涉及到作业的提交的过程。Hadoop遵循的是Master/Slave的架构,也就是主从关系,对应的就是JobTracker/TaskTracker,前者负责资源管理和作业调度,后者主要负责执行由前者分配过来的作业。这样说的话,简单明了。JobTracker里面的执行的过程很多,那就得从开头开始分析,也就是作业最最开始的提交流程开始。后面的分析我会结合MapReduce的代码穿插式的分析,便于大家理解。
其实在作业的提交状态之前,还不会到达JobTacker阶段的,首先是到了MapReduce中一个叫JobClient的类中。也就是说,比如用户通过bin/hadoop jar xxx.jar把打包的jar包上传到系统中时,首先会触发的就是JobClient.。
- publicRunningJobsubmitJob(StringjobFile)throwsFileNotFoundException,
- InvalidJobConfException,
- IOException{
- //Loadinthesubmittedjobdetails
- JobConfjob=newJobConf(jobFile);
- returnsubmitJob(job);
- }
- publicRunningJobsubmitJob(JobConfjob)throwsFileNotFoundException,
- IOException{
- try{
- //又继续调用的是submitJobInternal方法
- returnsubmitJobInternal(job);
- }catch(InterruptedExceptionie){
- thrownewIOException("interrupted",ie);
- }catch(ClassNotFoundExceptioncnfe){
- thrownewIOException("classnotfound",cnfe);
- }
- }
- ...
- jobCopy=(JobConf)context.getConfiguration();
- //Createthesplitsforthejob为作业创建输入信息
- FileSystemfs=submitJobDir.getFileSystem(jobCopy);
- LOG.debug("Creatingsplitsat"+fs.makeQualified(submitJobDir));
- intmaps=writeSplits(context,submitJobDir);
- jobCopy.setNumMapTasks(maps);
- //write"queueadminsofthequeuetowhichjobisbeingsubmitted"
- //tojobfile.
- Stringqueue=jobCopy.getQueueName();
- AccessControlListacl=jobSubmitClient.getQueueAdmins(queue);
- jobCopy.set(QueueManager.toFullPropertyName(queue,
- QueueACL.ADMINISTER_JOBS.getAclName()),acl.getACLString());
- //WritejobfiletoJobTracker'sfs
- FSDataOutputStreamout=
- FileSystem.create(fs,submitJobFile,
- newFsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION));
- try{
- jobCopy.writeXml(out);
- }finally{
- out.close();
- }
- //
- //Now,actuallysubmitthejob(usingthesubmitname)
- //
- printTokens(jobId,jobCopy.getCredentials());
- //所有信息配置完毕,作业的初始化工作完成,最后将通过RPC方式正式提交作业
- status=jobSubmitClient.submitJob(
- jobId,submitJobDir.toString(),jobCopy.getCredentials());
- JobProfileprof=jobSubmitClient.getJobProfile(jobId);
至此我们知道,我们作业已经从本地提交出去了,后面的事情就是JobTracker的事情了,这个时候我们直接会触发的是JobTacker的addJob()方法。
- privatesynchronizedJobStatusaddJob(JobIDjobId,JobInProgressjob)
- throwsIOException{
- totalSubmissions++;
- synchronized(jobs){
- synchronized(taskScheduler){
- jobs.put(job.getProfile().getJobID(),job);
- //观察者模式,会触发每个监听器的方法
- for(JobInProgressListenerlistener:jobInProgressListeners){
- listener.jobAdded(job);
- }
- }
- }
- myInstrumentation.submitJob(job.getJobConf(),jobId);
- job.getQueueMetrics().submitJob(job.getJobConf(),jobId);
- LOG.info("Job"+jobId+"addedsuccessfullyforuser'"
- +job.getJobConf().getUser()+"'toqueue'"
- +job.getJobConf().getQueueName()+"'");
- AuditLogger.logSuccess(job.getUser(),
- Operation.SUBMIT_JOB.name(),jobId.toString());
- returnjob.getStatus();
- }
- /**
- *StarttheJobTrackerprocess.Thisisusedonlyfordebugging.Asarule,
- *JobTrackershouldberunaspartoftheDFSNamenodeprocess.
- *JobTracker也是一个后台进程,伴随NameNode进程启动进行,main方法是他的执行入口地址
- */
- publicstaticvoidmain(Stringargv[]
- )throwsIOException,InterruptedException
- publicstaticvoidmain(Stringargv[]
- )throwsIOException,InterruptedException{
- StringUtils.startupShutdownMessage(JobTracker.class,argv,LOG);
- try{
- if(argv.length==0){
- //调用startTracker方法开始启动JobTracker
- JobTrackertracker=startTracker(newJobConf());
- //JobTracker初始化完毕,开启里面的各项线程服务
- tracker.offerService();
- }
- else{
- if("-dumpConfiguration".equals(argv[0])&&argv.length==1){
- dumpConfiguration(newPrintWriter(System.out));
- }
- else{
- System.out.println("usage:JobTracker[-dumpConfiguration]");
- System.exit(-1);
- }
- }
- }catch(Throwablee){
- LOG.fatal(StringUtils.stringifyException(e));
- System.exit(-1);
- }
- }
- JobTracker(finalJobConfconf,Stringidentifier,Clockclock,QueueManagerqm)
- throwsIOException,InterruptedException{
- .....
- //初始化安全相关操作
- secretManager=
- newDelegationTokenSecretManager(secretKeyInterval,
- tokenMaxLifetime,
- tokenRenewInterval,
- DELEGATION_TOKEN_GC_INTERVAL);
- secretManager.startThreads();
- ......
- //Readthehosts/excludefilestorestrictaccesstothejobtracker.
- this.hostsReader=newHostsFileReader(conf.get("mapred.hosts",""),
- conf.get("mapred.hosts.exclude",""));
- //初始化ACL访问控制列表
- aclsManager=newACLsManager(conf,newJobACLsManager(conf),queueManager);
- LOG.info("Startingjobtrackerwithowneras"+
- getMROwner().getShortUserName());
- //Createthescheduler
- Class<?extendsTaskScheduler>schedulerClass
- =conf.getClass("mapred.jobtracker.taskScheduler",
- JobQueueTaskScheduler.class,TaskScheduler.class);
- //初始化Task任务调度器
- taskScheduler=(TaskScheduler)ReflectionUtils.newInstance(schedulerClass,conf);
- //Setservice-levelauthorizationsecuritypolicy
- if(conf.getBoolean(
- ServiceAuthorizationManager.SERVICE_AUTHORIZATION_CONFIG,false)){
- ServiceAuthorizationManager.refresh(conf,newMapReducePolicyProvider());
- }
- inthandlerCount=conf.getInt("mapred.job.tracker.handler.count",10);
- this.interTrackerServer=
- RPC.getServer(this,addr.getHostName(),addr.getPort(),handlerCount,
- false,conf,secretManager);
- if(LOG.isDebugEnabled()){
- Propertiesp=System.getProperties();
- for(Iteratorit=p.keySet().iterator();it.hasNext();){
- Stringkey=(String)it.next();
- Stringval=p.getProperty(key);
- LOG.debug("Property'"+key+"'is"+val);
- }
- }
1.初始化ACL访问控制列表数据
2.创建TaskSchedule任务调度器
3.得到DPC Server。
4.还有其他一些零零碎碎的操作....
然后第2个方法offService(),主要开启了各项服务;
- publicvoidofferService()throwsInterruptedException,IOException{
- //Prepareforrecovery.Thisisdoneirrespectiveofthestatusofrestart
- //flag.
- while(true){
- try{
- recoveryManager.updateRestartCount();
- break;
- }catch(IOExceptionioe){
- LOG.warn("Failedtoinitializerecoverymanager.",ioe);
- //waitforsometime
- Thread.sleep(FS_ACCESS_RETRY_PERIOD);
- LOG.warn("Retrying...");
- }
- }
- taskScheduler.start();
- .....
- this.expireTrackersThread=newThread(this.expireTrackers,
- "expireTrackers");
- //启动该线程的主要作用是发现和清理死掉的任务
- this.expireTrackersThread.start();
- this.retireJobsThread=newThread(this.retireJobs,"retireJobs");
- //启动该线程的作用是清理长时间驻留在内存中且已经执行完的任务
- this.retireJobsThread.start();
- expireLaunchingTaskThread.start();
- if(completedJobStatusStore.isActive()){
- completedJobsStoreThread=newThread(completedJobStatusStore,
- "completedjobsStore-housekeeper");
- //该线程的作用是把已经运行完成的任务的信息保存到HDFS中,以便后续的查询
- completedJobsStoreThread.start();
- }
- //starttheinter-trackerserveroncethejtisready
- this.interTrackerServer.start();
- synchronized(this){
- state=State.RUNNING;
- }
- LOG.info("StartingRUNNING");
- this.interTrackerServer.join();
- LOG.info("StoppedinterTrackerServer");
- }
- voidclose()throwsIOException{
- //服务停止
- if(this.infoServer!=null){
- LOG.info("StoppinginfoServer");
- try{
- this.infoServer.stop();
- }catch(Exceptionex){
- LOG.warn("ExceptionshuttingdownJobTracker",ex);
- }
- }
- if(this.interTrackerServer!=null){
- LOG.info("StoppinginterTrackerServer");
- this.interTrackerServer.stop();
- }
- if(this.expireTrackersThread!=null&&this.expireTrackersThread.isAlive()){
- LOG.info("StoppingexpireTrackers");
- //执行线程中断操作
- this.expireTrackersThread.interrupt();
- try{
- //等待线程执行完毕再执行后面的操作
- this.expireTrackersThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(this.retireJobsThread!=null&&this.retireJobsThread.isAlive()){
- LOG.info("Stoppingretirer");
- this.retireJobsThread.interrupt();
- try{
- this.retireJobsThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(taskScheduler!=null){
- //调度器的方法终止
- taskScheduler.terminate();
- }
- if(this.expireLaunchingTaskThread!=null&&this.expireLaunchingTaskThread.isAlive()){
- LOG.info("StoppingexpireLaunchingTasks");
- this.expireLaunchingTaskThread.interrupt();
- try{
- this.expireLaunchingTaskThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(this.completedJobsStoreThread!=null&&
- this.completedJobsStoreThread.isAlive()){
- LOG.info("StoppingcompletedJobsStorethread");
- this.completedJobsStoreThread.interrupt();
- try{
- this.completedJobsStoreThread.join();
- }catch(InterruptedExceptionex){
- ex.printStackTrace();
- }
- }
- if(jobHistoryServer!=null){
- LOG.info("Stoppingjobhistoryserver");
- try{
- jobHistoryServer.shutdown();
- }catch(Exceptionex){
- LOG.warn("ExceptionshuttingdownJobHistoryserver",ex);
- }
- }
- DelegationTokenRenewal.close();
- LOG.info("stoppedalljobtrackerservices");
- return;
- }
至此,JobTracker的执行过程总算有了一个了解了吧,不算太难。后面的过程分析。JobTracker是如何把任务进行分解和分配的,从宏观上去理解Hadoop的工作原理。下面是以上过程的一个时序图