命令为:
hadoop_debugjar /opt/hadoop-1.0.0/hadoop-examples-1.0.0.jar wordcount/user/admin/in/yellow.txt /user/admin/out/555
首先调用org.apache.hadoop.util.runJar.main
public static void main(String[]args){
//JVM退出时将tmp/hadoop-admin/hadoop-unjar4705742737164408087
Runtime.getRuntime().addShutdownHook(newThread(){
toURL());
toURL());
toURL());
toURL());
}
设置属性:
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReduce.class);
job.setCombinerClass(WordCountReduce.class);//mapreduce.combine.class
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);//mapred.mapoutput.value.class
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setJobName("WordCount");
FileInputFormat.addInputPath(job,input);
FileOutputFormat.setOutputPath(job,output);
job.submit()
连接JobTracker:
privatevoidconnect()throwsIOException,InterruptedException {
其中:
publicvoidinit(JobConfconf) throwsIOException{
此时获得一个实现JobSubmissionProtocol的RPC调用,即JobTracker的代理。
获取job StagingArea
PathjobStagingArea =JobSubmissionFiles.getStagingDir(JobClient.this,
RPC请求:JobSubmissionProtocol.getStagingAreaDir()
返回:hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/Admin/.staging
RPC请求:ClientProtocol.getFileInfo(/tmp/hadoop-admin/mapred/staging/Admin/.staging)
返回:org.apache.hadoop.hdfs.protocol.HdfsFileStatus@5521691b,即存在
RPC请求:ClientProtocol.getFileInfo(/tmp/hadoop-admin/mapred/staging/Admin/.staging)
返回:org.apache.hadoop.hdfs.protocol.HdfsFileStatus@726c554,用以判断权限
获得 NewJobId
JobIDjobId = jobSubmitClient.getNewJobId();
RPC请求:JobSubmissionProtocol.getNewJobId()
返回:job_201404010621_0004
建立 submitJob Dir:
PathsubmitJobDir = newPath(jobStagingArea,jobId.toString());
hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004
复制Jar到HDFS
copyAndConfigureFiles(jobCopy,submitJobDir);
RPC请求:ClientProtocol.getFileInfo(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004)
返回:null
RPC请求:ClientProtocol.mkdirs(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004,rwxr-xr-x)
返回:true
RPC请求:ClientProtocol.setPermission(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004,rwx------)
返回:null
RPC请求:ClientProtocol.getFileInfo(/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.jar)
返回:null,即不存在
RPC请求:ClientProtocol.create(/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.jar,rwxr-xr-x, DFSClient_-1317833261, true, true, 3,67108864)
返回:输出流
RPC请求:ClientProtocol.addBlock(/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.jar,DFSClient_-1317833261, null)
返回:org.apache.hadoop.hdfs.protocol.LocatedBlock@1a9b701
Block:blk_6689254996395759186_2720
BlockToken:Ident: ,Pass: , Kind: , Service:
DataNode:[10.1.1.103:50010,10.1.1.102:50010]
RPC请求:ClientProtocol.complete(/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.jar,DFSClient_-1317833261)
返回:true
RPC请求:ClientProtocol.setReplication(/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.jar,10)
返回:true
RPC请求:ClientProtocol.setPermission(/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.jar,rw-r--r--)
返回:null
RPC请求:ClientProtocol.renewLease(DFSClient_-1317833261)
返回:null
此后有1个守护线程会不断发送 renewLease请求
此时本地文件/opt/hadoop-1.0.0/hadoop-examples-1.0.0.jar被复制到HDFS文件系统/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.xml
Reduce数目:
int reduces= jobCopy.getNumReduceTasks();
reduce数目为2
检查输出目录
RPC请求:ClientProtocol.getFileInfo(/user/admin/out/555)
返回:null,即不存在
获取输入分片信息:
int maps =writeSplits(context, submitJobDir);
其中:
其中:
RPC请求:ClientProtocol.getFileInfo(/user/admin/in/yellow.txt)
返回:path="hdfs://server1:9000/user/admin/in/yellow.txt",length=201000000,isdir=false,block_replication=3, blocksize=67108864, permission=rw-r--r--,owner=Admin, group=supergroup
RPC请求:ClientProtocol.getBlockLocations(/user/admin/in/yellow.txt,0, 201000000)
返回:3个BlockLocation
offset={0},
offset={67108864},
offset={134217728},length={66782272}, hosts={server3,server2}, names={[10.1.1.102:50010, 10.1.1.103:50010]},topologyPaths={[/default-rack/10.1.1.103:50010,/default-rack/10.1.1.102:50010]}
最终确定的分片信息 为3个Filespit
Filespit: file={hdfs://server1:9000/user/admin/in/yellow.txt},hosts={ [server3, server2] }, length={ 67108864 },start={0}
Filespit: file={hdfs://server1:9000/user/admin/in/yellow.txt},hosts={ [server3, server2] }, length={ 67108864 },start={67108864}
Filespit: file={hdfs://server1:9000/user/admin/in/yellow.txt},hosts={ [server3, server2] }, length={ 66782272}, start={134217728}
map数量为3
jobCopy.setNumMapTasks(maps);
建立分片文件:
JobSplitWriter.createSplitFiles(jobSubmitDir,conf,
RPC请求:ClientProtocol.create(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.split,rwxr-xr-x, DFSClient_-1317833261, true, true, 3,67108864);
返回:输出流
RPC请求:ClientProtocolsetPermission(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.split,rw-r--r--)
返回:null
RPC请求:ClientProtocol.setReplication(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.split,10)
返回:true
RPC请求:ClientProtocol.addBlock(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.split,DFSClient_-1317833261, null)
返回:LocatedBlock对象为
Block: blockid=-921399365952861077,generationStamp=2714,numBytes=0
BlockTokenIdentifier:Ident: ,Pass: , Kind: , Service:
DatanodeInfo[]:[10.1.1.103:50010,10.1.1.102:50010]
offset:0
RPC请求:ClientProtocol.complete(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.split,DFSClient_-1317833261)
返回:true
写入的 SplitMetaInfo为
[data-size :67108864 start-offset : 7 locations :
[data-size :67108864 start-offset : 116 locations:
[data-size :66782272 start-offset : 225 locations :
返回:输出流
RPC请求: ClientProtocol.setPermission(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.splitmetainfo,rw-r--r--)
返回:null
RPC请求:ClientProtocol.addBlock(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.splitmetainfo,DFSClient_-1317833261, null)
返回:LocatedBlock对象为
Block: blockid=789965327875207186,generationStamp=2715,numBytes=0
BlockTokenIdentifier:Ident: ,Pass: , Kind: , Service:
DatanodeInfo[]:[10.1.1.103:50010,10.1.1.102:50010]
offset:0
RPC请求:ClientProtocol.complete(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.splitmetainfo,DFSClient_-1317833261)
返回:true
设置AccessControl
RPC请求:JobSubmissionProtocol.getQueueAdmins(default)
返回:All usersare allowed
Write jobfile to JobTracker'sfs
RPC请求:ClientProtocol.create(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.xml,rwxr-xr-x, DFSClient_-1317833261, true, true, 3,67108864)
返回:输出流
RPC请求:ClientProtocol.setPermission(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.xml,rw-r--r--)
返回:null
RPC请求:ClientProtocol.addBlock(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.xml,DFSClient_-1317833261,null)
返回:LocatedBlock对象为
Block: blockid= -7725157033540829125,generationStamp= 2716,numBytes=0
BlockTokenIdentifier:Ident: ,Pass: , Kind: , Service:
DatanodeInfo[]:[10.1.1.103:50010,10.1.1.102:50010]
offset:0
RPC请求:ClientProtocol.complete(/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.xml,DFSClient_-1317833261)
返回:true
此时"/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/"下生成文件 job.xml,包含了所有的配置信息.
此时HDFS目录"/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/"下面文件为:
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
job.jar 为运行的Jar包,
提交作业:
status= jobSubmitClient.submitJob(
RPC请求:JobSubmissionProtocol.submitJob(job_201404010621_0004,hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004,org.apache.hadoop.security.Credentials@70677770)
返回:JobStatus: setProgress=0,mapProgress=0,reduceProgress=0,cleanProgress=0,runstate=4,priority=NOMAL,..
RPC请求:JobSubmissionProtocol.getJobProfile(job_201404010621_0004)
返回:JobProfile:jobFile=hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/Admin/.staging/job_201404010621_0004/job.xml,jobID=job_201404010621_0004,name=WordCount,queue=default,url=http://server1:50030/jobdetails.jsp?jobid=job_201404010621_0004,user=Admin
综合JobStatus和JobProfile
Job:job_201404010621_0004
file:hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404010621_0004/job.xml
tracking URL:http://server1:50030/jobdetails.jsp?jobid=job_201404010621_0004
map()completion: 0.0
reduce()completion: 0.0
监控Job状态:
jobClient.monitorAndPrintJob(conf,info);
RPC请求:JobSubmissionProtocol.getJobStatus(job_201404010621_0004)
返回:
RPC请求:JobSubmissionProtocol.getJobStatus(job_201404010621_0004)
返回:
即map 100%reduce 100%
之后会多次发送JobSubmissionProtocol.getJobStatus(job_201404010621_0004)请求
RPC请求:JobSubmissionProtocol.getTaskCompletionEvents(job_201404010621_0004,0, 10)
返回: [Task Id :attempt_201404010621_0004_m_000004_0, Status : SUCCEEDED, Task Id :attempt_201404010621_0004_m_000002_0, Status : SUCCEEDED, Task Id :attempt_201404010621_0004_m_000000_0, Status : SUCCEEDED, Task Id :attempt_201404010621_0004_m_000001_0, Status : SUCCEEDED, Task Id :attempt_201404010621_0004_m_000000_1, Status : KILLED, Task Id :attempt_201404010621_0004_r_000000_0, Status : SUCCEEDED, Task Id :attempt_201404010621_0004_r_000001_0, Status : SUCCEEDED, Task Id :attempt_201404010621_0004_m_000003_0, Status :SUCCEEDED]
RPC请求:JobSubmissionProtocol.getJobCounters(job_201404010621_0004)
返回:OW[class=classorg.apache.hadoop.mapred.Counters,value=Counters: 29