Hadoop收集作业执行状态信息

最近一个项目需要收集hadoop作业的执行状态的信息,我给出了以下的解决策略:

1、从Hadoop提供的jobtracker.jsp获取需要的信息,这里遇到的一个问题是里面使用了application作用域
JobTracker tracker = (JobTracker) application.getAttribute("job.tracker");
而Jetty服务器是嵌入到Hadoop的内部的,
org.apache.mapred.Jobtracker.java
InetSocketAddress infoSocAddr = NetUtils.createSocketAddr(
conf.get(JT_HTTP_ADDRESS, "0.0.0.0:50030"));
infoServer = new HttpServer("job", infoBindAddress, tmpInfoPort,
tmpInfoPort == 0, conf);
infoServer.setAttribute("job.tracker", this);
于是,如果想通过jsp页面获取统计信息的话,必须绕开Jetty服务器,或者在修改Jobtracker的中返回infoServer的一个引用,在代码中实现,不过显然这个需要修改Hadoop的核心代码,灵活性不高。
2、脚本解析jsp.
通过wget http://localhost:50030/jobtracker.jsp可以看到:

-----------------------------------------------------------------------------------------------
<b>State:</b> RUNNING<br>
<b>Started:</b> Tue Dec 28 09:43:40 CST 2010<br>
<b>Version:</b> 0.21.0,
985326<br>
<b>Compiled:</b> Tue Aug 17 01:02:28 EDT 2010 by
tomwhite from
branches/branch-0.21<br>
<b>Identifier:</b> 201012280943<br>
...............................................................................................
这些信息完全都可以使用python beautiful soup(http://www.crummy.com/software/BeautifulSoup/)来解析得到。

3、把你的hadooop版本升级到Hadoop-0.21.0,Cluster类

提供了丰富的API接口
 cancelDelegationToken(org.apache.hadoop.security.token.Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> token)
Cancel a delegation token from the JobTracker
voidclose()
Close the Cluster.
TaskTrackerInfo[]getActiveTaskTrackers()
Get all active trackers in the cluster.
Job[]getAllJobs()
Get all the jobs in cluster.
TaskTrackerInfo[]getBlackListedTaskTrackers()
Get blacklisted trackers.
QueueInfo[]getChildQueues(String queueName)
Returns immediate children of queueName.
ClusterMetricsgetClusterStatus()
Get current cluster status.
org.apache.hadoop.security.token.Token
<org.apache.hadoop.mapreduce.
security.token.delegation.DelegationTokenIdentifier>
getDelegationToken(org.apache.hadoop.io.Text renewer)
Get a delegation token for the user from the JobTracker.
org.apache.hadoop.fs.FileSystemgetFileSystem()
Get the file system where job-specific files are stored
JobgetJob(JobID jobId)
Get job corresponding to jobid.
StringgetJobHistoryUrl(JobID jobId)
Get the job history file path for a given job id.
StategetJobTrackerState()
Get JobTracker's state
QueueInfogetQueue(String name)
Get queue information for the specified name.
QueueAclsInfo[]getQueueAclsForCurrentUser()
Gets the Queue ACLs for current user
QueueInfo[]getQueues()
Get all the queues in cluster.
QueueInfo[]getRootQueues()
Gets the root level queues.
org.apache.hadoop.fs.PathgetStagingAreaDir()
Grab the jobtracker's view of the staging directory path where job-specific files will be placed.
org.apache.hadoop.fs.PathgetSystemDir()
Grab the jobtracker system directory path where job-specific files will be placed.

例如我们需要打印作业的信息的时候,只需要:
Configuration conf = new Configuration();
Cluster cluster = new Cluster(conf);
Job[] job = cluster.getAllJobs();
if(job != null) {
for (Job tmp : job) {
System.out.println(tmp.getJobID());
System.out.println(tmp.getJobName());
System.out.println(tmp.getStartTime());
System.out.println(tmp.getFinishTime());
}
}
:-)...

Klose 我们应该和Hadoop一起进步。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值