通过HadoopAPI获取task日志内容

最新推荐文章于 2023-08-04 01:02:29 发布

some_321

最新推荐文章于 2023-08-04 01:02:29 发布

阅读量469

点赞数

文章标签：大数据 java

在集群中查看Task日志的方法，一般有两个：

1，通过Hadoop提供的WebConsole，直接在页面中追踪查看；

2，到集群中运行该task的节点上，查看日志文件。每个tasktracker子进程都会用log4j产生三个日志文件，分别是syslog，stdout，stderr。这些日志文件存放到%HADOOP_LOG_DIR%目录下的userlogs的子目录中。但是通过该方法，需要追踪到哪个节点运行了该task。

下面，通过使用JobClient，以及JobClient的几个私有方法（displayTaskLogs（）、getTaskLogs（）、getTaskLogURL（），方法参数省略，具体见代码），来获取日志信息。代码如下：

package myTest;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.*;

import java.io.*;


public class test {

    static String getTaskLogURL(TaskAttemptID taskId, String baseUrl) {
        return (baseUrl + "/tasklog?plaintext=true&attemptid=" + taskId);
    }
//JobClient中的该方法，没有Writer参数，这是为了得到输出流加的
    private static void displayTaskLogs(TaskAttemptID taskId, String baseUrl, Writer sw)
            throws IOException {
        // The tasktracker for a 'failed/killed' job might not be around...
        if (baseUrl != null) {
            // Construct the url for the tasklogs
            String taskLogUrl = getTaskLogURL(taskId, baseUrl);

            // Copy task's stderr to stderr of the JobClient
           getTaskLogs(taskId, new URL(taskLogUrl+"&filter=stderr"), sw);
        }
    }

//JobClient中的该方法，参数不是Writer类型，而是OutputStream类型，直接打印到控制台。
    private static void getTaskLogs(TaskAttemptID taskId, URL taskLogUrl,
                                   Writer out) {
        try {
            URLConnection connection = taskLogUrl.openConnection();
            connection.setReadTimeout(1000000);
            connection.setConnectTimeout(1000000);
            BufferedReader input =
                    new BufferedReader(new InputStreamReader(connection.getInputStream()));
            BufferedWriter output =
                    new BufferedWriter(out);
            try {
                String logData = null;
                while ((logData = input.readLine()) != null) {
                    if (logData.length() > 0) {
                        output.write(taskId + ": " + logData + "\n");
                        output.flush();
                    }
                }
            } finally {
                input.close();
            }
        }catch(IOException ioe){
            System.out.println("Error reading task output" + ioe.getMessage());
        }
    }

    public static void main(String[] args) throws IOException, InterruptedException {

        Configuration conf = new Configuration();
        conf.addResource(new Path("conf/mapred-site.xml"));
        conf.addResource(new Path("conf/core-site.xml"));
        conf.addResource(new Path("conf/hdfs-site.xml"));

//输出配置文件的所有属性
//        for (Map.Entry<String, String> entry : conf) {
//            System.out.println(entry.getKey() + "\t=\t" + entry.getValue());
//        }

        JobConf job = new JobConf(conf);
        JobClient jc = new JobClient(job);
       jc.init(job);

        JobID jobIdNew = new JobID("201304151829", 6316);
        RunningJob runJob = jc.getJob(jobIdNew);

        StringWriter sw = new StringWriter();
        TaskCompletionEvent[] events = runJob.getTaskCompletionEvents(0);
        for(TaskCompletionEvent event : events){
            displayTaskLogs(event.getTaskAttemptId(), event.getTaskTrackerHttp(), sw);
        }
        System.out.println(sw.toString());

//        /**
//         * mapProgress()/reduceProgress()
//         * result:1.0
//         */
//        System.out.println(runJob.mapProgress());
//        System.out.println(runJob.reduceProgress());
//
//
//        /**getTrackingURL()
//         * result:
//         * http://baby6:35030/jobdetails.jsp?jobid=job_201304151829_5768
//         */
//        System.out.println(runJob.getTrackingURL());
//
//
//        /**displayTasks()
//         * result:
//         * attempt_201304151829_5768_m_000000_0
//         */
//        jc.displayTasks(jobIdNew, "map", "completed");
//

//        /**
//         * 获取集群中taskTracker个数
//         */
//        System.out.println(jc.getClusterStatus().getTaskTrackers());

/**
 *获取集群中活着的节点名称
 */
//        Collection<String> c = jc.getClusterStatus(true).getActiveTrackerNames();
//        Iterator it = c.iterator();
//        while (it.hasNext()) {
//            System.out.println(it.next());
//        }
//        JobStatus[] jobs = jc.getAllJobs();
//        System.out.println(jobs.length);


    }
}

注：该方法只能获取的到非历史Job的日志信息，如果该job已经变成History job时，获取为空。

一般一个job经过24小时会变成history job，这个可以在集群中设置。

some_321

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
通过HadoopAPI获取task日志内容

在集群中查看Task日志的方法，一般有两个：1，通过Hadoop提供的WebConsole，直接在页面中追踪查看；2，到集群中运行该task的节点上，查看日志文件。每个tasktracker子进程都会用log4j产生三个日志文件，分别是syslog，stdout，stderr。这些日志文件存放到%HADOOP_LOG_DIR%目录下的userlogs的子目录中。但是通过该方法，需要追踪...
复制链接

扫一扫