MR 之 计数器

10 篇文章 0 订阅
4 篇文章 0 订阅

MR 之 计数器


首先我们看一个MapReduce程序的其中一段log,由此log中我们对MR的计数器的分析和学习(以下log中注释是对其中涉及计数器的说明)。

16/03/22 14:25:30 INFO mapreduce.Job: Counters: 49 // 表示本次job49个计数器
   File System Counters // 文件系统计数器
      FILE: Number of bytes read=235
      FILE: Number of bytes written=230421
      FILE: Number of read operations=0
      FILE: Number of large read operations=0
      FILE: Number of write operations=0
      HDFS: Number of bytes read=189
      HDFS: Number of bytes written=86
      HDFS: Number of read operations=6
      HDFS: Number of large read operations=0
      HDFS: Number of write operations=2
   Job Counters // 作业计数器
      Launched map tasks=1 // 启动的map数为1
      Launched reduce tasks=1 // 启动的reduce数为1
      Data-local map tasks=1
      Total time spent by all maps in occupied slots (ms)=12118
      Total time spent by all reduces in occupied slots (ms)=11691
      Total time spent by all map tasks (ms)=12118
      Total time spent by all reduce tasks (ms)=11691
      Total vcore-seconds taken by all map tasks=12118
      Total vcore-seconds taken by all reduce tasks=11691
      Total megabyte-seconds taken by all map tasks=12408832
      Total megabyte-seconds taken by all reduce tasks=11971584
   Map-Reduce Framework //MapReduce框架计数器
      Map input records=3
      Map output records=14
      Map output bytes=201
      Map output materialized bytes=235
      Input split bytes=100
      Combine input records=0
      Combine output records=0
      Reduce input groups=10
      Reduce shuffle bytes=235
      Reduce input records=14
      Reduce output records=10
      Spilled Records=28
      Shuffled Maps =1
      Failed Shuffles=0
      Merged Map outputs=1
      GC time elapsed (ms)=331
      CPU time spent (ms)=2820
      Physical memory (bytes) snapshot=306024448
      Virtual memory (bytes) snapshot=1690583040
      Total committed heap usage (bytes)=136122368
   Shuffle Errors // Shuffle错误计数器
      BAD_ID=0
      CONNECTION=0
      IO_ERROR=0
      WRONG_LENGTH=0
      WRONG_MAP=0
      WRONG_REDUCE=0
   File Input Format Counters // 文件输入格式计数器
      Bytes Read=89 // Map从HDFS上读取的字节数,共89个字节
   File Output Format Counters // 文件输出格式计数器
      Bytes Written=86 //Reduce输出到HDFS上的字节数,共86个字节

计数器:计数器是用来记录job的执行进度和状态的。它的作用可以理解为日志。我们通常可以在程序的某个位置插入计数器,用来记录数据或者进度的变化情况,它比日志更便利进行分析。以下图片截取自 Hadoop权威指南(第3版) ,其中对计数器做了较为详尽的说明。


从上面log信息中,我们可以获取MR的一些内置计数器的信息,这些内置计数器用以描述多项指标。包括:

  • 任务计数器(TaskCounter):采集任务的相关信息,TaskCounter由其关联的task维护,并定期发送给tasktracker,再由tasktracker发送给jobtracker,因此,计数器能被全局的聚集。
  • 文件系统计数器(File System Counters):
  • 作业计数器(Job Counters) :由jobtracker(或yarn中应用宿主)维护,因此无需再网络间传输数据,这些计数器都是做业级别的统计,值不会随着任务运行而改变。如:启动的map数。此异于其他计数器(包括自定义计数器)的特点。Job Counters由application master维护。
  • MapReduce框架计数器(Map-Reduce Framework)
  • Shuffle 错误计数器(Shuffle Errors)
  • 文件输入格式计数器(File Output Format Counters)
  • 文件输出格式计数器(File Input Format Counters)

自定义计数器

MR允许用户编写程序来定义计数器,计数器的值可在map或reduce中增加。计数器是全局的,换言之,MR框架将跨所有map和reduce来聚集这些计数器。并在作业结束时产生一个最终结果。如下计算最高气温的计数器。

package com.wbkit.cobub.mapreduce;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;

public class MaxTemperatureWithCounters extends Configured implements Tool {

  enum Temperature {
    MISSING,
    MALFORMED
  }


  public int run(String[] args) throws Exception {
    Job job = Job.getInstance();
    if (job == null) {
      return -1;
    }

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(MaxTemperatureMapperWithCounters.class);
    job.setCombinerClass(MaxTemperatureReducer.class);
    job.setReducerClass(MaxTemperatureReducer.class);

    return job.waitForCompletion(true) ? 0 : 1;
  }

  public static void main(String[] args) throws Exception {
    int exitCode = ToolRunner.run(new MaxTemperatureWithCounters(), args);
    System.exit(exitCode);
  }
  static class MaxTemperatureMapperWithCounters
          extends Mapper<LongWritable, Text, Text, IntWritable> {

    private NcdcRecordParser parser = new NcdcRecordParser();

    @Override
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

      parser.parse(value);
      if (parser.isValidTemperature()) {
        int airTemperature = parser.getAirTemperature();
        context.write(new Text(parser.getYear()),
                new IntWritable(airTemperature));
      } else if (parser.isMalformedTemperature()) {
        System.err.println("Ignoring possibly corrupt input: " + value);
        context.getCounter(Temperature.MALFORMED).increment(1);
      } else if (parser.isMissingTemperature()) {
        context.getCounter(Temperature.MISSING).increment(1);
      }

      // dynamic counter
      context.getCounter("TemperatureQuality", parser.getQuality()).increment(1);
    }
  }

  static class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
    protected void setup(Context context ) throws IOException, InterruptedException {
      // NOTHING
    }

    protected void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException {
      int maxValue =Integer.MIN_VALUE;
      int count =0;
      for (IntWritable value:values){
        maxValue = Math.max(maxValue,value.get());
        count ++;
      }
      context.getCounter("r","MaxTemperatureReducer."+key.toString()).increment(1l);
    }
    protected void cleanup(Context context ) throws IOException, InterruptedException {
      context.getCounter("r","");
    }


  }



}


动态计数器

Java枚举类型字段在编译阶段必须指定,因而无法使用枚举类型动态新建计数器

context.getCounter("TemperatureQuality", parser.getQuality()).increment(1);







  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值