计数器是Hadoop框架使用的一种针对错误信息收集的手段,主要用于对数据的控制及收集统计信息,计数器可以帮助程序设计人员收集某一类特定信息的数据,对于大多数的Hadoop框架内的事件和组件,使用计数器来获取信息比查阅日志文件要容易的多。MapReduce框架中已经内置了一些计数器,也可以自定义计数器。
(1) 内置计数器的分类
分组 | 属性名 |
---|---|
MapReduce任务计数器 | org.apache.hadoop.mapreduce.TaskCounter |
文件系统计数器 | org.apache.hadoop.mapreduce.FileSystemCounter |
输入文件计数器 | org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter |
输出文件计数器 | org.apache.hadoop.mapreduce.lib.input.FileOutputFormatCounter |
任务计数器 | org.apache.hadoop.mapreduce.JobCounter |
(2) MapReduce任务计数器
MapReduce任务计数器主要用于收集任务在运行时的任务信息。
分组 | 描述 |
---|---|
MAP_INPUT_RECORDS | 输入的数据记录数 |
MAP_SKIPPED_RECORDS | 输入跳过的记录数 |
MAP_INPUT_BYTES | 输入的记录字节数 |
SPLIT_RAW_BYTES | 输入分片中的字节数 |
MAP_OUTPUT_RECORDS | Map任务输出记录数 |
MAP_OUTPUT_BYTES | Map输出的字节数 |
MAP_OUTPUT_MATERIALIZED_BYTES | Map写入磁盘的字节数 |
COMBINE_INPUT_RECORDS | 合并任务记录数 |
COMBINE_OUTPUT_RECORDS | 合并任务输出记录数 |
REDUCE_INPUT_GROUPS | Reduce任务的分组数 |
REDUCE_INPUT_RECORDS | Reduce任务记录数 |
REDUCE_OUTPUT_RECORDS | Reduce任务输出记录数 |
REDUCE_SKIPPED_RECORDS | Reduce跳过记录数 |
REDUCE_SKIPPED_GROUPS | Reduce跳过分组 |
CPU_MILLISECONDS | CPU运行时间 |
GC_TIME_MILLS | 垃圾收集器运行时间 |
SHUFFLED_MAPS | 排序任务的数目 |
(3) 文件系统计数器
分组 | 属性名 |
---|---|
BYTES_READ | 输入的总的数据数 |
BYTES_WRITTEN | 写出的总的数据数 |
(4) 任务计数器
分组 | 属性名 |
---|---|
TOTAL_LAUNCHED_MAPS | 已经启用的MAP任务数 |
TOTAL_LAUNCHED_REDUCE | 已经启用的Reduce任务数 |
TOTAL_LAUNCHED_UBERTASKS | 已经启用的全部上级任务数 |
NUM_UBER_SUBREDUCES | 对于Reduce任务启动的上级任务数 |
NUM_FAILED_REDUCES | 失败的Reduce任务 |
NUM_FAILED_MAPS | 失败的Map任务数 |
DATA_LOCAL_MAPS | 与输入数据处于同一节点的任务数 |
OTHER_LOCAL_MAPS | 其他节点输入数据的任务数 |
SLOTS_MMILLIS_REDUCES | 在Reduce任务上运行的是时间数 |
SLOTS_MILLS_MAPS | 在MAP任务上运行的时间数 |
RACK_LOCAL_MAPS | 与输入数据处于同一个节点上的任务数 |
(5) 自定义计数器
下面这个计数器例子是统计mapreduce中分布在三个数值段中的记录的个数。
另外是还有动态计数器的例子。
/**
* Created by 鸣宇淳 on 2017/5/23.
*/
public class MyCounter {
public static enum PvSoltEnum
{
Solt_0_to_1000,
Solt_1000_to_10000,
Solt_more_10000
}
}
static final String NUMGROUP = "NumGroup";
static final String STARTBYONE = "BigPV";
//Mapper类
public static class SortWCMapper extends
Mapper<LongWritable, Text, MyDataTypeWritable, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String lineValue = value.toString();
String[] strs = lineValue.split(",");
if (2 != strs.length) {
return;
}
Integer keyInt = Integer.valueOf(strs[0]);
Integer valInt = Integer.valueOf(strs[1]);
//自定义计数器
if (valInt.compareTo(1000) < 0) {
context.getCounter(MyCounter.PvSoltEnum.Solt_0_to_1000).increment(1);
} else if (valInt.compareTo(1000) >= 0 && valInt.compareTo(10000) < 0) {
context.getCounter(MyCounter.PvSoltEnum.Solt_1000_to_10000).increment(1);
}
//动态计数器
if (valInt > 20000) {
context.getCounter(NUMGROUP, STARTBYONE).increment(1);
}
MyDataTypeWritable mapOutputKey = new MyDataTypeWritable(keyInt, valInt);
context.write(mapOutputKey, new IntWritable(mapOutputKey.getSecond()));
}
}
public int run(String[] args) throws Exception {
//获取配置
Configuration configuration = this.getConf();
//创建job
Job job = Job.getInstance(configuration, SortWCMapReduce.class.getSimpleName());
//指定MapReduce主类
job.setJarByClass(SortWCMapReduce.class);
//指定输入路径
Path inpath = new Path(args[0]);
FileInputFormat.addInputPath(job, inpath);
//指定输出路径
Path outpath = new Path(args[1]);
FileOutputFormat.setOutputPath(job, outpath);
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(SortWCMapper.class);
job.setMapOutputKeyClass(MyDataTypeWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(SortWCReducer.class);
boolean isSucces = job.waitForCompletion(true);
job.getJobID();
Counters counters = job.getCounters();
//读取自定义计数器
Counter myc=counters.findCounter(MyCounter.PvSoltEnum.Solt_0_to_1000);
//读取动态计数器
Counter c = counters.findCounter(NUMGROUP, STARTBYONE);
System.out.println("自定义计数器——Solt_0_to_1000:" + myc.getValue() );
System.out.println("动态计数器——NUMGROUP:" + c.getValue() );
return isSucces ? 0 : 1;
}
所有的计数器会在job运行完成后打印显示:
17/05/24 06:36:28 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=199133652
FILE: Number of bytes written=399356575
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=130421676
HDFS: Number of bytes written=213744
HDFS: Number of read operations=33
HDFS: Number of large read operations=0
HDFS: Number of write operations=20
Job Counters
Launched map tasks=1
Launched reduce tasks=10
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=66692
Total time spent by all reduces in occupied slots (ms)=112790
Total time spent by all map tasks (ms)=66692
Total time spent by all reduce tasks (ms)=56395
Total vcore-seconds taken by all map tasks=66692
Total vcore-seconds taken by all reduce tasks=56395
Total megabyte-seconds taken by all map tasks=68292608
Total megabyte-seconds taken by all reduce tasks=115496960
Map-Reduce Framework
Map input records=14223828
Map output records=14223828
Map output bytes=170685936
Map output materialized bytes=199133652
Input split bytes=95
Combine input records=0
Combine output records=0
Reduce input groups=19289
Reduce shuffle bytes=199133652
Reduce input records=14223828
Reduce output records=19289
Spilled Records=28447656
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1117
CPU time spent (ms)=107380
Physical memory (bytes) snapshot=5327536128
Virtual memory (bytes) snapshot=31331090432
Total committed heap usage (bytes)=5032050688
NumGroup
BigPV=144
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
mapreduce.counter.MyCounter$PvSoltEnum
Solt_0_to_1000=14218978
Solt_1000_to_10000=4532
Solt_more_10000=317
File Input Format Counters
Bytes Read=130421581
File Output Format Counters
Bytes Written=213744
自定义计数器——Solt_0_to_1000:14218978
动态计数器——NUMGROUP:144