java计数器map_计数器(Counter)

最新推荐文章于 2022-11-21 19:48:57 发布

weixin_39672443

最新推荐文章于 2022-11-21 19:48:57 发布

阅读量827

点赞数

文章标签： java计数器map

本文链接：https://blog.csdn.net/weixin_39672443/article/details/114817003

版权

Hadoop中的计数器有点类似于日志，可以输出Hadoop在运行过程的运运算信息。

在之前运行的WordCount中，控制台输出的信息有以下内容(可以再运行一次WordCount案例进行查看)：Counters: 38

File System Counters #10个

FILE: Number of bytes read=462

FILE: Number of bytes written=541399

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=38

HDFS: Number of bytes written=19

HDFS: Number of read operations=15

HDFS: Number of large read operations=0

HDFS: Number of write operations=6

Map-Reduce Framework #20个

Map input records=2

Map output records=4

Map output bytes=35

Map output materialized bytes=49

Input split bytes=109

Combine input records=0

Combine output records=0

Reduce input groups=3

Reduce shuffle bytes=49

Reduce input records=4

Reduce output records=3

Spilled Records=8

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=59

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=242360320

Shuffle Errors #6个

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters #1个

Bytes Read=19

File Output Format Counters #1个

Bytes Written=19

可以看到输出的日志中，提示总共有38个计数器，分成了5组，File System Counters等称之为组名，可以看出每组分别有10、20、6、1、1个计数器。

对于这38个计数器，我们并不是每一个都关心，以下重点讲解部分计数器的作用

一、计数器讲解

1、File Input Format CountersFile Input Format Counters #1个

Bytes Read=19

表示的是我们从HDFS中读取的文件的字节数总共是19个字节

回归之前的word.txt中的文本内容hello you

helo me

5+3+5+2=15，加上2个空格和一个换行，一个结束符也是19个字符。

2、Map-Reduce FrameworkMap-Reduce Framework #20个

Map input records=2

Map output records=4

Map output bytes=35

Map output materialized bytes=49

Input split bytes=109

Combine input records=0

Combine output records=0

Reduce input groups=3

Reduce shuffle bytes=49

Reduce input records=4

Reduce output records=3

Spilled Records=8

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=59

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=242360320

Map input records=2hello you

hello me

刚好是2行

Map output records=4

由于我们的mapper中，是每读取一个单词，就输出一个键值对，因此map任务的输出是：,

刚好有四个

Reduce input records=4

map输出的记录就是reduce输入的记录数，因此也是四个

Reduce input groups=3

关于分组group的概念我们之后会详细讲解，实际上就是将mapper的输出的记录进行分组，即把相同key的分为一组，所以分组后是

刚好分成3组。

Reduce output records=3

WordCount案例中的输出为hello 2

you 1

me 1

刚好是3行。

Combine input records=0、Combine output records=0

这是属于规约，在后面我们会详细的讲解规约的概念。

二、自定义计数器

计数器用Counter对象表示，每个计数器都有一个组，只要组名(groupName)相同，那么这些计数器就自动属于一个组。并且每个计数器还有这自己的名字(counterName)，用以区分同一个组下的不同计数器。

获得一个计数器实例的方法如下：Counter counter = context.getCounter(groupName, counterName);

例如，我们现在要进行敏感词统计，即分析某段文本内容中出现了多少次敏感词。假设我们把"hello"认为是一个敏感词。在WordCount案例的基础上，我们可以将TokenizerMapper的代码修改如下public static class TokenizerMapper extends

Mapper {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

@Override

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

// StringTokenizer是java工具类，将字符串按照空格进行分割

StringTokenizer itr = new StringTokenizer(value.toString());

//自定义计数器

String groupName="Custom Group";//

String counterName="Sensitive words";

Counter counter = context.getCounter(groupName, counterName);

// 每次出现一个单词，单词次数加1

while (itr.hasMoreTokens()) {

String nextToken = itr.nextToken();

if(nextToken.equals("hello")){//假设"hello"为敏感词，每次输出，即加1

counter.increment(1);

}

word.set(nextToken);

context.write(word, one);

}

再次运行WordCount案例，我们可以看到控制台中输出了我们自定义的计数器Counters: 39

File System Counters

FILE: Number of bytes read=462

FILE: Number of bytes written=541399

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=38

HDFS: Number of bytes written=19

HDFS: Number of read operations=15

HDFS: Number of large read operations=0

HDFS: Number of write operations=6

Map-Reduce Framework

Map input records=2

Map output records=4

Map output bytes=35

Map output materialized bytes=49

Input split bytes=109

Combine input records=0

Combine output records=0

Reduce input groups=3

Reduce shuffle bytes=49

Reduce input records=4

Reduce output records=3

Spilled Records=8

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=38

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=242360320

Custom Group #我们自定义的组名

Sensitive words=2 #我们自定义的计数器的值为2

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=19

File Output Format Counters

Bytes Written=19

weixin_39672443

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫