MapReduce 计数器

最新推荐文章于 2023-04-22 10:47:15 发布

JOEL-T99

最新推荐文章于 2023-04-22 10:47:15 发布

阅读量947

点赞数

分类专栏： BigData 文章标签： hadoop 计数器 mapreduce

本文链接：https://blog.csdn.net/weixin_47243236/article/details/121599405

版权

BigData 专栏收录该内容

58 篇文章 5 订阅

订阅专栏

1. MapReduce 计数器

1.1 计数器是什么

计数器一般用来记录 job 执行进度和状态信息。

如果自己手动实现计数，需要考虑将多个线程的计算结果合并，编码过于麻烦，通常使用 MapReduce 的计数器。

实际应用中可以用来统计任务的某个环节的执行次数或者数据量，并作为优化前后的参数依据。

1.2 计数器分类

🌟 内置计数器：
内置计数器用来描述该作业的各项指标。

🌟 自定义计数器：

用户可以通过自己定义计数器，来实现特定的需求，可以通过枚举的方式定义计数器，也可以通过 context.getCounter 的方式自定义计数器。

2. MapReduce 内置计数器

内置计数器中分为若干个组，组内包含若干个统计项。

组别	对应类	描述
MapReduce 任务计数器	`mapreduce.TaskCounter`	统计任务的具体信息
文件系统计数器	`mapreduce.FileSystemCounter`	统计任务的读取或写入
FileInputFormat 计数器	`mapreduce.lib.input.FileInputFormatCounter`	统计读取的字节数
FileOutputFormat 计数器	`mapreduce.lib.input.FileOutputFormatCounter`	统计写出的字节数
作业计数器	`mapreduce.JobCounter`	统计任务的作业

2.1 MapReduce 任务计数器

名称	描述
map input records	map 输入的记录数
map input bytes	map 输入的字节数
map skipped records	map 跳过的记录数
map output records	map 输出的记录数
map output bytes	map 输出的字节数
split raw bytes	分片的原始字节数
map output materialized bytes	map 输出的物化字节数
combine input records	combine 输入的记录数
combine output records	combine 输出的记录数
reduce input groups	reduce 输入的组
reduce input records	reduce 输入的记录数
reduce output records	reduce 输出的记录数
reduce skipped groups	reduce 跳过的组数
reduce skipped records	reduce 跳过的字节数
reduce shuffle bytes	reduce 结果 shuffle 的字节数
spilled records	溢出的记录数
cpu milliseconds	CPU 毫秒
physical memory bytes	物理内存字节数
virtual memory bytes	虚拟内存字节数
committed heap bytes	有效的堆字节数
gc time millis	gc 运行时间
shuffled maps	由 shuffle 传输的 map 输出数
failed shuffle	失败的 shuffle 数
merged map outputs	被合并的 map 输出数

2.2 文件系统计数器

名称	描述
bytes read	文件系统的读字节数
bytes waitten	文件系统的写字节数

2.3 FileInputFormat 计数器

名称	描述
bytes read	读的字节数

2.4 FileOutputFormat 计数器

名称	描述
bytes waitten	写的字节数

2.5 作业计数器

名称	描述
total launched maps	启用的 map 任务数
total launched reduces	启用的 reduce任务数
total launched ubertasks	启用的 uber 任务数
num uber submaps	uber 中的 map 任务数
num uber subreduces	uber 中的reduce 任务数
num failed maps	失败的 map任务数
num failed reduces	失败的 reduce任务数
num failed ubertasks	失败的 uber 任务数
data local maps	数据本地化的 map 任务数
rack local maps	机架本地化的 map 任务数
other local maps	其他本地化的 map 任务数
slots millis maps	map 任务的总运行时间
slots millis reduces	reduce 任务的总运行时间
fallow slots millis maps	在保留槽之后，map 任务等待的总时间
fallow slots millis reduces	在保留槽之后，reduce 任务等待的总时间

3. 自定义计数器

MapReduce 提供了两种方式直接创建 MapReduce 程序全局计数器，并且使用 Counter.incriment() 进行累加操作。

3.1 窥见源码

  /**
   * 获取给定counterName的计数器
   * @param counterName 计数器名称
   * @return 给定counterName的计数器
   */
  public Counter getCounter(Enum<?> counterName);

  /**
   * 获取给定groupName和counterName的计数器。
   */
  public Counter getCounter(String groupName, String counterName);

3.2 枚举声明计数器

通过 getCounter 传入枚举类型，可以实现计数器的功能。

实现：统计IP数量、统计192开头的IP数量！

Counter counter1 = context.getCounter(IpCounterEnum.IP_Quantity_Statistics);
counter1.increment(1);
Counter counter2 = context.getCounter(IpCounterEnum.IP_Start_With_192);
if (key.toString().startsWith("192")){
    counter2.increment(1);
}

📤 查看输出：

CustomCounter.IpCounterEnum
		IP_Quantity_Statistics=1137
		IP_Start_With_192=32

3.3 自定义声明计数器

通过 getCounter 传入自定义组名及项名，可以实现计数器的功能。

实现：统计IP数量、统计192开头的IP数量！

Counter counter1 = context.getCounter("数量统计", "访问量");
counter1.increment(1);
Counter counter2 = context.getCounter("数量统计", "以192开头的IP");
if (key.toString().startsWith("192")) {
    counter2.increment(1);
}

📤 查看输出：

数量统计
		以192开头的IP=32
		访问量=1137

4. 写在最后

建议使用传入枚举的方式实现信息的统计！

❤️ END ❤️

JOEL-T99

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录