Hadoop之NLineInputFormat解析

最新推荐文章于 2024-03-17 20:13:26 发布

VIP文章逸辰杳

最新推荐文章于 2024-03-17 20:13:26 发布

阅读量3.6k

点赞数

分类专栏： Hadoop 文章标签： hadoop FileInputFormat FileInputSplit

本文链接：https://blog.csdn.net/yhyr_ycy/article/details/52022531

版权

Hadoop默认实现的InputFormat是FileInputFormat<K,V>，在FileInputFormat下有如下五个子类：CombineFileInputFormat<K,V>、TextInputFormat<K,V>、KeyValueTextInputFormat<K,V>、NLineInputFormat<K,V>和 SequenceFileInputForma<K,V>t。其中TextInputFormat默认的实现是TextInputFormat<K,V>。该输入格式的分片方式为：输入文本的每一行作为一个分片，其中该行的偏移量作为Key，该行的内容为Value。而这篇文章介绍到的是是FileInputFormat的另外一种子类——NLineInputFormat。

NLineInputFormat是Hadoop的一种非默认初始化的一种输入格式。不同与InputFormat默认初始化的LineInputFormat，这种分片方式是可以一次性按照指定的分片数进行InputSplit。所以在特定情况下，可以有效提升代码的运行效率。例如：数据文件为每行一个浮点数，总共一百行；指定Reducer个数为5个，分片行数为20。则表明在InputSplit时，文本的每20行作为一个split；分别输入每个Mapper处理的分片内容。因为Hadoop中Mapper的个数与split有关，又因为NLineInputSplit的Mapper计算方法为：Mapper = Splits = 文件行数/分片行数。所以在上面的例子中，总共划分为5个Mapper。下面贴出实现上面案例的源代码。

最低0.47元/天解锁文章

逸辰杳

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Hadoop之NLineInputFormat解析

Hadoop默认实现的InputFormat是FileInputFormat，在FileInputFormat下有如下五个子类：CombineFileInputFormat、TextInputFormat、KeyValueTextInputFormat、NLineInputFormat和SequenceFileInputFormat。其中TextInputFormat默认的实现是TextInp
复制链接

扫一扫