大数据技术之Hadoop之MapReduce（3）——CombineTextInputFormat

最新推荐文章于 2022-11-30 10:25:47 发布

张反水

最新推荐文章于 2022-11-30 10:25:47 发布

阅读量205

点赞数

分类专栏： # Hadoop 大数据文章标签： java 大数据 mapreduce hadoop 分布式

本文链接：https://blog.csdn.net/zy13765287861/article/details/104685595

版权

本文介绍了如何使用Hadoop的CombineTextInputFormat进行大数据处理的实战案例，通过一个统计单词数量的例子，详细展示了Mapper、Reducer和Driver类的实现，并解释了在处理多个小文件时如何利用CombineTextInputFormat提高效率。

摘要由CSDN通过智能技术生成

3.1.5 CombineTextInputFormat案例实操

示例：统计单词个数

准备工作
在hdfs的根目录下创建input文件夹，然后在里面放置4个大小分别为1.5M、35M、5.5M、6.5M的小文件作为输入数据
具体代码

Mapper类

/**
 * @Author zhangyong
 * @Date 2020/3/4 16:35
 * @Version 1.0
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
   
    private Text mapOutputKey = new Text();
    private IntWritable mapOutputValue = new IntWritable();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
   
        String linevalue = value.toString();  //1.将读取的文件变成，偏移量+内容//读取一行数据
        StringTokenizer st = new StringTokenizer(linevalue);//使用空格分隔
        while (st.hasMoreTokens()