超详细的MapReduce WordCount 统计微博评论最多的用户

最新推荐文章于 2022-02-08 16:41:04 发布

顾·不想加班·小顾

最新推荐文章于 2022-02-08 16:41:04 发布

阅读量387

点赞数

分类专栏： MapReduce学习记录文章标签： java mapreduce 大数据 hadoop idea

本文链接：https://blog.csdn.net/qq_34575632/article/details/119010699

版权

MapReduce学习记录专栏收录该内容

1 篇文章 1 订阅

订阅专栏

这篇博客详细介绍了如何使用Java实现MapReduce来统计微博评论最多的用户。首先通过FastJSON解析JSON数据，提取userId并进行Map操作，然后在Reduce阶段计算每个用户的评论数量并排序，最终输出评论最多的用户及其数量。博主提供了从本地运行WordCount程序的完整代码，包括Mapper、Reducer和启动类。

摘要由CSDN通过智能技术生成

超详细的MapReduce WordCount 统计微博评论最多的用户

使用fastjson解析每一行的json

List<Map<String,Object>> parses = (List<Map<String,Object>>) JSON.parse(value.toString());

提取userId

for (Map<String, Object> pars : parses) {
            String new_value = (String) pars.get("userId");
            context.write(new IntWritable(1),new Text(new_value));
        }

Mapper完整代码

package anu.mapereduce;

import com.alibaba.fastjson.JSON;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.List;
import java.util.Map;

/**
 *  yucheng_gu
 */
public class MainMapper extends Mapper<LongWritable, Text,IntWritable,Text >{
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        List<Map<String,Object>> parses = (List<Map<String,Object>>) JSON.parse(value.toString());
        for (Map<String, Object> pars : parses) {
            String new_value = (String) pars.get("userId");
            context.write(new IntWritable(1),new Text(new_value));
        }
    }
}

reduce查找每个用户的出现数量

Map<String,Integer> navs = new HashMap<>();
        for (Text value : values) {
            Integer integer = navs.get(value.toString());
            if (integer == null){
                navs.put(value.toString(),1);
            }else {
                navs.put(value.toString(),integer+1);
            }
        }

把所有用户的评论数量的信息做排序

List<String> llas = new ArrayList<>();
        for (String keys_l : navs.keySet()) {
            Integer is_v = 0;
            String nname = "null";
            Map<String,Integer> new_navs=new HashMap<>();
            for (String keyaa : navs.keySet()) {
                if (! llas.contains(keyaa)){
                    new_navs.put(keyaa,navs.get(keyaa));
                }
            }
            for (String keys : new_navs.keySet()) {
                if(new_navs.get(keys)>is_v){
                    is_v = new_navs.get(keys);
                    nname = keys;
                }
            }
            llas.add(nname);
        }

输出数据

for (String lla : llas) {
            context.write(new Text(lla),new IntWritable(navs.get(lla)));
        }

Reduce完整代码

package anu.mapereduce;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.*;

public class MainReduce extends Reducer< IntWritable,Text,Text, IntWritable> {

    @Override
    protected void reduce(IntWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        Map<String,Integer> navs = new HashMap<>();
        for (Text value : values) {
            Integer integer = navs.get(value.toString());
            if (integer == null){
                navs.put(value.toString(),1);
            }else {
                navs.put(value.toString(),integer+1);
            }
        }
        List<String> llas = new ArrayList<>();
        for (String keys_l : navs.keySet()) {
            Integer is_v = 0;
            String nname = "null";
            Map<String,Integer> new_navs=new HashMap<>();
            for (String keyaa : navs.keySet()) {
                if (! llas.contains(keyaa)){
                    new_navs.put(keyaa,navs.get(keyaa));
                }
            }
            for (String keys : new_navs.keySet()) {
                if(new_navs.get(keys)>is_v){
                    is_v = new_navs.get(keys);
                    nname = keys;
                }
            }
            llas.add(nname);
        }
        for (String lla : llas) {
            context.write(new Text(lla),new IntWritable(navs.get(lla)));
        }
    }
}

为了方便调试不依赖集群运行，使用本地运行，具体的方法可自己百度

WordCountRunner 启动类的完整代码

package anu.mapereduce;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import java.io.IOException;

/**
 *  yucheng_gu
 */
public class WordCountRunner{
    public static void main(String[] args) throws InterruptedException, IOException, ClassNotFoundException {
        //注册本地hadoop驱动
        System.setProperty("hadoop.home.dir","D:\\LocalServer\\hadoop-2.9.2");
        Configuration configuration = new Configuration();
        //创建一个job任务对象,super.getConf()获取父类的configuration，jobName：任务名称
        Job myWordCount = Job.getInstance(configuration, "MyWordCount");
        //配置job任务的八个步骤
        //第一步：指定读取文件的方式和源文件的路径
        myWordCount.setInputFormatClass(TextInputFormat.class);
        //TextInputFormat.addInputPath(myWordCount,new Path(args[0]));
        //第二步：指定map阶段的处理方式,和数据类型
        myWordCount.setMapperClass(MainMapper.class);
        //设置map阶段k2的类型
        myWordCount.setMapOutputKeyClass(IntWritable.class);
        //设置map阶段v2的类型
        myWordCount.setMapOutputValueClass(Text.class);
        //第三，四，五，六，步采用默认暂时不用配置
        //第七步：指定reduce阶段的处理方式和数据类型
        myWordCount.setReducerClass(MainReduce.class);
        //设置reduce阶段k3的类型
        myWordCount.setOutputKeyClass(Text.class);
        //设置reduce阶段v3的类型
        myWordCount.setOutputValueClass(IntWritable.class);
        //第八步：设置输出类型
        myWordCount.setOutputFormatClass(TextOutputFormat.class);
        //设置输出路径
        // 6 指定job的输入原始所在目录
        FileInputFormat.setInputPaths(myWordCount,
                new Path("D:\\javaproject\\20210722_GOUP_11_GYC\\MapperReuceDemo01\\src\\main\\resources\\datas.json"));
        FileOutputFormat.setOutputPath(myWordCount,
                new Path("D:\\javaproject\\20210722_GOUP_11_GYC\\MapperReuceDemo01\\src\\main\\resources\\input"));
        //等待任务结束
        boolean b = myWordCount.waitForCompletion(true);
        System.exit(b?0:1);
    }
}

输出结果：
在这里插入图片描述

新手第一次写博客勿喷！

顾·不想加班·小顾

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
4
评论
超详细的MapReduce WordCount 统计微博评论最多的用户

超详细的MapReduce WordCount 统计微博评论最多的用户使用fastjson解析每一行的jsonList<Map<String,Object>> parses = (List<Map<String,Object>>) JSON.parse(value.toString());提取userIdfor (Map<String, Object> pars : parses) { String new_val
复制链接

扫一扫

专栏目录