Hadoop之MapReduce应用实例1(求平均值)

一、数据集及程序要求

  数据集ramen-ratings.txt,包含全世界2580种方便面的品牌、国家、包装类型、评分等内容,使用MapReduce计数并求平均值,输出:

二、源代码编写

2.1 打开IntelliJ IDEA创建Maven项目
2.2 pom.xml 文件如下:
    <dependencies>
        <!-- https://mvnrepository.com/artifact/junit/junit -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.mrunit/mrunit -->
        <dependency>
            <groupId>org.apache.mrunit</groupId>
            <artifactId>mrunit</artifactId>
            <version>1.1.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>2.5</version>
            <type>pom</type>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.0.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.0.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.0.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>3.1.0</version>
        </dependency>

    </dependencies>
2.4 编写Map实现
  1. 实现逻辑:
    输入:一行数据
    处理:使用空格将字符串split成数组,提取国家和评分,分别作为键和值输出。
    输出:<国家, 评分>
  2. 代码编辑:
public class NoodlesMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {

    private static Logger logger  =  Logger.getLogger(NoodlesMapper.class.getName());
    /**
     *
     * @param key 输入键参数(行首字符偏移量) 数据类型须与泛型类型1一致
     * @param value 输入值参数(行文本) 数据类型须与泛型类型2一致
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        //初始化数据
        String[] strs = value.toString().split("\t");
        //国家
        String nation = strs[4];
        logger.info("nation:"+nation);
        double v = 0.0;
        //清理无效数据
        if(strs[5].equals("Unrated")){
            return;
        }else {
            v = Float.parseFloat(strs[5].trim());
        }
        logger.info("stars ============>>> " + v);
        context.write(new Text(nation), new DoubleWritable(v));

    }
}
2.5 编写Reduce实现
  1. 实现逻辑:
    输入:<国家,[该国家所有的评分数组]>
    处理:计算平均分
    输出:<国家,平均分>
  2. 代码编辑
public class NoodlesReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {

    private static Logger logger  =  Logger.getLogger(NoodlesReducer.class.getName());

    @Override
    protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
        double sum = 0;
        int count = 0;
        Iterator<DoubleWritable> val = values.iterator();
        while (val.hasNext()) {
            sum += val.next().get();//计算总评分
            count++;//统计总的国家数
        }
        double avg = (double) sum/count;
        logger.info("avg = " + avg);
        context.write(key, new DoubleWritable(avg));
    }
}
2.6 编写Run实现
public class NoodlesAVGRun extends Configured implements Tool {
    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new NoodlesAVGRun(), args);
        System.exit(res);
    }
    public int run(String[] args) throws Exception {
        //System.setProperty("HADOOP_USER_NAME","root");//指定虚拟机里的用户名
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://192.168.137.150:9000");
        // 创建作业
        Job job = Job.getInstance(conf, "NoodlesAVGRun");

        // 指定作业的主类
        job.setJarByClass(NoodlesAVGRun.class);

        // 指定Map和Reduce类
        job.setMapperClass(NoodlesMapper.class);
        job.setReducerClass(NoodlesReducer.class);

        // 指定输入格式为:文本格式文件
        job.setInputFormatClass(TextInputFormat.class);
        //TextInputFormat.addInputPath(job, new Path(args[0]));
        TextInputFormat.addInputPath(job, new Path("/ramen-ratings.txt"));

        // 指定输出格式为:文本格式文件,键为文本、值为浮点型
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);
        //TextOutputFormat.setOutputPath(job, new Path(args[1]));
        TextOutputFormat.setOutputPath(job, new Path("/test/output"));

        // 执行MapReduce
        boolean res = job.waitForCompletion(true);
        if(res) {
            System.out.println("执行成功");
            return 0;
        }
        else
            return -1;
    }
}

执行结果:
在这里插入图片描述

  • 1
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值