MapReduce代码编写--求性别人数、求总分、关联、map端的过滤、combiner预聚合

最新推荐文章于 2023-04-06 20:10:52 发布

赤兔胭脂小吕布

最新推荐文章于 2023-04-06 20:10:52 发布

阅读量653

点赞数

文章标签： mapreduce hadoop 大数据 java hdfs

本文链接：https://blog.csdn.net/m0_52602967/article/details/126144395

版权

这篇博客探讨了如何使用MapReduce进行大数据处理，包括从学生数据中统计性别数量、计算总分、实现数据关联、在Map阶段进行过滤操作以及利用Combiner进行预聚合，特别是强调了Combiner在Max、Min、Sum场景下的应用。

摘要由CSDN通过智能技术生成

主要基于数据

#students.txt

/*
1500100001,施笑槐,22,女,文科六班
1500100002,吕金鹏,24,男,文科六班
1500100003,单乐蕊,22,女,理科六班
1500100004,葛德曜,24,男,理科三班
1500100005,宣谷芹,22,女,理科五班
1500100006,边昂雄,21,男,理科二班
1500100007,尚孤风,23,女,文科六班
1500100008,符半双,22,女,理科六班
1500100009,沈德昌,21,男,理科一班
1500100010,羿彦昌,23,男,理科六班
1500100011,宰运华,21,男,理科三班
1500100012,梁易槐,21,女,理科一班
1500100013,逯君昊,24,男,文科二班
1500100014,羿旭炎,23,男,理科五班
1500100015,宦怀绿,21,女,理科一班
1500100016,潘访烟,23,女,文科一班
1500100017,高芷天,21,女,理科五班
1500100018,骆怜雪,21,女,文科六班
1500100019,娄曦之,24,男,理科三班
1500100020,杭振凯,23,男,理科四班
1500100021,连鸿晖,22,男,理科六班
1500100022,薄运珧,23,男,文科四班
1500100023,东鸿畴,23,男,理科二班
1500100024,湛慕卉,22,女,文科二班
1500100025,翁飞昂,22,男,文科四班
……
*/

#score.txt

/*
1500100001,1000001,98
1500100001,1000002,5
1500100001,1000003,137
1500100001,1000004,29
1500100001,1000005,85
1500100001,1000006,52
1500100002,1000001,139
1500100002,1000002,102
1500100002,1000003,44
1500100002,1000004,18
1500100002,1000005,46
1500100002,1000006,91
1500100003,1000001,48
1500100003,1000002,132
1500100003,1000003,41
1500100003,1000007,32
1500100003,1000008,7
1500100003,1000009,99
1500100004,1000001,147
1500100004,1000002,69
1500100004,1000003,37
1500100004,1000007,87
1500100004,1000008,21
1500100004,1000009,60
1500100005,1000001,105
……
*/

求性别人数

package com.shujia.MapReduce;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class Demo2GenderCnt {
    // Map端
    public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            String[] splits = value.toString().split(",");
            String gender = splits[3];
            // 以性别作为key 1作为value
            context.write(new Text(gender), new IntWritable(1));
        }
    }

    // Reduce端
    public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            int cnt = 0;
            // 统计性别人数
            for (IntWritable value : values) {
                cnt += value.get();
            }
            context.write(key, new IntWritable(cnt));
        }
    }

    // Driver端
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        //这个名字在yarn的web界面上可以看到
        job.setJobName("Demo2GenderCnt");
        job.setJarByClass(Demo2GenderCnt.class);

        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 配置输入输出路径
        FileInputFormat.addInputPath(job, new Path("/student/input"));
        // 输出路径不需要提前创建，如果该目录已存在则会报错
        // 通过HDFS的JavaAPI判断输出路径是否存在
        Path outPath

最低0.47元/天解锁文章

赤兔胭脂小吕布

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
MapReduce代码编写--求性别人数、求总分、关联、map端的过滤、combiner预聚合

主要基于数据#students.txt/*1500100001,施笑槐,22,女,文科六班1500100002,吕金鹏,24,男,文科六班1500100003,单乐蕊,22,女,理科六班1500100004,葛德曜,24,男,理科三班1500100005,宣谷芹,22,女,理科五班1500100006,边昂雄,21,男,理科二班1500100007,尚孤风,23,女,文科六班...
复制链接

扫一扫