Hadoop_day05_MapReduce 的经典案例（流量统计）

最新推荐文章于 2023-05-12 18:07:15 发布

chipeize

最新推荐文章于 2023-05-12 18:07:15 发布

阅读量679

点赞数

分类专栏：大数据 MapReduce Java Hadoop

本文链接：https://blog.csdn.net/chipeize/article/details/99879549

版权

大数据同时被 3 个专栏收录

30 篇文章 0 订阅

订阅专栏

Hadoop

21 篇文章 0 订阅

订阅专栏

Java

8 篇文章 0 订阅

订阅专栏

需求一：统计求和

统计每个手机号的上行数据包总和，下行数据包总和，上行总流量之和，下行总流量之和分析：以手机号码作为key值，上行流量，下行流量，上行总流量，下行总流量四个字段作为value值，然后以这个key，和value作为map阶段的输出，reduce阶段的输入

1.1 自定义SumBean

public class SumBean implements Writable{

    private Integer upFlow;
    private Integer downFlow;
    private Integer upCountFlow;
    private Integer downCountFlow;

    @Override
    public String toString() {
        return upFlow + "\t" + downFlow + "\t" + upCountFlow + "\t" + downCountFlow;
    }

    public Integer getUpFlow() {
        return upFlow;
    }

    public void setUpFlow(Integer upFlow) {
        this.upFlow = upFlow;
    }

    public Integer getDownFlow() {
        return downFlow;
    }

    public void setDownFlow(Integer downFlow) {
        this.downFlow = downFlow;
    }

    public Integer getUpCountFlow() {
        return upCountFlow;
    }

    public void setUpCountFlow(Integer upCountFlow) {
        this.upCountFlow = upCountFlow;
    }

    public Integer getDownCountFlow() {
        return downCountFlow;
    }

    public void setDownCountFlow(Integer downCountFlow) {
        this.downCountFlow = downCountFlow;
    }

    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeInt(upFlow);
        dataOutput.writeInt(downFlow);
        dataOutput.writeInt(upCountFlow);
        dataOutput.writeInt(downCountFlow);
    }

    @Override
    public void readFields(DataInput dataInput) throws IOException {
        this.upFlow = dataInput.readInt();
        this.downFlow = dataInput.readInt();
        this.upCountFlow = dataInput.readInt();
        this.downCountFlow = dataInput.readInt();
    }
}

1.2 Mapper

public class SumMapper extends Mapper<LongWritable,Text,Text,SumBean> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        Text text = new Text();
        String[] split = value.toString().split("\t");
        for (String s : split) {
            text.set(split[1]);
            SumBean sumBean = new SumBean();
            sumBean.setUpFlow(Integer.parseInt(split[6]));
            sumBean.setDownFlow(Integer.parseInt(split[7]));
            sumBean.setUpCountFlow(Integer.parseInt(split[8]));
            sumBean.setDownCountFlow(Integer.parseInt(split[9]));

            context.write(text,sumBean);
        }
    }
}

1.3 Reducer

public class SumReducer extends Reducer<Text,SumBean,Text,SumBean> {
    @Override
    protected void reduce(Text key, Iterable<SumBean> values, Context context) throws IOException, InterruptedException {
        SumBean sumBean = new SumBean();
        Integer upFlow = 0;
        Integer downFlow = 0;
        Integer upCountFlow = 0;
        Integer downCountFlow = 0;

        for (SumBean value : values) {
            upFlow += value.getUpFlow();
            downFlow += value.getDownFlow();
            upCountFlow += value.getUpCountFlow();
            downCountFlow += value.getDownCountFlow();
        }

        sumBean.setUpFlow(upFlow);
        sumBean.setDownFlow(downFlow);
        sumBean.setUpCountFlow(upCountFlow);
        sumBean.setDownCountFlow(downCountFlow);

        context.write(key,sumBean);
    }
}

1.4 JobMain

public class JobMain extends Configured implements Tool{

    @Override
    public int run(String[] strings) throws Exception {

        Job job = Job.getInstance(super.getConf(), "mapreduce_sum");

        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.setInputPaths(job,new Path("d:\\mapreduce\\demo_sum_in"));

        job.setMapperClass(SumMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(SumBean.class);

        job.setReducerClass(SumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(SumBean.class);

        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job,new Path("d:\\mapreduce\\demo_sum_out"));

        boolean bl = job.waitForCompletion(true);
        return bl ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        int run = ToolRunner.run(configuration, new JobMain(), args);
        System.exit(run);
    }
}

需求二：上行流量倒序排序（递减排序）

分析：以需求一的输出数据作为排序的输入数据，自定义FlowBean,以FlowBean为map输出的key，以手机号作为Map输出的value，因为MapReduce程序会对Map阶段输出的key进行排序

定义FlowBean实现WritableComparable实现比较排序

Java 的 compareTo 方法说明:

compareTo 方法用于将当前对象与方法的参数进行比较。
如果指定的数与参数相等返回 0。
如果指定的数小于参数返回 -1。
如果指定的数大于参数返回 1。

例如：o1.compareTo(o2); 返回正数的话，当前对象（调用 compareTo 方法的对象 o1）要排在比较对象（compareTo 传参对象 o2）后面，返回负数的话，放在前面

2.1 FlowBean

public class FlowBean implements WritableComparable<FlowBean> {

    private Integer upFlow;
    private Integer downFlow;
    private Integer upCountFlow;
    private Integer downCountFlow;

    public Integer getUpFlow() {
        return upFlow;
    }

    public void setUpFlow(Integer upFlow) {
        this.upFlow = upFlow;
    }

    public Integer getDownFlow() {
        return downFlow;
    }

    public void setDownFlow(Integer downFlow) {
        this.downFlow = downFlow;
    }

    public Integer getUpCountFlow() {
        return upCountFlow;
    }

    public void setUpCountFlow(Integer upCountFlow) {
        this.upCountFlow = upCountFlow;
    }

    public Integer getDownCountFlow() {
        return downCountFlow;
    }

    public void setDownCountFlow(Integer downCountFlow) {
        this.downCountFlow = downCountFlow;
    }

    @Override
    public String toString() {
        return upFlow + "\t" + downFlow + "\t" +upCountFlow + "\t" + downCountFlow;
    }

    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeInt(upFlow);
        dataOutput.writeInt(downFlow);
        dataOutput.writeInt(upCountFlow);
        dataOutput.writeInt(downCountFlow);
    }

    @Override
    public void readFields(DataInput dataInput) throws IOException {
        this.upFlow = dataInput.readInt();
        this.downFlow = dataInput.readInt();
        this.upCountFlow = dataInput.readInt();
        this.downCountFlow = dataInput.readInt();
    }

    @Override
    public int compareTo(FlowBean flowBean) {
        return flowBean.getUpFlow() - this.upFlow;
    }
}

2.2 Mapper

public class SortMapper extends Mapper<LongWritable,Text,FlowBean,Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        FlowBean flowBean = new FlowBean();
        String[] split = value.toString().split("\t");

        flowBean.setUpFlow(Integer.parseInt(split[1]));
        flowBean.setDownFlow(Integer.parseInt(split[2]));
        flowBean.setUpCountFlow(Integer.parseInt(split[3]));
        flowBean.setDownCountFlow(Integer.parseInt(split[4]));
        context.write(flowBean,new Text(split[0]));

    }
}

2.3 Reducer

public class SortReducer extends Reducer<FlowBean,Text,Text,FlowBean> {
    @Override
    protected void reduce(FlowBean key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        for (Text value : values) {
            context.write(value,key);
        }
    }
}

2.4 JobMain

public class JobMain extends Configured implements Tool {
    @Override
    public int run(String[] strings) throws Exception {

        Job job = Job.getInstance(super.getConf(), "mapreduce_sort");

        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.setInputPaths(job,new Path("d:\\mapreduce\\demo_sort_in"));

        job.setMapperClass(SortMapper.class);
        job.setMapOutputKeyClass(FlowBean.class);
        job.setMapOutputValueClass(Text.class);

        job.setReducerClass(SortReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(FlowBean.class);

        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job,new Path("d:\\mapreduce\\demo_sort_out"));

        boolean bl = job.waitForCompletion(true);
        return bl ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        int run = ToolRunner.run(configuration, new JobMain(), args);
        System.exit(run);
    }
}

需求三：手机号码分区

在需求一的基础上，继续完善，将不同的手机号分到不同的数据文件的当中去，需要自定义分区来实现，这里我们自定义来模拟分区，将以下数字开头的手机号进行分开

135 开头数据到一个分区文件
136 开头数据到一个分区文件
137 开头数据到一个分区文件
其他分区

这个分区的代码以前说过，这里不再详细讲，有需要可以访问 Hadoop_day05_MapReduce 的 Shuffle 详解（分区、排序、规约、分组）

自定义分区（partitioner）：

public class MyPartitioner extends Partitioner<Text,FlowBean> {
    @Override
    public int getPartition(Text text, FlowBean flowBean, int i) {
        String s = text.toString();
        if (s.startsWith("135")){
            return 0;
        }else if (s.startsWith("136")){
            return 1;
        }else if (s.startsWith("137")){
            return 2;
        }else {
            return 3;
        }
    }
}

JobMain

job.setPartitionerClass(MyPartitioner.class);
job.setNumReduceTasks(4);

chipeize

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Hadoop_day05_MapReduce 的经典案例（流量统计）

需求一：统计求和统计每个手机号的上行数据包总和，下行数据包总和，上行总流量之和，下行总流量之和分析：以手机号码作为key值，上行流量，下行流量，上行总流量，下行总流量四个字段作为value值，然后以这个key，和value作为map阶段的输出，reduce阶段的输入1.1 自定义SumBeanpublic class SumBean implements Writa...
复制链接

扫一扫