八天学会hadoop (3) 流量排序统计实战

流量统计实战

 

先复习一波hadoop shuffle的过程

1.map task 过程中会把数据写入到内存中,在spill写入之前,会先进行二次排序,首先根据数据所属的partition进行排序,然后每个partition中的数据再按key来排序。

2. 接着会进行combine过程(如果设置了combiner了的话) combine本身也是一个reducer 会对写入到磁盘的数据处理,期望减少写入到磁盘数据的大小

3.当数据达到阈值,会进行spill 生成多个磁盘文件。,多个磁盘文件进行归并排序。map task的shuffle就此结束。

map task     partition->sort->combine->spill(磁盘)->归并排序

 

reduce task 会接受不同 map task 同一个的partition的数据 然后又进行归并排序。然后进行reduce处理 最终结果写到hdfs上面。

特别说明:

如果指定reduceTask的数量为1 那么任何数摸1都会得0 其实就一个分区号0 如果指定reduceTask 其实最多就2个分区 分区号0 和分区号1 

 

 

自定义hadoop 可以比较 可以序列化的bean

public class FlowBean implements WritableComparable<FlowBean> {


    public FlowBean() {
    }

    public FlowBean(long downFlow, long upFlow, String phoneNumber) {
        this.downFlow = downFlow;
        this.upFlow = upFlow;
        this.phoneNumber = phoneNumber;
        this.sumFlow=upFlow+downFlow;
    }

    /**
     *   this.sumFlow指定参数  返回-1 指定参数要靠前 在这里 总流量大的靠前
     * @param o 方法参数
     * @return
     */
    @Override
    public int compareTo(FlowBean o) {
        return this.sumFlow>o.sumFlow?-1:1;
    }

    /**
     *  从磁盘到内存
     * @param dataOutput
     * @throws IOException
     */
    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(phoneNumber);
        dataOutput.writeLong(upFlow);
        dataOutput.writeLong(downFlow);
        dataOutput.writeLong(sumFlow);
    }

    /**
     *  序列化 从内存到磁盘
     * @param dataInput
     * @throws IOException
     */
    @Override
    public void readFields(DataInput dataInput) throws IOException {
        phoneNumber=dataInput.readUTF();
        upFlow=dataInput.readLong();
        downFlow=dataInput.readLong();
        sumFlow=dataInput.readLong();

    }





    private long upFlow;
    private long downFlow;
    private long sumFlow;
    private String phoneNumber;

    public long getUpFlow() {
        return upFlow;
    }

    public void setUpFlow(long upFlow) {
        this.upFlow = upFlow;
    }

    public long getDownFlow() {
        return downFlow;
    }

    public void setDownFlow(long downFlow) {
        this.downFlow = downFlow;
    }

    public long getSumFlow() {
        return sumFlow;
    }

    public void setSumFlow(long sumFlow) {
        this.sumFlow = sumFlow;
    }

    public String getPhoneNumber() {
        return phoneNumber;
    }

    public void setPhoneNumber(String phoneNumber) {
        this.phoneNumber = phoneNumber;
    }

    public void set(String phoneNumber, long upFlow, long downFlow) {
        this.phoneNumber=phoneNumber;
        this.upFlow=upFlow;
        this.downFlow=downFlow;
        this.sumFlow=upFlow+downFlow;
    }

    @Override
    public String toString() {
       return "\t"+upFlow+"\t"+downFlow+"\t"+sumFlow;
    }
}

代码runner如下

public class FlowSort {

    public static class FlowSortMapper extends Mapper<Object, Text, FlowBean, NullWritable> {


        private FlowBean outKey = new FlowBean();

        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String regex = "\\s+";
            String[] split = line.split(regex);
            String phoneNumber = "";
            long upFlow=0;
            long downFlow=0;
            try {
                phoneNumber = split[1];
                upFlow= Long.parseLong(split[8]);
                downFlow = Long.parseLong(split[9]);
            } catch (Exception e) {
                context.getCounter("FlowSort","splitException").increment(1);
                return;
            }
            outKey.set(phoneNumber, upFlow, downFlow);
            context.write(outKey,NullWritable.get());
        }
    }


    public static class FlowSortReducer extends Reducer<FlowBean, NullWritable, Text, FlowBean> {
        private Text outKey = new Text();
        // 这里针对一个reduceTask或者说针对一个partition 但是一个partition可能有多个hashcode不一样的key
        @Override
        protected void reduce(FlowBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
            // 这里只针对同一个key
            String phoneNumber = key.getPhoneNumber();
            outKey.set(phoneNumber);
            context.write(outKey,key);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        FileUtils.deleteDirectory(new File("E:\\IdeaProjects\\hadoopstudy\\data\\flowdata\\results"));
        Job job = Job.getInstance(configuration, "flowSort");
        job.setJarByClass(FlowSort.class);
        job.setReducerClass(FlowSort.FlowSortReducer.class);
        job.setMapOutputKeyClass(FlowBean.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputKeyClass(FlowBean.class);

        job.setNumReduceTasks(1);
        MultipleInputs.addInputPath(job,new Path(args[0]), TextInputFormat.class,FlowSort.FlowSortMapper.class);
        FileOutputFormat.setOutputPath(job,new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }


}

 

 

 

map到reduce中间的shuffle会根据key进行sort哦  

好的 我们下次再见

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值