手机流量统计项目实现(中)
实现思路:
- 根据手机号进行分组,然后把该手机号对应的上下行流量加起来
- Mapper: 把手机号、上行流量、下行流量拆开
把手机号作为key,把Access作为value写出去 - Reducer形如:(“手机号”,<Access,Access>)
- 自定义分区类(需要继承Partitioner抽象类),并覆写getPartition()方法
具体操作:
(3)编写Reduce任务类(Reduce Task)
对每个手机号的流量进行汇总,Map输出数据为:
phone==>Access(手机号,上行流量和,下行流量和)
也可以优化为:
phone==>Access(NullWritable对象,上行流量和,下行流量和)
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class FlowReducer extends Reducer<Text, FlowBean, Text, FlowBean>{
@Override
protected void reduce(Text key, Iterable<FlowBean> values, Context context)
throws IOException, InterruptedException {
long sumUpFlow=0;
long sumDownFlow=0;
System.out.println(values);
for (FlowBean flowBean: values) {
sumUpFlow+=flowBean.getUpflow();
sumDownFlow+=flowBean.getDownflow();
}
FlowBean v=new FlowBean(sumUpFlow,sumDownFlow);
context.write(key, v);
}
}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class FlowDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
if (args.length < 2) {
System.err.println("Usage: FlowDriver <inputPath> <outputPath>");
System.exit(1);
}
Configuration configuration = new Configuration();
Job job = Job.getInstance(configuration, "Flow Calculation");
job.setJarByClass(FlowDriver.class);
job.setMapperClass(FlowMapper.class);
job.setReducerClass(FlowReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FlowBean.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FlowBean.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
(4)编写分区处理类
继承org.apache.hadoop.mapreduce.Partitioner类,"13"开头的手机号交给第一个ReduceTask任务处理,最终输出到0号分区,"15"开头的手机号交给第二个ReduceTask任务处理,最终输出到1号分区,其余手机号交给第三个ReduceTask任务处理,最终输出到2号分区。
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class PhonePartitioner extends Partitioner<Text, FlowBean> {
@Override
public int getPartition(Text key,FlowBean value, int numPartitions) {
String phonePrefix = key.toString().substring(0, 2);
switch (phonePrefix) {
case "13":
return 0;
case "15":
return 1;
default:
return 2;
}
}
}
至此,代码部分结束!