我们现在有一个目录data下保持了一些日志文件,文件的内容格式如下
7 13560436666 120.196.100.99 1116 954 200 |
需求统计每一个手机号耗费的总上行流量、总下行流量、总流量
- 封装一个FlowBean类,保存手机号,上行流量,下行流量,并实现序列化。
- 在map方法中切割数据,拆分出手机号,上行流量,下行流量,并实例化FlowBean对象。
- 在reduce方法中对数据进行汇总
- (三)编写流量统计的Bean对象
- 通过上面的分析,我们总结出基本步骤是:
- 写一个Bean对象,保存流量信息
- 实现Writable接口
package com.root.mapreduce.writable; import org.apache.hadoop.io.Writable; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; //1 继承Writable接口 public class FlowBean implements Writable { private long upFlow; //上行流量 private long downFlow; //下行流量 //2 提供无参构造 public FlowBean() { } //3 提供参数的getter和setter方法 public long getUpFlow() { return upFlow; } public void setUpFlow(long upFlow) { this.upFlow = upFlow; } public long getDownFlow() { return downFlow; } public void setDownFlow(long downFlow) { this.downFlow = downFlow; } //4 实现序列化和反序列化方法,注意顺序一定要保持一致 @Override public void write(DataOutput dataOutput) throws IOException { dataOutput.writeLong(upFlow); dataOutput.writeLong(downFlow); } @Override public void readFields(DataInput dataInput) throws IOException { this.upFlow = dataInput.readLong(); this.downFlow = dataInput.readLong(); } }
(四)编写Mapper类
通过上面的分析,在Map方法中,我们做数据拆分并实例化对象。这里要特别注意一下泛型的类型。
package com.root.mapreduce.writable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class FlowMapper extends Mapper<LongWritable, Text, Text, FlowBean> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //1 获取一行数据,转成字符串 String line = value.toString(); //2 切割数据 String[] split = line.split("\t"); //3 抓取我们需要的数据:手机号,上行流量,下行流量 String phone = split[0]; Long up = Long.parseLong(split[split.length - 1]); Long down = Long.parseLong(split[split.length - 2]); //4 封装对象 FlowBean flowBean = new FlowBean(up, down); //5 写出outK outV context.write(phone, flowBean); } }
(五)编写Reducer类
在reducer类中,主要做汇总和输出的设置。
package com.root.mapreduce.writable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class FlowReducer extends Reducer<Text, FlowBean, Text, Text> { private FlowBean outV = new FlowBean(); @Override protected void reduce(Text key, Iterable<FlowBean> values, Context context) throws IOException, InterruptedException { long totalUp = 0; long totalDown = 0; //1 遍历values,将其中的上行流量,下行流量分别累加 for (FlowBean flowBean : values) { totalUp += flowBean.getUpFlow(); totalDown += flowBean.getDownFlow(); } //2 计算总流量 String flowDesc = String.format("总上行流量: %d,总下行流量:%d,总流量:%d\n", totalUp, totalDown, (totalUp+totalDown)) //3 写出outK outV context.write(key,flowDesc); } }
六)编写Driver驱动类
dirver类是常规写法,一共有7步,与之前学习的内容一致。
package com.root.mapreduce.writable; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; public class FlowDriver { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { //1 获取job对象 Configuration conf = new Configuration(); Job job = Job.getInstance(conf); //2 关联本Driver类 job.setJarByClass(FlowDriver.class); //3 关联Mapper和Reducer job.setMapperClass(FlowMapper.class); job.setReducerClass(FlowReducer.class); //4 设置Map端输出KV类型 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(FlowBean.class); //5 设置程序最终输出的KV类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); //6 设置程序的输入输出路径 FileInputFormat.setInputPaths(job, new Path("D:\\inputflow")); FileOutputFormat.setOutputPath(job, new Path("D:\\flowoutput")); //7 提交Job boolean b = job.waitForCompletion(true); System.exit(b ? 0 : 1); } }
- 测试使用
运行程序,查看效果。