目录
1.需求分析
统计求和:统计每个手机号的上行流量总和,下行流量总和,上行总流量之和,下行总流量之和
分析:以手机号码作为key值,上行流量,下行流量,上行总流量,下行总流量四个字段作为value值,然后以这个key,和value作为map阶段的输出,reduce阶段的输入
2.代码实现
2.1 数据展示
2.2 解决思路
2.3 代码结构
2.3.1 FlowBean
package ucas.mapreduce_flowcount;
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class FlowBean implements Writable {
private Integer upFlow;
private Integer downFlow;
private Integer upCountFlow;
private Integer downCountFlow;
public Integer getUpFlow() {
return upFlow;
}
public void setUpFlow(Integer upFlow) {
this.upFlow = upFlow;
}
public Integer getDownFlow() {
return downFlow;
}
public void setDownFlow(Integer downFlow) {
this.downFlow = downFlow;
}
public Integer getUpCountFlow() {
return upCountFlow;
}
public void setUpCountFlow(Integer upCountFlow) {
this.upCountFlow = upCountFlow;
}
public Integer getDownCountFlow() {
return downCountFlow;
}
public void setDownCountFlow(Integer downCountFlow) {
this.downCountFlow = downCountFlow;
}
@Override
public String toString() {
return
upFlow +
"\t" + downFlow +
"\t" + upCountFlow +
"\t" + downCountFlow;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeInt(upFlow);
dataOutput.writeInt(downFlow);
dataOutput.writeInt(upCountFlow);
dataOutput.writeInt(downCountFlow);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
this.upFlow = dataInput.readInt();
this.downFlow = dataInput.readInt();
this.upCountFlow = dataInput.readInt();
this.downCountFlow = dataInput.readInt();
}
}
2.3.2 FlowCountMapper
package ucas.mapreduce_flowcount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class FlowCountMapper extends Mapper<LongWritable,Text,Text,FlowBean> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//1:拆分手机号
String[] split = value.toString().split("\t");
String phoneNum = split[1];
//2:获取四个流量字段
FlowBean flowBean = new FlowBean();
flowBean.setUpFlow(Integer.parseInt(split[6]));
flowBean.setDownFlow(Integer.parseInt(split[7]));
flowBean.setUpCountFlow(Integer.parseInt(split[8]));
flowBean.setDownCountFlow(Integer.parseInt(split[9]));
//3:将k2和v2写入上下文中
context.write(new Text(phoneNum), flowBean);
}
}
2.3.3 FlowCountReduce
package ucas.mapreduce_flowcount;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class FlowCountReducer extends Reducer<Text,FlowBean,Text,FlowBean> {
@Override
protected void reduce(Text key, Iterable<FlowBean> values, Context context) throws IOException, InterruptedException {
//封装新的FlowBean
FlowBean flowBean = new FlowBean();
Integer upFlow = 0;
Integer downFlow = 0;
Integer upCountFlow = 0;
Integer downCountFlow = 0;
for (FlowBean value : values) {
upFlow += value.getUpFlow();
downFlow += value.getDownFlow();
upCountFlow += value.getUpCountFlow();
downCountFlow += value.getDownCountFlow();
}
flowBean.setUpFlow(upFlow);
flowBean.setDownFlow(downFlow);
flowBean.setUpCountFlow(upCountFlow);
flowBean.setDownCountFlow(downCountFlow);
//将K3和V3写入上下文中
context.write(key, flowBean);
}
}
2.3.4 JobMain
package ucas.mapreduce_flowcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class JobMain extends Configured implements Tool {
@Override
public int run(String[] strings) throws Exception {
//创建一个任务对象
Job job = Job.getInstance(super.getConf(), "mapreduce_flowcount");
//打包放在集群运行时,需要做一个配置
job.setJarByClass(JobMain.class);
//第一步:设置读取文件的类: K1 和V1
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job, new Path("hdfs://192.168.0.101:8020/input/flowcount"));
//第二步:设置Mapper类
job.setMapperClass(FlowCountMapper.class);
//设置Map阶段的输出类型: k2 和V2的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FlowBean.class);
//第三,四,五,六步采用默认方式(分区,排序,规约,分组)
//第七步 :设置文的Reducer类
job.setReducerClass(FlowCountReducer.class);
//设置Reduce阶段的输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FlowBean.class);
//设置Reduce的个数
//第八步:设置输出类
job.setOutputFormatClass(TextOutputFormat.class);
//设置输出的路径
TextOutputFormat.setOutputPath(job, new Path("hdfs://192.168.0.101:8020/out/flowcount_out"));
boolean b = job.waitForCompletion(true);
return b?0:1;
}
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
//启动一个任务
int run = ToolRunner.run(configuration, new JobMain(), args);
System.exit(run);
}
}
3.运行及结果分析
3.1 准备工作
node01节点创建文件夹,并且上传文件,IDEA打包jar包,并且上传至 /export/software
3.2 运行代码及结果展示
运行命令:
hadoop jar day04_mapreduce_combiner-1.0-SNAPSHOT.jar ucas.mapreduce_flowcount.JobMain
运行计数器统计:
2020-10-11 00:00:04,735 INFO mapreduce.Job: map 0% reduce 0%
2020-10-11 00:00:11,866 INFO mapreduce.Job: map 100% reduce 0%
2020-10-11 00:00:18,936 INFO mapreduce.Job: map 100% reduce 100%
2020-10-11 00:00:24,066 INFO mapreduce.Job: Job job_1602327055253_0004 completed successfully
2020-10-11 00:00:24,238 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=663
FILE: Number of bytes written=432667
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2588
HDFS: Number of bytes written=556
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5093
Total time spent by all reduces in occupied slots (ms)=4175
Total time spent by all map tasks (ms)=5093
Total time spent by all reduce tasks (ms)=4175
Total vcore-milliseconds taken by all map tasks=5093
Total vcore-milliseconds taken by all reduce tasks=4175
Total megabyte-milliseconds taken by all map tasks=5215232
Total megabyte-milliseconds taken by all reduce tasks=4275200
Map-Reduce Framework
Map input records=22
Map output records=22
Map output bytes=613
Map output materialized bytes=663
Input split bytes=120
Combine input records=0
Combine output records=0
Reduce input groups=21
Reduce shuffle bytes=663
Reduce input records=22
Reduce output records=21
Spilled Records=44
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=170
CPU time spent (ms)=2360
Physical memory (bytes) snapshot=478408704
Virtual memory (bytes) snapshot=4846075904
Total committed heap usage (bytes)=303030272
Peak Map Physical memory (bytes)=371359744
Peak Map Virtual memory (bytes)=2409140224
Peak Reduce Physical memory (bytes)=107048960
Peak Reduce Virtual memory (bytes)=2436935680
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2468
File Output Format Counters
Bytes Written=556
运行结果展示: