1,是什么?
MapReduvce的序列化是将结构化的对象转化为字节流,而反序列化则是将字节流转化为结构化对象。
2,为什么?
因为一台服务器内存中的数据是无法直接传送到另一台服务器的,所以要将数据转化为字节流再进行传输。
3,怎么办?
由于Java的序列化是一个重量级的框架,会产生一大堆额外信息(如:各种校验信息,继承关系等),导致字节流格外臃肿,不利于高效的传输,因此Hadoop自己开发了一套序列化机制:writable,
常用数据类型对应的Hadoop数据序列化类型:
Java数据类型 | Hadoop序列化数据类型 |
int | IntWritable |
String | Text |
boolean | BooleanWritable |
byte | ByteWritable |
float | FloatWritable |
long | LongWritable |
double | DoubleWritable |
map | MapWritable |
array | ArrayWritable |
其次如果以上数据类型无法满足我们的需求,我们可以自定义bean对象实现序列化接口。
我们通过一个简单的流量统计案例来介绍:
输入数据
1 13736230513 192.196.100.1 www.atguigu.com 2481 24681 200
2 13846544121 192.196.100.2 www.hao123.com 264 0 200
3 13956435636 192.196.100.3 www.hao123.com 132 1512 200
4 13966251146 192.168.100.1 www.hao123.com 240 0 404
5 18271575951 192.168.100.2 www.atguigu.com 1527 2106 200
6 84188413 192.168.100.3 www.atguigu.com 4116 1432 200
7 13590439668 192.168.100.4 www.hao123.com 1116 954 200
8 15910133277 192.168.100.5 www.hao123.com 3156 2936 200
9 13729199489 192.168.100.6 www.hao123.com 240 0 200
10 13630577991 192.168.100.7 www.shouhu.com 6960 690 200
11 15043685818 192.168.100.8 www.baidu.com 3659 3538 200
12 15959002129 192.168.100.9 www.atguigu.com 1938 180 500
13 13560439638 192.168.100.10 www.hao123.com 918 4938 200
14 13470253144 192.168.100.11 www.hao123.com 180 180 200
15 13682846555 192.168.100.12 www.qq.com 1938 2910 200
16 13992314666 192.168.100.13 www.gaga.com 3008 3720 200
17 13509468723 192.168.100.14 www.qinghua.com 7335 110349 404
18 18390173782 192.168.100.15 www.sogou.com 9531 2412 200
19 13975057813 192.168.100.16 www.baidu.com 11058 48243 200
20 13768778790 192.168.100.17 www.hao123.com 120 120 200
21 13568436656 192.168.100.18 www.alibaba.com 2481 24681 200
22 13568436656 192.168.100.19 1116 954 200
数据格式
7 13560436666 120.196.100.99 www.hao123.com 1116 954 200
id 手机号码 网络ip 域名 上行流量 下行流量 网络状态码
3)编写MapReduce程序
(1)编写流量统计的FlowBean对象
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
//1,继承Writable接口
public class FlowBean implements Writable {
//2,自定义需要的参数
private long upFlow;//上行流量
private long downFlow;//下行流量
private long sumFlow;//总流量
//3,无参构造
public FlowBean() {
}
//4,提供三个参数的getter和setter方法
public long getUpFlow() {
return upFlow;
}
public void setUpFlow(long upFlow) {
this.upFlow = upFlow;
}
public long getDownFlow() {
return downFlow;
}
public void setDownFlow(long downFlow) {
this.downFlow = downFlow;
}
public long getSumFlow() {
return sumFlow;
}
public void setSumFlow(long sumFlow) {
this.sumFlow = sumFlow;
}
//5,序列化方法
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeLong(upFlow);
dataOutput.writeLong(downFlow);
dataOutput.writeLong(sumFlow);
}
//6,反序列化方法,注意顺序要与序列化一致
@Override
public void readFields(DataInput dataInput) throws IOException {
this.upFlow= dataInput.readLong();
this.downFlow= dataInput.readLong();
this.sumFlow= dataInput.readLong();
}
//7,重写toString方法
@Override
public String toString() {
return upFlow +"\t"+downFlow +"\t"+ sumFlow;
}
}
(2)Mapper类
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class map extends Mapper<LongWritable, Text,Text,FlowBean>{
private Text outK=new Text();
private FlowBean outV=new FlowBean();
@Override
protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException {
//获取一行数据
String v= value.toString();
//切割数据
String[] str=v.split("\t");
//获取手机号码
String phone=str[1];
//获取上行流量和下行流量
if (str[4]!=null||!str[4].equals("")||str[5]!=null||!str[5].equals("")) {//过滤空字符串和 “”情况
if(str[4].matches("^[0-9]+$")||str[4].matches("^[0-9]+$")){//过滤非数字字符串
long up=Long.parseLong(str[4].trim());
long down=Long.parseLong(str[5].trim());
long sum=up+down;
//封装
outK.set(phone);
outV.setUpFlow(up);
outV.setDownFlow(down);
outV.setSumFlow(sum);
//写出
context.write(outK,outV);
}
}
}
}
(3)Reducer类
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class reducer extends Reducer<Text,FlowBean,Text,FlowBean> {
private FlowBean outV=new FlowBean();
@Override
protected void reduce(Text key, Iterable<FlowBean> values, Reducer<Text, FlowBean, Text, FlowBean>.Context context) throws IOException, InterruptedException {
long sum_upFlow=0;
long sum_downFlow=0;
long sum_sumFlow=0;
//遍历,将上行流量和下行流量以及总流量进行累加
for(FlowBean v:values){
sum_upFlow+= v.getUpFlow();
sum_downFlow+= v.getDownFlow();
sum_sumFlow+=v.getSumFlow();
}
//封装
outV.setUpFlow(sum_upFlow);
outV.setDownFlow(sum_downFlow);
outV.setSumFlow(sum_sumFlow);
//写出
context.write(key,outV);
}
}
(4)驱动类
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class main {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Job job=Job.getInstance(new Configuration());
job.setJarByClass(main.class);
//指定job的mapper的输入和输出类型
job.setMapperClass(map.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FlowBean.class);
//指定job的reducer的输入和输出类型
job.setReducerClass(reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FlowBean.class);
//指定job的输入和输出路径
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
//执行任务
job.waitForCompletion(true);
}
}
运行结果
13470253144 180 180 360
13509468723 7335 110349 117684
13560439638 918 4938 5856
13568436656 2481 24681 27162
13590439668 1116 954 2070
13630577991 6960 690 7650
13682846555 1938 2910 4848
13729199489 240 0 240
13736230513 2481 24681 27162
13768778790 120 120 240
13846544121 264 0 264
13956435636 132 1512 1644
13966251146 240 0 240
13975057813 11058 48243 59301
13992314666 3008 3720 6728
15043685818 3659 3538 7197
15910133277 3156 2936 6092
15959002129 1938 180 2118
18271575951 1527 2106 3633
18390173782 9531 2412 11943
84188413 4116 1432 5548
如有疑问可在评论区留言