1.什么是序列化?
- 序列化就是把内存中的对象,转换成字节序列(或其他数据传输协议)以便于存储到磁盘(持久化)和网络传输
- 反序列化就是将收到字节序列(或其他数据传输协议) 或者是磁盘的持久化数据,转换成内存中的对象
2.为什么要序列化?
- 一般来说,"活的"对象只能在内存中生存,关机断电就没有了,而且"活的"对象只能由本地的进程使用,不能被发送到网络上的另外一台计算机,然而序列化可以存储"活的"对象,可以将"活的"对象发送到远程计算机.
3.为什么不用java的序列化?
- java的序列化是一个重量级序列化框架(Serializeble),一个对象被序列化后,会附带很多额外的信息(各种校验信息,Header,继承体系等),不便于在网络中高效传输.所以,hadoop自己开发了一套序列化机制(Writable)
4.Hadoop序列化的特点:
- 紧凑:高效使用存储空间
- 快速:读写数据的额外开销小
- 可扩展性:随着通讯协议的升级而可升级
- 互操作:支持多语言的交互
5.序列化实操代码演示:
package com.czxy.hadoop.mapreduce.demo04;
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class MyHadoopWritable implements Writable,Comparable<MyHadoopWritable> {
private long upFlow;
private long downFlow;
private long sumFlow;
public MyHadoopWritable() {
super();
}
public void write(DataOutput out) throws IOException {
out.writeLong(upFlow);
out.writeLong(downFlow);
out.writeLong(sumFlow);
}
public void readFields(DataInput in) throws IOException {
upFlow = in.readLong();
downFlow = in.readLong();
sumFlow = in.readLong();
}
public int compareTo(MyHadoopWritable myHadoopWritable) {
return this.sumFlow > myHadoopWritable.getSumFlow() ? -1 : 1;
}
public long getUpFlow() {
return upFlow;
}
public void setUpFlow(long upFlow) {
this.upFlow = upFlow;
}
public long getDownFlow() {
return downFlow;
}
public void setDownFlow(long downFlow) {
this.downFlow = downFlow;
}
public long getSumFlow() {
return sumFlow;
}
public void setSumFlow(long sumFlow) {
this.sumFlow = sumFlow;
}
@Override
public String toString() {
return "MyHadoopWritable{" +
"upFlow=" + upFlow +
", downFlow=" + downFlow +
", sumFlow=" + sumFlow +
'}';
}
}