序列化
序列化是将对象转换为字节序列的过程,反序列化是将字节序列恢复为对象的过程
在分布式数据处理的两大用途: 进程通信和持久存储
进程通信使用RPC,RPC序列化格式:紧凑,快速,可扩展,支持互操作
四个属性的重要性
进程通信 | 持久存储 | |
紧凑 | 充分利用带宽 | 高效使用存储空间 |
快速 | 减少性能开销 | 减少读/写开销 |
可扩展 | 满足需求变化 | 透明读取旧格式数据 |
支持互操作 | 不同语言的服务端和客户端交互 | 不同语言读/写 |
Writable接口
public interface Writable {
/**
* Serialize the fields of this object to <code>out</code>.
* 将数据写入到二进制流中
* @param out <code>DataOuput</code> to serialize this object into.
* @throws IOException
*/
void write(DataOutput out) throws IOException;
/**
* Deserialize the fields of this object from <code>in</code>.
* 从二进制流中读取数据
* <p>For efficiency, implementations should attempt to re-use storage in the
* existing object where possible.</p>
*
* @param in <code>DataInput</code> to deseriablize this object from.
* @throws IOException
*/
void readFields(DataInput in) throws IOException;
}
public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}
Writable接口和comparator
WritableComparable接口
public interface WritableComparable<T> extends Writable, Comparable<T> {
}
public class MyWritableComparable implements WritableComparable<MyWritableComparable> {
// Some data
private int counter;
private long timestamp;
private int value;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable o) {
int thisValue = this.value;
int thatValue = o.value;
return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + counter;
return result;
}
}
RawComparator接口
public interface RawComparator<T> extends Comparator<T> {
/**
* Compare two objects in binary.
* b1[s1:l1] is the first object, and b2[s2:l2] is the second object.
*
* @param b1 The first byte array.
* @param s1 The position index in b1. The object under comparison's starting index.
* @param l1 The length of the object in b1.
* @param b2 The second byte array.
* @param s2 The position index in b2. The object under comparison's starting index.
* @param l2 The length of the object under comparison in b2.
* @return An integer result of the comparison.
*/
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
}
compare()方法直接对比数据流记录,无需反序列化再对比,节省创建对象的开销
WritableComparator类是RawComparator的实现,提供多种关于序列化的方法
public class WritableComparator implements RawComparator, Configurable{}
Writable类
Hadoop提供广泛的Writable类
1) 基本数据类型的Writable封装器,get()/set()读取或存储
变长格式字节不固定,如果整数很小,使用变长字节可以节省空间
分布均匀使用定长,分布不均匀使用变长
2) Text类型:针对UTF-8序列的Writable类,java.lang.String的Writable等价类
Text和String类的差异
索引/迭代/可变性/对String重新排序
3) BytesWritable:二进制数据数组的封装
4) NullWritable:
(1) 序列化长度为0,不读不写,占位符
(2) 单例模式 NullWritable.get()获取实例
(3) 用在MapReduce中,将键/值设置为NullWritable
(4) 在sequencefile中,可以用作sequencefile的键
5) ObjectWritable和GenericWritable
ObjectWritable对Java基本类型的通用封装,用于RPC中对方法参数和返回类型进行封装和解封装
GenericWritable,一个字段包含多种类型时使用,只写封装类型的名称,通过类型引用,静态类型的数组,加入位置索引提高性能,如sequencefile的值包含多种类型
6) Writable集合类(6个)
ArrayWritable, TwoDArrayWritable:数组和二维数组
ArrayPrimitiveWritable Java基本数组类型的封装
MapWriable, SortedMapWritable 分别实现了java.util.Map<Writable,Writable>和java.util.Map<WritableComparable, Writable>
EnumMapWritable 集合的枚举类型