Hadoop并没有采用Java的序列化,而是引入了它自己的系统。
Hadoop中定义了两个序列化相关的接口:Writable接口(hadoop)和Comparable接口(Java),这两个接口可以合成一个接口WritableComparable.
Writable接口,所有实现了Writable接口的类都可以被序列化和反序列化;
Comparable接口,主要是通过字节流比较序列化的对象以提高比较效率;
public interface Writable {
void write(DataOutput out) throws IOException; //序列化void readFields(DataInput in) throws IOException; //反序列化
}//DoubleWritable的Comprator片段
public static class Comparator extends WritableComparator {
public Comparator() {
super(DoubleWritable.class);
}//其中s1和s2表示各自字节数组的起始位置,l1和l2表示各自字节数组在起始位置后的长度
public int compare(byte[] b1, int s1, int l1,
byte[] b2, int s2, int l2) {
double thisValue = readDouble(b1, s1);
double thatValue = readDouble(b2, s2);
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
}
}static { //注册该类所用的Comparator
WritableComparator.define(DoubleWritable.class, new Comparator());
}}
Hadoop中的序列化类型:
实现了WritableComparable接口的类:
基础:BooleanWritable | ByteWritable
数字:IntWritable | VIntWritable | FloatWritable | LongWritable | VLongWritable | DoubleWritable
高级:NullWritable | Text | BytesWritable | MDSHash | ObjectWritable | GenericWritable
仅实现了Writable接口的类:
数组:ArrayWritable | TwoDArrayWritable
映射:AbstractMapWritable | MapWritable | SortedMapWritable
Note:VIntWritable和VLongWritable 这两个是同一個实现,将数字转化成变长的字節流,数字越小,字符流越短。
Text 经常使用,序列化为字符流长度 + String的UTF8编码,最大2G。