Writable
Writable是一个可序列化的对象,它基于 DataInput 和 DataOutput实现了一个简单、高效的序列化协议。
Hadoop Map-Reduce 框架中的任何Key或者Value类型都要实现这个接口。
Writable接口的实现,通常都是要实现 read(DataInput) 静态方法,该静态方法将会构造一个新的实例,通过调用 readFields(DataInput) 并返回实例。
下面是writable接口的定义:
public interface Writable {
void write(DataOutput out) throws IOException;
void readFields(DataInput in) throws IOException;
}
该接口有两个方法,write方法用以序列化该对象的fields到out,也就是写操作;readFields用于从输入in中反序列化对象的fields,即读操作。
下面是该接口的一个实现示例:
public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}
WritableComparable
WritableComparables 对象之间可以互相比较,通常是通过Comparators进行比较,在Hadoop Map-Reduce框架中任何作为key的类型都需要实现这个接口。
需要注意的是,在Hadoop中经常使用hashCode()来对keys进行分区。重要的是,hashCode()的实现在不同的JVM实例中返回相同的结果。还要注意的是,Object中的默认hashCode()实现不满足这个属性。
WritableComparable继承自Writable和java.lang.Comparable接口,是一个 Writable 也是一个Comparable,也就是说,既可以序列化,也可以比较!其定义如下:
public interface WritableComparable<T> extends Writable, Comparable<T> {
}
WritableComparable的有很多的实现类,包括:
BooleanWritable, BytesWritable, ByteWritable, DoubleWritable, FloatWritable, ID, ID, IntWritable, JobID, JobID, LongWritable, MD5Hash, NullWritable, Record, RecordTypeInfo, ShortWritable, TaskAttemptID, TaskAttemptID, TaskID, TaskID, Text, VIntWritable, VLongWritable
下面是的一个实现示例:
public class MyWritableComparable implements WritableComparable {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable o) {
int thisValue = this.value;
int thatValue = o.value;
return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
}
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + counter;
result = prime * result + (int) (timestamp ^ (timestamp >>> 32));
return result
}
}
RawComparator
对于MapReduce来说,因为中间有个基于键的排序阶段,所以类型的比较是非常重要的。Hadoop中提供了原生的比较接口RawComparator,该接口继承于 Java Comparator 接口。RawComparator接口允许其实现直接比较数据流中的记录,无需先把数据流反序列化为对象,这样避免了新建对象的额外开销。
下面是 RawComparator 接口的定义:
public interface RawComparator<T> extends Comparator<T> {
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
}
接口RawComparator只有一个方法compare(),用以以二进制的形式比较两个对象。
该接口并非被多数的衍生类实现,其子类包括如下:
org.apache.hadoop.io.serializer.DeserializerComparator, JavaSerializationComparator, KeyFieldBasedComparator, KeyFieldBasedComparator, RecordComparator, WritableComparator
其中有一个实现类 WritableComparator,有很多的类都是继承了这个类。
WritableComparator
WritableComparator是 WritableComparables的比较器。这个基础实现类使用是自然的排序规则,如果要使用其他的排序规则,需要重写compare(WritableComparable,WritableComparable)方法。
可以通过重写compare(byte[],int,int,byte[],int,int)方法来对比较密集型操作进行优化。静态实用方法的提供,可以协助该方法的优化实现。
上面我们提到过WritableComparator是 WritableComparables对象的比较器,那么它是如何实现的,通过查看 WritableComparable 的实现类的源代码,可以知道这些类中都有一个名为Comparator的内部类,这个Comparator内部类都继承了WritableComparator类。也就是说WritableComparator是对BooleanWritable.Comparator, BytesWritable.Comparator, ByteWritable.Comparator, DoubleWritable.Comparator, FloatWritable.Comparator, IntWritable.Comparator, LongWritable.Comparator, MD5Hash.Comparator, NullWritable.Comparator, RecordComparator, Text.Comparator, UTF8.Comparator这些类的一个通用实现!
WritableComparator类似于一个注册表,里面记录了所有Comparator类的集合。Comparators成员用一张Hash表记录Key=Class,value=WritableComprator的注册信息.这就是它能够充当RawComparator实例工厂的原因!ConcurrentHashMap<Class,WritableComparator>根据对应的Class,就能返回一个相应的WritableComparator!
下面是 WritableComparator类中关于该hashmap的定义:
private static final ConcurrentHashMap<Class, WritableComparator> comparators
= new ConcurrentHashMap<Class, WritableComparator>(); // registry
总结
上面简单了解了 Writable、WritableComparable、RawComparator、WritableComparator几个接口和类的实现以及他们的作用,下面用一张类结构图来表示说明他们之间的关系。
参考:
http://hadoop.apache.org/docs/r2.9.1/api/org/apache/hadoop/io/Writable.html
http://hadoop.apache.org/docs/r2.9.1/api/org/apache/hadoop/io/WritableComparable.html
http://hadoop.apache.org/docs/r2.9.1/api/org/apache/hadoop/io/RawComparator.html
http://hadoop.apache.org/docs/r2.9.1/api/org/apache/hadoop/io/WritableComparator.html