自定义key简介
hadoop中自定义key的组成是由writable类型组成。如果用java的数据类型,最终还是要转换成writable类型。
自定义key要继承WritableComparable接口,原因参考文章
Hadoop 的Writable序列化接口
自定义key例子
public class MyKeyWritable implements WritableComparable<MyKeyWritable> {
private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
private static final IntWritable.Comparator INT_COMPARATOR = new IntWritable.Comparator();
private Text value;
private IntWritable flag;
public MyKeyWritable() {
this.set(new Text(), new IntWritable());
}
public MyKeyWritable(Text value, IntWritable flag) {
this.set(value, flag);
}
public void set(Text value, IntWritable flag) {
this.value = value;
this.flag = flag;
}
public Text getValue() {
return value;
}
public IntWritable getFlag() {
return flag;
}
public void write(DataOutput out) throws IOException {
this.value.write(out);
this.flag.write(out);
}
public void readFields(DataInput in) throws IOException {
this.value.readFields(in);
this.flag.readFields(in);
}
@Override
public int hashCode() {
return super.hashCode();
}
@Override
public boolean equals(Object obj) {
if (!(obj instanceof MyKeyWritable))
return false;
MyKeyWritable sw = (MyKeyWritable) obj;
return this.value.equals(sw.value) && this.flag.equals(sw.flag);
}
@Override
public String toString() {
return this.value.toString() + "|" + this.flag.get();
}
public static class Comparator extends WritableComparator {
public Comparator() {
super(MyKeyWritable.class);
}
@Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
int thisValueLen = WritableUtils.decodeVIntSize(b1[s1]) + readInt(b1, s1);
int thatValueLen = WritableUtils.decodeVIntSize(b2[s2]) + readInt(b2, s2);
int res1 = TEXT_COMPARATOR.compare(b1, s1, thisValueLen, b2, s2, thatValueLen);
/*
a negative integer, zero, or a positive integer, first
argument is less than, equal to, or greater than the second
*/
if (res1 != 0)
return res1;
int res2 = INT_COMPARATOR.compare(b1, s1 + thisValueLen, l1 - thisValueLen, b2,
s2 + thatValueLen, l2 - thatValueLen);
return res2;
}
}
public int compareTo(MyKeyWritable o) {
int res = this.value.compareTo(o.value);
if (res != 0)
return res;
return this.flag.compareTo(o.flag);
}
static {
WritableComparator.define(MyKeyWritable.class, new Comparator());
}
}
分析
自定义key 继承了WritableComparable 接口,实现了Writable接口的write(DataOutput out)和readFields(DataInput in)两个方法,也实现了Comparable 接口的compareTo(T o)的方法,并且实现了Object 的equals(Object obj)方法,到此一个自定义key就实现了
为什么要在用内部类实现WritableComparator类呢?
虽然实现了compareTo(MyKeyWritable o) ,但是他进行比较的时候必须是对象之间进行比较,在数据传递过程中已经将其反序列化成字节流,因此在比较时,需要将对象的字节流进行序列化,然后进行比较,序列化是要消耗资源和性能的,为了提高比较效率,实现WritableComparator类或者RawComparator接口,实现其compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) 方法,就不需要序列化,以字节的方式去比较,效率得以提高。
TEXT_COMPARATOR 、INT_COMPARATOR 是Text和IntWritable里面WritableComparator的实现,我们可以直接去使用,只不过在自定义的时候对其进行了整合,为我所用。(这里可以浏览源码去了解)
下面代码是注册者个比较器
static {
WritableComparator.define(MyKeyWritable .class, new Comparator());
}