RawComparator用于 Writable对象的比较,
例如:
Job.setSortComparatorClass(Class <? extends RowComparator>);
Job.setGroupingComparatorClass(Class <? extends RowComparator>);
能作为Key的 Writable有以下特征:
必须实现 接口WritableComparable;
一般都包含一个扩展自WritableComparator 的比较器类。
而 WritableComparator类,实现了 RawComparator接口。
public interface WritableComparable<T> extends Writable, Comparable<T>;
public interface RawComparator<T> extends Comparator<T>;
public class WritableComparator implements RawComparator;
说明其中一个方法:
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);
该方法以字节方式比较两个Writable对象
做个实验,
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
...
private static final Logger log = LoggerFactory.getLogger(...class);
public static void main (String[] args) {
Text text = new Text(
"01234567890123456789012345678901234567890123456789"
+ "01234567890123456789012345678901234567890123456789"
+ "01234567890123456789012345678901234567890123456789"
+ "01234567890123456789012345678901234567890123456789"
+ "01234567890123456789012345678901234567890123456789"
+ "01234567890123456789012345678901234567890123456789");
/*
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT);
CharBuffer charBuffer = CharBuffer.wrap(text.toString().toCharArray());
ByteBuffer byteBuffer = encoder.encode(charBuffer);
int l1 = byteBuffer.limit();
byte[] byteArray = byteBuffer.array();
DataOutputBuffer out = new DataOutputBuffer();
WritableUtils.writeVInt(out, l1);
out.write(byteArray, 0, l1);
out.close();
byte[] b1 = out.getData();
*/
int l1 = text.toString().length();
byte[] b1 = WritableUtils.toByteArray(text);
int s1 = 0;
int n1 = WritableUtils.decodeVIntSize(b1[s1]);
log.info("[{}, {}]", l1, n1);
byte[] b2 = Arrays.copyOfRange(b1, s1 + n1, l1 + n1);
log.info(new String(b2));
}
执行结果,
[303, 3] 012345678901234567890123456789012345678901...
Text 会在序列化的时候,在字节数组的最开始,标示字符串的实际长度。上例中的注释部分
class Text:
public void write(DataOutput out) throws IOException {
WritableUtils.writeVInt(out, length);
out.write(bytes, 0, length);
}
RawComparator comparator = new RawComparator<Text> {
public int compare(Text t1, Text t2) {
return t1.toString.compareTo(t2.toString());
}
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
int n1 = WritableUtils.decodeVIntSize(b1[s1]);
int n2 = WritableUtils.decodeVIntSize(b2[s2]);
// Text的比较是这么实现的
// WritableComparator.compareBytes(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2);
// 其实完全可以这么干
byte[] _b1 = Arrays.copyOfRange(b1, s1 + n1, s1 + l1);
byte[] _b2 = Arrays.copyOfRange(b2, s2 + n2, s2 + l2);
String t1 = new String(_b1);
String t2 = new String(_b2);
return compare(new Text(t1), new Text(t2));
}
}