RawComparator

RawComparator用于 Writable对象的比较,

例如:

Job.setSortComparatorClass(Class <? extends RowComparator>);
Job.setGroupingComparatorClass(Class <? extends RowComparator>);

 

 

能作为Key的 Writable有以下特征:

 必须实现 接口WritableComparable;

 一般都包含一个扩展自WritableComparator  的比较器类。

 

而 WritableComparator类,实现了 RawComparator接口。

 

public interface WritableComparable<T> extends Writable, Comparable<T>;

public interface RawComparator<T> extends Comparator<T>;

public class WritableComparator implements RawComparator;

 

 

说明其中一个方法:

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);

该方法以字节方式比较两个Writable对象

 

做个实验,

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

...
private static final Logger log = LoggerFactory.getLogger(...class);

public static void main (String[] args) {
	Text text = new Text(
		"01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789"
		+ "01234567890123456789012345678901234567890123456789");

	/*
	CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder()
				.onMalformedInput(CodingErrorAction.REPORT)
				.onUnmappableCharacter(CodingErrorAction.REPORT);
	CharBuffer charBuffer = CharBuffer.wrap(text.toString().toCharArray());
	ByteBuffer byteBuffer = encoder.encode(charBuffer);
	int l1 = byteBuffer.limit();

	byte[] byteArray = byteBuffer.array();
	DataOutputBuffer out = new DataOutputBuffer();
	WritableUtils.writeVInt(out, l1);
	out.write(byteArray, 0, l1);
	out.close();
	byte[] b1 = out.getData();
    */
	int l1 = text.toString().length();
	byte[] b1 = WritableUtils.toByteArray(text);

	int s1 = 0;
	int n1 = WritableUtils.decodeVIntSize(b1[s1]);

	log.info("[{}, {}]", l1, n1);

	byte[] b2 = Arrays.copyOfRange(b1, s1 + n1, l1 + n1);
	log.info(new String(b2));
}

 

执行结果,

[303, 3]
012345678901234567890123456789012345678901...

 

Text 会在序列化的时候,在字节数组的最开始,标示字符串的实际长度。上例中的注释部分

class Text:
public void write(DataOutput out) throws IOException {
	WritableUtils.writeVInt(out, length);
	out.write(bytes, 0, length);
}
 

 

RawComparator comparator = new RawComparator<Text> {

	public int compare(Text t1, Text t2) { 
		return t1.toString.compareTo(t2.toString());
	}

	public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
		int n1 = WritableUtils.decodeVIntSize(b1[s1]);
		int n2 = WritableUtils.decodeVIntSize(b2[s2]);

		// Text的比较是这么实现的 
		// WritableComparator.compareBytes(b1, s1 + n1, l1 - n1, b2, s2 + n2, l2 - n2);

		// 其实完全可以这么干
		byte[] _b1 = Arrays.copyOfRange(b1, s1 + n1, s1 + l1);
		byte[] _b2 = Arrays.copyOfRange(b2, s2 + n2, s2 + l2);
		String t1 = new String(_b1);
		String t2 = new String(_b2);
		return compare(new Text(t1), new Text(t2));
	}

}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值