HBase Filter 过滤器之 Comparator 原理及源码学习

最新推荐文章于 2021-12-06 22:15:15 发布

禅克

最新推荐文章于 2021-12-06 22:15:15 发布

阅读量1.1k

点赞数

本文链接：https://blog.csdn.net/weixin_42047967/article/details/105757146

版权

本文深入剖析HBase Filter中的Comparator，包括BinaryComparator、BinaryPrefixComparator、BitComparator、LongComparator、NullComparator、RegexStringComparator和SubstringComparator。详细解释每个比较器的工作原理和源码实现，通过实例演示比较规则，帮助理解它们在HBase过滤中的应用。

摘要由CSDN通过智能技术生成

前言：上篇文章HBase Filter 过滤器概述对HBase过滤器的组成及其家谱进行简单介绍，本篇文章主要对HBase过滤器之比较器作一个补充介绍，也算是HBase Filter学习的必备低阶魂技吧。本篇文中源码基于HBase 1.1.2.2.6.5.0-292 HDP版本。

HBase所有的比较器实现类都继承于父类ByteArrayComparable，而ByteArrayComparable又实现了Comparable接口；不同功能的比较器差别在于对父类compareTo()方法的重写逻辑不同。

下面分别对HBase Filter默认实现的七大比较器一一进行介绍。

1. BinaryComparator

介绍：二进制比较器，用于按字典顺序比较指定字节数组。

先看一个小例子：

public class BinaryComparatorDemo {

    public static void main(String[] args) {

        BinaryComparator bc = new BinaryComparator(Bytes.toBytes("bbb"));

        int code1 = bc.compareTo(Bytes.toBytes("bbb"), 0, 3);
        System.out.println(code1); // 0
        int code2 = bc.compareTo(Bytes.toBytes("aaa"), 0, 3);
        System.out.println(code2); // 1
        int code3 = bc.compareTo(Bytes.toBytes("ccc"), 0, 3);
        System.out.println(code3); // -1
        int code4 = bc.compareTo(Bytes.toBytes("bbf"), 0, 3);
        System.out.println(code4); // -4
        int code5 = bc.compareTo(Bytes.toBytes("bbbedf"), 0, 6);
        System.out.println(code5); // -3
    }
}

不难看出，该比较器的比较规则如下：

两个字符串首字母不同，则该方法返回首字母的asc码的差值
参与比较的两个字符串如果首字符相同，则比较下一个字符，直到有不同的为止，返回该不同的字符的asc码差值
两个字符串不一样长，可以参与比较的字符又完全一样，则返回两个字符串的长度差值

看一下以上规则对应其compareTo()方法的源码实现：实现一：

static enum UnsafeComparer implements Bytes.Comparer<byte[]> {
INSTANCE;
....
public int compareTo(byte[] buffer1, int offset1, int length1, byte[] buffer2, int offset2, int length2) {
    if (buffer1 == buffer2 && offset1 == offset2 && length1 == length2) {
        return 0;
    } else {
        int minLength = Math.min(length1, length2);
        int minWords = minLength / 8;
        long offset1Adj = (long)(offset1 + BYTE_ARRAY_BASE_OFFSET);
        long offset2Adj = (long)(offset2 + BYTE_ARRAY_BASE_OFFSET);
        int j = minWords << 3;

        int offset;
        for(offset = 0; offset < j; offset += 8) {
            long lw = theUnsafe.getLong(buffer1, offset1Adj + (long)offset);
            long rw = theUnsafe.getLong(buffer2, offset2Adj + (long)offset);
            long diff = lw ^ rw;
            if (diff != 0L) {
                return lessThanUnsignedLong(lw, rw) ? -1 : 1;
            }
        }

        offset = j;
        int b;
        int a;
        if (minLength - j >= 4) {
            a = theUnsafe.getInt(buffer1, offset1Adj + (long)j);
            b = theUnsafe.getInt(buffer2, offset2Adj + (long)j);
            if (a != b) {

最低0.47元/天解锁文章

禅克

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
HBase Filter 过滤器之 Comparator 原理及源码学习

前言：上篇文章HBase Filter 过滤器概述对HBase过滤器的组成及其家谱进行简单介绍，本篇文章主要对HBase过滤器之比较器作一个补充介绍，也算是HBase Filter学习的必备低阶魂技吧。本篇文中源码基于HBase 1.1.2.2.6.5.0-292 HDP版本。HBase所有的比较器实现类都继承于父类ByteArrayComparable，而ByteArrayComparab...
复制链接

扫一扫