HBase 高级操作之过滤器

过滤器能干什么?

  • HBase为筛选数据提供了一组过滤器,通过过滤器可以在HBase中的数据的多个维度(行,列,版本等)上对数据进行过滤筛选操作。
  • 通常来说,通过行建,列来筛选数据的应用场景较多。

HBase过滤器分类

1.基于行,列,单元值的过滤器
1.1----- 基于行的过滤器
  • PrefixFilter :行的前缀匹配
  • PageFilter :基于行的分页
1.2------基于列的过滤器
  • ColumnPrefixFilter:列前缀匹配
  • FirstKeyOnlyFilter:只返回每一行的第一列
1.3-----基于单元值的过滤器
  • KeyOnlyFilter:返回的数据不包括单元值,之包含行建和列
  • TimeStampFilter:根据数据的时间戳版本进行过滤
1.4-----基于列和单元值的过滤器
  • SingleColumnValueFilter:对该列的单元值进行比较过滤
  • SingleColumnExcludeFilter:对该列的单元值进行比较过滤
2.比较过滤器
2.1-----比较过滤器通常需要一个比较运算符和一个比较器实现过滤
  • RowFilter
  • FamilyFilter
  • QualifierFilter
  • ValueFilter

最常用的过滤器

过滤器(Filter)功能
RowFilter筛选出匹配的所有的行
PrefixFilter筛选出具有特定前缀的行建的数据
KeyOnlyFilter只返回每行的行键,值全部为空
ColumnPrefixFilter按照列名的前缀来筛选单元格
ValueFilter按照具体的值来筛选单元格
TimeStampsFilter根据时间戳版本进行过滤
FilterList用于综合使用多个过滤器

下面就对这些常用的Filter一一测试:
首先看一下我们的表:

hbase(main):003:0> scan 'man'
ROW                                    COLUMN+CELL                                                                                                   
 rowkey1                               column=basic:age, timestamp=1541251830545, value=20                                                           
 rowkey1                               column=basic:name, timestamp=1541251830506, value=zs                                                          
 rowkey1                               column=basic:sex, timestamp=1541251830540, value=male                                                         
 rowkey1                               column=extend:job, timestamp=1541251830548, value=student                                                     
 rowkey1                               column=extend:salary, timestamp=1541251830553, value=0                                                        
 rowkey2                               column=basic:age, timestamp=1541251830565, value=24                                                           
 rowkey2                               column=basic:name, timestamp=1541251830557, value=jack                                                        
 rowkey2                               column=basic:sex, timestamp=1541251830561, value=male                                                         
 rowkey2                               column=extend:job, timestamp=1541251830569, value=IT                                                          
 rowkey2                               column=extend:salary, timestamp=1541251830572, value=10000                                                    
 rowkey3                               column=basic:age, timestamp=1541251830585, value=19                                                           
 rowkey3                               column=basic:name, timestamp=1541251830577, value=rose                                                        
 rowkey3                               column=basic:sex, timestamp=1541251830580, value=female                                                       
 rowkey3                               column=extend:job, timestamp=1541251830588, value=teacher                                                     
 rowkey3                               column=extend:salary, timestamp=1541251830592, value=2000                                                     
3 row(s) in 0.2140 seconds

下面的代码基于之前的HBase Java API基本操作https://blog.csdn.net/zhangshk_/article/details/83690790,我把需要用到的一个方法先贴出来

/**
     *
     * @param tableName
     * @param startKey
     * @param stopKey
     * @param filterList
     * @return
     */
    public static ResultScanner getScanner(String tableName,String startKey,String stopKey,FilterList filterList){
        try( Table table = HBaseConn.getTable(tableName)){
            Scan scan = new Scan();
            scan.setFilter(filterList);
            scan.setStartRow(Bytes.toBytes(startKey));
            scan.setStopRow(Bytes.toBytes(stopKey));
            scan.setCaching(1000);
            ResultScanner results = table.getScanner(scan);
            results.forEach(result -> {
                System.out.println("rowkey == "+Bytes.toString(result.getRow()));
                System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
                System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
                System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
                System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
                System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
            });
            return results;
        }catch (Exception e){
            e.printStackTrace();
        }
        return null;
    }

下面是一系列的过滤器的测试方法:

package com.zsk.hbase.api;

import org.apache.hadoop.hbase.CellComparator;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;

import java.util.Arrays;

public class HBaseFilterTest {

    @Test
    public void testRowFileterTest(){
        Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("rowkey1")));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testPrefixFileterTest(){
        Filter filter = new PrefixFilter(Bytes.toBytes("row"));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testKeyOnlyFileterTest(){
        Filter filter = new KeyOnlyFilter();
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testColumnPrefixFileterTest(){
        Filter filter = new ColumnPrefixFilter(Bytes.toBytes("nam"));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testValueFileterTest(){
        Filter filter = new ValueFilter(CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes("zs")) );
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testTimeStampFileterTest(){
        Filter filter = new TimestampsFilter(Arrays.asList(1541251830545L,1541251830565L));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
}

模式总结:
首先声明一个Filter,然后将Filter添加到FilterList中,同时添加到FilterList的时候,可以指定他的Operator,是不是所有的过滤器都必须通过(Operator.MUST_PASS_ALL)或者只通过一个就可以了(Operator.MUST_PASS_ONE)。
然后将FilterList添加到getScanner方法中就可以了。
其实,还是很简单的。

所有的Filter 都是在服务端生效的,如果我们自定义Filter,那么需要将开发完成的Filter打成jar包,发送到服务端。对于生产环境,一般不会自定义过滤器。
而一般情况下,我们通过对rowkey进行合理的设计,就可以解决根据不同场景的查询了,没有必要自定义Filter。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值