大数据开发利器：Hadoop（9） HBase进阶第2讲 HBase过滤器

最新推荐文章于 2021-12-29 12:08:05 发布

知庸vv

最新推荐文章于 2021-12-29 12:08:05 发布

阅读量630

点赞数

分类专栏：大数据学习文章标签：大数据 hadoop hbase

本文链接：https://blog.csdn.net/Dr_Lecter/article/details/53054551

版权

大数据学习专栏收录该内容

12 篇文章 0 订阅

订阅专栏

本节介绍几种HBase的过滤器：RowFilter（行过滤器）、QulifierFliter（列名过滤器）和FilterList。

1. 准备工作

1.1 创建表

① 表结构介绍

还是以学生成绩表为例，表名为studentScore，行键名为name，列族名为score。行限定符有English,Math, Computer 。
表的逻辑视图如下：

name	score
name	English	Math	Computer
Zhangsan	80	85	95
Lisi	65	74	88

②使用Hbase shell创建表

creart 'studentScore', 'score'

1.2 开启服务

启动Hadoop、Zookeeper和HBase。

start-dfs.sh
start-yarn.sh
zkServer.sh start
start-hbase.sh

1.3 创建项目

创建项目名为hbase_study2，包名为edu.hbase.study2。同时将hbase/conf下的log4j.properties复制到项目下的src/目录下。

2. 代码讲解

2.1 创建和关闭连接

① 创建连接

Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "localhost");
conf.set("hbase.zoookeeper.property.clientPort", "2181");
Connection conn = ConnectionFactory.createConnection(conf);
Table table = conn.getTable(TableName.valueOf("studentScore"));

② 关闭连接

if (table != null)
    talbe.close();
if(conn != null) 
    conn.close();

2.2 RowFilter(行过滤器)

2.2.1 RowFilter()方法参数讲解

① RowFilter()方法(其他过滤器也是一样)有两个参数。

第一个参数为比较运算符。有以下几种类型：

操作	描述
LESS	匹配小于设定值的值
LESS_OR_EQUAL	匹配小于或等于设定值的值
EQUAL	匹配等于设定值的值
NOT_EQUAL	匹配不相等设定值的值
GREATER_OR_EQUAL	匹配大于或等于设定值的值
GREATER	匹配大于设定值的值
NOT_OP	排除一切值

第二个参数为比较器，有以下几种类型。

比较器	描述
BinaryComparator	使用Bytes.compareTo()比较当前值与阈值
BinaryPrefixComparator	前缀匹配
NullComparator	判断当前值是否为null
BitComparator	按位与(AND)，或（OR），异或(XOR)执行位级比较
RegexStringComparator	正则表达式比较
SubstringComparator	把阈值和表中的数据当作String实例，同时通过contains()操作字符串

② Get和Scan两个类都支持过滤器，本节以Scan讲解

2.2.2 过滤器代码

Scan scan = new Scan();
// 产生的数据不放入CacheBlock中，防止读取错误
scan.setCacheBlocks(false);
// 匹配是否相等 语句1
Filter filter = new RowFilter(CompareFilter.CompareOP.EQUAL, new BinaryComparator(Bytes("Zhangsan")));
// 匹配前缀是否相等 语句2
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("Zhang")));
// 匹配是否含有相应的字符串 语句3
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("si"));
scan.setFilter(filter);

2.2.3 格式化输出代码

当然，需要格式话输出，代码如下：

// 格式话输出
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    List<Cell> cells = result.listCells();
    for (Cell cell : cells) {
        String rowkey = new String(CellUtil.cloneRow(cell));
        String colFamily = new String(CellUtil.cloneFamily(cell));
        String column = new String(CellUtil.cloneQualifier(cell));
        String value = new String(CellUtil.cloneValue(cell));
        System.out.println("rowkey: " + rowkey + "  " + "colFamily: " + colFamily + "  " + "cloumn: "+ column + "  " + "value: " + value);
    }
}
scanner.close();

2.2.4 构造数据和输出展示

2.2.4.1 执行语句1

 //匹配相等
Filter filter = new RowFilter(CompareFilter.CompareOP.EQUAL, new BinaryComparator(Bytes("Zhangsan")));

① 首先创建两行测试数据

put 'studentScore', 'Zhangsan', 'score:English', '80'
put 'studentScore', 'Lisi', 'score:English', '65'

② 执行语句1并注释语句2和语句3
在Eclipse中运行项目。
结果如下图所示：

rowFilter 1

可以发现，只显示了Zhangsan行的数据。

2.2.4.2 执行语句2

// 匹配前缀
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("Zhang")));

① 构造测试数据

put 'studentScore', 'Zhangzhao', 'score:English', '90'

现在表中有Zhangsan,Zhangzhao,Lisi，三行数据。
② 执行语句2，同时注释语句1和语句3

结果如下图所示：

rowFitler 2

可以发现，显示了两行数据。

2.2.4.3 执行语句3

// 匹配包含字符串
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("si"));

① 构造测试数据

put 'studentScore', 'Lisiguang', 'socre:English', '70'

现在有两行包含si字符串。
② 执行语句3，同时注释语句1和语句2
结果如下图所示：

rowFitler 3

可以发现，显示了两行数据，符合预期。

2.3 QualifierFilter(列限定符过滤器)

2.3.1 代码讲解

和行键过滤器类似的方法，只是方法名不同。

// 匹配是否相等
Scan scan = new Scan();
scan.setCacheBlocks(false);
Filter filter = new QualifierFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("English")));
scan.setFilter(filter);

2.3.2 运行代码

① 构造数据

put 'studentScore', 'Zhangsan', 'score:Math', '85'

② 运行代码
运行代码结果如下图：

qualifierFilter

2.4.1 代码讲解

这里使用SingleColumnValueFilter（单列）过滤器。即用一列的值决定是否一行数据是否被过滤。
所以有三个参数：（从左到右）

列族名
行限定符
比较运算符
需要比较的参数
前面两个参数确定单元格，最后两个参数确定比较的方式和比较的值。

Scan scan = new Scan();
scan.setCacheBlocks(false);

// English为8开头的
SingleColumnValueFilter filter1 = new SingleColumnValueFilter(Bytes.toBytes("score"), Bytes.toBytes("English"), CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("8")));
// 作用是：对于要使用作为条件的列，如果这一列本身就不存在，那么如果为true，这样的行将会被过滤掉，如果为false，这样的行会包含在结果集中
filte1.setFilterIfMissing(true);
// math为80或90
SingleColumnValueFilter filter2 = new SingColumnValueFilter(
        Bytes.toBytes("score"),
        Bytes.toBytes("Math"),
        CompareOp.EQUAL,
        new RegexStringComparator("80|90"));
)
filter2.setFilterIfMissing(true);
List<Filter> filters = new ArrayList<Filter>();
filters.add(filter1);
filters.add(filter2);

// 这里有两个参数：MUST_PASS_ONE和MUST_PASS_ALL
// MUST_PASS_ALL  类似 与运算（&&） 默认是这个
// MuST_PASS_ALL  类似 或运算（||） 

// 语句1
FilterList filterlist = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters);
// 语句2
FilterList filterlist = new FilterList(filters);

scan.setFilter(filterlist);

2.4.2 构造数据

为了让结果更好的显示，我们将之前的数据删去，重新构造数据

disable 'studentScore'
drop 'studentScore'
create 'studentScore', 'score'
// 满足两个条件的
put 'studentScore', 'Zhangsan', 'score:English', '85'
put 'studentScore', 'Zhangsan', 'score:Math', '90'
// 仅仅满足一个条件的
put 'studentScore', 'Lisi', 'Score:English', '90'
put 'studentScore', 'lisi', 'Score:Math', '80'
// 两个条件都不满足的
put 'studentScore', 'Wangwu', 'Score:English', '90'
put 'studentScore', 'Wangwu', 'Scrore:Math', '85'

2.4.3 运行代码

① 运行语句1，注释语句2

FilterList filterlist = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters);

类似或运算（||），所以预期结果应显示Zhangsan和Lisi两行数据。
结果如下图：

自定义Filter 1

② 运行语句2，注释语句1

FilterList filterlist = new FilterList(filters);

类似与运算(&&)，所以预期结果只有Zhangsan这一行。
结果如下图：

自定义Filter 2

2.5 完整代码

package edu.hbase.study3;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.BinaryPrefixComparator;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.QualifierFilter;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.util.Bytes;

public class EduFilter {
    public static Configuration conf;
    public static Connection conn;
    public static Table table;

    public static void init() throws IOException {
        conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "localhost");
        conf.set("hbase.zookeeper.property.clientPort", "2181");
        conn = ConnectionFactory.createConnection(conf);
        table = conn.getTable(TableName.valueOf("studentScore"));
    }

    public static void close() throws IOException {
        if (table != null) {
            table.close();
        }

        if (conn != null) {
            conn.close();
        }

    }

    public static void rowFilter() throws IOException {
        init();
        Scan scan = new Scan();
        // 产生的数据不放入CacheBlock中，防止读取出错
        scan.setCacheBlocks(false);
        // 匹配全部是否相等
//      Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("Zhangsan")));
        // 匹配前缀是否相等
//      Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("Zhang")));
        // 匹配是否含有相应的字符串
        Filter filter = new RowFilter(CompareOp.EQUAL, new SubstringComparator("si"));

        scan.setFilter(filter);
        resultOut(scan);
        close();
    }
    public static void columnFilter() throws IOException {
        init();
        Scan scan = new Scan();
        scan.setCacheBlocks(false);
        Filter filter = new QualifierFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("English")));
        scan.setFilter(filter);
        resultOut(scan);
        close();
    }
    public static void resultOut(Scan scan) throws IOException {
        ResultScanner scanner = table.getScanner(scan);
        for (Result result : scanner) {
            List<Cell> cells = result.listCells();
            for (Cell cell : cells) {
                String rowkey = new String(CellUtil.cloneRow(cell));
                String colFamily = new String(CellUtil.cloneFamily(cell));
                String column = new String(CellUtil.cloneQualifier(cell));
                String value = new String(CellUtil.cloneValue(cell));
                System.out.println("rowkey: " + rowkey + "  " + "colFamily: "
                        + colFamily + "  " + "cloumn: "+ column + "  " + "value: " + value);
            }
        }
        scanner.close();
    }
    public static void filterList() throws IOException {
        init();
        Scan scan = new Scan();
        scan.setCacheBlocks(false);

        SingleColumnValueFilter filter1 = new SingleColumnValueFilter(
                Bytes.toBytes("score"),
                Bytes.toBytes("English"),
                CompareOp.EQUAL,
                new BinaryPrefixComparator(Bytes.toBytes("8")));
        filter1.setFilterIfMissing(true);

        SingleColumnValueFilter filter2 = new SingleColumnValueFilter(
                Bytes.toBytes("score"),
                Bytes.toBytes("Math"),
                CompareOp.EQUAL,
                new RegexStringComparator("80|90"));
        filter2.setFilterIfMissing(true);

        List<Filter> filters = new ArrayList<Filter>();
        filters.add(filter1);
        filters.add(filter2);

        // MUST_PASS_ALL  && 的用法 （默认）
                // MUST_PASS_ONE  || 的用法

        FilterList filterlist = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters);

//      FilterList filterlist = new FilterList(filters);
        scan.setFilter(filterlist);
        resultOut(scan);

        close();
    }

    public static void main(String[] args) throws IOException {
//      rowFilter();
//      columnFilter();
        filterList();
    }
}

3. 总结

本节主要介绍了三种常用的过滤器：行过滤器、列限定符过滤器以及自定义过滤器。
其中自定义过滤器中，我设置了两个过滤器，通过FilgerList实现了两个过滤器的与（&）和或（|）。

参考内容

dxer - Hbase过滤器总结
王滨 - 网易微专业大数据工程师

知庸vv

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
大数据开发利器：Hadoop（9） HBase进阶第2讲 HBase过滤器

本节介绍几种HBase的过滤器：RowFilter（行过滤器）、QulifierFliter（列名过滤器）和FilterList。1. 准备工作1.1 创建表① 表结构介绍还是以学生成绩表为例，表名为studentScore，行键名为name，列族名为score。行限定符有English,Math, Computer 。表的逻辑视图如下： name s
复制链接

扫一扫