大数据从入门到实战 - HBase高级特性：过滤器（二）

发芽ing的小啊呜

已于 2023-11-29 10:13:51 修改

阅读量6.3k

点赞数 18

分类专栏： # 大数据&云计算基础文章标签：过滤器 hbase 大数据 hadoop

于 2021-01-13 23:07:31 首次发布

本文链接：https://blog.csdn.net/qq_43543789/article/details/112596780

版权

大数据&云计算基础专栏收录该内容

31 篇文章 145 订阅

订阅专栏

大数据从入门到实战 - HBase高级特性：过滤器（二）

一、关于此次实践
- 1、实战简介
- 2、全部任务
二、实践详解

叮嘟！这里是小啊呜的学习课程资料整理。好记性不如烂笔头，今天也是努力进步的一天。一起加油进阶吧！
在这里插入图片描述

一、关于此次实践

1、实战简介

HBase 提供的第二类过滤器直接继承自FilterBase，同事用于更特定的使用场景。其中的一些过滤器只能进行筛选，因此只适用于扫描操作，对Get()方法来说，这些过滤器限制得过于苛刻，要么包括整行，要么什么都不包括。

本次实训我们来学习这些专用过滤器：

单列值过滤器
单列排除过滤器
前缀过滤器
行键过滤器
首次行键过滤器
包含结束的过滤器
时间戳过滤器
列计数过滤器
列分页过滤器 …

2、全部任务

在这里插入图片描述

二、实践详解

1、第1关：常用的专用过滤器

在这里插入图片描述

package step1;
import java.io.IOException;
import javax.ws.rs.POST;
import org.apache.hadoop.cli.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class Task {
public void query(String tName) throws Exception {
/********* Begin *********/
Configuration config = new Configuration();
Connection conn = ConnectionFactory.createConnection(config);
TableName tableName = TableName.valueOf(Bytes.toBytes("test_tb1"));Table table = conn.getTable(tableName);
Filter filter = new PrefixFilter(Bytes.toBytes("row5"));
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(Bytes.toString(result.getRow()));
for(Cell cell : result.listCells()){
String family = Bytes.toString(CellUtil.cloneFamily(cell));
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.println("\t" + family + ":" + qualifier + " " + value);
}
}
// 分页
byte[] POSTFIX = new byte[] {0};
Filter filter1 = new PageFilter(10);// 构建过滤器并设置每页数据量
int totalRows = 0;
byte[] lastRow = null;
int i = 4;
while(i > 0 ){
Scan scan1 = new Scan();
// 添加过滤器
scan1.setFilter(filter1);
// 设置查询的起始行
if(lastRow != null){
byte[] startRow = Bytes.add(lastRow, POSTFIX);
String info = new String(startRow,"utf-8");
System.out.println("开始分页查询");
scan1.withStartRow(startRow);
}
ResultScanner scanner1= table.getScanner(scan1);
int localRows = 0;
Result result;
while ((result = scanner1.next()) != null) {
System.out.println(Bytes.toString(result.getRow()));
for(Cell cell : result.listCells()){
String family = Bytes.toString(CellUtil.cloneFamily(cell));
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.println("\t" + family + ":" + qualifier + " " + value);}
localRows++;
totalRows++;
lastRow = result.getRow();
}
scanner1.close();
if (localRows == 0) break;
i--;
}
conn.close();
/********* End *********/
}
}

评测
在这里插入图片描述

2、第2关：同时使用多种过滤器

在这里插入图片描述

package step2;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.cli.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.*;

public class Task {
public void query(String tName) throws Exception {

/********* Begin *********/
Configuration config = new Configuration();
Connection conn = ConnectionFactory.createConnection(config);
TableName tableName = TableName.valueOf("test_tb2");
Table table = conn.getTable(tableName);
Filter regFilter = new RowFilter(CompareOperator.EQUAL ,new RegexStringComparator(".*9$"));
Filter moreThanFilter = new RowFilter(CompareOperator.GREATER , new BinaryComparator(Bytes.toBytes("row50")));
List<Filter> list = new ArrayList<>();
list.add(regFilter);
list.add(moreThanFilter);
FilterList filterList1 = new FilterList(list);
Scan scan1 = new Scan();
scan1.setFilter(filterList1);
ResultScanner scanner1 = table.getScanner(scan1);
for (Result result : scanner1) {
System.out.println(Bytes.toString(result.getRow()));
for(Cell cell : result.listCells()){
String family = Bytes.toString(CellUtil.cloneFamily(cell));
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.println("\t" + family + ":" + qualifier + " " + value);
}
}
scanner1.close();
// 第二次查询
Filter subFilter =new RowFilter(CompareOperator.EQUAL,new SubstringComparator("93"));
Filter valueFilter =new ValueFilter(CompareOperator.EQUAL,new BinaryComparator(Bytes.toBytes("value10")));
List<Filter> list2 = new ArrayList<>();
list2.add(subFilter);
list2.add(valueFilter);
FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE,list2);Scan scan2 = new Scan();
scan2.setFilter(filterList2);
ResultScanner scanner2 = table.getScanner(scan2);
for (Result result : scanner2) {
System.out.println(Bytes.toString(result.getRow()));
for(Cell cell : result.listCells()){
String family = Bytes.toString(CellUtil.cloneFamily(cell));
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.println("\t" + family + ":" + qualifier + " " + value);
}
}
scanner2.close();
conn.close();
/********* End *********/
}
}

评测
在这里插入图片描述

3、第3关：过滤器总结

在这里插入图片描述
通过上图我们可以了解到，过滤器怎样在客户端进行配置，怎样在网络中进行传输，怎样在服务端执行。

总共是三步：

客户端创建Scan过滤器；
发送过滤器数据的序列化Scan；
RegionServer使用过滤器对Scan进行序列化，并同时使用Scan和内部扫描。

用一句话概括就是：

过滤器在客户端创建，然后通过RPC请求发送到服务端，最后在服务端进行过滤操作。

评测
在这里插入图片描述

在这里插入图片描述

Ending！
更多课程知识学习记录随后再来吧！

就酱，嘎啦！

在这里插入图片描述

注：
人生在勤，不索何获。

发芽ing的小啊呜

关注

18
点赞
踩
53

收藏

觉得还不错? 一键收藏
打赏
1
评论
大数据从入门到实战 - HBase高级特性：过滤器（二）

大数据从入门到实战 - HBase高级特性：过滤器（二）一、关于此次实践1、实战简介2、全部任务二、实践详解1、第1关：常用的专用过滤器2、第2关：同时使用多种过滤器3、第3关：过滤器总结
复制链接

扫一扫