1.3 专用过滤器 (Dedicated Filters)
----
HBase 提供的第二类过滤器直接基于 FilterBase 并实现了更特定的应用场景。其中的很多过滤器事实上只适用于执行扫描操作时,因为它们过滤整个行。
对于 get() 调用,这些过滤器的限制过于苛刻:包括所有行,后者什么也不包括。
■ 前缀过滤器 (PrefixFilter)
-------------------------------------------------------------------------------------------------------------------------------------
实例化过滤器时给定一个行的前缀,所有行键匹配这个前缀的行返回给客户端:
PrefixFilter(final byte[] prefix)
示例: Example using the prefix based filter
Filter filter = new PrefixFilter(Bytes.toBytes("row-1"));
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner.close();
Get get = new Get(Bytes.toBytes("row-5"));
get.setFilter(filter);
Result result = table.get(get);
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
输出:
Results of scan:
Cell: row-1/colfam1:col-1/1427280142327/Put/vlen=7/seqid=0, Value: val-1.1
Cell: row-1/colfam1:col-10/1427280142379/Put/vlen=8/seqid=0, Value: val-1.10
...
Cell: row-1/colfam2:col-8/1427280142375/Put/vlen=7/seqid=0, Value: val-1.8
Cell: row-1/colfam2:col-9/1427280142377/Put/vlen=7/seqid=0, Value: val-1.9
Cell: row-10/colfam1:col-1/1427280142530/Put/vlen=8/seqid=0, Value: val-10.1
Cell: row-10/colfam1:col-10/1427280142546/Put/vlen=9/seqid=0, Value: val-10.10
...
Cell: row-10/colfam2:col-8/1427280142542/Put/vlen=8/seqid=0, Value: val-10.8
Cell: row-10/colfam2:col-9/1427280142544/Put/vlen=8/seqid=0, Value: val-10.9
Result of get:
需要注意的是,get() 方法并没有返回任何结果,因为它请求的行与过滤器的前缀不匹配。这个过滤器在使用 get() 方法时作用不大,但在扫描操作中
非常有用。
当扫描器遇到一个比前缀大的行键时扫描结束。与 start row 配合使用,此过滤器会提升扫描的整体性能,因为它知道什么时候忽略掉所有剩下的行。
■ 分页过滤器 (PageFilter)
-------------------------------------------------------------------------------------------------------------------------------------
可以使用这个过滤器对结果按行分页。创建该过滤器实例时,要指定 pageSize 参数,控制返回每页包含的行数。
PageFilter(final long pageSize)
NOTE:
---------------------------------------------------------------------------------------------------------------------------------
在物理上分离的服务器上进行过滤有一个根本性的问题。过滤器并发地运行在不同的分区服务器上,并且不能跨越这种边界保持它们的当前状态。因
此,每一个过滤器要求扫描到至少到 pageCount 行才会结束扫描。
客户端代码需要记住返回的最后的行,然后,在另一次迭代要开始时,以此来设定扫描的起始行(start row),保持相同的过滤器属性。
示例: Example using a filter to paginate through rows
private static final byte[] POSTFIX = new byte[] { 0x00 };
Filter filter = new PageFilter(15);
int totalRows = 0;
byte[] lastRow = null;
while (true) {
Scan scan = new Scan();
scan.setFilter(filter);
if (lastRow != null) {
byte[] startRow = Bytes.add(lastRow, POSTFIX);
System.out.println("start row: " + Bytes.toStringBinary(startRow));
scan.setStartRow(startRow);
}
ResultScanner scanner = table.getScanner(scan);
int localRows = 0;
Result result;
while ((result = scanner.next()) != null) {
System.out.println(localRows++ + ": " + result);
totalRows++;
lastRow = result.getRow();
}
scanner.close();
if (localRows == 0) break;
}
System.out.println("total rows: " + totalRows);
输出:
Adding rows to table...
0: keyvalues={row-1/colfam1:col-1/1427280402935/Put/vlen=7/ seqid=0, ...}
1: keyvalues={row-10/colfam1:col-1/1427280403125/Put/vlen=8/seqid=0, ...}
...
14: keyvalues={row-110/colfam1:col-1/1427280404601/Put/vlen=9/seqid=0, ...}
start row: row-110\x00
0: keyvalues={row-111/colfam1:col-1/1427280404615/Put/vlen=9/seqid=0, ...}
1: keyvalues={row-112/colfam1:col-1/1427280404628/Put/vlen=9/seqid=0, ...}
...
14: keyvalues={row-124/colfam1:col-1/1427280404786/Put/vlen=9/seqid=0, ...}
start row: row-124\x00
0: keyvalues={row-125/colfam1:col-1/1427280404799/Put/vlen=9/seqid=0, ...}
...
start row: row-999\x00
total rows: 1000
HBase 中的行键是按字典序排序的,因此返回的结果也是如此排序,并且起始行是包括在结果中的。需要拼接一个 0 字节到前一个 row key.
这确保最后看到的行键被忽略,并且下一个,按排序次序,能被找到。内容为 0 的字节只有很少的增加量,因此可安全地用于重置扫描边界。
■ 键过滤器 (KeyOnlyFilter)
-------------------------------------------------------------------------------------------------------------------------------------
有些应用只需要访问每个 cell 的 key, 而忽略其实际的值。KeyOnlyFilter 提供这种功能。构造器:
KeyOnlyFilter()
KeyOnlyFilter(boolean lenAsVal)
可选的 lenAsVal 参数,用于处理内部转换调用 as-is, 控制对每一个 Cell 实例的值部分的处理。默认值为 false, 简单地设置为 0 长度的值,而
相反为 true 则设置值为原始值的长度表示。
示例: Only returns the first found cell from each row
int rowCount = 0;
for (Result result : scanner) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " + (
cell.getValueLength() > 0 ?
Bytes.toInt(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()) : "n/a" ));
}
rowCount++;
}
System.out.println("Total num of rows: " + rowCount);
scanner.close();
}
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();
HBaseHelper helper = HBaseHelper.getHelper(conf);
helper.dropTable("testtable");
helper.createTable("testtable", "colfam1");
System.out.println("Adding rows to table...");
helper.fillTableRandom("testtable", /* row */ 1, 5, 0,
/* col */ 1, 30, 0, /* val */ 0, 10000, 0, true, "colfam1");
Connection connection = ConnectionFactory.createConnection(conf);
table = connection.getTable(TableName.valueOf("testtable"));
System.out.println("Scan #1");
Filter filter1 = new KeyOnlyFilter();
scan(filter1);
Filter filter2 = new KeyOnlyFilter(true);
scan(filter2);
输出:
Results of scan:
Cell: row-0/colfam1:col-17/6/Put/vlen=0/seqid=0, Value: n/a
Cell: row-0/colfam1:col-27/3/Put/vlen=0/seqid=0, Value: n/a
...
Cell: row-4/colfam1:col-3/2/Put/vlen=0/seqid=0, Value: n/a
Cell: row-4/colfam1:col-5/16/Put/vlen=0/seqid=0, Value: n/a
Total num of rows: 5
Scan #2
Results of scan:
Cell: row-0/colfam1:col-17/6/Put/vlen=4/seqid=0, Value: 8
Cell: row-0/colfam1:col-27/3/Put/vlen=4/seqid=0, Value: 6
...
Cell: row-4/colfam1:col-3/2/Put/vlen=4/seqid=0, Value: 7
Cell: row-4/colfam1:col-5/16/Put/vlen=4/seqid=0, Value: 8
Total num of rows: 5
■ 首行键过滤器 (FirstKeyOnlyFilter)
-------------------------------------------------------------------------------------------------------------------------------------
虽然这个名称隐含着 KeyValue, 或 key only, 这两者都是用词不当。这个过滤器返回它找到的行中的第一个 cell, 及其所有的细节,包括值,或许它
应命名为 FirstCellFilter 之类的名称。
如果需要访问每一行的第一个列(因为 HBase 隐式排序的), 这个过滤器可以提供这类功能。通常,这个过滤器用于行计数(row counter) 类型的应用中,
因为它只需要检查一个行是否存在。在面向列的数据库中,一个行实际上是由列组成的,并且如果没有列,行业就不存在。
另一可能使用的场景依赖于列是按词汇表排序的,并且设置 column qualifier 为纪元值。这会使排序的列最早的时间戳名最新获取到。配合使用这个
过滤器,一次扫描就可以从每一行上获取最早的列。更有趣的是,如果反转列限定符的时间戳设置,那么久可以在一次扫描中获取每一行中最新的项目。
这个类使用了过滤器框架提供的另一个优化特性:它在检查完第一个列之后会通知 region server 结束对当前行的扫描,并跳到下一行,与全表扫描相比,
其性能得到了提升。
示例: Only returns the first found cell from each row
Filter filter = new FirstKeyOnlyFilter();
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
int rowCount = 0;
for (Result result : scanner) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
rowCount++;
}
System.out.println("Total num of rows: " + rowCount);
scanner.close();
输出:
Adding rows to table...
Results of scan:
Cell: row-0/colfam1:col-10/19/Put/vlen=6/seqid=0, Value: val-76
Cell: row-1/colfam1:col-0/0/Put/vlen=6/seqid=0, Value: val-19
...
Cell: row-8/colfam1:col-10/4/Put/vlen=6/seqid=0, Value: val-35
Cell: row-9/colfam1:col-1/5/Put/vlen=5/seqid=0, Value: val-0
Total num of rows: 30
显示出每行只有一个 cell 返回
■ FirstKeyValueMatchingQualifiersFilter
-------------------------------------------------------------------------------------------------------------------------------------
这个过滤器是 FirstKeyOnlyFilter 的扩展,但不是返回找到的第一个 cell, 而是返回一个行中所有的列,直到一个给定的列限定符(column qualifier)
如果行中没有这个限定符,返回所有的列。这个过滤器主要用于 rowcounter shell command。
构造器:
FirstKeyValueMatchingQualifiersFilter(Set<byte[]> qualifiers)
示例: Returns all columns, or up to the first found reference qualifier, for each row
Set<byte[]> quals = new HashSet<byte[]>();
quals.add(Bytes.toBytes("col-2"));
quals.add(Bytes.toBytes("col-4"));
quals.add(Bytes.toBytes("col-6"));
quals.add(Bytes.toBytes("col-8"));
Filter filter = new FirstKeyValueMatchingQualifiersFilter(quals);
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
int rowCount = 0;
for (Result result : scanner) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
rowCount++;
}
System.out.println("Total num of rows: " + rowCount);
scanner.close();
输出:
Adding rows to table...
Results of scan:
Cell: row-0/colfam1:col-0/1/Put/vlen=6/seqid=0, Value: val-48
Cell: row-0/colfam1:col-1/4/Put/vlen=6/seqid=0, Value: val-78
Cell: row-0/colfam1:col-5/1/Put/vlen=6/seqid=0, Value: val-62
Cell: row-0/colfam1:col-6/6/Put/vlen=5/seqid=0, Value: val-6
Cell: row-10/colfam1:col-1/3/Put/vlen=6/seqid=0, Value: val-73
Cell: row-10/colfam1:col-6/5/Put/vlen=6/seqid=0, Value: val-11
...
Cell: row-6/colfam1:col-1/0/Put/vlen=6/seqid=0, Value: val-39
Cell: row-7/colfam1:col-9/6/Put/vlen=6/seqid=0, Value: val-57
Cell: row-8/colfam1:col-0/2/Put/vlen=6/seqid=0, Value: val-90
Cell: row-8/colfam1:col-1/4/Put/vlen=6/seqid=0, Value: val-92
Cell: row-8/colfam1:col-6/4/Put/vlen=6/seqid=0, Value: val-12
Cell: row-9/colfam1:col-1/5/Put/vlen=6/seqid=0, Value: val-35
Cell: row-9/colfam1:col-2/2/Put/vlen=6/seqid=0, Value: val-22
Total num of rows: 47
■ 包含结束的过滤器 (InclusiveStopFilter)
-------------------------------------------------------------------------------------------------------------------------------------
一个扫描的行边界是包含起始行(inclusive for the start row), 而不包含结束行(exclusive for the stop row). 可以使用这个过滤器改变结束行的
语义,即包含给定的结束行。
示例: Example using a filter to include a stop row
Filter filter = new InclusiveStopFilter(Bytes.toBytes("row-5"));
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes("row-3"));
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(result);
}
scanner.close();
输出:
Adding rows to table...
Results of scan:
keyvalues={row-3/colfam1:col-1/1427282689001/Put/vlen=7/seqid=0}
keyvalues={row-30/colfam1:col-1/1427282689069/Put/vlen=8/seqid=0}
...
keyvalues={row-48/colfam1:col-1/1427282689100/Put/vlen=8/seqid=0}
keyvalues={row-49/colfam1:col-1/1427282689102/Put/vlen=8/seqid=0}
keyvalues={row-5/colfam1:col-1/1427282689004/Put/vlen=7/seqid=0}
■ FuzzyRowFilter
-------------------------------------------------------------------------------------------------------------------------------------
这个过滤器作用于行键,但是以一种模糊的形式(in a fuzzy manner)。它需要一个应返回的行键的列表,加上一个相应的 byte[] 数组用于表示行键中
每一个字节的重要性。构造器如下:
FuzzyRowFilter(List<Pair<byte[], byte[]>> fuzzyKeysData)
fuzzyKeysData 指定一个行键字节的重要性,接受两个值之一:
0 :指示在行键中同一位置的字节必须匹配 as-is
1 :意思是不关心对应的行键字节,并且总是接受
示例: Example filtering by column prefix
List<Pair<byte[], byte[]>> keys = new ArrayList<Pair<byte[], byte[]>>();
keys.add(new Pair<byte[], byte[]>(Bytes.toBytes("row-?5"), new byte[] { 0, 0, 0, 0, 1, 0 }));
Filter filter = new FuzzyRowFilter(keys);
Scan scan = new Scan()
.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-5"))
.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(result);
}
scanner.close();
输出:
Adding rows to table...
Results of scan:
keyvalues={row-05/colfam1:col-01/1/Put/vlen=9/seqid=0,
row-05/colfam1:col-02/2/Put/vlen=9/seqid=0,
...
row-05/colfam1:col-09/9/Put/vlen=9/seqid=0,
row-05/colfam1:col-10/10/Put/vlen=9/seqid=0}
keyvalues={row-15/colfam1:col-01/1/Put/vlen=9/seqid=0,
row-15/colfam1:col-02/2/Put/vlen=9/seqid=0,
...
row-15/colfam1:col-09/9/Put/vlen=9/seqid=0,
row-15/colfam1:col-10/10/Put/vlen=9/seqid=0}
测试代码写入 20 行到表中,名称为 row-01 到 row-20. 我们要获取匹配模式为 row-?5 的所有的行。
■ ColumnCountGetFilter
-------------------------------------------------------------------------------------------------------------------------------------
可以使用这个过滤器只获取一个每行有指定的最大数量列的数据。
ColumnCountGetFilter(final int n)
一旦找到某一行匹配最大数量的列数整个扫描就停止,因此对于 scan 操作没什么用处,它只是用于在 get() 调用中测试过滤器。
■ ColumnPrefixFilter
-------------------------------------------------------------------------------------------------------------------------------------
类似于 PrefixFilter, 这个过滤器通过对列名称进行前缀匹配过滤。需要指定一个前缀来创建过滤器:
ColumnPrefixFilter(final byte[] prefix)
所有与设定的前缀匹配的列都包含在结果中。
示例: Example filtering by column prefix, selects all columns starting with col-1.
Filter filter = new ColumnPrefixFilter(Bytes.toBytes("col-1"));
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(result);
}
scanner.close();
输出:
Adding rows to table...
Results of scan:
keyvalues={row-1/colfam1:col-1/1/Put/vlen=7/seqid=0,
row-1/colfam1:col-10/10/Put/vlen=8/seqid=0,
...
row-1/colfam1:col-19/19/Put/vlen=8/seqid=0}
...
■ MultipleColumnPrefixFilter
-------------------------------------------------------------------------------------------------------------------------------------
这个过滤器是 ColumnPrefixFilter 的直接扩展,运行应用请求一个列限定符的前缀列表,而不仅仅是一个前缀。
MultipleColumnPrefixFilter(final byte[][] prefixes)
示例: Example filtering by column prefix, adds two column prefixes, and also a row prefix to limit the output.
Filter filter = new MultipleColumnPrefixFilter(new byte[][] {
Bytes.toBytes("col-1"), Bytes.toBytes("col-2")
});
Scan scan = new Scan()
//Limit to rows starting with a specific prefix
.setRowPrefixFilter(Bytes.toBytes("row-1"))
.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.print(Bytes.toString(result.getRow()) + ": ");
for (Cell cell : result.rawCells()) {
System.out.print(Bytes.toString(cell.getQualifierArray(),
cell.getQualifierOffset(), cell.getQualifierLength()) + ",
");
}
System.out.println();
}
scanner.close();
输出:
Adding rows to table...
Results of scan:
row-1: col-1, col-10, col-11, col-12, col-13, col-14, col-15,
col-16,
col-17, col-18, col-19, col-2, col-20, col-21, col-22, col-23,
col-24,
col-25, col-26, col-27, col-28, col-29,
row-10: col-1, col-10, col-11, col-12, col-13, col-14, col-15,
col-16,
col-17, col-18, col-19, col-2, col-20, col-21, col-22, col-23,
col-24,
col-25, col-26, col-27, col-28, col-29,
row-18: col-1, col-10, col-11, col-12, col-13, col-14, col-15,
col-16,
col-17, col-18, col-19, col-2, col-20, col-21, col-22, col-23,
col-24,
col-25, col-26, col-27, col-28, col-29,
row-19: col-1, col-10, col-11, col-12, col-13, col-14, col-15,
col-16,
col-17, col-18, col-19, col-2, col-20, col-21, col-22, col-23,
col-24,
col-25, col-26, col-27, col-28, col-29,
■ ColumnRangeFilter
-------------------------------------------------------------------------------------------------------------------------------------
这个过滤器类似于两个 QualifierFilter 实例一起工作,一个用于检查低边界,另一个用于检查高边界。两个实例通过提供的BinaryPrefixComparator
分别为 LESS_OR_EQUAL 和 GREATER_OR_EQUAL 两个比较操作符:
ColumnRangeFilter(final byte[] minColumn, boolean minColumnInclusive,
final byte[] maxColumn, boolean maxColumnInclusive)
应提供可选的 minColumn 和 maxColumn 以及 boolean 值 minColumnInclusive 和 maxColumnInclusive 用于标志排除或者包含。如果没有指定
minColumn 则从 table 的开始扫描,如果没有指定 maxColumn,则扫描到 table 的末尾。
示例: Example filtering by columns within a given range
Filter filter = new ColumnRangeFilter(Bytes.toBytes("col-05"), true, Bytes.toBytes("col-11"), false);
Scan scan = new Scan()
.setStartRow(Bytes.toBytes("row-03"))
.setStopRow(Bytes.toBytes("row-05"))
.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(result);
}
scanner.close();
输出:
Adding rows to table...
Results of scan:
keyvalues={row-03/colfam1:col-05/5/Put/vlen=9/seqid=0,
row-03/colfam1:col-06/6/Put/vlen=9/seqid=0,
row-03/colfam1:col-07/7/Put/vlen=9/seqid=0,
row-03/colfam1:col-08/8/Put/vlen=9/seqid=0,
row-03/colfam1:col-09/9/Put/vlen=9/seqid=0,
row-03/colfam1:col-10/10/Put/vlen=9/seqid=0}
keyvalues={row-04/colfam1:col-05/5/Put/vlen=9/seqid=0,
row-04/colfam1:col-06/6/Put/vlen=9/seqid=0,
row-04/colfam1:col-07/7/Put/vlen=9/seqid=0,
row-04/colfam1:col-08/8/Put/vlen=9/seqid=0,
row-04/colfam1:col-09/9/Put/vlen=9/seqid=0,
row-04/colfam1:col-10/10/Put/vlen=9/seqid=0}
■ 单列值过滤器 (SingleColumnValueFilter)
-------------------------------------------------------------------------------------------------------------------------------------
当用一个列的值精确匹配来决定整个行是否返回到结果中时,使用这个过滤器。需要首先指定要跟踪的列,然后指定要检查的值。
SingleColumnValueFilter(final byte[] family, final byte[] qualifier, final CompareOp compareOp, final byte[] value)
SingleColumnValueFilter(final byte[] family, final byte[] qualifier, final CompareOp compareOp, final ByteArrayComparable comparator)
protected SingleColumnValueFilter(final byte[] family, final byte[] qualifier, final CompareOp compareOp,
ByteArrayComparable comparator, final boolean filterIfMissing, final boolean latestVersionOnly)
第一个构造器很简单,在内部创建 BinaryComparator 实例。第二个构造器与一直使用的基于 CompareFilter 的参数相同。虽然 SingleColumnValueFilter
不是直接继承自 CompareFilter, 它们还是具有相同的构造器参数类型。第三个构造器,增添了两个额外的 boolean 标志,这两个标志也可以在过滤器实例
构造之后,通过 getter 和 setter 方法设置:
boolean getFilterIfMissing()
void setFilterIfMissing(boolean filterIfMissing)
boolean getLatestVersionOnly()
void setLatestVersionOnly(boolean latestVersionOnly)
前面的方法用于控制如果行中根本没有指定的列如何处理。默认是包含在结果中,但可以通过 setFilterIfMissing(true) 来反转这种行为,那样,所有没有
引用列的行都从结果中移除。
通过 setLatestVersionOnly(false), 默认为 true, 可以改变过滤器的默认行为,也就是只检查引用列的最新版本,改变默认行为只会,也会在检查中包含
以前的版本。
示例: Example using a filter to return only rows with a given value in a given column
SingleColumnValueFilter filter = new SingleColumnValueFilter(
Bytes.toBytes("colfam1"),
Bytes.toBytes("col-5"),
CompareFilter.CompareOp.NOT_EQUAL,
new SubstringComparator("val-5"));
filter.setFilterIfMissing(true);
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner.close();
Get get = new Get(Bytes.toBytes("row-6"));
get.setFilter(filter);
Result result = table.get(get);
System.out.println("Result of get: ");
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
输出:
Adding rows to table...
Results of scan:
Cell: row-1/colfam1:col-1/1427279447557/Put/vlen=7/seqid=0, Value: val-1.1
Cell: row-1/colfam1:col-10/1427279447613/Put/vlen=8/seqid=0, Value: val-1.10
...
Cell: row-4/colfam2:col-8/1427279447667/Put/vlen=7/seqid=0, Value: val-4.8
Cell: row-4/colfam2:col-9/1427279447669/Put/vlen=7/seqid=0, Value: val-4.9
Cell: row-6/colfam1:col-1/1427279447692/Put/vlen=7/seqid=0, Value: val-6.1
Cell: row-6/colfam1:col-10/1427279447709/Put/vlen=8/seqid=0, Value: val-6.10
...
Cell: row-9/colfam2:col-8/1427279447759/Put/vlen=7/seqid=0, Value: val-9.8
Cell: row-9/colfam2:col-9/1427279447761/Put/vlen=7/seqid=0, Value: val-9.9
Result of get:
Cell: row-6/colfam1:col-1/1427279447692/Put/vlen=7/seqid=0, Value: val-6.1
Cell: row-6/colfam1:col-10/1427279447709/Put/vlen=8/seqid=0, Value: val-6.10
...
Cell: row-6/colfam2:col-8/1427279447705/Put/vlen=7/seqid=0, Value: val-6.8
Cell: row-6/colfam2:col-9/1427279447707/Put/vlen=7/seqid=0, Value: val-6.9
■ 单列排除过滤器 (SingleColumnValueExcludeFilter)
-------------------------------------------------------------------------------------------------------------------------------------
SingleColumnValueExcludeFilter 继承自 SingleColumnValueFilter,经扩展后提供一种稍微不同的语义:参考列从结果中排除。换句话说,用户可以
使用与 SingleColumnValueFilter 相同的构造器、方法以及特性来控制此过滤器的工作。唯一不同的是,客户端的 Result 实例中,永远不会获得作为
检查目标的参考列,因为它是在结果中被排除的。
■ 时间戳过滤器 (TimestampsFilter)
-------------------------------------------------------------------------------------------------------------------------------------
当需要在扫描结果中对版本进行细粒度的控制时,这个过滤器可以满足需求。需要传入一个装载了时间戳的 List 实例:
TimestampsFilter(List<Long> timestamps)
如前所述,一个版本(version) 是指一个列在一个特定时间的值,因此用一个时间戳(timestamp) 来表示。当过滤器请求一系列的时间戳时,它会找到与
其中时间戳精确匹配的列版本。
示例: Example filtering data by timestamps, sets up a filter with three timestamps and adds a time range to the second scan
List<Long> ts = new ArrayList<Long>();
//Add timestamps to the list.
ts.add(new Long(5));
ts.add(new Long(10));
ts.add(new Long(15));
Filter filter = new TimestampsFilter(ts);
Scan scan1 = new Scan();
//Add the filter to an otherwise default Scan instance.
scan1.setFilter(filter);
ResultScanner scanner1 = table.getScanner(scan1);
for (Result result : scanner1) {
System.out.println(result);
}
scanner1.close();
Scan scan2 = new Scan();
scan2.setFilter(filter);
//Also add a time range to verify how it affects the filter
scan2.setTimeRange(8, 12);
ResultScanner scanner2 = table.getScanner(scan2);
for (Result result : scanner2) {
System.out.println(result);
}
scanner2.close();
输出:
Adding rows to table...
Results of scan #1:
keyvalues={row-1/colfam1:col-10/10/Put/vlen=8/seqid=0,
row-1/colfam1:col-15/15/Put/vlen=8/seqid=0,
row-1/colfam1:col-5/5/Put/vlen=7/seqid=0}
keyvalues={row-100/colfam1:col-10/10/Put/vlen=10/seqid=0,
row-100/colfam1:col-15/15/Put/vlen=10/seqid=0,
row-100/colfam1:col-5/5/Put/vlen=9/seqid=0}
...
keyvalues={row-99/colfam1:col-10/10/Put/vlen=9/seqid=0,
row-99/colfam1:col-15/15/Put/vlen=9/seqid=0,
row-99/colfam1:col-5/5/Put/vlen=8/seqid=0}
Results of scan #2:
keyvalues={row-1/colfam1:col-10/10/Put/vlen=8/seqid=0}
keyvalues={row-10/colfam1:col-10/10/Put/vlen=9/seqid=0}
...
keyvalues={row-98/colfam1:col-10/10/Put/vlen=9/seqid=0}
keyvalues={row-99/colfam1:col-10/10/Put/vlen=9/seqid=0}
■ 随机行过滤器 (RandomRowFilter)
-------------------------------------------------------------------------------------------------------------------------------------
最后,有一种过滤器可以让结果中包含随机行。构造器需要传入参数 chance, 取值在 0.0 到 1.0 之间。
RandomRowFilter(float chance)
在过滤器内部会使用 Java 的 Random.nextFloat() 调用来决定一行是否被过滤,使用这个方法的结果会与设定的 chance 进行比较。如果 chance 值
为负值会导致所有结果被过滤掉,相反,如果 chance 为大于 1.0 则结果集中包含所有行。
示例: Example filtering rows randomly
Filter filter = new RandomRowFilter(0.5f);
for (int loop = 1; loop <= 3; loop++) {
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(Bytes.toString(result.getRow()));
}
scanner.close();
}
输出:
Adding rows to table...
Results of scan for loop: 1
row-1
row-10
row-3
row-9
Results of scan for loop: 2
row-10
row-2
row-3
row-5
row-6
row-8
Results of scan for loop: 3
row-1
row-3
row-4
row-8
row-9
1.4 装饰过滤器 (Decorating Filters)
-----------------------------------------------------------------------------------------------------------------------------------------
HBase 提供的过滤器已十分强大,有时需要对这些过滤器进行修改、或者扩展,以对过滤器的行为进行更多地控制其返回结果非常有用。这些额外的控制不
依赖于过滤器本身,而是应用到过滤器上。这就是装饰过滤器(decorating filter)这组类的作用。
装饰过滤器和其它单一目的的过滤器类似,实现 Filter 接口。因此,它们可以用于这些过滤器的替代,配合它们所封装的过滤器实例的行为。
■ SkipFilter
-------------------------------------------------------------------------------------------------------------------------------------
这个过滤器封装了一个给定的过滤器并扩展它,当封装的过滤器建议某个 Cell 被忽略时,排除一整行。换句话说,只要一个过滤器指出某个列在其行中
被忽略,则整个行就被忽略。
NOTE:
---------------------------------------------------------------------------------------------------------------------------------
被封装的过滤器必须实现 filterKeyValue() 方法,否则 SkipFilter 不会如期工作。这是因为 SkipFilter 只检查该方法的返回结果以决定如何
处理当前的行。
下面的示例配合使用带有一个 ValueFilter 的 SkipFilter, 首先选择所有非零值的列(column), 之后丢弃所有其它不含有某个匹配值的行。
示例: Example of using a filter to skip entire rows based on another filter’s results
Filter filter1 = new ValueFilter(CompareFilter.CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes("val-0")));
Scan scan = new Scan();
//Only add the ValueFilter to the first scan.
scan.setFilter(filter1);
ResultScanner scanner1 = table.getScanner(scan);
for (Result result : scanner1) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner1.close();
//Add the decorating skip filter for the second scan.
Filter filter2 = new SkipFilter(filter1);
scan.setFilter(filter2);
ResultScanner scanner2 = table.getScanner(scan);
for (Result result : scanner2) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner2.close();
输出:
Adding rows to table...
Results of scan #1:
Cell: row-01/colfam1:col-01/1/Put/vlen=5/seqid=0, Value: val-4
Cell: row-01/colfam1:col-02/2/Put/vlen=5/seqid=0, Value: val-4
Cell: row-01/colfam1:col-03/3/Put/vlen=5/seqid=0, Value: val-1
Cell: row-01/colfam1:col-04/4/Put/vlen=5/seqid=0, Value: val-3
Cell: row-01/colfam1:col-05/5/Put/vlen=5/seqid=0, Value: val-1
Cell: row-02/colfam1:col-01/1/Put/vlen=5/seqid=0, Value: val-1
Cell: row-02/colfam1:col-03/3/Put/vlen=5/seqid=0, Value: val-2
Cell: row-02/colfam1:col-04/4/Put/vlen=5/seqid=0, Value: val-4
Cell: row-02/colfam1:col-05/5/Put/vlen=5/seqid=0, Value: val-2
...
Cell: row-30/colfam1:col-01/1/Put/vlen=5/seqid=0, Value: val-2
Cell: row-30/colfam1:col-02/2/Put/vlen=5/seqid=0, Value: val-4
Cell: row-30/colfam1:col-03/3/Put/vlen=5/seqid=0, Value: val-4
Cell: row-30/colfam1:col-05/5/Put/vlen=5/seqid=0, Value: val-4
Total cell count for scan #1: 124
Results of scan #2:
Cell: row-01/colfam1:col-01/1/Put/vlen=5/seqid=0, Value: val-4
Cell: row-01/colfam1:col-02/2/Put/vlen=5/seqid=0, Value: val-4
Cell: row-01/colfam1:col-03/3/Put/vlen=5/seqid=0, Value: val-1
Cell: row-01/colfam1:col-04/4/Put/vlen=5/seqid=0, Value: val-3
Cell: row-01/colfam1:col-05/5/Put/vlen=5/seqid=0, Value: val-1
Cell: row-06/colfam1:col-01/1/Put/vlen=5/seqid=0, Value: val-4
Cell: row-06/colfam1:col-02/2/Put/vlen=5/seqid=0, Value: val-4
Cell: row-06/colfam1:col-03/3/Put/vlen=5/seqid=0, Value: val-4
Cell: row-06/colfam1:col-04/4/Put/vlen=5/seqid=0, Value: val-3
Cell: row-06/colfam1:col-05/5/Put/vlen=5/seqid=0, Value: val-2
...
Cell: row-28/colfam1:col-01/1/Put/vlen=5/seqid=0, Value: val-2
Cell: row-28/colfam1:col-02/2/Put/vlen=5/seqid=0, Value: val-1
Cell: row-28/colfam1:col-03/3/Put/vlen=5/seqid=0, Value: val-2
Cell: row-28/colfam1:col-04/4/Put/vlen=5/seqid=0, Value: val-4
Cell: row-28/colfam1:col-05/5/Put/vlen=5/seqid=0, Value: val-2
Total cell count for scan #2: 55
■ WhileMatchFilter
-------------------------------------------------------------------------------------------------------------------------------------
这个过滤器与上一个类似,不过当一条数据被过滤掉时,它会直接放弃本次扫描操作。它使用其封装的过滤器来检查它是否通过其 row key 来忽略一行,
或者某个 Cell 的检查忽略某个行的列。
示例: Example of using a filter to skip entire rows based on another filter’s results
Filter filter1 = new RowFilter(CompareFilter.CompareOp.NOT_EQUAL,
new BinaryComparator(Bytes.toBytes("row-05")));
Scan scan = new Scan();
scan.setFilter(filter1);
ResultScanner scanner1 = table.getScanner(scan);
for (Result result : scanner1) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner1.close();
Filter filter2 = new WhileMatchFilter(filter1);
scan.setFilter(filter2);
ResultScanner scanner2 = table.getScanner(scan);
for (Result result : scanner2) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner2.close();
输出:
Adding rows to table...
Results of scan #1:
Cell: row-01/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-01.01
Cell: row-02/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-02.01
Cell: row-03/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-03.01
Cell: row-04/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-04.01
Cell: row-06/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-06.01
Cell: row-07/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-07.01
Cell: row-08/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-08.01
Cell: row-09/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-09.01
Cell: row-10/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-10.01
Total cell count for scan #1: 9
Results of scan #2:
Cell: row-01/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-01.01
Cell: row-02/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-02.01
Cell: row-03/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-03.01
Cell: row-04/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-04.01
Total cell count for scan #2: 4
1.5 FilterList
-----------------------------------------------------------------------------------------------------------------------------------------
到目前为止,已经看到了过滤器如何在一个表的各种维度上进行过滤工作,从行,到列,再到一个列的各种版本的值。在实际应用中,可能需要多个过滤器
共同限制返回到客户端的结果,FilterList 提供了这项功能。构造器:
FilterList(final List<Filter> rowFilters)
FilterList(final Filter... rowFilters)
FilterList(final Operator operator)
FilterList(final Operator operator, final List<Filter> rowFilters)
FilterList(final Operator operator, final Filter... rowFilters)
rowFilters 参数指定过滤器列表,用于一起访问,使用 operator 来联合它们的结果。可能的值如下表:
Possible values for the FilterList.Operator enumeration
+---------------+--------------------------------------------------------------------------------------------------
| Operator | Description
+---------------+--------------------------------------------------------------------------------------------------
| MUST_PASS_ALL | A value is only included in the result when all filters agree to do so
+---------------+--------------------------------------------------------------------------------------------------
| MUST_PASS_ONE | As soon as a value was allowed to pass one of the filters, it is included in the overall result.
+---------------+--------------------------------------------------------------------------------------------------
默认值为 MUST_PASS_ALL。
在 FilterList 实例已经创建之后向其添加过滤器,可以通过如下方法:
void addFilter(Filter filter)
可以更进一步控制列表中所包含的过滤器执行的顺序,选择一个合适的 List 实现,例如,ArrayList 会保证过滤器的执行顺序与它们加入列表的顺序相同。
示例: Example of using a filter list to combine single purpose filters
List<Filter> filters = new ArrayList<Filter>();
Filter filter1 = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("row-03")));
filters.add(filter1);
Filter filter2 = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("row-06")));
filters.add(filter2);
Filter filter3 = new QualifierFilter(CompareFilter.CompareOp. EQUAL, new RegexStringComparator("col-0[03]"));
filters.add(filter3);
FilterList filterList1 = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList1);
ResultScanner scanner1 = table.getScanner(scan);
for (Result result : scanner1) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner1.close();
FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters);
scan.setFilter(filterList2);
ResultScanner scanner2 = table.getScanner(scan);
for (Result result : scanner2) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner2.close();
输出:
Adding rows to table...
Results of scan #1 - MUST_PASS_ALL:
Cell: row-03/colfam1:col-03/3/Put/vlen=9/seqid=0, Value: val-03.03
258 Chapter 4: Client API: Advanced Features
Cell: row-04/colfam1:col-03/3/Put/vlen=9/seqid=0, Value: val-04.03
Cell: row-05/colfam1:col-03/3/Put/vlen=9/seqid=0, Value: val-05.03
Cell: row-06/colfam1:col-03/3/Put/vlen=9/seqid=0, Value: val-06.03
Total cell count for scan #1: 4
Results of scan #2 - MUST_PASS_ONE:
Cell: row-01/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-01.01
Cell: row-01/colfam1:col-02/2/Put/vlen=9/seqid=0, Value: val-01.02
...
Cell: row-10/colfam1:col-04/4/Put/vlen=9/seqid=0, Value: val-10.04
Cell: row-10/colfam1:col-05/5/Put/vlen=9/seqid=0, Value: val-10.05
Total cell count for scan #2: 50
1.6 自定义过滤器 (Custom Filters)
-----------------------------------------------------------------------------------------------------------------------------------------
用户可能需要按各自的需求实现自定义过滤器,可以实现 Filter 接口,或者直接继承 FilterBase 类,该类已经为接口中所有成员方法提供了默认实现。
Filter 接口有如下结构:
public abstract class Filter {
public enum ReturnCode {
INCLUDE, INCLUDE_AND_NEXT_COL, SKIP, NEXT_COL, NEXT_ROW,
SEEK_NEXT_USING_HINT
}
public void reset() throws IOException
public boolean filterRowKey(byte[] buffer, int offset, int length) throws IOException
public boolean filterAllRemaining() throws IOException
public ReturnCode filterKeyValue(final Cell v) throws IOException
public Cell transformCell(final Cell v) throws IOException
public void filterRowCells(List<Cell> kvs) throws IOException
public boolean hasFilterRow()
public boolean filterRow() throws IOException
public Cell getNextCellHint(final Cell currentKV) throws IOException
public boolean isFamilyEssential(byte[] name) throws IOException
public void setReversed(boolean reversed)
public boolean isReversed()
public byte[] toByteArray() throws IOException
public static Filter parseFrom(final byte[] pbBytes)
throws DeserializationException
}
接口提供了一个公共的枚举类型 ReturnCode, 用于 filterKeyValue() 方法的返回值,指明执行框架下一步应进行的操作。过滤器可以选择跳过一个值,
一个列的剩余部分或一行的剩余部分,而不是遍历所有的数据。因此获取数据的效率会大大提升。
Possible values for the Filter.ReturnCode enumeration
+-----------------------+--------------------------------------------------------------------------------------------------------------
| Return code | Description
+-----------------------+--------------------------------------------------------------------------------------------------------------
| INCLUDE | Include the given Cell instance in the result
+-----------------------+--------------------------------------------------------------------------------------------------------------
| INCLUDE_AND_NEXT_COL | Include current cell and move to next column, i.e. skip all further versions of the current.
+-----------------------+--------------------------------------------------------------------------------------------------------------
| SKIP | Skip the current cell and proceed to the next.
+-----------------------+--------------------------------------------------------------------------------------------------------------
| NEXT_COL | Skip the remainder of the current column, proceeding to the next. This is used by the TimestampsFilter
+-----------------------+--------------------------------------------------------------------------------------------------------------
| NEXT_ROW | Similar to the previous, but skips the remainder of the current row, moving to the next.
| | The RowFilter makes use of this return code, for example.
+-----------------------+--------------------------------------------------------------------------------------------------------------
| SEEK_NEXT_USING_HINT | Some filters want to skip a variable number of cells and use this return code to indicate that the framework
| | should use the getNextCellHint() method to determine where to skip to. The ColumnPrefixFilter, for example,
| | uses this feature.
+-----------------------+--------------------------------------------------------------------------------------------------------------
示例: Implements a filter that lets certain rows pass
public class CustomFilter extends FilterBase {
private byte[] value = null;
private boolean filterRow = true;
public CustomFilter() {
super();
}
public CustomFilter(byte[] value) {
this.value = value;
}
@Override
public void reset() {
this.filterRow = true;
}
@Override
public ReturnCode filterKeyValue(Cell cell) {
if (CellUtil.matchingValue(cell, value)) {
filterRow = false;
}
return ReturnCode.INCLUDE;
}
@Override
public boolean filterRow() {
return filterRow;
}
@Override
public byte [] toByteArray() {
FilterProtos.CustomFilter.Builder builder =
FilterProtos.CustomFilter.newBuilder();
if (value != null) builder.setValue(ByteStringer.wrap(value));
return builder.build().toByteArray();
}
//@Override
public static Filter parseFrom(final byte[] pbBytes) throws DeserializationException {
FilterProtos.CustomFilter proto;
try {
proto = FilterProtos.CustomFilter.parseFrom(pbBytes);
} catch (InvalidProtocolBufferException e) {
throw new DeserializationException(e);
}
return new CustomFilter(proto.getValue().toByteArray());
}
}
//Example using a custom filter
List<Filter> filters = new ArrayList<Filter>();
Filter filter1 = new CustomFilter(Bytes.toBytes("val-05.05"));
filters.add(filter1);
Filter filter2 = new CustomFilter(Bytes.toBytes("val-02.07"));
filters.add(filter2);
Filter filter3 = new CustomFilter(Bytes.toBytes("val-09.01"));
filters.add(filter3);
FilterList filterList = new FilterList(
FilterList.Operator.MUST_PASS_ONE, filters);
Scan scan = new Scan();
scan.setFilter(filterList);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
for (Cell cell : result.rawCells()) {
System.out.println("Cell: " + cell + ", Value: " +
Bytes.toString(cell.getValueArray(), cell.getValueOffset(),
cell.getValueLength()));
}
}
scanner.close();
输出:
Adding rows to table...
Results of scan:
Cell: row-02/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-02.01
Cell: row-02/colfam1:col-02/2/Put/vlen=9/seqid=0, Value: val-02.02
...
Cell: row-02/colfam1:col-06/6/Put/vlen=9/seqid=0, Value: val-02.06
Cell: row-02/colfam1:col-07/7/Put/vlen=9/seqid=0, Value: val-02.07
Cell: row-02/colfam1:col-08/8/Put/vlen=9/seqid=0, Value: val-02.08
...
Cell: row-05/colfam1:col-04/4/Put/vlen=9/seqid=0, Value: val-05.04
Cell: row-05/colfam1:col-05/5/Put/vlen=9/seqid=0, Value: val-05.05
Cell: row-05/colfam1:col-06/6/Put/vlen=9/seqid=0, Value: val-05.06
...
Cell: row-05/colfam1:col-10/10/Put/vlen=9/seqid=0, Value: val-05.10
Cell: row-09/colfam1:col-01/1/Put/vlen=9/seqid=0, Value: val-09.01
Cell: row-09/colfam1:col-02/2/Put/vlen=9/seqid=0, Value: val-09.02
...
Cell: row-09/colfam1:col-09/9/Put/vlen=9/seqid=0, Value: val-09.09
Cell: row-09/colfam1:col-10/10/Put/vlen=9/seqid=0, Value: val-09.10
自定义过滤器载入 (Custom Filter Loading)
-------------------------------------------------------------------------------------------------------------------------------------
完成过滤器的编写之后,需要将其部署到 HBase 上。首先需要编译好过滤器类,然后打包成 JAR, 并保证可以被 region 服务器调用。
可以使用编译系统来准备配置用的 JAR 文件,同时使用配置管理系统把文件分发到每个 region 服务器中。文件分发完成后,有两种选择载入 JAR 文件
静态配置 (Static Configuration):
这种情况下,需要将 JAR 文件配置到 hbase-env.sh 文件中,如:
# Extra Java CLASSPATH elements. Optional.
# export HBASE_CLASSPATH=
export HBASE_CLASSPATH="/hbase-book/ch04/target/hbase-bookch04-2.0.jar
动态载入 (Dynamic Loading):
利用集群范围的,在 HDFS 中共享 JAR 文件目录来载入 JAR 文件。hbase-default.xml 文件中有如下属性设置:
<property>
<name>hbase.dynamic.jars.dir</name>
<value>${hbase.rootdir}/lib</value>
</property>
默认指向 ${hbase.rootdir}/lib,通常在 HDFS 中解析为 /hbase/lib/ . 完整路径类似为:hdfs://master.foobar.com:9000/hbase/lib , 如果该
目录存在,并且含有以 .jar 结尾的文件,则服务器会载入这些文件,并使其中包含的类可用。
1.7 过滤器解析工具 (Filter Parser Utility)
-----------------------------------------------------------------------------------------------------------------------------------------
客户端过滤器包是另一个辅助类,名为 ParseFilter. 它用于所有过滤器需要描述为文本的地方,最终转换为一个 Java class。一般出现在网关服务器上(
gateway servers), 如 REST or Thrift. HBase Shell 也使用这个类允许 shell 用户则命令行上指定一个过滤器,然后作为后续扫描或者 get 操作的一部分
执行过滤器,例如:
hbase(main):001:0> scan 'testtable', { FILTER => "PrefixFilter('row-2') AND QualifierFilter(<=,'binary:col-2')" }
输出:
ROW COLUMN+CELL
row-20 column=colfam1:col-0, timestamp=7, value=val-46
row-21 column=colfam1:col-0, timestamp=7, value=val-87
row-21 column=colfam1:col-2, timestamp=5, value=val-26
...
row-28 column=colfam1:col-2, timestamp=3, value=val-74
row-29 column=colfam1:col-1, timestamp=0, value=val-86
row-29 column=colfam1:col-2, timestamp=3, value=val-21
10 row(s) in 0.0170 seconds
"binary:col-2" 参数。冒号分隔的第二部分是过滤器处理的值。第一部分是过滤器解析器类允许为过滤器指定的一个基于 CompareFilter 的比较器。支持的
比较器前缀如下表:
String representation of Comparator types
+---------------+----------------------------
| String | Type
+---------------+----------------------------
| binary | BinaryComparator
+---------------+----------------------------
| binaryprefix | BinaryPrefixComparator
+---------------+----------------------------
| regexstring | RegexStringComparator
+---------------+----------------------------
| substring | SubstringComparator
+---------------+----------------------------
由于比较过滤器也需要一个比较操作,因此有字符串格式的表示方法。例如上例子中的 "<=" 表示小于或等于。下表列出可用的字符串表示的操作符类型:
String representation of compare operation
+---------------+----------------------------
| String | Type
+---------------+----------------------------
| < | CompareOp.LESS
+---------------+----------------------------
| <= | CompareOp.LESS_OR_EQUAL
+---------------+----------------------------
| > | CompareOp.GREATER
+---------------+----------------------------
| >= | CompareOp.GREATER_OR_EQUAL
+---------------+----------------------------
| = | CompareOp.EQUAL
+---------------+----------------------------
| != | CompareOp.NOT_EQUAL
+---------------+----------------------------
过滤器解析器支持一些基于 token 的文本翻译成过滤器类。可以配合过滤器使用 AND 和 OR 关键字,会被之后翻译为 FilterList 实例,或者设置为
MUST_PASS_ALL, 或者 MUST_PASS_ONE
示例:
hbase(main):001:0> scan 'testtable',{ FILTER => "(PrefixFilter('row-2') AND (QualifierFilter(>=, 'binary:col-2'))) AND (TimestampsFilter(1,
5))" }
输出:
ROW COLUMN+CELL
row-2 column=colfam1:col-9, timestamp=5, value=val-31
row-21 column=colfam1:col-2, timestamp=5, value=val-26
row-23 column=colfam1:col-5, timestamp=5, value=val-55
row-28 column=colfam1:col-5, timestamp=1, value=val-54
4 row(s) in 0.3190 seconds
最后,可以使用 SKIP 和 WHILE 关键字表示 SkipFilter 和 WhileMatchFilter.
示例:
hbase(main):001:0> scan 'testtable',{ FILTER => "SKIP ValueFilter(>=, 'binary:val-5') " }
输出:
ROW COLUMN+CELL
row-11 column=colfam1:col-0, timestamp=8, value=val-82
row-48 column=colfam1:col-3, timestamp=6, value=val-55
row-48 column=colfam1:col-7, timestamp=3, value=val-80
row-48 column=colfam1:col-8, timestamp=2, value=val-65
row-7 column=colfam1:col-9, timestamp=6, value=val-57
3 row(s) in 0.0150 seconds
Precedence of string keywords
+---------------+-------------------------------------------------------------------------------------------
| Keyword | Description
+---------------+-------------------------------------------------------------------------------------------
| SKIP/WHILE | Wrap filter into SkipFilter, or WhileMatchFilter instance.
+---------------+-------------------------------------------------------------------------------------------
| AND | Add both filters left and right of keyword to FilterList instance using MUST_PASS_ALL.
+---------------+-------------------------------------------------------------------------------------------
| OR | Add both filters left and right of keyword to FilterList instance using MUST_PASS_ONE.
+---------------+-------------------------------------------------------------------------------------------
代码中,可以调用如下方法来解析字符串为类实例:
Filter parseFilterString(String filterString) throws CharacterCodingException
Filter parseFilterString (byte[] filterStringAsByteArray) throws CharacterCodingException
Filter parseSimpleFilterExpression(byte[] filterStringAsByteArray) throws CharacterCodingException
ParseFilter 默认情况下只支持 HBase 自带过滤器的解析,不支持 FirstKeyValueMatchingQualifiersFilter, FuzzyRowFilter, and RandomRowFilter
在自己的代码中,可以注册自定义过滤器,并获取支持的过滤器列表,使用如下方法:
static Map<String, String> getAllFilters()
Set<String> getSupportedFilters()
static void registerFilter(String name, String filterClass)
本篇完
参考: