参数基础
有两个参数类在各类Filter中经常出现,统一介绍下:
(1)比较运算符
CompareFilter.CompareOp
比较运算符用于定义比较关系,
可以有以下几类值供选择:
- EQUAL 相等
- GREATER 大于
- GREATER_OR_EQUAL 大于等于
- LESS 小于
- LESS_OR_EQUAL 小于等于
- NOT_EQUAL 不等于
(2)比较器
ByteArrayComparable
通过比较器可以实现多样化目标匹配效果,比较器
有以下子类可以使用:
- BinaryComparator 匹配完整字节数组
- BinaryPrefixComparator 匹配字节数组前缀
- BitComparator
- NullComparator
- RegexStringComparator 正则表达式匹配
- SubstringComparator 子串匹配
1,FilterList
FilterList 代表一个过滤器链
,它可以包含一组即将应用于目标数据集的过滤器
,过滤器间具有“与”
FilterList.Operator.MUST_PASS_ALL
和“或”
FilterList.Operator.MUST_PASS_ONE
关系。
官网实例代码,
两个
“
或”
关系的
过滤器
的写法:
- FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE); //数据只要满足一组过滤器中的一个就可以
- SingleColumnValueFilter filter1 = new SingleColumnValueFilter(cf,column,CompareOp.EQUAL,Bytes.toBytes("my value"));
- list.add(filter1);
- SingleColumnValueFilter filter2 = new SingleColumnValueFilter(cf,column,CompareOp.EQUAL,Bytes.toBytes("my other value"));
- list.add(filter2);
- Scan scan = new Scan();
- scan.setFilter(list);
2,列值过滤器--SingleColumnValueFilter
SingleColumnValueFilter 用于测试列值相等 (CompareOp.EQUAL ), 不等 (CompareOp.NOT_EQUAL),或单侧范围 (e.g., CompareOp.GREATER)。
构造函数:
(1)比较的关键字是一个字符数组
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, byte[] value)
(2)比较的关键字是一个比较器(比较器下一小节做介绍)
SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, ByteArrayComparable comparator)
测试表user内容如下:
java代码测试:
- Table table = connection.getTable(TableName.valueOf("user"));
- SingleColumnValueFilter scvf= new SingleColumnValueFilter(Bytes.toBytes("account"), Bytes.toBytes("name"),
- CompareOp.EQUAL,"zhangsan".getBytes());
- scvf.setFilterIfMissing(true); //默认为false, 没有此列的数据也会返回 ,为true则只返回name=lisi的数据
- Scan scan = new Scan();
- scan.setFilter(scvf);
- ResultScanner resultScanner = table.getScanner(scan);
- for (Result result : resultScanner) {
- List<Cell> cells= result.listCells();
- for (Cell cell : cells) {
- String row = Bytes.toString(result.getRow());
- String family1 = Bytes.toString(CellUtil.cloneFamily(cell));
- String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
- String value = Bytes.toString(CellUtil.cloneValue(cell));
- System.out.println("[row:"+row+"],[family:"+family1+"],[qualifier:"+qualifier+"]"
- + ",[value:"+value+"],[time:"+cell.getTimestamp()+"]");
- }
- }
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
下面 红色是匹配列内容的会返回,其他的不是account:name列也会返回,, name=lisi的不会返回,因为不匹配。
- [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
- [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
- [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]
- [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]
- [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
- [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]
- [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]
- [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
- [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
- <span style="color:#ff0000;">[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]</span>
- [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
- [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
- [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]
3 键值元数据
由于HBase 采用键值对保存内部数据,键值元数据过滤器评估一行的键
(ColumnFamily:Qualifiers)
是否存在
3.1. 基于列族过滤数据的FamilyFilter
构造函数:
FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator)
代码如下:
- public static ResultScanner getDataFamilyFilter(String tableName,String family) throws IOException{
- Table table = connection.getTable(TableName.valueOf("user"));
- FamilyFilter ff = new FamilyFilter(CompareOp.EQUAL ,
- new BinaryComparator(Bytes.toBytes("account"))); //表中不存在account列族,过滤结果为空
- // new BinaryPrefixComparator(value) //匹配字节数组前缀
- // new RegexStringComparator(expr) // 正则表达式匹配
- // new SubstringComparator(substr)// 子字符串匹配
- Scan scan = new Scan();
- // 通过scan.addFamily(family) 也可以实现此操作
- scan.setFilter(ff);
- ResultScanner resultScanner = table.getScanner(scan);
- return resultScanner;
- }
- [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
- [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
- [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
3.2. 基于限定符Qualifier(列)过滤数据的QualifierFilter
构造函数:
QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator)
- Table table = connection.getTable(TableName.valueOf("user"));
- QualifierFilter ff = new QualifierFilter(
- CompareOp.EQUAL , new BinaryComparator(Bytes.toBytes("name")));
- // new BinaryPrefixComparator(value) //匹配字节数组前缀
- // new RegexStringComparator(expr) // 正则表达式匹配
- // new SubstringComparator(substr)// 子字符串匹配
- Scan scan = new Scan();
- // 通过scan.addFamily(family) 也可以实现此操作
- scan.setFilter(ff);
- ResultScanner resultScanner = table.getScanner(scan);
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
3.3. 基于列名(即Qualifier)前缀过滤数据的ColumnPrefixFilter ( 该功能用QualifierFilter也能实现 )
构造函数:
ColumnPrefixFilter(byte[] prefix)
- Table table = connection.getTable(TableName.valueOf("user"));
- ColumnPrefixFilter ff = new ColumnPrefixFilter(Bytes.toBytes("name"));
- Scan scan = new Scan();
- // 通过QualifierFilter的 newBinaryPrefixComparator也可以实现
- scan.setFilter(ff);
- ResultScanner resultScanner = table.getScanner(scan);
返回结果:
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
3.4. 基于多个列名(即Qualifier)前缀过滤数据的MultipleColumnPrefixFilter
MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行为差不多,但可以指定多个前缀
- byte[][] prefixes = new byte[][] {Bytes.toBytes("name"), Bytes.toBytes("age")};
- //返回所有行中以name或者age打头的列的数据
- MultipleColumnPrefixFilter ff = new MultipleColumnPrefixFilter(prefixes);
- Scan scan = new Scan();
- scan.setFilter(ff);
- ResultScanner rs = table.getScanner(scan);
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
- [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
3.5. 基于列范围过滤数据ColumnRangeFilter
构造函数:
ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive)
参数解释:
- minColumn - 列范围的最小值,如果为空,则没有下限;
- minColumnInclusive - 列范围是否包含minColumn ;
- maxColumn - 列范围最大值,如果为空,则没有上限;
- maxColumnInclusive - 列范围是否包含maxColumn 。
代码:
- Table table = connection.getTable(TableName.valueOf("user"));
- byte[] startColumn = Bytes.toBytes("a");
- byte[] endColumn = Bytes.toBytes("d");
- //返回所有列中从a到d打头的范围的数据,
- ColumnRangeFilter ff = new ColumnRangeFilter(startColumn, true, endColumn, true);
- Scan scan = new Scan();
- scan.setFilter(ff);
- ResultScanner rs = table.getScanner(scan);
- [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]
- [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
4. RowKey
当需要根据行键特征查找一个范围的行数据时,使用Scan的
startRow和stopRow会更高效,但是,
startRow和stopRow只能匹配行键的开始字符,而不能匹配中间包含的字符
:
byte[] startColumn = Bytes.toBytes("azha");
byte[] endColumn = Bytes.toBytes("dddf");
Scan scan = new Scan(startColumn,endColumn);
当需要针对行键进行更复杂的过滤时,可以使用
RowFilter:
构造函数:
RowFilter(CompareFilter.CompareOp rowCompareOp, ByteArrayComparable rowComparator)
代码:
- Table table = connection.getTable(TableName.valueOf("user"));
- RowFilter rf = new RowFilter(CompareOp.EQUAL ,
- new SubstringComparator("zhangsan"));
- // new BinaryPrefixComparator(value) //匹配字节数组前缀
- // new RegexStringComparator(expr) // 正则表达式匹配
- // new SubstringComparator(substr)// 子字符串匹配
- Scan scan = new Scan();
- scan.setFilter(rf);
- ResultScanner rs = table.getScanner(scan);
结果:
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
- [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
- [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
- [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
- [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]
5.PageFilter
指定页面行数,返回对应行数的结果集。
需要注意的是,该过滤器并不能保证返回的结果行数小于等于指定的页面行数,因为过滤器是分别作用到各个region server的,它只能保证当前region返回的结果行数不超过指定页面行数。
构造函数:
PageFilter(long pageSize)
代码:
- Table table = connection.getTable(TableName.valueOf("user"));
- PageFilter pf = new PageFilter(2L);
- Scan scan = new Scan();
- scan.setFilter(pf);
- scan.setStartRow(Bytes.toBytes("zhangsan_"));
- ResultScanner rs = table.getScanner(scan);
结果:返回的结果实际上有四条,因为这数据来自不同RegionServer,
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
- [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]
6.SkipFilter
根据整行中的每个列来做过滤,只要存在一列不满足条件,整行都被过滤掉。
例如,如果一行中的所有列代表的是不同物品的重量,则真实场景下这些数值都必须大于零,我们希望将那些包含任意列值为0的行都过滤掉。
在这个情况下,我们结合ValueFilter和SkipFilter共同实现该目的:
scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,
new BinaryComparator(Bytes.toBytes(0))));
构造函数:
SkipFilter(Filter filter)
代码:
- Table table = connection.getTable(TableName.valueOf("user"));
- SkipFilter sf = new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,
- new BinaryComparator(Bytes.toBytes("zhangsan"))));
- Scan scan = new Scan();
- scan.setFilter(sf);
- ResultScanner rs = table.getScanner(scan);
- [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
- [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
- [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]
- [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]
- [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
- [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]
- [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]
- [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
- [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]
- [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
- [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
- [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]
- [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230]
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi]
- [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236]
- [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔]
- [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai]
- [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21]
- [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女]
- [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002]
- [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009]
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646]
- [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898]
- <strong><span style="color:#ff0000;">[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china]
- [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan]</span></strong>
- [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100]
- [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男]
- [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001]
7. FirstKeyOnlyFilter
该过滤器仅仅返回每一行中的第一个cell的值,
可以用于高效的执行行数统计操作。
构造函数:
public FirstKeyOnlyFilter()
代码
- Table table = connection.getTable(TableName.valueOf("user"));
- FirstKeyOnlyFilter fkof = new FirstKeyOnlyFilter();
- Scan scan = new Scan();
- scan.setFilter(fkof);
- ResultScanner rs = table.getScanner(scan);
结果: 看着返回数据还没明白, 仅仅返回每一行中的第一个cell的值, 可以用于高效的执行行数统计操作。
- [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]
- [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]
- [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]
- [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]
- [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]
- [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]
- [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]
- [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]
- [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]
- [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]
- [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]
- [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230]
- [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi]
- [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236]
- [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔]
- [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai]
- [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21]
- [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女]
- [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002]
- [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009]
- [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646]
- <strong><span style="color:#ff0000;">[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898]</span></strong>
- [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china]
- <strong><span style="color:#ff0000;">[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan]</span></strong>
- [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong]
- [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100]
- [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男]
- [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001]