大部分内容来自<hbase权威指南>,示例代码部分都用scala编写
介绍:
hbase的filter用来控制返回的数据,主要用在Get和Scan中,通过filter的限制可以指定返回列簇,列,时间戳和版本号.所有的filter都在服务端生效,叫做谓词下推.这样可以保证被过滤掉的数据不会被传送客户端.
在过滤层次结构的最底层是Filter接口和FilterBase抽象类,用户可以通过集成Filter和FilterBase实现自己的接口.使用getScan或get获取数据,返回一个ScannerResult或者Result,可以把,Result看做是一行,每个Result包装了KeyValue类,每个KeyValue类可以看做一个行列对应的值.
CompareFilter
CompareFilter是一组经常使用到的Filter,CompareFilter有一组子类,分别实现行,列簇,列....的过滤,在实例化CompareFilter的时候,需要传入两个参数.CompareOp和ByteArrayComparable.
CompareOp
CompareFilter.CompareOp.LESS | 匹配小于设置值的值 |
CompareFilter.CompareOp.LESS_OR_EQUAL | 匹配小于等于设置值的值 |
CompareFilter.CompareOp.EQUAL | 匹配等于设置值的值 |
CompareFilter.CompareOp.NOT_EQUAL | 匹配不等于设置值的值 |
CompareFilter.CompareOp.GREATER_OR_EQUAL | 匹配大于等于设置值的值 |
CompareFilter.CompareOp.GREATER | 匹配大于设置值的值 |
CompareFilter.CompareOp.NO_OP | 排除一切值 |
ByteArrayComparable实例
BinaryComparator | 使用Bytes.compareTo()比较当前值与阈值 |
BinaryPrefixComparator | 前缀匹配 |
NullComparator | 不做匹配,只判断当前值是否NULL |
BitComparator | 通过BitwiseOp类提供的按位与(AND)/或(OR)/异或(XOR)操作执行位级比较 |
RegexStringComparator | 根据一个正则表达式,在实例化这个比较器的时候去匹配表中的数据 |
SubstringComparator | 把阈值和表中的数据当做是一个String实例,同时通过contains()操作匹配字符串 |
LongComparator |
CompareFilter实例
RowFilter | 根据rowKey过滤数据,只留下符合匹配条件的行 |
FamilyFilter | 根据FamilyFilter过滤数据,留下符合条件的行和FamilyFiter |
QualifierFilter | 根据列名过滤数据 |
ValueFilter | 根据值过滤数据 |
DependentColumnFilter | 选定一个参考列,使用参考列的时间戳作为过滤条件,过滤时,每一行每一列与参考列的时间戳进行比较,就是返回一起修改的列 构造函数如下: DependentColumnFilter(final byte [] family, final byte[] qualifier, final boolean dropDependentColumn, final CompareOp valueCompareOp, final ByteArrayComparable valueComparator) DependentColumnFilter(final byte [] family, final byte [] qualifier) DependentColumnFilter(final byte [] family, final byte [] qualifier, final boolean dropDependentColumn) 构造器有三个,dropDependentColumn表示判断时间戳的结果是返回还是剔除,另外valueCompareOp/valueComparator两个参数表示可以在过滤的时候,连value一起过滤 |
scala代码实例:
import
scala.collection.JavaConverters._
val table = new HTable(HandleHbase. conf , tableName )
var scan = new Scan()
scan.addColumn(Bytes.toBytes( "cf0" ) , Bytes.toBytes( "qual6" ))
println( "--------------------row filter BinaryComparator -------------------------" )
val rowFilter1 = new RowFilter(CompareFilter.CompareOp. LESS_OR_EQUAL ,
new BinaryComparator(Bytes. toBytes( "row-22" )))
scan.setFilter(rowFilter1)
val rowScanner1 = table.getScanner(scan)
for (res <- rowScanner1.iterator().asScala){
println(res)
}
rowScanner1.close()
println( "--------------------row filter SubstringComparator -------------------------" )
val rowFilter2 = new RowFilter(CompareFilter.CompareOp. EQUAL ,
new SubstringComparator( "-3" ))
scan.setFilter(rowFilter2)
val rowScanner2 = table.getScanner(scan)
for (res <- rowScanner2.iterator().asScala){
println(res)
}
rowScanner2.close()
scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-2" ))
scan.setStopRow(Bytes.toBytes( "row-3" ))
println( "--------------------family filter BinaryComparator -------------------------" )
val familyFilter1 = new FamilyFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "cf1" )))
scan.setFilter(familyFilter1)
val familyScanner1 = table.getScanner(scan)
for (res <- familyScanner1.iterator().asScala){
println(res)
}
familyScanner1.close()
println( "--------------------qualifier filter BinaryComparator -------------------------" )
/** 不管列簇 , 只要有列 qual1 就输出 */
val qualifierFilter1 = new QualifierFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "qual1" )))
scan.setFilter(qualifierFilter1)
val qualifierScanner1 = table.getScanner(scan)
for (res <- qualifierScanner1.iterator().asScala){
println(res)
}
qualifierScanner1.close()
println( "--------------------value filter BinaryComparator -------------------------" )
val valueFilter1 = new ValueFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "val2" )))
scan.setFilter(valueFilter1)
val valueScanner1 = table.getScanner(scan)
for (res <- valueScanner1.iterator().asScala){
println(res)
}
valueScanner1.close()
println( "--------------------dependent column filter BinaryComparator -------------------------" )
/** 使用 cf0:qual1 这一列作为参考列 , 输出和这一列一起修改的列 */
val dependentFilter1 = new DependentColumnFilter(Bytes. toBytes( "cf0" ) , Bytes.toBytes( "qual1" ))
scan.setFilter(dependentFilter1)
val dependentScanner1 = table.getScanner(scan)
for (res <- dependentScanner1.iterator().asScala){
println(res)
}
dependentScanner1.close()
val table = new HTable(HandleHbase. conf , tableName )
var scan = new Scan()
scan.addColumn(Bytes.toBytes( "cf0" ) , Bytes.toBytes( "qual6" ))
println( "--------------------row filter BinaryComparator -------------------------" )
val rowFilter1 = new RowFilter(CompareFilter.CompareOp. LESS_OR_EQUAL ,
new BinaryComparator(Bytes. toBytes( "row-22" )))
scan.setFilter(rowFilter1)
val rowScanner1 = table.getScanner(scan)
for (res <- rowScanner1.iterator().asScala){
println(res)
}
rowScanner1.close()
println( "--------------------row filter SubstringComparator -------------------------" )
val rowFilter2 = new RowFilter(CompareFilter.CompareOp. EQUAL ,
new SubstringComparator( "-3" ))
scan.setFilter(rowFilter2)
val rowScanner2 = table.getScanner(scan)
for (res <- rowScanner2.iterator().asScala){
println(res)
}
rowScanner2.close()
scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-2" ))
scan.setStopRow(Bytes.toBytes( "row-3" ))
println( "--------------------family filter BinaryComparator -------------------------" )
val familyFilter1 = new FamilyFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "cf1" )))
scan.setFilter(familyFilter1)
val familyScanner1 = table.getScanner(scan)
for (res <- familyScanner1.iterator().asScala){
println(res)
}
familyScanner1.close()
println( "--------------------qualifier filter BinaryComparator -------------------------" )
/** 不管列簇 , 只要有列 qual1 就输出 */
val qualifierFilter1 = new QualifierFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "qual1" )))
scan.setFilter(qualifierFilter1)
val qualifierScanner1 = table.getScanner(scan)
for (res <- qualifierScanner1.iterator().asScala){
println(res)
}
qualifierScanner1.close()
println( "--------------------value filter BinaryComparator -------------------------" )
val valueFilter1 = new ValueFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "val2" )))
scan.setFilter(valueFilter1)
val valueScanner1 = table.getScanner(scan)
for (res <- valueScanner1.iterator().asScala){
println(res)
}
valueScanner1.close()
println( "--------------------dependent column filter BinaryComparator -------------------------" )
/** 使用 cf0:qual1 这一列作为参考列 , 输出和这一列一起修改的列 */
val dependentFilter1 = new DependentColumnFilter(Bytes. toBytes( "cf0" ) , Bytes.toBytes( "qual1" ))
scan.setFilter(dependentFilter1)
val dependentScanner1 = table.getScanner(scan)
for (res <- dependentScanner1.iterator().asScala){
println(res)
}
dependentScanner1.close()
专用Filter
专用Filter直接继承自抽象类FilterBase,专用的就是扩展性比较差,处理某些特定情况的时候比较方便.
SingleColumnValueFilter | 用列簇/列/值作为匹配条件,只有匹配特定列簇,列和值的行会被保留,其余的会被剔除(通过setFilterIfMissing设置缺失对应列簇,列的是否会被保留) |
SingleColumnValueExcludeFilter | 作用和SingleColumnValueFilter一样,不同的是作为对照的列是否会被保留下来 |
PrefixFilter | 匹配行前缀,前缀匹配的行会被保留下来,RowFilter可以实现这个功能,只是这个Filter用起来比较方便 |
PageFilter | 用户可以使用这个Filter对结果按行分页,这个Filter每次返回固定行数的匹配结果. |
KeyOnlyFilter | 只返回KeyValue的键,不返回值 |
FirstKeyOnlyFilter | 返回每行第一列 |
InclusiveStopFilter | 在scan中,使用setStartRow和setStopRow的时候是前闭后开的,可以使用这个Filter将stopRow包括进来 |
TimestampsFilter | 可以设置多个时间版本,返回符合版本的值 |
ColumnCountGetFilter | 可以使用这个过滤器来限制每行返回多少列 |
ColumnPaginationFilter | 与PageFilter相似,可以对一行的所有列进行分页 |
ColumnPrefixFilter | 列前缀过滤 |
RandomRowFilter | 随机行过滤器 |
部分示例代码:
import
scala.collection.JavaConverters._
val table = new HTable( hbaseHandle . conf , tableName )
val scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-1" ))
scan.setStopRow(Bytes.toBytes( "row-2" ))
println( "--------------------1.single column value filter -------------------------" )
/** 以列簇,列,值作为判断条件 , 过滤剩下匹配的行 */
val singleColumnValueFilter = new SingleColumnValueFilter(Bytes. toBytes( "cf0" ) ,
Bytes.toBytes( "qual3" ) , CompareFilter.CompareOp. EQUAL ,new BinaryComparator(Bytes. toBytes( "val2" )))
singleColumnValueFilter.setFilterIfMissing( true )
scan.setFilter(singleColumnValueFilter)
val singleColumnValueScanner = table.getScanner(scan)
for (res <- singleColumnValueScanner.iterator().asScala){
println(res)
}
singleColumnValueScanner.close()
println( "--------------------2.single column value exclude filter -------------------------" )
/** 以列簇,列,值作为判断条件 , 过滤剩下匹配的行,作为匹配条件的列不再保留 */
val singleColumnValueExcludeFilter = new SingleColumnValueExcludeFilter(Bytes. toBytes( "cf0" ) ,
Bytes.toBytes( "qual3" ) , CompareFilter.CompareOp. EQUAL ,new BinaryComparator(Bytes. toBytes( "val2" )))
singleColumnValueExcludeFilter.setFilterIfMissing( true )
scan.setFilter(singleColumnValueExcludeFilter)
val singleColumnValueExcludeScanner = table.getScanner(scan)
for (res <- singleColumnValueExcludeScanner.iterator().asScala){
println(res)
}
singleColumnValueExcludeScanner.close()
println( "--------------------3.prefix filter -------------------------" )
/** 匹配行健前缀 */
val prefixFilter = new PrefixFilter(Bytes. toBytes( "row-11" ))
scan.setFilter(prefixFilter)
val prefixScanner = table.getScanner(scan)
for (res <- prefixScanner.iterator().asScala){
println(res)
}
prefixScanner.close()
val table = new HTable( hbaseHandle . conf , tableName )
val scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-1" ))
scan.setStopRow(Bytes.toBytes( "row-2" ))
println( "--------------------1.single column value filter -------------------------" )
/** 以列簇,列,值作为判断条件 , 过滤剩下匹配的行 */
val singleColumnValueFilter = new SingleColumnValueFilter(Bytes. toBytes( "cf0" ) ,
Bytes.toBytes( "qual3" ) , CompareFilter.CompareOp. EQUAL ,new BinaryComparator(Bytes. toBytes( "val2" )))
singleColumnValueFilter.setFilterIfMissing( true )
scan.setFilter(singleColumnValueFilter)
val singleColumnValueScanner = table.getScanner(scan)
for (res <- singleColumnValueScanner.iterator().asScala){
println(res)
}
singleColumnValueScanner.close()
println( "--------------------2.single column value exclude filter -------------------------" )
/** 以列簇,列,值作为判断条件 , 过滤剩下匹配的行,作为匹配条件的列不再保留 */
val singleColumnValueExcludeFilter = new SingleColumnValueExcludeFilter(Bytes. toBytes( "cf0" ) ,
Bytes.toBytes( "qual3" ) , CompareFilter.CompareOp. EQUAL ,new BinaryComparator(Bytes. toBytes( "val2" )))
singleColumnValueExcludeFilter.setFilterIfMissing( true )
scan.setFilter(singleColumnValueExcludeFilter)
val singleColumnValueExcludeScanner = table.getScanner(scan)
for (res <- singleColumnValueExcludeScanner.iterator().asScala){
println(res)
}
singleColumnValueExcludeScanner.close()
println( "--------------------3.prefix filter -------------------------" )
/** 匹配行健前缀 */
val prefixFilter = new PrefixFilter(Bytes. toBytes( "row-11" ))
scan.setFilter(prefixFilter)
val prefixScanner = table.getScanner(scan)
for (res <- prefixScanner.iterator().asScala){
println(res)
}
prefixScanner.close()
附加过滤器(就是Filter的装饰类,给一个Filter附加而外的功能)
SkipFilter | 很多过滤器是默认保留空置的行的,这个过滤器装饰的Filter能够过滤空行 |
WhileMatchFilter | 这个过滤器和SkipFilter相似,但是在第一条过滤数据出现的时候,这个过滤器就会停止 |
FilterList类
这个类包装了一个List<Filter>,是Filter子类,能够用List<Filter>构造一个FilterList传给scan同时使用多个过滤器.
构造器:
FilterList(final List<Filter> rowFilters)
FilterList(final Filter... rowFilters)
FilterList(final Operator operator)
FilterList(final Operator operator, final List<Filter> rowFilters)
FilterList(final Operator operator, final Filter... rowFilters)
FilterList(final Filter... rowFilters)
FilterList(final Operator operator)
FilterList(final Operator operator, final List<Filter> rowFilters)
FilterList(final Operator operator, final Filter... rowFilters)
FilterList.Operator
MUST_PASS_ALL | 所有过滤器包含这个值,这个值才会被包含在结果中,相当于AND操作 |
MUST_PASS_ONE | 只要有一个过滤器包含这个值,那这个值就会包含在结果,相当于OR操作 |
测试代码:
import
scala.collection.JavaConverters._
val
table =
new
HTable(hbaseHandle.conf
,
tableName)
val scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-2" ))
scan.setStopRow(Bytes.toBytes( "row-3" ))
val familyFilter1 = new FamilyFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "cf1" )))
val qualifierFilter1 = new QualifierFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "qual1" )))
println( "------------------test MUST_PASS_ALL---------------------" )
val filterList1 = new FilterList(FilterList.Operator. MUST_PASS_ALL , familyFilter1 , qualifierFilter1)
scan.setFilter(filterList1)
val filterScanner1 = table.getScanner(scan)
for (res <- filterScanner1.iterator().asScala){
println(res)
}
filterScanner1.close()
println( "------------------test MUST_PASS_ONE---------------------" )
val filterList2 = new FilterList(FilterList.Operator. MUST_PASS_ONE , familyFilter1 , qualifierFilter1)
scan.setFilter(filterList2)
val filterScanner2 = table.getScanner(scan)
for (res <- filterScanner2.iterator().asScala){
println(res)
}
filterScanner2.close()
val scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-2" ))
scan.setStopRow(Bytes.toBytes( "row-3" ))
val familyFilter1 = new FamilyFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "cf1" )))
val qualifierFilter1 = new QualifierFilter(CompareFilter.CompareOp. EQUAL ,
new BinaryComparator(Bytes. toBytes( "qual1" )))
println( "------------------test MUST_PASS_ALL---------------------" )
val filterList1 = new FilterList(FilterList.Operator. MUST_PASS_ALL , familyFilter1 , qualifierFilter1)
scan.setFilter(filterList1)
val filterScanner1 = table.getScanner(scan)
for (res <- filterScanner1.iterator().asScala){
println(res)
}
filterScanner1.close()
println( "------------------test MUST_PASS_ONE---------------------" )
val filterList2 = new FilterList(FilterList.Operator. MUST_PASS_ONE , familyFilter1 , qualifierFilter1)
scan.setFilter(filterList2)
val filterScanner2 = table.getScanner(scan)
for (res <- filterScanner2.iterator().asScala){
println(res)
}
filterScanner2.close()
注意:
使用多个FilterList的组合,就能够使用类似在SQL中的多个AND和OR的查询了.
自定义Filter
自定义Filter需要继承Filter类,因为Filter在服务器端执行,所以使用自定义Filter的时候,注意还要部署到服务器上,在hbase_env.sh加入自定义Filter的jar包路径到HBASE_CLASSPATH.
以下使用到的Cell类是KeyValue的父类
Filter中定义了若干方法,方法的执行顺序如下:
public boolean filterRowKey(byte[] data, int offset, int length) 在这个方法中判断RowKey是否要过滤,返回true表示过滤这一行,返回false表示不过滤这一行. |
public ReturnCode filterKeyValue(final Cell v) 上一个方法执行后,确定一行不过滤,这时候就可以逐个扫描一行的KeyValue(Cell)了,返回一个枚举类型ReturnCode,ReturnCode的返回类型有如下几个:
|
public void filterRowCells(List<Cell> kvs) 一旦所有的行和列经过前面两个方法的检查后,这个方法会被调用.本方法让用户可以访问之前两个方法筛选出来的KeyValue实例.DependentColumnFilter过滤器使用这个方法来过滤与参考列不匹配的数据. |
public boolean filterRow() 以上所有方法执行完之后,filterRow会被执行.PageFilter使用当前方法来检查在一次迭代分页中返回的行数是否达到预期的页大小,如果达到页大小则返回True.默认返回值是false,此时结果中包含当前行. |
public void reset() 在迭代器中为每个新行重置过滤器. |
public boolean filterAllRemaining() 当这个返回True,可以用于结果整个扫描操作.可以使用这个方法减少扫描,优化结果. |