Hbase1.1.3的相关使用心得

最新推荐文章于 2020-05-26 22:53:57 发布

leeking888

最新推荐文章于 2020-05-26 22:53:57 发布

阅读量1.5k

点赞数

分类专栏： java

本文链接：https://blog.csdn.net/leeking888/article/details/51831964

版权

java 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

使用hbase最关心与主要的问题就是过滤条件、排序、分页问题，下面是一些常用到的相关方法。其中做的分页不是很理想，所以就一笔带过了。

1 相关组件

1.1 Phoenix

类似sql的工具，需要添加几个包，但是不能访问原有的相关表。只能使用sqlline进行创建表。

2 Hbase API高级特性-专用过滤器

Hbase提供的专用过滤器直接继承自FilterBase,其中一些过滤器只能做行筛选，因此只适合于扫描操作，对get（），这些过滤器限制的更苛刻：要么包含整行，要么什么都不包括。

2.1 单列值过滤器（SingleColumnValueFilter）

用一列的值决定是否一行数据被过滤。

public void singleColumnValueFilter() throws IOException{
SingleColumnValueFilter filter = newSingleColumnValueFilter(Bytes.toBytes("info"),
Bytes.toBytes("name"), CompareFilter.CompareOp.LESS_OR_EQUAL,
new BinaryComparator(Bytes.toBytes("ljj")));
filter.setFilterIfMissing(true); //所有不包含参考列的行都可以被过滤掉，默认这一行包含在结果中

Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for(Result res:scanner){
for(KeyValue kv: res.raw()){
System.out.println("KV: "+kv + ",value: "+Bytes.toString(kv.getValue()));
}
}
scanner.close();

Get get = new Get(Bytes.toBytes("3103"));
get.setFilter(filter);
Result result = table.get(get);
System.out.println("Result of get(): " + result);
for(KeyValue kv:result.raw()){
System.out.println("KV: "+kv + ",value:"+Bytes.toString(kv.getValue()));
}
}

2.2 单列排除过滤器（SingleColumnValueExcludeFilter）

该过滤器继承SingleColumnValueFilter，参考列不会包含在结果中。

2.3 前缀过滤器（PrefixFilter）

所用与前缀匹配的行都会被返回。扫描操作以字典序查找，当遇到比前缀大的行时，扫描结束。此过滤器对get（）方法作用不大。

public void prefixFilter() throws IOException{
Filter filter = new PrefixFilter(Bytes.toBytes("31"));

Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for(Result res:scanner){
for(KeyValue kv: res.raw()){
System.out.println("KV: "+kv + ",value: "+Bytes.toString(kv.getValue()));
}
}
scanner.close();
//此过滤器对get（）方法作用不大
}

2.4 分页过滤器（PageFilter）

作用：对结果按行分页。

public void pageFilter() throws IOException{
Filter filter = new PageFilter(4);
int totalRows = 0;
byte[] lastRow = null;
byte[] POSTFIX = new byte[0];
while(true){
Scan scan = new Scan();
scan.setFilter(filter);
if(lastRow != null){
byte[] startRow = Bytes.add(lastRow, POSTFIX);
System.out.println("start row: "+Bytes.toString(startRow));
scan.setStartRow(startRow);
}
ResultScanner scanner = table.getScanner(scan);
int localRows = 0;
Result result;
while((result = scanner.next()) != null){
System.out.println(localRows++ +": "+result);
totalRows++;
lastRow = result.getRow();
}
scanner.close();
if(localRows == 0)
break;
}
System.out.println("total rows: "+ totalRows);
}

2.5 行键过滤器（KeyOnlyFilter）

只需要将结果中KeyValue实例的键返回，不需要返回实际的数据。

2.6 首次行键过滤器（FirstKeyOnlyFilter）

只需要访问一行中的第一列。该过滤器常用在行数统计。

2.7 包含结束的过滤器（InclusiveStopFilter）

开始行被包含在结果中，但终止行被排斥在外，使用这个过滤器，也可以将结束行包含在结果中。

public void inclusiveStopFilter() throws IOException{
Filter filter = new InclusiveStopFilter(Bytes.toBytes("3104"));
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes("3101"));
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for(Result res: scanner){
System.out.println(res);
}
}

2.8 时间戳过滤器（TimestampsFilter）

需要在扫描结果中对版本进行细粒度控制。

一个版本是指一个列在一个特定时间的值。

public void timestampsFilter() throws IOException{
List<Long> ts = new ArrayList<Long>();
ts.add(new Long(5));
ts.add(new Long(10));
ts.add(new Long(15));
Filter filter = new TimestampsFilter(ts);

Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for(Result res:scanner){
System.out.println(res);
}
scanner.close();

Scan scan2 = new Scan();
scan2.setFilter(filter);
scan2.setTimeRange(8, 12);
ResultScanner scanner2 = table.getScanner(scan2);
for(Result res:scanner2)
System.out.println(res);
scanner2.close();
}

2.9 列计数过滤器（ColumnCountGetFilter）

限制每行最多取回多少列。设置ColumnCountGetFilter(int n),它不适合扫描操作，更适合get（）。

2.10 列分页过滤器（ColumnPaginationFilter）

可以对一行中所有列进行分页。

ColumnPaginationFilter（intlimit,int offset）,跳过所有偏移量小于offset的列，并包含之前所有偏移量在limit之前的列。

public void columnPaginationFilter() throws IOException{
Filter filter = new ColumnPaginationFilter(2,3);

Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for(Result res:scanner)
System.out.println(res);
scanner.close();
}

2.11 列前缀过滤器（ColumnPrefixFilter）

对列名称前缀进行匹配。

2.12 随机行过滤器（RandomRowFilter）

可以让结果中包含随机行。RandomRowFilter（float chance）

Chance在0~1之间。

3 过滤器2

HBase为筛选数据提供了一组过滤器，通过这个过滤器可以在HBase中的数据的多个维度（行，列，数据版本）上进行对数据的筛选操作，也就是说过滤器最终能够筛选的数据能够细化到具体的一个存储单元格上（由行键，列明，时间戳定位）。通常来说，通过行键，值来筛选数据的应用场景较多。

1. RowFilter：筛选出匹配的所有的行，对于这个过滤器的应用场景，是非常直观的：使用BinaryComparator可以筛选出具有某个行键的行，或者通过改变比较运算符（下面的例子中是CompareFilter.CompareOp.EQUAL）来筛选出符合某一条件的多条数据，以下就是筛选出行键为row1的一行数据：