一、创建表格
1、使用命令行来连接正在运行的Hbase实例,命令:
hbase shell
2、在使用过滤器之前先创建这样的表结构:
3、具体执行命令如下:
创建表:
create 'student','stuInfo','grades'
插入第一个逻辑行的数据:
put 'student', '001', 'stuInfo:name','alice'
put 'student', '001', 'stuInfo:age','18'
put 'student', '001', 'stuInfo:sex','female'
put 'student', '001', 'grades:english','80'
put 'student', '001', 'grades:math','90'
同样插入其他两行数据。
结果:
hbase(main):028:0> scan 'student'
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
001 column=stuinfo:sex, timestamp=1586248315971, value=female
002 column=grades:bigdata, timestamp=1586248539502, value=88
002 column=grades:english, timestamp=1586248508242, value=85
002 column=grades:math, timestamp=1586248524101, value=78
002 column=stuinfo:class, timestamp=1586248476375, value=1802
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
002 column=stuinfo:sex, timestamp=1586248462931, value=male
003 column=grades:english, timestamp=1586248639980, value=90
003 column=grades:math, timestamp=1586248651102, value=80
003 column=stuinfo:age, timestamp=1586248586426, value=19
003 column=stuinfo:class, timestamp=1586248611878, value=1803
003 column=stuinfo:name, timestamp=1586248574358, value=harry
003 column=stuinfo:sex, timestamp=1586248601271, value=male
3 row(s) in 0.1320 seconds
二、过滤器操作
1.行键过滤器
包括RowFilter、PrefixFilter、KeyOnlyFilter、FirstKeyOnlyFilter等
格式:scan ‘表名’,{Filter =>“过滤器( 比较运算符,’比较器’)”}
(1)RowFilter:针对行键进行过滤
例1:显示行键前缀为0开头的键值对;
scan 'student',{FILTER=>"RowFilter(=,'substring:001')"}
结果如下:
hbase(main):031:0> scan 'student',{FILTER=>"RowFilter(=,'substring:001')"}
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
001 column=stuinfo:sex, timestamp=1586248315971, value=female
1 row(s) in 0.2700 seconds
例2:显示行键字节顺序大于002的键值对;
scan 'student',FILTER=>"RowFilter(>,'binary:002')"
结果;
hbase(main):032:0> scan 'student',{FILTER=>"RowFilter(>,'binary:001')"}
ROW COLUMN+CELL
002 column=grades:bigdata, timestamp=1586248539502, value=88
002 column=grades:english, timestamp=1586248508242, value=85
002 column=grades:math, timestamp=1586248524101, value=78
002 column=stuinfo:class, timestamp=1586248476375, value=1802
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
002 column=stuinfo:sex, timestamp=1586248462931, value=male
003 column=grades:english, timestamp=1586248639980, value=90
003 column=grades:math, timestamp=1586248651102, value=80
003 column=stuinfo:age, timestamp=1586248586426, value=19
003 column=stuinfo:class, timestamp=1586248611878, value=1803
003 column=stuinfo:name, timestamp=1586248574358, value=harry
003 column=stuinfo:sex, timestamp=1586248601271, value=male
2 row(s) in 0.2130 seconds
(2)PrefixFilter:行键前缀过滤器
例3:扫描前缀为001的行键
scan 'student',FILTER=>"PrefixFilter('001')"
结果;
hbase(main):033:0> scan 'student',FILTER=>"PrefixFilter('001')"
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
001 column=stuinfo:sex, timestamp=1586248315971, value=female
1 row(s) in 0.1300 seconds
(3)FirstKeyOnlyFilter:扫描全表,显示每个逻辑行的第一个键值对
例4:
scan 'student',FILTER=>"FirstKeyOnlyFilter()"
结果;
hbase(main):034:0> scan 'student',FILTER=>"FirstKeyOnlyFilter()"
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
002 column=grades:bigdata, timestamp=1586248539502, value=88
003 column=grades:english, timestamp=1586248639980, value=90
3 row(s) in 0.0780 seconds
(4)InclusiveStopFilter:替代ENDROW返回终止条件行;
例5:扫描显示行键001到002范围内的键值对
scan 'student', {STARTROW =>'001',FILTER =>"InclusiveStopFilter('002')"}
结果;
hbase(main):037:0> scan 'student',{STARTROW=>'001',FILTER=>"InclusiveStopFilter('002')"}
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
001 column=stuinfo:sex, timestamp=1586248315971, value=female
002 column=grades:bigdata, timestamp=1586248539502, value=88
002 column=grades:english, timestamp=1586248508242, value=85
002 column=grades:math, timestamp=1586248524101, value=78
002 column=stuinfo:class, timestamp=1586248476375, value=1802
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
002 column=stuinfo:sex, timestamp=1586248462931, value=male
2 row(s) in 0.0500 seconds
此条命令等同于:
scan 'student', {STARTROW =>'001',ENDROW => '003'}
结果;
hbase(main):038:0> scan 'student',{STARTROW=>'001',ENDROW=>'003'}
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
001 column=stuinfo:sex, timestamp=1586248315971, value=female
002 column=grades:bigdata, timestamp=1586248539502, value=88
002 column=grades:english, timestamp=1586248508242, value=85
002 column=grades:math, timestamp=1586248524101, value=78
002 column=stuinfo:class, timestamp=1586248476375, value=1802
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
002 column=stuinfo:sex, timestamp=1586248462931, value=male
2 row(s) in 0.0540 seconds
(5)KeyOnlyFilter ,只对单元格的键过滤和显示,不显示值
scan 'student',FILTER=>"KeyOnlyFilter()"
结果:
hbase(main):035:0> scan 'student',FILTER=>"KeyOnlyFilter()"
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=
001 column=grades:math, timestamp=1586248399580, value=
001 column=stuinfo:age, timestamp=1586248291518, value=
001 column=stuinfo:name, timestamp=1586248245854, value=
001 column=stuinfo:sex, timestamp=1586248315971, value=
002 column=grades:bigdata, timestamp=1586248539502, value=
002 column=grades:english, timestamp=1586248508242, value=
002 column=grades:math, timestamp=1586248524101, value=
002 column=stuinfo:class, timestamp=1586248476375, value=
002 column=stuinfo:name, timestamp=1586248440973, value=
002 column=stuinfo:sex, timestamp=1586248462931, value=
003 column=grades:english, timestamp=1586248639980, value=
003 column=grades:math, timestamp=1586248651102, value=
003 column=stuinfo:age, timestamp=1586248586426, value=
003 column=stuinfo:class, timestamp=1586248611878, value=
003 column=stuinfo:name, timestamp=1586248574358, value=
003 column=stuinfo:sex, timestamp=1586248601271, value=
3 row(s) in 0.1940 seconds
2.列族与列过滤器
(1)FamilyFilter:针对列族进行比较和过滤。
例1:显示列族前缀为stu开头的键值对;
scan 'student',FILTER=>"FamilyFilter(=,'substring:stu’)”
scan 'student',FILTER=>"FamilyFilter(=,‘binary:stu’)”
结果;
hbase(main):042:0* scan 'student',{FILTER=>"FamilyFilter(=,'binary:stu')"}
ROW COLUMN+CELL
0 row(s) in 0.0580 seconds
(2)QualifierFilter:列标识过滤器。
例2:显示列名为name的记录;
scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"
结果;
hbase(main):001:0> scan 'student',{FILTER=>"QualifierFilter(=,'substring:name')"}
ROW COLUMN+CELL
001 column=stuinfo:name, timestamp=1586248245854, value=alice
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
003 column=stuinfo:name, timestamp=1586248574358, value=harry
3 row(s) in 0.7890 seconds
(3)ColumnPrefixFilter:对列名前缀进行过滤。
例2:显示列名为name的记录;
scan 'student',FILTER=>"ColumnPrefixFilter('name’)”
结果;
hbase(main):002:0> scan 'student',FILTER=>"ColumnPrefixFilter('name')"
ROW COLUMN+CELL
001 column=stuinfo:name, timestamp=1586248245854, value=alice
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
003 column=stuinfo:name, timestamp=1586248574358, value=harry
3 row(s) in 0.2490 seconds
等价于
scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"
结果;
hbase(main):004:0> scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"
ROW COLUMN+CELL
001 column=stuinfo:name, timestamp=1586248245854, value=alice
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
003 column=stuinfo:name, timestamp=1586248574358, value=harry
3 row(s) in 0.1020 seconds
(4)MultipleColumnPrefixFilter:可以指定多个前缀
例3:显示列名为name和age的记录;
scan 'student',FILTER=>"MultipleColumnPrefixFilter('name','age')"
结果;
hbase(main):005:0> scan 'student',FILTER=>"MultipleColumnPrefixFilter('name','age')"
ROW COLUMN+CELL
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
002 column=stuinfo:name, timestamp=1586248440973, value=nancy
003 column=stuinfo:age, timestamp=1586248586426, value=19
003 column=stuinfo:name, timestamp=1586248574358, value=harry
3 row(s) in 0.0870 seconds
(5)ColumnRangeFilter :设置范围按字典序对列名进行过滤;
scan 'student',FILTER=>"ColumnRangeFilter('bi',true,'na',true)"
结果;
hbase(main):001:0> scan 'student',FILTER=>"ColumnRangeFilter('bi',true,'na',true)"
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
002 column=grades:bigdata, timestamp=1586248539502, value=88
002 column=grades:english, timestamp=1586248508242, value=85
002 column=grades:math, timestamp=1586248524101, value=78
002 column=stuinfo:class, timestamp=1586248476375, value=1802
003 column=grades:english, timestamp=1586248639980, value=90
003 column=grades:math, timestamp=1586248651102, value=80
003 column=stuinfo:class, timestamp=1586248611878, value=1803
3 row(s) in 0.8940 seconds
3.值过滤器
(1)ValueFilter :值过滤器。
例1:查询值等于19的所有键值对
scan 'student',FILTER=>"ValueFilter(=,'binary:19') "
scan 'student',FILTER=>"ValueFilter(=,'substring:19')"
结果:
hbase(main):002:0> scan 'student',FILTER=>"ValueFilter(=,'binary:19')"
ROW COLUMN+CELL
003 column=stuinfo:age, timestamp=1586248586426, value=19
1 row(s) in 0.1480 seconds
hbase(main):003:0> scan 'student',FILTER=>"ValueFilter(=,'substring:19')"
ROW COLUMN+CELL
003 column=stuinfo:age, timestamp=1586248586426, value=19
1 row(s) in 0.0470 seconds
(2)SingleColumnValueFilter :在指定的列族和列中进行值过滤器。
例2:查询stuinfo列族age列中值等于19的所有键值对
scan 'student',{COLUMN=>'stuinfo:age',FILTER=>"SingleColumnValueFilter('stuinfo','age',=,'binary:19')"}
结果;
hbase(main):007:0> scan 'student',{COLUMN=>'stuinfo:age',FILTER=>"SingleColumnValueFilter('stuinfo','age',=,'binary:19')"}
ROW COLUMN+CELL
003 column=stuinfo:age, timestamp=1586248586426, value=19
1 row(s) in 0.1500 seconds
等同于;
scan 'student',FILTER=>"SingleColumnValueFilter('stuinfo','name',=,'binary:alice')"
结果;
hbase(main):005:0> scan 'student',FILTER=>"SingleColumnValueFilter('stuinfo','name',=,'binary:alice')"
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
001 column=stuinfo:sex, timestamp=1586248315971, value=female
1 row(s) in 0.0490 seconds
4.其他过滤器
(1)ColumnCountGetFilter :限制每个逻辑行返回的键值对数
例1:返回行键为001的前3个键值对
get 'student','001',FILTER=>"ColumnCountGetFilter(3)"
结果;
hbase(main):001:0> get 'student','002',FILTER=>"ColumnCountGetFilter(2)"
COLUMN CELL
grades:bigdata timestamp=1586248539502, value=88
grades:english timestamp=1586248508242, value=85
1 row(s) in 0.7400 seconds
(2)PageFilter :基于行的分页过滤器,设置返回行数。
例2:显示一行
scan 'student',FILTER=>"PageFilter(1)"
结果;
hbase(main):002:0> scan 'student',FILTER=>"PageFilter(1)"
ROW COLUMN+CELL
001 column=grades:english, timestamp=1586248371684, value=80
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
001 column=stuinfo:name, timestamp=1586248245854, value=alice
001 column=stuinfo:sex, timestamp=1586248315971, value=female
1 row(s) in 0.2050 seconds
(3)ColumnPaginationFilter :基于列的进行分页过滤器,需要设置偏移量与返回数量 。
例3:显示每行第1列之后的2个键值对
scan 'student',FILTER=>"ColumnPaginationFilter(2,1)"
结果;
hbase(main):004:0> scan 'student',FILTER=>"ColumnPaginationFilter(2,1)"
ROW COLUMN+CELL
001 column=grades:math, timestamp=1586248399580, value=90
001 column=stuinfo:age, timestamp=1586248291518, value=18
002 column=grades:english, timestamp=1586248508242, value=85
002 column=grades:math, timestamp=1586248524101, value=78
003 column=grades:math, timestamp=1586248651102, value=80
003 column=stuinfo:age, timestamp=1586248586426, value=19
3 row(s) in 0.1980 seconds
感谢大家的支持!!我会继续努力的。