1、Hbase 中的表结构
./hbase shell
我在shell 操作中创建了一个table ,name为 ‘test10' , 一个列簇 ’colfam1'
使用了describe 命令进行查看创建的table
在表中, 列簇名为 colfam1
布隆过滤器为 ROW
table的版本号为1
IN_MEMORY 是否内存存储 ,false
...这些表中的描述,以后在点击 table 的设计中描述。
hbase(main):007:0> create 'test10','colfam1'
0 row(s) in 0.7950 seconds
hbase(main):008:0> describe 'test10'
Table test10 is ENABLED
test10
COLUMN FAMILIES DESCRIPTION
{ NAME => 'colfam1',
BLOOMFILTER => 'ROW',
VERSIONS => '1',
IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE',
TTL => 'FOREVER',
COMPRESSION => 'NONE',
MIN_VERSIONS => '0',
BLOCKCACHE => 'true',
BLOCKSIZE => '65536',
REPLICATION_SCOPE => '0'
}
表的基础操作CRUD
1、list 命令
# 展示当前的所有表
hbase(main):035:0> list
TABLE
test10
test11
2 row(s) in 0.0110 seconds
=> ["test10", "test11"]
hbase(main):036:0>
2、put 插入数据向表 'test10'
'test10' 表示表
‘myrow-3' 表示这个行的行键,这个必须包含。
’colfam1:name' 表示 colfam1 表示表的列簇,name 表示这个列簇的name 列,hbase是面向列存储的。
‘fandong' 表示这个name 列的值。
hbase(main):036:0> put 'test10','myrow-3','colfam1:name','fandong'
0 row(s) in 0.0100 seconds
3、scan 表扫描命令
scan + tableName 表示扫描整个表,这个用法慎用,表如果很大,会拖累服务器性能。
hbase(main):037:0> scan 'test10'
ROW COLUMN+CELL
myrow-1 column=colfam1:q1, timestamp=1574693256009, value=value-1
myrow-2 column=colfam1:q1, timestamp=1574693267180, value=value-3
myrow-3 column=colfam1:name, timestamp=1574693404696, value=fandong
3 row(s) in 0.0170 seconds
scan 其他用法,添加过滤条件
如果不清楚怎么使用,可以使用 scan help 查看其用法
hbase(main):038:0> scan help
Some examples:
hbase> scan 'hbase:meta'
hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
hbase> scan 't1', {REVERSED => true}
hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
(QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
hbase> scan 't1', {FILTER =>
org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
For setting the Operation Attributes
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled. Examples:
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default. Example:
hbase> scan 't1', {RAW => true, VERSIONS => 10}
Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column. A user can define a FORMATTER by adding it to the column name in
the scan specification. The FORMATTER can be stipulated:
1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot
specify a FORMATTER for all columns of a column family.
根据帮助,可以看下hbase:meta 是什么?
hbase(main):040:0* scan 'hbase:meta'
ROW COLUMN+CELL
hbase:namespace,,1574685680023.d48536c776d34af327fe column=info:regioninfo, timestamp=1574685680472, value={ENCODED => d48536c776d34af327fe166ea632e369, NAME => 'hbase:namespace,,1574685680023.d48536c776
166ea632e369. d34af327fe166ea632e369.', STARTKEY => '', ENDKEY => ''}
hbase:namespace,,1574685680023.d48536c776d34af327fe column=info:seqnumDuringOpen, timestamp=1574686394947, value=\x00\x00\x00\x00\x00\x00\x00\x05
166ea632e369.
hbase:namespace,,1574685680023.d48536c776d34af327fe column=info:server, timestamp=1574686394947, value=hadoop01.fandong.com:46404
166ea632e369.
hbase:namespace,,1574685680023.d48536c776d34af327fe column=info:serverstartcode, timestamp=1574686394947, value=1574686385822
166ea632e369.
test10,,1574692694769.755a290796b773e6c059fcc23c4ce column=info:regioninfo, timestamp=1574692695121, value={ENCODED => 755a290796b773e6c059fcc23c4ce6c5, NAME => 'test10,,1574692694769.755a290796b773e6c05
6c5. 9fcc23c4ce6c5.', STARTKEY => '', ENDKEY => ''}
test10,,1574692694769.755a290796b773e6c059fcc23c4ce column=info:seqnumDuringOpen, timestamp=1574692695306, value=\x00\x00\x00\x00\x00\x00\x00\x01
6c5.
test10,,1574692694769.755a290796b773e6c059fcc23c4ce column=info:server, timestamp=1574692695306, value=hadoop01.fandong.com:46404
6c5.
test10,,1574692694769.755a290796b773e6c059fcc23c4ce column=info:serverstartcode, timestamp=1574692695306, value=1574686385822
6c5.
test11,,1574693322360.f12d1560d158107bf477a6f175e3e column=info:regioninfo, timestamp=1574693322981, value={ENCODED => f12d1560d158107bf477a6f175e3e312, NAME => 'test11,,1574693322360.f12d1560d158107bf47
312. 7a6f175e3e312.', STARTKEY => '', ENDKEY => ''}
test11,,1574693322360.f12d1560d158107bf477a6f175e3e column=info:seqnumDuringOpen, timestamp=1574693323058, value=\x00\x00\x00\x00\x00\x00\x00\x01
312.
test11,,1574693322360.f12d1560d158107bf477a6f175e3e column=info:server, timestamp=1574693323058, value=hadoop01.fandong.com:46404
312.
test11,,1574693322360.f12d1560d158107bf477a6f175e3e column=info:serverstartcode, timestamp=1574693323058, value=1574686385822
312.
3 row(s) in 0.0410 seconds
scan 命令范围查找获取myrow-1, myrow-2 的数据
1、指定扫描的列
2、设定扫描的开始(包括)
3、设定扫描的结束 (不包括)
这个是过滤指定的列,若果过滤多个列怎么做
hbase(main):044:0> scan 'test10',{COLUMNS=>'colfam1:q1',STARTROW =>'myrow-1', STOPROW =>'myrow-3'}
ROW COLUMN+CELL
myrow-1 column=colfam1:q1, timestamp=1574693256009, value=value-1
myrow-2 column=colfam1:q1, timestamp=1574693267180, value=value-3
2 row(s) in 0.0270 seconds
hbase(main):003:0> scan 'test10',{COLUMNS=> ['colfam1:q1','colfam1:name'],STARTROW =>'myrow-1', STOPROW =>'myrow-5'}
ROW COLUMN+CELL
myrow-1 column=colfam1:q1, timestamp=1574693256009, value=value-1
myrow-2 column=colfam1:q1, timestamp=1574693267180, value=value-3
myrow-3 column=colfam1:name, timestamp=1574693404696, value=fandong
myrow-4 column=colfam1:name, timestamp=1574694474272, value=fandong
myrow-4 column=colfam1:q1, timestamp=1574694503077, value=value-4
4 row(s) in 0.0330 seconds
4、修改table 'test10' 增加一列 colfam1: name
执行一样的put操作就可以。
hbase(main):046:0> put 'test10','myrow-4','colfam1:name','fandong'
0 row(s) in 0.0150 seconds
hbase(main):047:0> scan 'test10'
ROW COLUMN+CELL
myrow-1 column=colfam1:q1, timestamp=1574693256009, value=value-1
myrow-2 column=colfam1:q1, timestamp=1574693267180, value=value-3
myrow-3 column=colfam1:name, timestamp=1574693404696, value=fandong
myrow-4 column=colfam1:name, timestamp=1574694474272, value=fandong
4 row(s) in 0.0190 seconds
5 、get + tablename + rowkey
hbase(main):004:0> get 'test10','myrow-1'
COLUMN CELL
colfam1:q1 timestamp=1574693256009, value=value-1
1 row(s) in 0.0160 seconds
。。。继续