前言
在上章, 我们尝试在本地安装了HBase
. 本章, 我们主要了解下HBase Shell
的基本使用.
基础知识
HBase是什么
在使用HBase
之前, 我们先了解下HBase
的几项基本知识.
- 与
MySQL
不同,HBase
是面向列的数据库. 通常会将某些列存储在不同的文件内. 比如<id,name,age,sex>
有时会被拆分成<id,name>``<id,age,sex>
分布在不同的文件内.
为什么要按列存储?
个人认为有如下几点:
- 文件过大时, 方便文件拆分.
- 方便添加新列, 因为数据的格式又可能多样化, 传统的
RMDB
无法满足需求. - 列投影较为方便, 切查询时只需要去需要的列文件内读取, 提高加载速度.
- 几张图看懂列式存储(转)
HBase内数据基本类型
在HBase
内, 数据按照<行键><列族1: 列1-1, 列1-2><列族2: 列2-1, 列2-2>
这样的类型进行存储的. 且, 其一, 其中行键的排列顺序是按照字典顺序排序的, 这点对于搜索非常重要. 其二, 同一行键的相同列族中列的值, 是可能变化的, 并且按照时间戳进行排序的.(当然, 有些数据在合并的时候, 会被删除.)
其中, 相应的模块知识, 如下所示:
- Row Key
Row Key
, 行键. 是用来检索记录的主键. 访问HBase Table
中的行, 主要有三种方式.单个row key进行访问
/通过 row key 正则匹配
/全表扫描
.Row Key
的值可以是任意字符串(最大长度为64KB, 实际使用经常为10-100byte) .其中, 行键的排列顺序是按照字典顺序排序的, 这点对于搜索非常重要. (PS: 字典顺序:1 10 12 6 7 9
中,11
排在9
之前.) - Columns Family
Columns Family
列族.HBase
内的每个列, 都属于一个列族. 列族是Schema
一部分(即,表设计), 而列不是(列可以在插入数据时, 动态添加). 列族是需要在使用之前进行提前定义的. 列名都以列族为前缀, 如course:namecourse:age. - Cell
Cell
, 数据单元. 有<row key, Columns Family, Column ,version>
唯一确定的单元.Cell
内的数据是没有类型的, 全部都是字节码进行存储的. - Time Stamp
每个Cell
存储一个数据的多个版本. 版本号, 通过时间戳进行索引(时间精确到毫秒). 时间戳类型为64位整数类型. 时间戳按照时间类型倒叙排序.
回收版本机制:<保存数据的最后n个版本>
/<保存最近一段时间的版本(如最近七天)>
.
HBase Shell相关命令
hbase | shell命令 | 描述 |
---|---|---|
create | 创建表 | < create ‘表名’, ‘列族名’, ‘列族名2’,‘列族名N’ > |
list | 查看所有表 | < list all > |
describe | 显示表详细信息 | < describe ‘表名’ > |
exists | 判断表是否存在 | < exists ‘表名’ > |
enable | 使表有效 | < enable ‘表名’ > |
disable | 使表无效 | < disable ‘表名’ > |
is_enabled | 判断是否启动表 | < is_enabled ‘表名’ > |
is_disabled | 判断是否禁用表 | < is_disabled ‘表名’ > |
count | 统计表中行的数量 | < count ‘表名’ > |
put | 添加记录 | < put ‘表名’, ‘row key’, ‘列族1 : 列’, ‘值’ > |
get | 获取记录(row key下所有) | < get ‘表名’, ‘row key’> |
get | 获取记录(某个列族) | < get ‘表名’, ‘row key’, ‘列族’> |
get | 获取记录(某个列) | < get ‘表名’,‘row key’,‘列族:列’ > |
delete | 删除记录 | < delete ‘表名’, ‘row key’, ‘列族:列’ > |
deleteall | 删除一行 | < deleteall ‘表名’,‘row key’> |
drop | 删除表 | <disable ‘表名’> < drop ‘表名’> |
alter | 修改列族(column family) | |
incr | 增加指定表,行或列的值 | |
truncate | 清空表 | 逻辑为先删除后创建 <truncate ‘表明’> |
scan | 通过对表的扫描来获取对用的值 | <scan ‘表名’> |
tools | 列出hbase所支持的工具 | |
status | 返回hbase集群的状态信息 | |
version | 返回hbase版本信息 | |
exit | 退出hbase shell | |
shutdown | 关闭hbase集群(与exit不同) |
详细操作
- 登陆
hbase shell
localhost:current Sean$ hbase shell
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
2019-04-09 19:12:43,867 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/Sean/Software/HBase/hbase-1.2.11/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/Sean/Software/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.11, rca53d58f5b7abde0c189c9f78baf4246bddffac3, Fri Feb 15 18:12:16 CST 2019
- 帮助
help
hbase(main):001:0> help
HBase Shell, version 1.2.11, rca53d58f5b7abde0c189c9f78baf4246bddffac3, Fri Feb 15 18:12:16 CST 2019
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quotas, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission
Group name: procedures
Commands: abort_procedure, list_procedures
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
- 查看所有表
list
hbase(main):002:0> list
TABLE
0 row(s) in 0.2370 seconds
=> []
- 创建表
create
hbase(main):003:0> create 'user', 'info1','info2'
0 row(s) in 1.6250 seconds
=> Hbase::Table - user
hbase(main):004:0> list
TABLE
user
1 row(s) in 0.0190 seconds
=> ["user"]
- 描述表信息
describe
hbase(main):005:0> describe 'user'
Table user is ENABLED
user
COLUMN FAMILIES DESCRIPTION
{NAME => 'info1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'info2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0390 seconds
exists
表是否存在
hbase(main):006:0> exists 'user'
Table user does exist
0 row(s) in 0.0130 seconds
drop
删除表 - 失败
# 删除失败
hbase(main):007:0> drop 'user'
ERROR: Table user is enabled. Disable it first.
Here is some help for this command:
Drop the named table. Table must first be disabled:
hbase> drop 't1'
hbase> drop 'ns1:t1'
drop
删除表 -
hbase(main):007:0> drop 'user'
ERROR: Table user is enabled. Disable it first.
Here is some help for this command:
Drop the named table. Table must first be disabled:
hbase> drop 't1'
hbase> drop 'ns1:t1'
hbase(main):008:0> disable 'user'
0 row(s) in 2.2900 seconds
hbase(main):009:0> is_enabled 'user'
false
0 row(s) in 0.0070 seconds
hbase(main):010:0> is_disabled 'user'
true
0 row(s) in 0.0240 seconds
put
/get
插入/获取
hbase(main):015:0> put 'user','1234','info1:name','zhangsan'
0 row(s) in 0.0620 seconds
hbase(main):016:0> scan 'user'
ROW COLUMN+CELL
1234 column=info1:name, timestamp=1554808804837, value=zhangsan
1 row(s) in 0.0260 seconds
hbase(main):017:0> put 'user','1234','info1:name','zhangsan1'
0 row(s) in 0.0100 seconds
hbase(main):018:0> scan 'user'
ROW COLUMN+CELL
1234 column=info1:name, timestamp=1554808822676, value=zhangsan1
1 row(s) in 0.0080 seconds
hbase(main):019:0> put 'user','1234','info2:name','zhangsan1'
0 row(s) in 0.0090 seconds
hbase(main):020:0> put 'user','1234','info1:age','23'
0 row(s) in 0.1280 seconds
hbase(main):023:0> get 'user','1234'
COLUMN CELL
info1:age timestamp=1554808862052, value=23
info1:name timestamp=1554808822676, value=zhangsan1
info2:name timestamp=1554808839655, value=zhangsan1
3 row(s) in 0.0280 seconds
hbase(main):025:0> get 'user','1234','info1'
COLUMN CELL
info1:age timestamp=1554808862052, value=23
info1:name timestamp=1554808822676, value=zhangsan1
2 row(s) in 0.0060 seconds
hbase(main):026:0> get 'user','1234','info1:name'
COLUMN CELL
info1:name timestamp=1554808822676, value=zhangsan1
1 row(s) in 0.0050 seconds
scan
扫描表
hbase(main):021:0> scan 'user'
ROW COLUMN+CELL
1234 column=info1:age, timestamp=1554808862052, value=23
1234 column=info1:name, timestamp=1554808822676, value=zhangsan1
1234 column=info2:name, timestamp=1554808839655, value=zhangsan1
1 row(s) in 0.0300 seconds
count
获取个数
hbase(main):024:0> count 'user'
1 row(s) in 0.0210 seconds
=> 1
- 删除某列
hbase(main):027:0> delete 'user','1234','info2:name'
0 row(s) in 0.0320 seconds
hbase(main):028:0> scan 'user'
ROW COLUMN+CELL
1234 column=info1:age, timestamp=1554808862052, value=23
1234 column=info1:name, timestamp=1554808822676, value=zhangsan1
1 row(s) in 0.0140 seconds
Reference
[1]. 官方文档中文版
[2]. HBase Shell命令大全
[3]. HBase数据模型介绍
[4]. HBase系列(一):HBase表结构及数据模型的理解