安装HBase
下载HBase
Hbase 下载地址:Apache Downloads
Hbase 配置
ZBMAC-2f32839f6:server yangyanping$ tar zxvf hbase-2.3.5-bin.tar.gz
ZBMAC-2f32839f6:server yangyanping$ cd hbase-2.3.5
ZBMAC-2f32839f6:hbase-2.3.5 yangyanping$ ls
CHANGES.md LICENSE.txt README.txt bin docs lib
LEGAL NOTICE.txt RELEASENOTES.md conf hbase-webapps
ZBMAC-2f32839f6:hbase-2.3.5 yangyanping$ bin/start-hbase.sh
+======================================================================+
| Error: JAVA_HOME is not set |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site |
| > http://www.oracle.com/technetwork/java/javase/downloads |
| |
| HBase requires Java 1.8 or later. |
+======================================================================+
ZBMAC-2f32839f6:hbase-2.3.5 yangyanping$ /usr/libexec/java_home -V
Matching Java Virtual Machines (1):
1.8.0_181, x86_64: "Java SE 8" /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home
/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home
修改配置文件/bin/hbase-config.sh 文件,导入JAVA_HOME配置
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home
HBase 启动
ZBMAC-2f32839f6:hbase-2.3.5 yangyanping$ bin/start-hbase.sh
running master, logging to /Users/yangyanping/Downloads/server/hbase-2.3.5/bin/../logs/hbase-yangyanping-master-ZBMAC-2f32839f6.out
yangyanping@ZBMac-WP2HJYDWY ~ % jps
20816 HMaster
3200 Launcher
2596
24652 Jps
hbase 目录结构
yangyanping@ZBMac-WP2HJYDWY hbase % tree
.
├── CHANGES.md
├── LEGAL
├── LICENSE.txt
├── NOTICE.txt
├── README.txt
├── RELEASENOTES.md
├── bin
│ ├── chaos-daemon.sh
│ ├── considerAsDead.sh
│ ├── draining_servers.rb
│ ├── get-active-master.rb
│ ├── graceful_stop.sh
│ ├── hbase
│ ├── hbase-cleanup.sh
│ ├── hbase-common.sh
│ ├── hbase-config.cmd
│ ├── hbase-config.sh
│ ├── hbase-daemon.sh
│ ├── hbase-daemons.sh
│ ├── hbase-jruby
│ ├── hbase.cmd
│ ├── hirb.rb
│ ├── local-master-backup.sh
│ ├── local-regionservers.sh
│ ├── master-backup.sh
│ ├── region_mover.rb
│ ├── region_status.rb
│ ├── regionservers.sh
│ ├── replication
│ │ └── copy_tables_desc.rb
│ ├── rolling-restart.sh
│ ├── shutdown_regionserver.rb
│ ├── start-hbase.cmd
│ ├── start-hbase.sh
│ ├── stop-hbase.cmd
│ ├── stop-hbase.sh
│ ├── test
│ │ └── process_based_cluster.sh
│ └── zookeepers.sh
├── conf
│ ├── hadoop-metrics2-hbase.properties
│ ├── hbase-env.cmd
│ ├── hbase-env.sh
│ ├── hbase-policy.xml
│ ├── hbase-site.xml
│ ├── log4j-hbtop.properties
│ ├── log4j.properties
│ └── regionservers
├── docs
├── lib
├── logs
│ ├── SecurityAuth.audit
│ ├── hbase-yangyanping-master-ZBMac-WP2HJYDWY.log
│ └── hbase-yangyanping-master-ZBMac-WP2HJYDWY.out
└── tmp
├── hbase
│ ├── MasterData
│ │ ├── WALs
│ │ │ └── 192.168.1.102,16000,1654267930582
│ │ │ └── 192.168.1.102%2C16000%2C1654267930582.1654267933426
│ │ ├── archive
│ │ ├── data
│ │ │ └── master
│ │ │ └── store
│ │ │ └── 1595e783b53d99cd5eef43b6debb2682
│ │ │ ├── proc
│ │ │ └── recovered.edits
│ │ │ └── 1.seqid
│ │ └── oldWALs
│ ├── WALs
│ │ └── 192.168.1.102,16020,1654267932601
│ │ ├── 192.168.1.102%2C16020%2C1654267932601.1654267937735
│ │ └── 192.168.1.102%2C16020%2C1654267932601.meta.1654267934970.meta
│ ├── archive
│ ├── corrupt
│ ├── data
│ │ ├── default
│ │ └── hbase
│ │ ├── meta
│ │ │ └── 1588230740
│ │ │ ├── info
│ │ │ ├── recovered.edits
│ │ │ │ └── 1.seqid
│ │ │ ├── rep_barrier
│ │ │ └── table
│ │ └── namespace
│ │ └── 356d99eede0feaa0e99e593028f609a0
│ │ ├── info
│ │ └── recovered.edits
│ │ └── 1.seqid
│ ├── hbase.id
│ ├── hbase.version
│ ├── mobdir
│ ├── oldWALs
│ └── staging
└── zookeeper
└── zookeeper_0
└── version-2
├── log.1
└── snapshot.0
332 directories, 2394 files
hbase shell
ZBMAC-2f32839f6:bin yangyanping$ ./hbase shell
2021-06-07 18:09:54,843 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.3.5, rfd3fdc08d1cd43eb3432a1a70d31c3aece6ecabe, Thu Mar 25 20:50:15 UTC 2021
Took 0.0010 seconds
hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
Took 0.5337 seconds
WebUI
访问地址:http://localhost:16010/master-status
Shell 命令
创建表
语法:create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
HBase 使用 create 命令来创建表,创建表时需要指明表名和列族名,如创建上表中的学生信息表 Student 的命令如下:
行键 | 列族 info | 列族 course | 时间戳 | ||||
---|---|---|---|---|---|---|---|
name | gender | class | chines | english | math | ||
0001 | Tom | man | 80 | 90 | 85 | T2 | |
0002 | Amy | 01 | 95 | 89 | T1 | ||
0003 | Allen | man | 02 | 90 | 88 | T1 |
hbase:001:0> create 'student','info','course'
Created table student
Took 1.6881 seconds
=> Hbase::Table - student
这条命令建了名为 student 的表,表中包含两个列族,分别为 info 和 course。 注意在 HBase Shell 语法中,所有字符串参数都必须包含在单引号中,且区分大小写,如 student 和 Student 代表两个不同的表。
exists
创建表结构以后,可以使用 exists 命令查看此表是否存在,或使用 list 命令查看数据库中所有表,如下图所示。
命令格式:exists ‘表名’
hbase:002:0> exists 'student'
Table student does exist
Took 0.2100 seconds
=> true
命令格式:list
hbase:005:0> list
TABLE
student
1 row(s)
Took 0.0112 seconds
=> ["student"]
查看表的基本信息
使用 describe 命令查看指定表的列族信息,如下图所示
命令格式:describe ‘表名’
hbase:004:0> describe 'student'
Table student is ENABLED
student
COLUMN FAMILIES DESCRIPTION
{NAME => 'course', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NON
E', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'info', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE'
, COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s)
Quota is disabled
Took 0.1486 seconds
修改列族
首先修改列族的参数信息,如修改列族的版本。例如上面的 student 表,假设它的列族 info 的 VERSIONS 为 1,但是实际可能需要保存最近的 3 个版本,可使用以下命令完成:
命令格式:alter ‘表名’ ,{NAME => <family>, VERSIONS => <VERSIONS>}
hbase:006:0> alter 'student',{NAME =>'info',VERSIONS => 3}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 2.0905 seconds
修改多个列族的参数,形式与 create 命令类似。
这里要注意,修改已存有数据的列族属性时,HBase 需要对列族里所有的数据进行修改,如果数据量很大,则修改可能要占很长时间。
增加列族
如果需要在 student 表中新增一个列族 location,使用以下命令:
hbase:007:0> alter 'student','location'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 1.8666 seconds
删除列族
如果要移除或者删除已有的列族,以下两条命令均可完成:
alter 'user','delete' => 'location'
alter 'user', { NAME => 'location', METHOD => 'delete' }
hbase:008:0> alter 'student','delete' => 'location'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 1.8798 seconds
另外,HBase 表至少要包含一个列族,因此当表中只有一个列族时,无法将其删除。
删除表
HBase 使用 drop 命令删除表,但是在删除表之前需要先使用 disable 命令禁用表。例如有一个 student 表,删除该表的完整流程如下:
hbase:009:0> disable 'student'
Took 0.7296 seconds
hbase:010:0> drop 'student'
Took 0.4195 seconds
hbase:011:0> exists 'student'
Table student does not exist
Took 0.0137 seconds
=> false
使用 disable 禁用表以后,可以使用 is_disable 查看表是否禁用成功。另外,如果只是想清空表中的所有数据,使用 truncate 命令即可,此命令相当于完成禁用表、删除表,并按原结构重新建立表操作:
truncate '表名'
hbase:013:0> truncate 'student'
Truncating 'student' table (it may take a while):
Disabling table...
Truncating table...
Took 1.6392 seconds
插入数据
HBase 使用 put 命令向数据表中插入数据,put 向表中增加一个新行数据,或覆盖指定行的数据。例如有以上结构的数据表,向其中插入一条数据的写法为:
hbase:014:0> put 'student','0001','info:name','Tom',1
在上述命令中:
- 第一个参数student为表名;
- 第二个参数
0001
为行键的名称,为字符串类型; - 第三个参数i
nfo:name
为列族和列的名称,中间用冒号隔开。列族名必须是已经创建的,否则 HBase 会报错;列名是临时定义的,因此列族里的列是可以随意扩展的; - 第四个参数
Tom
为单元格的值。在 HBase 里,所有数据都是字符串的形式; - 最后一个参数
1
为时间戳,如果不设置时间戳,则系统会自动插入当前时间为时间戳。
student表有两个列族,info和course。其中info下有三个列name 和 gender,class。course 下有三个列chines,english和math。注意,向表中添加数据,在向HBase的表中添加数据的时候,只能一列一列的添加,不能同时添加多列
hbase:002:0> put 'student','0001','info:name','Tom',1
Took 0.0814 seconds
hbase:003:0> put 'student','0001','info:gender','man',1
Took 0.0308 seconds
hbase:004:0> put 'student','0001','info:class','01'
Took 0.0262 seconds
hbase:005:0> put 'student','0001','couse:chines',80
hbase:006:0> put 'student','0001','course:chines',80
Took 0.0151 seconds
hbase:007:0> put 'student','0002','info:name','杨艳平'
Took 0.0195 seconds
hbase:008:0> put 'student','0002','info:gender','man'
Took 0.0660 seconds
hbase:009:0> put 'student','0002','info:class',02
Took 0.0085 seconds
hbase:010:0> put 'student','0002','course:math',100
Took 0.0149 seconds
这样表结构就起来了,其实比较自由,列族里边可以自由添加子列很方便。如果列族下没有子列,加不加冒号都是可以的。
如果在添加数据的时候,需要手动的设置时间戳,则在put命令的最后加上相应的时间戳,时间戳是long类型的,所以不需要加引号 put 'user','001','info:name','tom' 1478053832459
查看表中的所有数据
HBase scan 命令用来查询全表数据,使用时只需指定表名即可。
hbase:012:0> scan 'student'
ROW COLUMN+CELL
0001 column=course:chines, timestamp=2022-06-07T00:10:46.843, value=80
0001 column=info:class, timestamp=2022-06-07T00:09:54.431, value=01
0001 column=info:gender, timestamp=1970-01-01T08:00:00.001, value=man
0001 column=info:name, timestamp=1970-01-01T08:00:00.001, value=Tom
0002 column=course:math, timestamp=2022-06-07T00:12:49.227, value=100
0002 column=info:class, timestamp=2022-06-07T00:12:34.077, value=2
0002 column=info:gender, timestamp=2022-06-07T00:12:17.515, value=man
0002 column=info:name, timestamp=2022-06-07T00:11:14.077, value=\xE6\x9D\xA8\xE8\x89\xB3\xE5\xB9\xB3
2 row(s)
Took 0.0683 seconds
从表中获取数据
- 查看其中某一个Key的数据
hbase:014:0> get 'student','0002'
COLUMN CELL
course:math timestamp=2022-06-07T00:12:49.227, value=100
info:class timestamp=2022-06-07T00:12:34.077, value=2
info:gender timestamp=2022-06-07T00:12:17.515, value=man
info:name timestamp=2022-06-07T00:11:14.077, value=\xE6\x9D\xA8\xE8\x89\xB3\xE5\xB9\xB3
1 row(s)
Took 0.0316 seconds
- 获取指定行中指定列族下所有列的数据信息
hbase:015:0> get 'student','0002','info' COLUMN CELL info:class timestamp=2022-06-07T00:12:34.077, value=2 info:gender timestamp=2022-06-07T00:12:17.515, value=man info:name timestamp=2022-06-07T00:11:14.077, value=\xE6\x9D\xA8\xE8\x89\xB3\xE5\xB9\xB3 1 row(s) Took 0.0638 seconds
-
获取指定行中指定列的数据信息
hbase:016:0> get 'student','0002','info:name' COLUMN CELL info:name timestamp=2022-06-07T00:11:14.077, value=\xE6\x9D\xA8\xE8\x89\xB3\xE5\xB9\xB3 1 row(s) Took 0.0933 seconds
- 查询表中有多少行
hbase:018:0> count 'student' 2 row(s) Took 0.0623 seconds => 2
- HBase存储中文
hbase:021:0> get 'student','0002',{FORMATTER => 'toString'}
COLUMN CELL
course:math timestamp=2022-06-07T00:12:49.227, value=100
info:class timestamp=2022-06-07T00:12:34.077, value=2
info:gender timestamp=2022-06-07T00:12:17.515, value=man
info:name timestamp=2022-06-07T00:11:14.077, value=杨艳平
1 row(s)
Took 0.1044 seconds
hbase:022:0> get 'student','0002','info:name:toString'
COLUMN CELL
info:name timestamp=2022-06-07T00:11:14.077, value=杨艳平
1 row(s)
Took 0.0333 seconds
删除数据
HBase delete 命令可以从表中删除一个单元格或一个行集,语法与 put 类似,必须指明表名和列族名称,而列名和时间戳是可选的。
-
例如,执行以下命令,将删除 student 表中行键为 0001 的 info 列族的所有数据:
hbase:023:0> delete 'student','0001','info'
Took 0.0741 seconds
更新数据
语法:put 'tableName' , ' rowName' , 'colFamily:column' , 'new value'
hbase:024:0> put 'student','0002','info:class','03'
Took 0.0185 seconds
过滤器
在 HBase 中,get 和 scan 操作都可以使用过滤器来设置输出的范围,类似 SQL 里的 Where 查询条件。
使用 show_filter 命令可以查看当前 HBase 支持的过滤器类型,如下图所示
hbase:002:0> show_filters
DependentColumnFilter
KeyOnlyFilter
ColumnCountGetFilter
SingleColumnValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
FirstKeyOnlyFilter
ColumnRangeFilter
ColumnValueFilter
TimestampsFilter
FamilyFilter
QualifierFilter
ColumnPrefixFilter
RowFilter
MultipleColumnPrefixFilter
InclusiveStopFilter
PageFilter
ValueFilter
ColumnPaginationFilter
Took 0.0706 seconds
=> #<Java::JavaUtil::HashMap::KeySet:0x66d3b881>
LIMIT
hbase:003:0> scan 'student',{LIMIT => 2,FORMATTER => 'toString'}
ROW COLUMN+CELL
0001 column=course:chines, timestamp=2022-06-07T00:10:46.843, value=80
0001 column=info:class, timestamp=2022-06-07T00:09:54.431, value=01
0001 column=info:gender, timestamp=1970-01-01T08:00:00.001, value=man
0001 column=info:name, timestamp=1970-01-01T08:00:00.001, value=Tom
0002 column=course:math, timestamp=2022-06-07T00:12:49.227, value=100
0002 column=info:class, timestamp=2022-06-07T00:28:43.673, value=03
0002 column=info:gender, timestamp=2022-06-07T00:12:17.515, value=man
0002 column=info:name, timestamp=2022-06-07T00:11:14.077, value=杨艳平
2 row(s)
Took 0.0919 seconds
行键过滤器
RowFilter 可以配合比较器和运算符,实现行键字符串的比较和过滤。例如,匹配行键中大于 0001 的数据,可使用 binary 比较器;匹配以 0001 开头的行键,可使用 substring 比较器,注意 substring 不支持大于或小于运算符。
实现上述匹配条件的过滤命令以及显示结果如下所示
hbase:001:0> scan 'student',FILTER =>"RowFilter(=,'substring:0002')"
ROW COLUMN+CELL
0002 column=course:math, timestamp=2022-06-07T00:12:49.227, value=100
0002 column=info:class, timestamp=2022-06-07T00:28:43.673, value=03
0002 column=info:gender, timestamp=2022-06-07T00:12:17.515, value=man
0002 column=info:name, timestamp=2022-06-07T00:11:14.077, value=\xE6\x9D\xA8\xE8\x89\xB3\xE5\xB9\xB3
1 row(s)
Took 0.6226 seconds
解决中文乱码
hbase:002:0> scan 'student',{FILTER =>"RowFilter(=,'substring:0002')",FORMATTER => 'toString'}
ROW COLUMN+CELL
0002 column=course:math, timestamp=2022-06-07T00:12:49.227, value=100
0002 column=info:class, timestamp=2022-06-07T00:28:43.673, value=03
0002 column=info:gender, timestamp=2022-06-07T00:12:17.515, value=man
0002 column=info:name, timestamp=2022-06-07T00:11:14.077, value=杨艳平
1 row(s)
Took 0.0838 seconds
值过滤器
在 HBase 的过滤器中也有针对单元格进行扫描的过滤器,即值过滤器,如下表所示。
值过滤器 | 描述 | 示例 |
ValueFilter | 值过滤器,找到符合值条件的键值对 | scan 'student', FILTER => "ValueFilter(=,'substring:curry')" 同 get 'student', '0001', FILTER => "ValueFilter(=,'substring:curry')" |
SingleColumnValueFilter | 在指定的列族和列中进行比较的值过滤器 | scan 'student', Filter => "SingleColumnValueFilter('info', 'gender', =,'binary:man')" |
SingleColumnValueExcludeFilter | 排除匹配成功的值 | scan 'student', Filter => "SingleColumnValueExcludeFilter('info', 'gender', =, 'binary:man')" |
hbase:003:0> scan 'student',{FILTER => "SingleColumnValueFilter('info','gender',=,'binary:man')"}
ROW COLUMN+CELL
0001 column=course:chines, timestamp=2022-06-07T00:10:46.843, value=80
0001 column=info:class, timestamp=2022-06-07T00:09:54.431, value=01
0001 column=info:gender, timestamp=1970-01-01T08:00:00.001, value=man
0001 column=info:name, timestamp=1970-01-01T08:00:00.001, value=Tom
0002 column=course:math, timestamp=2022-06-07T00:12:49.227, value=100
0002 column=info:class, timestamp=2022-06-07T00:28:43.673, value=03
0002 column=info:gender, timestamp=2022-06-07T00:12:17.515, value=man
0002 column=info:name, timestamp=2022-06-07T00:11:14.077, value=\xE6\x9D\xA8\xE8\x89\xB3\xE5\xB9\xB3
2 row(s)
Took 0.0785 seconds