HBase的安装和使用
概述
HBase(Hadoop Database),是一个基于Google BigTable论文设计的高可靠性、高性能、可伸缩的分布式存储系统。
CAP
CAP原则又称CAP定理,指的是在一个分布式系统中,一致性(Consistency)、可用性(Availability)、分区容错性(Partition tolerance)。CAP 原则指的是,这三个要素最多只能同时实现两点,不可能三者兼顾。
列存储
概念要从RDBMS说起,RDBMS操作的最小单元是行级数据
即使是查询或者更新某个表中某个字段,对于RDBMS来说都是加载整行数据来进行数据的修改
因此RDBMS在处理单个数据的处理方式上,性能并不高,因为会做一些无用的IO操作。
基于列式存储的数据库支持稀疏存储
将所有IO操作特性相似的字段归为一类
底层会为HBASE的列簇做独立存储和索引
HBASE排序顺序:rowkey>列簇>列名>时间戳
官网地址
: http://hbase.apache.org/
HBase是一种构建在HDFS之上的分布式、面向列的存储系统。在需要实时读写、随机访问超大规模数据集时,可 以使用HBase。
HBase 环境搭建-单机
基础环境
- Hadoop
- Zookeeper
安装和配置
[root@HadoopNode00 ~]# tar -zxvf hbase-1.2.4-bin.tar.gz -C /home/hbase/ # 解压至对应的目录
[root@HadoopNode00 ~]# vi .bashrc # 配置habse 环境变量
export HBASE_HOME=/home/hbase/hbase-1.2.4
export HBASE_MANAGES_ZK=false # 使用外部ZK
export PATH=$PATH:$HBASE_HOME/bin
[root@HadoopNode00 ~]# source .bashrc # 使环境变量生效
[root@HadoopNode00 ~]# vi /home/hbase/hbase-1.2.4/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://HadoopNode00:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>HadoopNode00</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
[root@HadoopNode00 ~]# vi /home/hbase/hbase-1.2.4/conf/regionservers
HadoopNode00
启动
[root@HadoopNode00 ~]# start-dfs.sh # 首先保证hdfs启动
[root@HadoopNode00 ~]# /home/zk/zookeeper-3.4.6/bin/zkServer.sh start /home/zk/zookeeper-3.4.6/conf/zk.cfg # 首先保证zk启动
[root@HadoopNode00 ~]# start-hbase.sh # 直接通过指令启动
[root@HadoopNode00 ~]# jps
1699 NameNode
2052 SecondaryNameNode
40660 QuorumPeerMain
42020 Jps
1851 DataNode
41708 HRegionServer # 健康存活
18476 NodeManager
41548 HMaster # 健康存活
18239 ResourceManager
连接
[root@HadoopNode00 ~]# hbase shell
Web UI
http://hostname:16010
Shell 操作
常见命令
status, table_help, version, whoami
命名空间操作
alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
创建命名空间
hbase(main):005:0> create_namespace 'zpark',{'user'=>'zhangsan'}
0 row(s) in 0.0560 seconds
# 需要注意的指定值为XXX 不能用等于号 而需要用‘=>’
hbase(main):006:0> list_namespace
list_namespace list_namespace_tables
hbase(main):006:0> list_namespace
NAMESPACE
default
hbase
zpark
3 row(s) in 0.0470 seconds
描述命名空间
hbase(main):008:0> describe_namespace 'zpark'
DESCRIPTION
{NAME => 'zpark', user => 'zhangsan'}
1 row(s) in 0.0450 seconds
修改命名空间
hbase(main):009:0> alter_namespace 'zpark',{METHOD => 'set' ,'user'=>'lqq'}
0 row(s) in 0.0460 seconds
hbase(main):010:0> describe_namespace 'zpark'
DESCRIPTION
{NAME => 'zpark', user => 'lqq'}
1 row(s) in 0.0020 seconds
删除命名空间 属性
hbase(main):012:0> alter_namespace 'zpark',{METHOD => 'unset' ,NAME=>'user'}
0 row(s) in 0.0450 seconds
hbase(main):013:0> describe_namespace 'zpark'
DESCRIPTION
{NAME => 'zpark'}
1 row(s) in 0.0090 seconds
显示所有命名空间
hbase(main):015:0> list_namespace
NAMESPACE
default
hbase
zpark
3 row(s) in 0.0270 seconds
删除命名空间
hbase(main):016:0> drop_namespace
ERROR: wrong number of arguments (0 for 1)
Here is some help for this command:
Drop the named namespace. The namespace must be empty. # 命名空间必须为空
hbase(main):017:0> drop_namespace 'zpark'
0 row(s) in 0.0480 seconds
显示某个命名空间下的表
hbase(main):003:0> list_namespace_tables 'lqq'
TABLE
t_user
1 row(s) in 0.0330 seconds
DDL 数据定义语言
对命名空间(数据库)的中表进行操作
创建表
hbase(main):004:0> create 'lqq:t_user','cf1','cf2'
hbase(main):003:0> list_namespace_tables 'lqq'
TABLE
t_user
1 row(s) in 0.0330 seconds
查看表详情
hbase(main):004:0> describe 'lqq:t_user'
Table lqq:t_user is ENABLED
baizhi:t_user
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODIN
G => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICA
TION_SCOPE => '0'}
{NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODIN
G => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICA
TION_SCOPE => '0'}
2 row(s) in 0.1400 seconds
删除表
- 注意在删除前 需要先将 表 disable 掉
hbase(main):001:0> drop 'lqq:t_user'
ERROR: Table lqq:t_user is enabled. Disable it first.
Here is some help for this command:
Drop the named table. Table must first be disabled:
hbase> drop 't1'
hbase> drop 'ns1:t1'
hbase(main):002:0> disable 'lqq:t_user'
0 row(s) in 2.5940 seconds
hbase(main):003:0> drop 'lqq:t_user'
0 row(s) in 1.5920 seconds
显示所有表
hbase(main):004:0> list
TABLE
0 row(s) in 0.0190 seconds
数据的CURD DML(数据管理语言)
插入(put)
# 插入一条数据 在lqq:t_user 行健为1 列簇cf1 字段名为name 值为zs
hbase(main):012:0> put 'lqq:t_user',1,'cf1:name','zs'
0 row(s) in 0.1420 seconds
hbase(main):001:0> t = get_table 'lqq:t_user' # 做表的引用
0 row(s) in 0.0330 seconds
=> Hbase::Table - lqq:t_user
# 插入一条数据 在lqq:t_user 行健为1 列簇cf1 字段名为sex 值为true
hbase(main):002:0> t.put 1,'cf1:sex','true'
0 row(s) in 0.3510 seconds
# 插入一条数据 在lqq:t_user 行健为1 列簇cf1 字段名为age 值为18
hbase(main):003:0> t.put 1,'cf1:age',18
0 row(s) in 0.0210 seconds
hbase(main):004:0> t.get 1
COLUMN CELL
cf1:age timestamp=1572888118460, value=18
cf1:name timestamp=1572887983788, value=zs
cf1:sex timestamp=1572888095574, value=true
3 row(s) in 0.0500 seconds
更新
# 创建一个可以多个版本的表
hbase(main):005:0> create 'lqq:t_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',VERSIONS=>3}
0 row(s) in 1.2410 seconds
=> Hbase::Table - lqq:t_user
# 插入基础的数据
hbase(main):012:0> t.put 1,'cf1:name','zs'
0 row(s) in 0.0870 seconds
hbase(main):013:0> t.put 1,'cf1:age',18
0 row(s) in 0.0220 seconds
hbase(main):014:0> t.put 1,'cf1:sex',true
0 row(s) in 0.0270 seconds
# 进行数据的更新
hbase(main):015:0> t.put 1,'cf1:name','zhangsan'
0 row(s) in 0.0080 seconds
# 可以看到数据已经变为其他数据
hbase(main):016:0> t.get 1
COLUMN CELL
cf1:age timestamp=1572888599018, value=18
cf1:name timestamp=1572888630027, value=zhangsan
cf1:sex timestamp=1572888605427, value=true
3 row(s) in 0.0350 seconds
取值(get)
# 获取所有rowkey 为1 列簇为cf1 列名为name 最多获取三个版本的数据
hbase(main):017:0> t.get 1 ,{COLUMNS=>'cf1:name',VERSIONS=>3}
COLUMN CELL
cf1:name timestamp=1572888630027, value=zhangsan
cf1:name timestamp=1572888590148, value=zs
2 row(s) in 0.0300 seconds
# 根据某个时间戳进行获取
hbase(main):020:0> t.get 1 ,{COLUMNS=>'cf1:name',TIMESTAMP => 1572888590148}
COLUMN CELL
cf1:name timestamp=1572888590148, value=zs
1 row(s) in 0.0170 seconds
# 根据时间戳区间进行获取
hbase(main):024:0> t.get 1 ,{COLUMNS=>'cf1:name',TIMERANGE => [157288850147,1572888630030],VERSIONS => 4}
COLUMN CELL
cf1:name timestamp=1572888630027, value=zhangsan
cf1:name timestamp=1572888590148, value=zs
2 row(s) in 0.0160 seconds
删除(delete/deleteall)
# 直接进行删除
hbase(main):025:0> delete 'lqq:t_user',1,'cf1:name'
0 row(s) in 0.0560 seconds
# 引用删除
hbase(main):027:0> t.delete 1,'cf1:sex'
0 row(s) in 0.0290 seconds
# 删除某个id下的所有数据
hbase(main):029:0> t.deleteall 1
0 row(s) in 0.0150 seconds
# 删除某个id某个列簇某个字段所有版本的值
hbase(main):043:0> t.deleteall 1 ,'cf1:name'
0 row(s) in 0.0150 seconds
全表扫描
hbase(main):049:0> t.scan
ROW COLUMN+CELL
1 column=cf1:age, timestamp=1572889604496, value=18
1 column=cf1:name, timestamp=1572889601629, value=zs
1 column=cf1:sex, timestamp=1572889608032, value=true
1 row(s) in 0.0240 seconds
计数(count)
hbase(main):050:0> t.count
1 row(s) in 0.0590 seconds
=> 1
追加(append)
hbase(main):051:0> t.append 1,'cf1:name','123'
0 row(s) in 0.0220 seconds
hbase(main):052:0> t.scan
ROW COLUMN+CELL
1 column=cf1:age, timestamp=1572889604496, value=18
1 column=cf1:name, timestamp=1572889732899, value=zs123
1 column=cf1:sex, timestamp=1572889608032, value=true
1 row(s) in 0.0240 seconds
清空数据
hbase(main):053:0> truncate 'lqq:t_user'
Truncating 'lqq:t_user' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 3.9640 seconds
hbase(main):054:0> t.scan
ROW COLUMN+CELL
0 row(s) in 0.1440 seconds
Java API
依赖
获取客户端
private Connection connection;
private Admin admin;
@Before
public void getAdmin() throws Exception {
Configuration conf = new Configuration();
conf.set("hbase.zookeeper.quorum", "HadoopNode00");
conf.set("hbase.zookeeper.property.clientPort", "2181");
connection = ConnectionFactory.createConnection(conf);
admin = connection.getAdmin();
}
关闭资源
@After
public void close() throws Exception {
admin.close();
connection.close();
}
命名空间操作
- 创建命名空间
@Test
public void createNameSpace() throws Exception {
NamespaceDescriptor namespaceDescriptor = NamespaceDescriptor.create("hadoop1").addConfiguration("lqq", "123").build();
admin.createNamespace(namespaceDescriptor);
}
- 修改命名空间
@Test
public void changeNameSpace() throws Exception{
NamespaceDescriptor namespaceDescriptor = NamespaceDescriptor.create("hadoop").removeConfiguration("lqq").build();
admin.modifyNamespace(namespaceDescriptor);
}
- 删除命名空间
@Test
public void deleteNameSpace() throws Exception{
admin.deleteNamespace("hadoop");
}
- 列出命名空间
@Test
public void listNameSpace() throws Exception {
NamespaceDescriptor[] namespaceDescriptors = admin.listNamespaceDescriptors();
for (NamespaceDescriptor namespaceDescriptor : namespaceDescriptors) {
System.out.println(namespaceDescriptor.getName());
}
}
表操作
- 创建表
@Test
public void createTable() throws Exception {
/*
* 将表的名字信息封装到TableName中
* */
TableName tableName = TableName.valueOf("lqq:t_java");
/*
*
* 创建描述表的对象 并提供表的名字
* */
HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);
/*
* 描述列簇的对象 并指定列簇的名字
* */
HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
// 设置 最大可存的版本
cf1.setMaxVersions(3);
/*
* 描述列簇的对象 并指定列簇的名字
* */
HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
// 设置 最大可存的版本
cf2.setMaxVersions(3);
// 在表中添加必要的属性:列簇
tableDescriptor.addFamily(cf1);
tableDescriptor.addFamily(cf2);
/*
* 使用admin对象创建表
* */
admin.createTable(tableDescriptor);
}
- 删除表
@Test
public void deleteTable() throws Exception {
TableName tableName = TableName.valueOf("lqq:t_java");
if (admin.tableExists(tableName)) {
admin.disableTable(tableName);
admin.deleteTable(tableName);
}
}
CURD
- put
插入|更新单个记录
@Test
public void putData() throws Exception {
TableName tableName = TableName.valueOf("lqq:t_user");
Table table = connection.getTable(tableName);
/*
* 封装 一行 数据
* */
Put put = new Put("1".getBytes());
/*
* 参数列表 : 列簇 列名 值
* */
put.addColumn("cf1".getBytes(), "name".getBytes(), "zhangsan".getBytes());
put.addColumn("cf1".getBytes(), "pwd".getBytes(), "123".getBytes());
put.addColumn("cf2".getBytes(), "age".getBytes(), "18".getBytes());
put.addColumn("cf2".getBytes(), "salary".getBytes(), "1000".getBytes());
table.put(put);
table.close();
}
批量插入
@Test
public void putManyData() throws Exception {
TableName tableName = TableName.valueOf("lqq:t_user");
Table table = connection.getTable(tableName);
/*
* 封装 一行 数据
* */
Put put = new Put("2".getBytes());
/*
* 参数列表 : 列簇 列名 值
* */
put.addColumn("cf1".getBytes(), "name".getBytes(), "lisi".getBytes());
put.addColumn("cf1".getBytes(), "pwd".getBytes(), "123".getBytes());
put.addColumn("cf2".getBytes(), "age".getBytes(), "20".getBytes());
put.addColumn("cf2".getBytes(), "salary".getBytes(), "20000".getBytes());
ArrayList<Put> puts = new ArrayList<Put>();
puts.add(put);
table.put(puts);
table.close();
}
@Test
public void putManyData() throws Exception {
TableName tableName = TableName.valueOf("lqq:t_user");
BufferedMutator bufferedMutator = connection.getBufferedMutator(tableName);
/*
* 封装 一行 数据
* */
Put put = new Put("2".getBytes());
/*
* 参数列表 : 列簇 列名 值
* */
put.addColumn("cf1".getBytes(), "name".getBytes(), "ls".getBytes());
put.addColumn("cf1".getBytes(), "pwd".getBytes(), "123".getBytes());
put.addColumn("cf2".getBytes(), "age".getBytes(), "20".getBytes());
put.addColumn("cf2".getBytes(), "salary".getBytes(), "20000".getBytes());
ArrayList<Put> puts = new ArrayList<Put>();
puts.add(put);
bufferedMutator.mutate(puts);
bufferedMutator.close();
}
delete
@Test
public void deleteData() throws Exception {
TableName tableName = TableName.valueOf("lqq:t_user");
Table table = connection.getTable(tableName);
Delete delete = new Delete("2".getBytes());
table.delete(delete);
table.close();
}
- 批量删除
@Test
public void deleteManyData() throws Exception {
TableName tableName = TableName.valueOf("lqq:t_user");
Table table = connection.getTable(tableName);
Delete delete = new Delete("1".getBytes());
table.delete(delete);
table.close();
}
get
@Test
public void getData() throws Exception{
Table table = connection.getTable(TableName.valueOf("lqq:t_user"));
Get get = new Get("2".getBytes());
Result result = table.get(get);
/*
* 列簇 列名
* */
byte[] name = result.getValue("cf1".getBytes(), "name".getBytes());
byte[] pwd = result.getValue("cf1".getBytes(), "pwd".getBytes());
byte[] age = result.getValue("cf2".getBytes(), "age".getBytes());
byte[] salary = result.getValue("cf2".getBytes(), "salary".getBytes());
System.out.println("名字为:"+Bytes.toString(name)+", 密码为:"+Bytes.toString(pwd)+",年龄为:"+Bytes.toString(age)+",工资为:"+Bytes.toString(salary));
}
- 获取多个版本的数据
@Test
public void getManyData() throws Exception {
Table table = connection.getTable(TableName.valueOf("baizhi:t_user"));
Get get = new Get("2".getBytes());
get.setMaxVersions(3);
get.addColumn("cf1".getBytes(), "name".getBytes());
Result result = table.get(get);
List<Cell> columnCells = result.getColumnCells("cf1".getBytes(), "name".getBytes());
for (Cell columnCell : columnCells) {
byte[] rowData = CellUtil.cloneRow(columnCell);
byte[] cfData = CellUtil.cloneFamily(columnCell);
byte[] qualifierData = CellUtil.cloneQualifier(columnCell);
byte[] data = CellUtil.cloneValue(columnCell);
System.out.println("行健为:" + Bytes.toString(rowData) + ", 列簇为:" + Bytes.toString(cfData) + ",列名为:" + Bytes.toString(qualifierData) + ",名字为:" + Bytes.toString(data));
}
}
scan
@Test
public void scanData() throws Exception {
Table table = connection.getTable(TableName.valueOf("baizhi:t_user"));
Scan scan = new Scan();
// scan.addFamily("cf1".getBytes());
//scan.addColumn("cf1".getBytes(),"name".getBytes());
PrefixFilter prefixFilter1 = new PrefixFilter("1".getBytes());
PrefixFilter prefixFilter2 = new PrefixFilter("2".getBytes());
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE, prefixFilter1, prefixFilter2);
scan.setFilter(filterList);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println("------------------");
byte[] name = result.getValue("cf1".getBytes(), "name".getBytes());
byte[] pwd = result.getValue("cf1".getBytes(), "pwd".getBytes());
byte[] age = result.getValue("cf2".getBytes(), "age".getBytes());
byte[] salary = result.getValue("cf2".getBytes(), "salary".getBytes());
System.out.println("名字为:" + Bytes.toString(name) + ", 密码为:" + Bytes.toString(pwd) + ",年龄为:" + Bytes.toString(age) + ",工资为:" + Bytes.toString(salary));
}
scanner.close();
table.close();
}