HBase介绍, 环境搭建,HBase shell 命令

为何使用HBase?

Hbase 称为Hadoop database,设计理念来自于google的bigtable(基于GFS上一款NoSQL数据库)论文。HDFS支持海量数据的存储,不支持数据修改(记录级别)不支持对于海量数据的随即访问。一般如果想针对于海量数据随机读写在不考虑时间的情况下可以配合Map Reduce实现对数据ETL(耗时)。Hbase是基于HDFS上的一款NoSQL数据库实现对HDFS上的数据随机读写。

 

HBase和HDFS关系?

Hbase介绍

        HBase是一个分布式的、可扩展、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System)所提供的分布式数据存储一样,HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。This project's goal is the hostingof very large tables -- billions of rows X millions of columns -- atop clustersof commodity hardware.

 

Hbase使用场景?

【First】, make sure you have enough data.If you have hundreds of millions or billions of rows, then HBase is a goodcandidate。

【Second】, make sure you can live withoutall the extra features that an RDBMS provides (e.g., typed columns, secondaryindexes, transactions, advanced query languages, etc.)

【Third】, make sure you have enoughhardware. Even HDFS doesn’t do well with anything less than 5 DataNodes (due tothings such as HDFS block replication which has a default of 3), plus aNameNode.

 

什么是面向列存储

1.    行存储问题

 

RDBMS

1.不支持稀疏存储(磁盘)
利用率低。
2.磁盘IO利用率低(扫描记录的时候都是整行加载)

test:t_user

id

name

pwd

sex

info

1

zs

***

TRUE

 

 

2

ls

***

 

 

 

3

ww

***

 

XXX

 

test:t_user_base

test:t_user_info

id

name

pwd

id

sex

info

1

zs

***

1

TRUE

 

 

2

ls

***

3

 

XXX

 

3

ww

***

提升磁盘和IO的利用率,表链接增多

select id,name from t_user  where id =1 IO利用率不高

2.    面向列存储

Hbase

1、列簇是将所有IO操作特性相近的列放置在同一个物理文件中。
2、所有存储在Hbase中数据都是按照RowKey、ColumnFamily、timestamp排好序,有利于查询。
3、Hbase的表可以设计的非常稀疏,在设计表的时候,只需要给定列簇即可,无需关注列簇下的列。

test:t_user

rowkey

Column Family

column

value

timestamp

1

cf1:name

zs

1

1

cf1:name

张三

2

1

cf1:pwd

***

1

1

cf2:sex

TRUE

1

2

cf1:name

ls

1

2

cf1:pwd

***

1

3

cf1:name

ww

1

3

cf1:pwd

***

1

3

cf2:info

XXX

1

名词解释



HBase环境搭建

1.    确保Hadoop的HDFS必须正常运行(略)

2.    启动zookeeper(略)

 

3.    上传Hbase安装包hbase-1.2.4-bin.tar.gz解压在/usr目录下配置HBASE_HOME

[root@CentOS ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/
[root@CentOS ~]# vim .bashrc
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.6.0
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.6.0
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME
[root@CentOS ~]# source .bashrc

4.   修改hbase的配置文件hbase-site.xml

[root@CentOS ~]# vim /usr/hbase-1.2.4/conf/hbase-site.xml
<configuration>
	<property>	
		<name>hbase.rootdir</name>
		<!-- 与hdfs配置一致 -->
		<value>hdfs://CentOS:9000/hbase</value>
	</property>
	<property>
		<name>hbase.cluster.distributed</name>
		<value>true</value>
	</property>
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>CentOS</value>
	</property>
	<property>
		<name>hbase.zookeeper.property.clientPort</name>
		<value>2181</value>
	</property>
</configuration>

5.   修改regionservers文本文件(确保和hadoop中slaves文件一致)

[root@CentOS ~]# vim /usr/hbase-1.2.4/conf/regionservers
CentOS

6.   修改hbase-env.sh文件

使用外部zookeeper管理集群元数据

[root@CentOS ~]# vim /usr/hbase-1.2.4/conf/hbase-env.sh 
127 # Tell HBase whether it should manage it's own instance of Zookeeper or not.
128 export HBASE_MANAGES_ZK=false

7.   启动|关闭Hbase

[root@CentOS ~]# start|stop-hbase.sh 
starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-CentOS.out
CentOS: starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-regionserver-CentOS.out
[root@CentOS ~]# jps
11971 NameNode
12750 Jps
1425 QuorumPeerMain
12536 HMaster
12659 HRegionServer
12054 DataNode
12255 SecondaryNameNode

可以访问:http://centos:16010/master-status#userTables

 

 

Hbase shell 命令基本使用

1) Namespace操作 (数据库操作

2) Table操作

3) HBASE (Create Retrive update delete【 CRUD】 ) DML

 --进入HBASE shell 窗口

[root@CentOS ~]# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

hbase(main):001:0>

 

注意:hbase shell中删除使用ctrl+backspace

 

 

  • namespace(数据库操作):

1)list_namespace(查看所有namespace)

hbase(main):003:0> list_namespace
NAMESPACE                                                                                                                      
default                                                                                                                        
hbase                                                                                                                          
2 row(s) in 1.1400 seconds

2)create_namespace (创建namespace)

create_namespace 'test',{'key'=>'value'}

例:

hbase(main):007:0> create_namespace 'test',{'creator'=>'jeffery'}
0 row(s) in 0.2650 seconds

3)list_namespace_tables(查看namespace下的表)

hbase(main):004:0> list_namespace_tables 'test'
TABLE                                                                                                                 
t_people                                                                                                              
t_user                                                                                                                
2 row(s) in 1.7320 seconds

4)describe_namespace (查看一个数据的信息)

hbase(main):008:0> describe_namespace 'test'
DESCRIPTION                                                                                                                    
{NAME => 'test', creator => 'jeffery'}                                                                                         
1 row(s) in 0.1760 seconds

5)alter_namespace (修改namespcae属性)添加一个属性:

hbase> alter_namespace 'namespace', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}

删除一个属性:

hbase> alter_namespace 'namespace', {METHOD => 'unset', NAME=>'PROPERTY_NAME'}

例:

修改一个属性

hbase(main):009:0> alter_namespace 'test',{METHOD=>'set','creator'=>'tom'}
0 row(s) in 2.0060 seconds
hbase(main):010:0> describe_namespace 'test'
DESCRIPTION                                                                                                                    
{NAME => 'test', creator => 'tom'}                                                                                             
1 row(s) in 0.0420 seconds

添加一个属性

hbase(main):012:0> alter_namespace 'test',{METHOD=>'set','time'=>'2018-07-09'}
0 row(s) in 0.4710 seconds
hbase(main):013:0> describe_namespace 'test'
DESCRIPTION                                                                                                                    
{NAME => 'test', creator => 'tom', time => '2018-07-09'}                                                                       
1 row(s) in 0.0230 seconds

删除一个属性

hbase(main):015:0> alter_namespace 'test',{METHOD=>'unset',NAME=>'time'}
0 row(s) in 0.2440 seconds
hbase(main):016:0> describe_namespace 'test'
DESCRIPTION                                                                                                                    
{NAME => 'test', creator => 'tom'}                                                                                             
1 row(s) in 0.0340 seconds

6)drop_namespace (删除namespace,只可以删除空的database)

hbase(main):017:0> drop_namespace 'test'
0 row(s) in 0.7570 seconds

 

  • table(表操作):

1)create (建表)

指定versions(最多保留几个版本)

hbase(main):024:0> create 'test:t_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',VERSIONS=>3}
0 row(s) in 6.6200 seconds

=> Hbase::Table - test:t_user

不指定versions(默认versions保留版本为1)

hbase(main):025:0> create 'test:t_user2','cf1','cf2'
0 row(s) in 5.3320 seconds

=> Hbase::Table - test:t_user2

2)list (展示所有用户表)

hbase(main):026:0> list
TABLE                                                                                                                          
test:t_user                                                                                                                    
test:t_user2                                                                                                                   
2 row(s) in 0.5430 seconds

=> ["test:t_user", "test:t_user2"]

3)describe  (查看表)

hbase(main):027:0> describe 'test:t_user'
Table test:t_user is ENABLED                                                                                                   
test:t_user                                                                                                                    
COLUMN FAMILIES DESCRIPTION                                                                                                    
{NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 
'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
CACHE => 'true'}                                                                                                               
{NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 
'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
CACHE => 'true'}                                                                                                               
2 row(s) in 1.8370 seconds
hbase(main):028:0> describe 'test:t_user2'
Table test:t_user2 is ENABLED                                                                                                   
test:t_user2                                                                                                                    
COLUMN FAMILIES DESCRIPTION                                                                                                     
{NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => '
NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCA
CHE => 'true'}                                                                                                                  
{NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => '
NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCA
CHE => 'true'}                                                                                                                  
2 row(s) in 0.0940 seconds

4)drop (删除表,hbase中不可以直接删除表需要先disable)

hbase(main):030:0> disable 'test:t_user2'
0 row(s) in 14.1030 seconds
hbase(main):031:0> drop 'test:t_user2'
0 row(s) in 2.8550 seconds

5)enable (启动一个disabled的表)

hbase(main):032:0> enable 'test:t_user'
0 row(s) in 0.0470 seconds
hbase(main):033:0> is_enabled 'test:t_user'
true                                                                                                                            
0 row(s) in 0.0400 seconds
hbase(main):034:0> is_disabled 'test:t_user'
false                                                                                                                           
0 row(s) in 0.0520 seconds

6)exists 判断表是否存在

hbase(main):035:0> exists 'test:t_user'
Table test:t_user does exist                                                                                                    
0 row(s) in 0.0660 seconds
hbase(main):036:0> exists 'test:t_user2'
Table test:t_user2 does not exist                                                                                               
0 row(s) in 0.0520 seconds
  • 数据管理

1)put(插入、更新)

 

语法:put'ns:table','rowkey','cf:column',value,[ts]

若不存在则插入

hbase(main):043:0> put 'test:t_user','user:001','cf1:name','zhangsan'
0 row(s) in 0.0610 seconds
hbase(main):042:0> get 'test:t_user','user:001'
COLUMN                    CELL                                                                    
 cf1:name                 timestamp=1527399295854, value=zhangsan                                 
1 row(s) in 0.4010 seconds

若存在则覆盖数据

hbase(main):043:0> put 'test:t_user','user:001','cf1:name','lisi'
0 row(s) in 0.0610 seconds
hbase(main):044:0> get 'test:t_user','user:001'
COLUMN                    CELL                                                                    
 cf1:name                 timestamp=1527399460905, value=lisi                                     
1 row(s) in 0.0850 seconds

2)get(获取一个列数据)

语法:get 'ns:table','rowkey' ...

① 获取最新版本的一个列数据

hbase(main):045:0> get 'test:t_user','user:001'
COLUMN                    CELL                                                                    
 cf1:name                 timestamp=1527399460905, value=lisi                                     
1 row(s) in 0.9440 seconds

② TIMERANGE 取timestamp区间内的数据(左闭右开区间,左小时间节点,右大时间节点)

get 'test:t_user','user:001',{COLUMN=>'cf1',TIMERANGE=>[1527401460239,1527401478060],VERSIONS=>10}
hbase(main):092:0> get 'test:t_user','user:001',{COLUMN=>'cf1',TIMERANGE=>[1527401460239,1527401478060],VERSIONS=>10}
COLUMN                    CELL                                                                    
 cf1:name                 timestamp=1527401471106, value=wangwu                                   
 cf1:name                 timestamp=1527401466085, value=lisi                                     
 cf1:name                 timestamp=1527401460239, value=zhangsan                                 
3 row(s) in 0.0630 seconds

注意:使用TIMERANGE要和VERSIONS连用,否则拿到的是区间内最新的一个数据

③ 获取最新两个版本的列数据

hbase(main):095:0> get 'test:t_user','user:001',{COLUMN=>'cf1',VERSIONS=>2}
COLUMN                    CELL                                                                    
 cf1:name                 timestamp=1527401478060, value=zhaoliu                                  
 cf1:name                 timestamp=1527401471106, value=wangwu                                   
2 row(s) in 0.2140 seconds

④ 获取指定timestamp的列数据

hbase(main):096:0> get 'test:t_user','user:001',{COLUMN=>'cf1',TIMESTAMP=>1527401471106}
COLUMN                    CELL                                                                    
 cf1:name                 timestamp=1527401471106, value=wangwu                                   
1 row(s) in 0.8810 seconds

⑤ 获取多个列数据

准备数据
hbase(main):099:0> put 'test:t_user','user:001','cf1:age','10'
0 row(s) in 0.4190 seconds
hbase(main):102:0> get 'test:t_user','user:001',{COLUMN => ['cf1:age','cf1:name'],VERSIONS=>1}
COLUMN                    CELL                                                                    
 cf1:age                  timestamp=1527403480511, value=10                                       
 cf1:name                 timestamp=1527401478060, value=zhaoliu                                  
2 row(s) in 0.1850 seconds

3)scan(获取一批)

① 获取一批最新版本的列数据

hbase(main):105:0> scan 'test:t_user'
ROW                       COLUMN+CELL                                                             
 user:001                 column=cf1:age, timestamp=1527403480511, value=10                       
 user:001                 column=cf1:name, timestamp=1527401478060, value=zhaoliu                 
1 row(s) in 0.6840 seconds

②分页获取一批列数据(LIMIT获取几个rowkey对应的数据,STARTROW代表起始rowkey的数据)

准备数据
hbase(main):106:0> put 'test:t_user','user:002','cf1:name','jeffery'
0 row(s) in 0.3810 seconds
hbase(main):107:0> scan 'test:t_user', {COLUMNS => ['cf1'], LIMIT =>1, STARTROW => 'user:002'}
ROW                       COLUMN+CELL                                                             
 user:002                 column=cf1:name, timestamp=1527404037898, value=jeffery                 
1 row(s) in 0.3700 seconds

③ 倒序分页获取一批列数据

hbase(main):111:0> scan 'test:t_user', {COLUMNS => ['cf1'], LIMIT =>11, STARTROW => 'user:002',REVERSED => true}
ROW                            COLUMN+CELL                                                                            
 user:002                      column=cf1:name, timestamp=1527404037898, value=jeffery                                
 user:001                      column=cf1:age, timestamp=1527403480511, value=10                                      
 user:001                      column=cf1:name, timestamp=1527401478060, value=zhaoliu                                
2 row(s) in 0.5790 seconds

4)delete(删除)

测试前数据
hbase(main):113:0> get 'test:t_user','user:001',{COLUMN=>'cf1',VERSIONS=>10}
COLUMN                         CELL                                                                                   
 cf1:age                       timestamp=1527403480511, value=10                                                      
 cf1:name                      timestamp=1527401478060, value=zhaoliu                                                 
 cf1:name                      timestamp=1527401471106, value=wangwu                                                  
 cf1:name                      timestamp=1527401466085, value=lisi                                                    
 cf1:name                      timestamp=1527401460239, value=zhangsan                                                
5 row(s) in 0.1440 seconds

① 指定timestamp删除(删除一个版本的列数据)

hbase(main):114:0> delete 'test:t_user','user:001','cf1:name',1527401460239
0 row(s) in 0.5630 seconds
hbase(main):115:0> get 'test:t_user','user:001',{COLUMN=>'cf1',VERSIONS=>10}
COLUMN                         CELL                                                                                   
 cf1:age                       timestamp=1527403480511, value=10                                                      
 cf1:name                      timestamp=1527401478060, value=zhaoliu                                                 
 cf1:name                      timestamp=1527401471106, value=wangwu                                                  
 cf1:name                      timestamp=1527401466085, value=lisi                                                    
4 row(s) in 4.9890 seconds

② 不指定timestamp删除(删除所有版本的列数据)

hbase(main):116:0> delete 'test:t_user','user:001','cf1:name'
0 row(s) in 0.4970 seconds
hbase(main):117:0> get 'test:t_user','user:001',{COLUMN=>'cf1',VERSIONS=>10}
COLUMN                         CELL                                                                                   
 cf1:age                       timestamp=1527403480511, value=10                                                      
1 row(s) in 1.2470 second

5)truncate(截断)(类似与RDBMS truncate语句,快速删除所有数据但保留表结构)

hbase(main):118:0> truncate
truncate            truncate_preserve
hbase(main):118:0> truncate 'test:t_user'
Truncating 'test:t_user' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 16.7690 seconds

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值