下载hbase: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/stable/hbase-1.2.6-bin.tar.gz
解压到目录/home/hadoop/hbase-1.2.6
HBase本身就内置Zookeeper以支持HA,在配置时可选择内部自带的Zookeeper或者选择外部独立安装的Zookeeper,对于存储可选择HDFS为底层分布式存储,故受HDFS是否配置为HA而影响,HDFS non HA 及HDFS HA下,HBase的配置文件需要相应的修改,且在启动HBase前,必须先启动zookeeper
1:设置环境变量及临时目录:.bash_profile
export PATH
export HIVE_HOME=/home/hadoop/apache-hive-2.3.3-bin
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export CLASSPATH=$CLASSPATH:${HIVE_HOME}/lib
export PGHOME=/usr/local/pgsql
export PGDATA=/data/pgdata
export PATH=$PGHOME/bin:$PATH
export MANPATH=$PGHOME/share/man:$MANPATH
export LANG=en_US.utf8
export DATE=`date +"%Y-%m-%d %H:%M:%S"`
export LD_LIBRARY_PATH=$PGHOME/lib:$LD_LIBRARY_PATH
export HBASE_HOME=/home/hadoop/hbase-1.2.6
alias rm='rm -i'
alias ll='ls -lh'
export PATH=.:${HIVE_HOME}/bin:$HBASE_HOME/bin:$PATH
创建临时目录:mkdir -p /home/hadoop/hbase-1.2.6/tmp
规划集群,有4台linux机器
hadoop-m 192.168.31.50 namenode datenode zookeeper hive Hbase
hadoop-sm 192.168.31.51 namenode datenode zookeeper Hbase
slave1 192.168.31.52 namenode datenode zookeeper Hbase
slave2 192.168.31.53 datenode Hbase
2:修改配置文件
- hbase-env.sh
hbase自带zookeeper,如果不用自带zk,将下面HBASE_MANAGES_ZK设置为false,本次使用独立配置的zookeeper
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_161 export HADOOP_HOME=/usr/local/hadoop-2.7.5 export HBASE_HOME=/home/hadoop/hbase-1.2.6 export HBASE_CLASSPATH=/usr/local/hadoop-2.7.5/etc/hadoop export HBASE_PID_DIR=/home/hadoop/hbase-1.2.6/pids export HBASE_MANAGES_ZK=false
- hbase-site.xml
<property> <name>hbase.rootdir</name> <value>hdfs://mycluster/hbase</value> <!--配置为core-site.xml 中的fs.defaultFS --> <description>The directory shared byregion servers.</description> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> <description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect. </description> </property> <property> <name>hbase.master</name> <value>60000</value> <!-- Hbase HA 方式下只需配置端口 --> </property> <property> <name>zookeeper.session.timeout</name> <value>120000</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop-m,hadoop-sm,slave1</value> </property> <property> <name>hbase.zoopkeeper.property.dataDir</name> <value>/usr/local/zookeeper-3.4.10/data/zkData</value> </property> <property> <name>hbase.tmp.dir</name> <value>/home/hadoop/hbase-1.2.6/tmp</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property>
- regionservers (注:通常应该将regionserver配置为datanode相同的server上以实现本地存储,提升性能)
hadoop-m hadoop-sm slave1 slave2
保存配置,然后这4台同步安装文件夹
采用scp命令 scp -r hbase-1.2.6 hadoop@slave2:/home/hadoop
hbase-site.xml 配置参数解析
- hbase.rootdir
这个目录是 RegionServer 的共享目录,用来持久化 HBase。特别注意的是 hbase.rootdir 里面的 HDFS 地址是要跟 Hadoop 的 core-site.xml 里面的 fs.defaultFS 的 HDFS 的 IP 地址或者域名、端口必须一致。(HA环境下,dfs.nameservices 是由zookeeper来决定的)
- hbase.cluster.distributed
HBase 的运行模式。为 false 表示单机模式,为 true 表示分布式模式。若为 false,HBase 和 ZooKeeper 会运行在同一个 JVM 中
- hbase.master
如果只设置单个 Hmaster,那么 hbase.master 属性参数需要设置为 master:60000 (主机名:60000)
如果要设置多个 Hmaster,那么我们只需要提供端口 60000,因为选择真正的 master 的事情会有 zookeeper 去处理
- hbase.tmp.dir
本地文件系统的临时文件夹。可以修改到一个更为持久的目录上。(/tmp会在重启时清除)
- hbase.zookeeper.quorum
对于 ZooKeeper 的配置。至少要在 hbase.zookeeper.quorum 参数中列出全部的 ZooKeeper 的主机,用逗号隔开。该属性值的默认值为 localhost,这个值显然不能用于分布式应用中。
- hbase.zookeeper.property.dataDir
这个参数用户设置 ZooKeeper 快照的存储位置,默认值为 /tmp,显然在重启的时候会清空。因为笔者的 ZooKeeper 是独立安装的,所以这里路径是指向了 $ZOOKEEPER_HOME/conf/zoo.cfg 中 dataDir 所设定的位置。
- hbase.zookeeper.property.clientPort
表示客户端连接 ZooKeeper 的端口。
- zookeeper.session.timeout
ZooKeeper 会话超时。Hbase 把这个值传递改 zk 集群,向它推荐一个会话的最大超时时间
- hbase.regionserver.restart.on.zk.expire
当 regionserver 遇到 ZooKeeper session expired , regionserver 将选择 restart 而不是 abort。
3 启动和测试
3.1 启动
Hbase是基于hadoop提供的分布式文件系统的,所以启动Hbase之前,先确保hadoop在正常运行,另外Hbase还依赖于zookkeeper,本来我们可以用hbase自带的zookeeper,但是我们上面的配置启用的是我们自己的zookeeper集群,所以在启动hbase前,还要确保zokeeper已经正常运行。
Hbase可以只在hadoop的某个namenode节点上安装,也可以在所有的hadoop节点上安装,但是启动的时候只需要在一个节点上启动就行了,本例中,我在4个节点上都安装了Hbase,启动的时候只需要在hadoop-m上启动就OK。
在hadoop-m上执行命令,进入到Hbase的bin目录内,命令是:
start-hbase.sh
如图
3.2 测试
用浏览器访问Hbase状态信息
直接访问地址:http://192.168.31.50:16030/
如图:
3.2.2 启动hbase的命令行
执行命令,进入到Hbase的bin目录内,命令是:
cd /opt/hbase/hbase-1.2.5/bin
执行命令启动Hbase命令行窗口,命令是:
./hbase shell
如图:
完整的输出是:
[hadoop@hadoop-m ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.6, rUnknown, Mon May 29 02:25:32 CDT 2017
hbase(main):001:0> status
1 active master, 1 backup masters, 4 servers, 0 dead, 0.5000 average load
hbase(main):002:0>
在hbase命令行模式下,可以输入一系列hbase命令,进行测试
等等 ~~~说好的高可用呢?
其实是没有启动
我们在hadoop-sm,slave1,slave2上分别执行
hbase-daemon.sh start master
然后再执行status
检查状态
- 可以查看进程jps:
在master和备节点上可以看到进程:HMaster,HRegionServer,因为我们4台都配了的,所以这4台都有HMaster
在其它节点上可以看到进程:HRegionServer
此时hadoop-m jps
hadoop-sm jps
- 浏览器查看:hadoop-m:16010, slave1:16010
可以看到一个是Active Master, 3个为Backup Master
如果要退出Hbase命令行模式的话,输入:exit
命令行操作:
1. 经常使用hbase命令
--进入habase
[grid@gc ~]$ hbase-0.90.5/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
hbase(main):001:0>
--查看数据库状态
hbase(main):002:0> status
2 servers, 0 dead, 1.0000 average load
--查询数据库版本号
hbase(main):004:0> version
0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
--帮助命令
hbase(main):003:0> help
HBase Shell, version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, version
Group name: ddl
Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list
Group name: dml
Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate
Group name: tools
Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump
Group name: replication
Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html
2. Hbase数据库操作命令
--创建表
resume表逻辑模型:
行键 | 时间戳 | 列族binfo | 列族edu | 列族work |
lichangzai | T2 | binfo:age=’1980-1-1’ |
|
|
T3 | binfo:sex=’man’ |
|
| |
T5 |
| edu:mschool=’rq no.1’ |
| |
T6 |
| edu:university=’qhddx’ |
| |
T7 |
|
| work:company1=’12580’ | |
changfei | T10 | binfo:age=’1986-2-1’ |
|
|
T11 |
| edu:university=’bjdx’ |
| |
T12 |
|
| work:company1=’LG’ | |
…… | Tn |
|
|
|
--创建表
hbase(main):005:0> create 'resume','binfo','edu','work'
0 row(s) in 16.5710 seconds
--列出表
hbase(main):006:0> list
TABLE
resume
1 row(s) in 1.6080 seconds
--查看表结构
hbase(main):007:0> describe 'resume'
DESCRIPTION ENABLED
{NAME => 'resume', FAMILIES => [{NAME => 'binfo', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', C true
OMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fals
e', BLOCKCACHE => 'true'}, {NAME => 'edu', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS
ION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLO
CKCACHE => 'true'}, {NAME => 'work', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION =>
'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACH
E => 'true'}]}
1 row(s) in 1.8590 seconds
--加入列族
hbase(main):014:0> disable 'resume'
0 row(s) in 4.2630 seconds
hbase(main):015:0> alter 'resume',name='f1'
0 row(s) in 4.6990 seconds
--删除列族
hbase(main):017:0> alter 'resume',{NAME=>'f1',METHOD=>'delete'}
0 row(s) in 1.1390 seconds
--或是
hbase(main):021:0> alter 'resume','delete' => 'f1'
0 row(s) in 1.9310 seconds
hbase(main):022:0> enable 'resume'
0 row(s) in 5.9060 seconds
注意:
(1) ddl命令是区分大写和小写的。像ddl中的alter,create, drop, enable等都必需用小写。
而{}中的属性名都必需用大写。
(2) alter、drop表之前必需在先禁用(disabel)表,改动完后再启用表(enable)表,否则会报错
--查询禁用状态
hbase(main):024:0> is_disabled 'resume'
false
0 row(s) in 0.4930 seconds
hbase(main):021:0> is_enabled 'resume'
true
0 row(s) in 0.2450 seconds
--删除表
hbase(main):015:0> create 't1','f1'
0 row(s) in 15.3730 seconds
hbase(main):016:0> disable 't1'
0 row(s) in 6.4840 seconds
hbase(main):017:0> drop 't1'
0 row(s) in 7.3730 seconds
--查询表是否存在
hbase(main):018:0> exists 'resume'
Table resume does exist
0 row(s) in 2.3900 seconds
hbase(main):019:0> exists 't1'
Table t1 does not exist
0 row(s) in 1.3270 seconds
--插入数据
put 'resume','lichangzai','binfo:age','1980-1-1'
put 'resume','lichangzai','binfo:sex','man'
put 'resume','lichangzai','edu:mschool','rq no.1'
put 'resume','lichangzai','edu:university','qhddx'
put 'resume','lichangzai','work:company1','12580'
put 'resume','lichangzai','work:company2','china mobile'
put 'resume','lichangzai','binfo:site','blog.csdn.net/lichangzai'
put 'resume','lichangzai','binfo:mobile','13712345678'
put 'resume','changfei','binfo:age','1986-2-1'
put 'resume','changfei','edu:university','bjdx'
put 'resume','changfei','work:company1','LG'
put 'resume','changfei','binfo:mobile','13598765401'
put 'resume','changfei','binfo:site','hi.baidu/lichangzai'
--获取一行键的全部数据
hbase(main):014:0> get 'resume','lichangzai'
COLUMN CELL
binfo:age timestamp=1356485720612, value=1980-1-1
binfo:mobile timestamp=1356485865523, value=13712345678
binfo:sex timestamp=1356485733603, value=man
binfo:site timestamp=1356485859806, value=blog.csdn.net/lichangzai
edu:mschool timestamp=1356485750361, value=rq no.1
edu:university timestamp=1356485764211, value=qhddx
work:company1 timestamp=1356485837743, value=12580
work:company2 timestamp=1356485849365, value=china mobile
8 row(s) in 2.1090 seconds
注意:必须通过行键Row Key来查询数据
--获取一个行键。一个列族的全部数据
hbase(main):015:0> get 'resume','lichangzai','binfo'
COLUMN CELL
binfo:age timestamp=1356485720612, value=1980-1-1
binfo:mobile timestamp=1356485865523, value=13712345678
binfo:sex timestamp=1356485733603, value=man
binfo:site timestamp=1356485859806, value=blog.csdn.net/lichangzai
4 row(s) in 1.6010 seconds
--获取一个行键。一个列族中一个列的全部数据
hbase(main):017:0> get 'resume','lichangzai','binfo:sex'
COLUMN CELL
binfo:sex timestamp=1356485733603, value=man
1 row(s) in 0.8980 seconds
--更新一条记录
hbase(main):018:0> put 'resume','lichangzai','binfo:mobile','13899999999'
0 row(s) in 1.7640 seconds
hbase(main):019:0> get 'resume','lichangzai','binfo:mobile'
COLUMN CELL
binfo:mobile timestamp=1356486691591, value=13899999999
1 row(s) in 1.5710 seconds
注意:更新实质就是插入一条带有时间戳的记录,get查询时仅仅显示最新时间的记录
--通过timestamp来获取数据
------查询最新的时间戳的数据
hbase(main):020:0> get 'resume','lichangzai',{COLUMN=>'binfo:mobile',TIMESTAMP=>1356486691591}
COLUMN CELL
binfo:mobile timestamp=1356486691591, value=13899999999
1 row(s) in 0.4060 seconds
------查之前(即删除)时间戳的数据
hbase(main):021:0> get 'resume','lichangzai',{COLUMN=>'binfo:mobile',TIMESTAMP=>1356485865523}
COLUMN CELL
binfo:mobile timestamp=1356485865523, value=13712345678
1 row(s) in 0.7780 seconds
--全表扫描
hbase(main):022:0> scan 'resume'
ROW COLUMN+CELL
changfei column=binfo:age, timestamp=1356485874056, value=1986-2-1
changfei column=binfo:mobile, timestamp=1356485897477, value=13598765401
changfei column=binfo:site, timestamp=1356485906106, value=hi.baidu/lichangzai
changfei column=edu:university, timestamp=1356485880977, value=bjdx
changfei column=work:company1, timestamp=1356485888939, value=LG
lichangzai column=binfo:age, timestamp=1356485720612, value=1980-1-1
lichangzai column=binfo:mobile, timestamp=1356486691591, value=13899999999
lichangzai column=binfo:sex, timestamp=1356485733603, value=man
lichangzai column=binfo:site, timestamp=1356485859806, value=blog.csdn.net/lichangzai
lichangzai column=edu:mschool, timestamp=1356485750361, value=rq no.1
lichangzai column=edu:university, timestamp=1356485764211, value=qhddx
lichangzai column=work:company1, timestamp=1356485837743, value=12580
lichangzai column=work:company2, timestamp=1356485849365, value=china mobile
2 row(s) in 3.6300 seconds
--删除指定行键的列族字段
hbase(main):023:0> put 'resume','changfei','binfo:sex','man'
0 row(s) in 1.2630 seconds
hbase(main):024:0> delete 'resume','changfei','binfo:sex'
0 row(s) in 0.5890 seconds
hbase(main):026:0> get 'resume','changfei','binfo:sex'
COLUMN CELL
0 row(s) in 0.5560 seconds
--删除整行
hbase(main):028:0> create 't1','f1','f2'
0 row(s) in 8.3950 seconds
hbase(main):029:0> put 't1','a','f1:col1','xxxxx'
0 row(s) in 2.6790 seconds
hbase(main):030:0> put 't1','a','f1:col2','xyxyx'
0 row(s) in 0.5130 seconds
hbase(main):031:0> put 't1','b','f2:cl1','ppppp'
0 row(s) in 1.2620 seconds
hbase(main):032:0> deleteall 't1','a'
0 row(s) in 1.2030 seconds
hbase(main):033:0> get 't1','a'
COLUMN CELL
0 row(s) in 0.8980 seconds
--查询表中有多少行
hbase(main):035:0> count 'resume'
2 row(s) in 2.8150 seconds
hbase(main):036:0> count 't1'
1 row(s) in 0.9500 seconds
--清空表
hbase(main):034:0> truncate 't1'
Truncating 't1' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 4.1070 seconds
注意:Truncate表的处理过程:因为Hadoop的HDFS文件系统不同意直接改动,所以仅仅能先删除表在又一次创建已达到清空表的目的