http://space.itpub.net/?uid-26686207-action-viewspace-itemid-747896
1.安装环境简介
物理笔记本:i5 2.27GHz (4 CPU) 4G内存320GB硬盘32位win7操作系统
虚拟机:Product VMware® Workstation Version 7.0.0 build-203739
虚拟机安装配置URL:http://ideapad.it168.com/thread-2088751-1-1.html不会配置的朋友请见
包括(vm tools工具、linux与windows共享文件配置)
linux虚拟机配置master(h1)slave1(h2)slave2(h4)
CPU:1颗2核
内存:512MB
硬盘:10GB
Linux ISO:CentOS-6.0-i386-bin-DVD.iso 32位
JDK version:"1.6.0_25-ea"
Hadoop software version:hadoop-0.20.205.0.tar.gz
Eclipse version:eclipse-SDK-4.2-linux-gtk.tar.gz and eclipse-SDK-4.2.1-linux-gtk.tar.gz
2.Hbase安装模式
单机模式
伪分布模式:单台主机模拟分布式
完全分布模式:本次是采用完全分布式来部署Hbase列式数据库
3.部署之前的准备
(1)JDK已经安装,一般要求版本为1.6以上,安装步骤请参考
http://f.dataguru.cn/forum.php?mod=viewthread&tid=18315&fromuid=303此贴第4步
(2)Hadoop集群已经安装,安装步骤请参考
http://f.dataguru.cn/forum.php?mod=viewthread&tid=18315&fromuid=303
(4)Hbase非关系型列式数据库版本选择
Hbase与Hadoop要求版本配达,不同的Hbase匹配不同的Hadoop,当匹配错误的时候就会安装失败,那么如何快捷的找到两者匹配关系呢?1是找度娘2是官方文档
本次安装使用的hadoop版本:hadoop-0.20.205.0.tar.gz
本次安装使用的hbase版本:hbase-0.90.5
(5)温馨提示hbase-0.90.5.tar.gz包的大小正常为31662866 bytes
为什么要说这个呢,在下载的时候有时会因为网络原因导致下载的不完全,但不易发现,如果使用了不完全的包来进行安装就会导致失败,而这个失败还不容易被发现,有时会让人欲罢不能。一个小小经验。
(6)验证Hadoop集群是否正常启动
一般有三种验证方式可以参考
http://f.dataguru.cn/forum.php?mod=viewthread&tid=23054&fromuid=303
1.
2.
21465 Jps
15378 SecondaryNameNode
9488
15248 NameNode
15443 JobTracker
3.
我们使用第一种比较方便的方法
[grid@h1 hadoop-0.20.2]$ bin/hadoop dfsadmin -report
Configured Capacity: 19865944064 (18.5 GB)
Present Capacity: 8932794368 (8.32 GB)
DFS Remaining: 8932655104 (8.32 GB)
DFS Used: 139264 (136 KB)
DFS Used%: 0%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.2.103:50010
Decommission Status : Normal
Configured Capacity: 9932972032 (9.25 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 5351727104 (4.98 GB)
DFS Remaining: 4581175296(4.27 GB)
DFS Used%: 0%
DFS Remaining%: 46.12%
Last contact: Sun Oct 28 18:18:08 CST 2012
Name: 192.168.2.105:50010
Decommission Status : Normal
Configured Capacity: 9932972032 (9.25 GB)
DFS Used: 69632 (68 KB)
Non DFS Used: 5581422592 (5.2 GB)
DFS Remaining: 4351479808(4.05 GB)
DFS Used%: 0%
DFS Remaining%: 43.81%
Last contact: Sun Oct 28 18:18:09 CST 2012
(7)检查/etc/hosts文件内容
192.168.2.102
192.168.2.103
192.168.2.105
答:Hadoop一般通过主机名来与集群中的节点进行通信,因此我们要将所有节点的ip与主机名映射关系写入到/etc/hosts中来保证我们的通信正常。如果有的朋友在配置文件中使用了ip后操作不正常请修改成对应的主机名即可。
4.开始Hbase完全分布式安装与配置
(1)把hbase-0.90.5.tar.gz上传到h1:/home/grid/目录下
[grid@h1 grid]$ pwd
/home/grid
[grid@h1 grid]$ ll
总用量30996
-rwxrwxrwx.
-rwxrwxrwx.
drwxr-xr-x. 14 grid hadoop
-rwxrw-rw-.
请注意大小哦:31662866,如果下载的包有缺陷可是安装不成功的哦:)
(2)解包
(3)替换hadoop核心jar包
注意不同的版本替换的jar包不一样哦
我用的hadoop-0.20.205.0.tar.gz版
需要用/home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar
替换/home/grid/hbase-0.90.5/lib/ hadoop-core-0.20-append-r1056497.jar
解决:hadoop与hbase内核版本不兼容问题,因为hbase的lib目录下的hadoop的包比我安装的0.20.2的版本要新,需要用0.20.2的hadoop包替换,这点官网的文档是有说明的,本文使用/home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar包
替换/home/grid/hbase-0.90.5/lib/ hadoop-core-0.20-append-r1056497.jar包
还有用cp ~/hadoop-0.20.2/lib/commons-configuration-1.6.jar ~/hbase-0.90.5/lib/
如果你使用的是cloudera公司的定制版hadoop和hbase那么就免去了替换jar的过程,因为cloudera公司已经把所有的兼容性问题都解决了
首先我先把hadoop-core-0.20-append-r1056497.jar重命名以备后用,在复制过去
mv hadoop-core-0.20-append-r1056497.jar hadoop-core-0.20-append-r1056497.jar.bak
cp /home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar /home/grid/hbase-0.90.5/lib/
修改权限,使其具有执行权限
chmod 755 hadoop-0.20.2-core.jar
(4)编辑/home/grid/hbase-0.90.5/conf/hbase-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.6.0_25
#指定JDK安装目录,让hbase可以默认找到JDK
# Extra Java CLASSPATH elements.
export HBASE_CLASSPATH=/home/grid/hadoop-0.20.2/conf
#指定Hadoop配置目录,引导hbase找到hadoop,因为hbase是基于hadoop的数据库
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
#启动hbase集成的zookeeper工具
# Where log files are stored.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs
#Hbase的log目录默认在$HBASE_HOME家目录下,如果想设置到其他地方可以修改
(5)编辑/home/grid/hbase-0.90.5/conf/hbase-site.xml
注:上面的数据目录建议设置,因为Linux默认会把数据存放在/tmp目录下,当Linux重启后会清空/tmp目录,那时你的数据可就找不回来啦!不要当小白鼠哦!
(6)编辑/home/grid/hbase-0.90.5/conf/regionservers
默认写着localhost修改为
h2
h4
[grid@h1 conf]$ cat regionservers
h2
h4
(7)将修改好的hbase-0.90.5软件目录同步到所有节点,我的集群中h1同步到h2
scp
scp
参数-r
[grid@h2 ~]$ ll
drwxr-xr-x.
[grid@h4 ~]$ ll
drwxr-xr-x.
5.启动/关闭Hbase数据库集群
做完以上的配置我们就可以启动Hbase集群了,在启动之前我们要检查一下Hadoop集群是否已经启动,必须先启动Hadoop在启动Hbase,我想道理大家都应该明白吧!Hadoop是Hbase的宿主。
[grid@h1 hbase-0.90.5]$bin/start-hbase.sh
h2:starting zookeeper, logging to
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-h2.out
h4:starting zookeeper, logging to
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-h4.out
h1:starting zookeeper, logging to
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-h1.out
starting master, logging to
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-master-h1.out
h4:starting regionserver, logging to
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-h4.out
h2:starting regionserver, logging to
/home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-h2.out
H1
[grid@h1 hbase-0.90.5]$ jps
8817HMaster
9149 Jps
4709 JobTracker
4515 NameNode
4650 SecondaryNameNode
8781HQuorumPeer
H2
[grid@h2 ~]$ jps
17188 TaskTracker
31445HRegionServer
31355HQuorumPeer
17077 DataNode
[grid@h4 ~]$ jps
27829 TaskTracker
17119 DataNode
29134HQuorumPeer
29208HRegionServer
浏览器验证
当我们在浏览主页的时候提示:You are currently running the HMaster without HDFS append support enabled. This may result in data loss. Please see theHBase wikifor details.
查看了hdfs-default.xml中,看到如下说明,Hadoop-0.20.2版本有bug不能支持HDFS追加功能,因此只能作罢。如果你用的是其他版本可能就不会有!
关闭Hbase数据库集群
[grid@h1 bin]$ ./stop-hbase.sh
stopping hbase
h2: stopping zookeeper..
h4: stopping zookeeper...
h1: stopping zookeeper..
小结:到此我们安装Hbase数据库已经完美完成,在操作的步骤中注意不同版本覆盖的文件不同,还要注意版本的配达要求,如果你使用的是VM虚拟机来安装的话,当你重启机器的时候可能会遇到节点HMaster、HQuorumPeer、HRegionServer进程不同程度的无法启动现象,先使用/bin/stop-hbase.sh停掉所有集群进程,在使用/bin/start-hbase.sh启动集群即可,必须所有进程全部正常启动后才能操作数据库否则会报错禁止操作。
6.进入Hbase数据库shell命令行操作
(1)进入shell命令行
[grid@h1 hbase-0.90.5]$bin/hbase shell
HBase Shell; enter 'help' for list of supported commands.(help打开帮助命令列表)
Type "exit" to leave the HBase Shell(exit命令退出shell命令行)
Version 0.90.5, r1212209, Fri Dec
hbase(main):001:0>exit
(2)查看数据库状态
hbase(main):001:0>status
2 servers, 0 dead, 1.0000 average load
有2个服务器活着,0台down(也就是失去联系),当前负荷(数字越大,负荷越大)
(3)Shell命令帮助
hbase(main):001:0>help
HBase Shell, version 0.90.5, r1212209, Fri Dec
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
SHELL USAGE:
Quote all names in HBase Shell such as table and column names.
command parameters.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
and are opened and closed with curley-braces.
'=>' character combination.
NAME, VERSIONS, COMPRESSION, etc.
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, seehttp://hbase.apache.org/docs/current/book.html
hbase(main):002:0> hbase(main):001:0> help
SyntaxError: (hbase):2: syntax error, unexpected ':'
(4)查询数据库版本
hbase(main):002:0>version
0.90.5, r1212209, Fri Dec
(5)创建表
注:Hbase除了表没有其他数据库对象,所以命令create即可
我为分组网信令监测系统-热点网站
表名:heat_sites
列族1:msisdn
列族2:user
列族3:sites
语法:create 'heat_sites','msisdn','user','sites'
hbase(main):001:0>create 'heat_sites','msisdn','user','sites'
0 row(s) in 24.3540 seconds
用户业务倾向性分析
表名:user_business
列族1:msisdn
列族2:user
列族3:business
语法:create 'user_business','msisdn','user','business'
hbase(main):002:0>create 'user_business','msisdn','user','business'
0 row(s) in 1.7390 seconds
(6)查看所有表
hbase(main):003:0>list
TABLE
heat_sites
user_business
2 row(s) in 0.3040 seconds
(7)查看表结构
hbase(main):004:0>describe 'heat_sites'
DESCRIPTION
{NAME => 'heat_sites', FAMILIES => [{NAME => 'msisdn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COM true
PRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCK
CACHE => 'true'}, {NAME => 'sites', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE',
NAME => 'user', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TT
L => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.2780 seconds
hbase(main):005:0>describe 'user_business'
DESCRIPTION
{NAME => 'user_business', FAMILIES => [{NAME => 'business', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0' true
, COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}, {NAME => 'msisdn', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => '
NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'tru
e'}, {NAME => 'user', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '
3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0370 seconds
(8)删除表中列族
凡是要修改表的结构hbase规定,必须先禁用表->修改表->启用表
hbase(main):007:0> alter 'user_business',{NAME=>'user',METHOD=>'delete'}
ERROR: Table user_business is enabled. Disable it first before altering.
报错:表是启用状态,在修改之前首先禁用它,注意语法中关键字要区分大小写
hbase(main):008:0>disable 'user_business'
0 row(s) in 2.1850 seconds
hbase(main):009:0>alter 'user_business',{NAME=>'user',METHOD=>'delete'}
0 row(s) in 0.0950 seconds
hbase(main):010:0>enable 'user_business'
(9)删除表
同样对表进行任何的操作都需要先禁用表->修改->启用表,删除同样
hbase(main):011:0>disable 'user_business'
0 row(s) in 2.6690 seconds
hbase(main):013:0>is_disabled 'user_business'
true
0 row(s) in 0.0150 seconds
hbase(main):014:0>drop 'user_business'
0 row(s) in 1.9930 seconds
(10)查询表是否存在
hbase(main):015:0>exists 'heat_sites'
Table heat_sites does exist
0 row(s) in 0.1540 seconds
hbase(main):016:0>exists 'user_business'
Table user_business does not exist
0 row(s) in 0.0650 seconds
(11)判断表是否enable或disable
hbase(main):005:0*is_enabled 'heat_sites'
true
0 row(s) in 0.0550 seconds
hbase(main):006:0>is_disabled 'user_business'
false
0 row(s) in 0.9820 seconds
hbase(main):007:0>is_enabled 'user_business'
true
0 row(s) in 0.0150 seconds
(12)插入记录
A对于hbase来说insert update其实没有什么区别,都是插入原理
B在hbase中没有数据类型概念,都是“字符类型”,至于含义在程序中体现
C每插入一条记录都会自动建立一个时间戳,由系统自动生成。我们也可以手动“强行指定”时间戳,例如同时向n张表插入记录,要求所有记录时间戳一致。
hbase(main):015:0*put'heat_sites','leonarding','msisdn:id','13672122125'
0 row(s) in 0.6950 seconds
hbase(main):016:0>put'heat_sites','leonarding','msisdn:*#06#','100'
0 row(s) in 0.0920 seconds
hbase(main):017:0>put'heat_sites','leonarding','user:name','liusheng'
0 row(s) in 0.1620 seconds
hbase(main):018:0>put'heat_sites','leonarding','user:age','28'
0 row(s) in 0.0410 seconds
hbase(main):019:0>put'heat_sites','leonarding','sites:http','www.dataguru.cn'
0 row(s) in 0.2090 seconds
hbase(main):020:0>put'heat_sites','leonarding','sites:name','lianshuchengjin'
0 row(s) in 0.0460 seconds
hbase(main):021:0>put'heat_sites','sunev_yu','msisdn:id','18866662222'
0 row(s) in 0.0570 seconds
hbase(main):022:0>put'heat_sites','sunev_yu','msisdn:*#06#','101'
0 row(s) in 0.0200 seconds
hbase(main):023:0>put'heat_sites','sunev_yu','user:name','yushuanghai'
0 row(s) in 0.0110 seconds
hbase(main):024:0>put'heat_sites','sunev_yu','user:age','26'
0 row(s) in 0.3310 seconds
hbase(main):025:0>put'heat_sites','sunev_yu','sites:http','www.dataguru.cn'
0 row(s) in 0.0530 seconds
hbase(main):026:0>
hbase(main):027:0*put'heat_sites','sunev_yu','sites:name','lianshuchengjin'
0 row(s) in 0.0790 seconds
hbase(main):028:0>put'heat_sites','tigerfish','msisdn:id','15911112222'
0 row(s) in 0.0350 seconds
hbase(main):029:0>put'heat_sites','tigerfish','msisdn:*#06#','102'
0 row(s) in 0.0360 seconds
hbase(main):001:0>put'heat_sites','tigerfish','user:name','huangzhihong'
0 row(s) in 0.5160 seconds
hbase(main):002:0>put'heat_sites','tigerfish','user:age','100'
0 row(s) in 0.0430 seconds
hbase(main):003:0>put'heat_sites','tigerfish','sites:http','www.itpub.net'
0 row(s) in 0.0150 seconds
hbase(main):004:0>put'heat_sites','tigerfish','sites:name','itpub'
0 row(s) in 0.0460 seconds
(13)获取一个行键的所有数据
注:必须通过行键Row Key来查询数据
hbase(main):001:0>get'heat_sites','leonarding'
COLUMN
msisdn:*#06#
msisdn:id
sites:http
sites:name
user:age
user:name
hbase(main):006:0*get'heat_sites','sunev_yu'
COLUMN
msisdn:*#06#
msisdn:id
sites:http
sites:name
user:age
user:name
hbase(main):005:0>get'heat_sites','tigerfish'
COLUMN
msisdn:*#06#
msisdn:id
sites:http
sites:name
user:age
user:name
(14)获取一个行键,一个列族的所有数据(就是指定列族,又叫列键)
hbase(main):006:0>get'heat_sites','leonarding','sites'
COLUMN
sites:http
sites:name
2 row(s) in 0.0760 seconds
hbase(main):009:0>get'heat_sites','sunev_yu','msisdn'
COLUMN
msisdn:*#06#
msisdn:id
2 row(s) in 0.0370 seconds
hbase(main):010:0>get'heat_sites','tigerfish','user'
COLUMN
user:age
user:name
2 row(s) in 0.0320 seconds
(15)获取一个行键,一个列族中一个列的所有数据
hbase(main):011:0>get'heat_sites','leonarding','user:name'
COLUMN
user:name
1 row(s) in 0.0360 seconds
hbase(main):012:0>get'heat_sites','sunev_yu','msisdn:id'
COLUMN
msisdn:id
1 row(s) in 0.0850 seconds
hbase(main):013:0>get'heat_sites','tigerfish','sites:http'
COLUMN
sites:http
1 row(s) in 0.0110 seconds
(16)更新一条记录
实质:与插入一条记录一样
hbase(main):014:0>put'heat_sites','leonarding','msisdn:id','18977777777'
0 row(s) in 0.0220 seconds
hbase(main):003:0*get'heat_sites','leonarding','msisdn:id'
COLUMN
msisdn:id
1 row(s) in 1.2500 seconds
(17)通过timestamp来获取数据
hbase(main):004:0>get'heat_sites','leonarding',{COLUMN=>'msisdn:id',TIMESTAMP=>1351563680951}
COLUMN
msisdn:id
1 row(s) in 0.0230 seconds
(18)全面扫描
hbase(main):005:0>scan 'heat_sites'
ROW
leonarding
leonarding
leonarding
leonarding
leonarding
leonarding
sunev_yu
sunev_yu
sunev_yu
sunev_yu
sunev_yu
sunev_yu
tigerfish
tigerfish
tigerfish
tigerfish
tigerfish
tigerfish
3 row(s) in 0.4740 seconds
注:上面显示了18行的记录,但大家要明白根据hbase存储结构是以行键RowKey来划分的,因此才会显示3row(s),一个行键表示实际的一行。
(19)删除指定行键中某个列族:列
hbase(main):006:0>create 'user_business','msisdn','user','business'
0 row(s) in 2.8970 seconds
hbase(main):007:0>put'user_business','leonarding','business:type','E-mail'
0 row(s) in 0.1930 seconds
hbase(main):012:0>put'user_business','leonarding','user:name','liusheng'
0 row(s) in 0.1360 seconds
hbase(main):008:0>get'user_business','leonarding'
COLUMN
business:type
user:name
row(s) in 0.1330 seconds
hbase(main):009:0>delete 'user_business','leonarding','business:type'
0 row(s) in 0.0970 seconds
hbase(main):015:0>get'user_business','leonarding'
COLUMN
user:name
1 row(s) in 0.1420 seconds
(20)删除整行
hbase(main):020:0>deleteall 'user_business','leonarding'
0 row(s) in 0.0460 seconds
hbase(main):021:0>get'user_business','leonarding'
COLUMN
0 row(s) in 0.0150 seconds
(21)查询表中有多少行记录
hbase(main):022:0>count 'heat_sites'
3 row(s) in 0.1310 seconds
(22)清空表truncate
hbase(main):023:0>put'user_business','leonarding','user:name','liusheng'
0 row(s) in 0.0960 seconds
hbase(main):024:0>get'user_business','leonarding'
COLUMN
user:name
1 row(s) in 0.0150 seconds
hbase(main):025:0>truncate 'user_business'
Truncating 'user_business' table (it may take a while):
- Disabling table...
- Dropping table...
- Creating table...
0 row(s) in 8.2200 seconds
hbase(main):026:0> get'user_business','leonarding'
COLUMN
0 row(s) in 1.2120 seconds
Truncate表的处理过程:由于Hadoop的HDFS文件系统不允许直接修改,所以只能先删除表在重新创建已达到清空表的目的
小结:本章节详细记录了Hbase数据库集群的管理与应用,对于常用的建表、修改表、删除表、截断表、插入记录、修改记录、删除记录等操作进行了详细说明与操作。重要一点要掌握不能直接修改表的属性应该先禁用->修改->启动,这是Hbase的一个重要特性。