HBase-0.90.4集群安装配置

最新推荐文章于 2024-05-02 15:37:23 发布

礼彬fly

最新推荐文章于 2024-05-02 15:37:23 发布

阅读量944

点赞数

分类专栏： Bigdatda-HBase 文章标签： HBase-0.90.4集群安装配置

Bigdatda-HBase 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

HBase是Hadoop数据库，能够实现随机、实时读写你的Big Data，它是Google的Bigtable的开源实现，可以参考Bigtable的论文Bigtable: A Distributed Storage System for Structured。HBase的存储模型可以如下三个词来概括：distributed, versioned, column-oriented。HBase并非只能在HDFS文件系统上使用，你可以应用在你的本地文件系统上部署HBase实例来存储数据。

准备工作

hbase-0.90.4.tar.gz [http://labs.renren.com/apache-mirror//hbase/stable/hbase-0.90.4.tar.gz]
zookeeper-3.3.4.tar.gz

下面介绍Standalone和Distributed安装过程。

Standalone模式

这种安装模式，是在你的本地文件系统上安装配置一个HBase实例，安装配置比较简单。

首先，要保证你的本地系统能够通过ssh无密码访问，配置如下：

[plain]view plaincopy 
   
 ssh-keygen -t dsa  
 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys  

检查一下权限：你的~/.ssh目录的权限是否为755，~/.ssh/authorized_keys的权限是否为644，如果不是，执行下面的命令行：

[plain]view plaincopy 
   
 chmod 755 ~/.ssh  
 chmod 644 ~/.ssh/authorized_keys  

然后，安装配置HBase，过程如下：

[plain]view plaincopy 
   
 cd /home/shirdrn/hadoop  
 tar -xvzf hbase-0.90.4.tar.gz  
 cd hbase-0.90.4  

修改conf/hbase-env.sh中JAVA_HOME配置，指定为你的JAVA_HOME目录：

[plain]view plaincopy 
   
 export JAVA_HOME=/usr/java/jdk1.6.0_16  

其他配置，如HBASE*指定配置项，如果需要可以进行配置。

修改hbase-site.xml中配置，示例如下：

[html]view plaincopy 
   
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
   
 <configuration>  
   <property>  
     <name>hbase.rootdir</name>  
     <value>file:///home/shirdrn/hadoop/hbase-0.90.4/data</value>  
   </property>  
 </configuration>  

指定HBase的数据存储目录，使用的是本地文件系统的目录。

接着，就可以启动HBase实例，提供本地存储服务：

[plain]view plaincopy 
   
 bin/start-hbase.sh  

启动完成以后，你可以跟踪一下HBase日志，看看是否启动成功了：

[plain]view plaincopy 
   
 tail -500f logs/hbase-shirdrn-master-localhost.log  

或者查看一下HMaster进程是否存在：

[plain]view plaincopy 
   
 ps -ef | grep HMaster  

通过日志可以看出，HBase实例启动了所有的HBase和Zookeeper守护进程，并且这些进程都是在同一个JVM中。下面，可以启动HBase shell，来简单测试HBase的数据存储的基本命令：

[plain]view plaincopy 
   
 cd bin  
 hbase shell  
 hbase(main):001:0> help  
 hbase(main):002:0> status  
 hbase(main):003:0> version  
 // 创建表'pagedb'，列簇（Column Family）为metadata、text、status  
 hbase(main):004:0> create 'pagedb', 'metadata', 'text', 'status'  
 // 插入数据  
 hbase(main):005:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata:site', 'www.mafengwo.cn'  
 hbase(main):006:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata:pubdate', '2011-12-20 22:09'  
 hbase(main):007:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'text:title', '南国之境'  
 hbase(main):008:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'text:content', '如果海會說话， 如果風愛上砂 我會聆聽浪花，...'  
 hbase(main):009:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:extracted', '0'  
 hbase(main):010:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:httpcode', '200'  
 hbase(main):011:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:indexed', '1'  
 // 扫描表'pagedb'  
 hbase(main):012:0> scan 'pagedb'  
 // 获取记录'http://www.mafengwo.cn/i/764197.html'的所有列的数据  
 hbase(main):013:0> get 'pagedb', 'http://www.mafengwo.cn/i/764197.html'  
 // 获取记录'http://www.mafengwo.cn/i/764197.html'的metadata列簇数据  
 hbase(main):014:0> get 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata'  
 // 获取记录'http://www.mafengwo.cn/i/764197.html'的列metadata:site数据  
 hbase(main):015:0> get 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata:site'  
 // 增加一个列status:state，并指定值为4  
 hbase(main):016:0> incr 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:state', 4  
 // 修改status:httpcode的值为500  
 hbase(main):017:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:httpcode', '500'  
 // 统计表'pagedb'中的记录行数  
 hbase(main):018:0> count 'pagedb'  
 // disable表'pagedb'  
 hbase(main):019:0> disable 'pagedb'  
 // enable表pagedb  
 hbase(main):020:0> enable 'pagedb'  
 // 清空表'pagedb'  
 hbase(main):021:0> truncate 'pagedb'  
 // 列出所有表  
 hbase(main):022:0> list  
 // 删除'http://www.mafengwo.cn/i/764197.html'数据行  
 hbase(main):023:0> deleteall 'pagedb','http://www.mafengwo.cn/i/764197.html'  
 // 删除表'pagedb'，删除之前必须先disable表  
 hbase(main):024:0> drop 'pagedb'  

如果想练习使用其他更多命令，可以通过help查看其他命令。

Distributed模式

基于分布式模式安装HBase，首先它是在安装在HDFS集群之上，所以，首先要做的就是能够正确配置分布式模式的HDFS集群：保证Nanemode和Datanode进程都正确启动。HBase是一个分布式NoSQL数据库，建立于HDFS之上，并且对于集群模式的HBase需要对各个结点之间的数据进行协调（Coordination），所以HBase直接将ZooKeeper作为一个分布式协调系统来实现HBase数据复制（Replication）存储。有关ZooKeeper的介绍可以参考官方文档：http://zookeeper.apache.org。

HBase的基于主从架构模式：HBase集群中存在一个Hbase Master Server，类似于HDFS中的Namenode的角色；而作为从结点的Region Server，类似于HDFS中的Datanode。

对于HBase分布式模式的安装，又基于Zookeeper的是否被HBase管理，分为两种模式：

基于HBase管理的Zookeeper集群，启动和关闭HBase集群，同时也控制Zookeeper集群
外部Zookeeper集群：一个完全独立于HBase的ZooKeeper集群，不受HBase管理控制（启动与停止ZooKeeper集群）

下面，我们基于一个单独安装的ZooKeeper集群，不基于HBase管理，进行安装。根据官网文档，很容易就能安装配置好，并尝试使用。

1、安装配置HDFS集群

启动HDFS集群实例，一台master作为Namenode结点，其余3台slaves作为Datanode结点。

其中，master服务端口为9000。

2、创建HBase存储目录

[plain]view plaincopy 
    
 #创建目录hdfs://master:9000/hbase  
 hadoop fs -mkdir /hbase  
 #验证/hbase目录创建成功  
 hadoop fs -lsr /  

3、配置HBase

（1）解压缩HBase软件包，修改系统环境变量，在~/.bashrc中最后面加入如下配置：

[plain]view plaincopy 
    
 export JAVA_HOME=/home/hadoop/installation/jdk1.6.0_30  
 export HADOOP_HOME=/home/hadoop/installation/hadoop-0.22.0  
 export HBASE_HEAPSIZE=128  
 export HBASE_MANAGES_ZK=false  

使配置生效：

[plain]view plaincopy 
    
 . ~/.bashrc  

（2）修改hbase-0.90.4/conf/hbase-env.sh脚本内容：

首先要重命名hbase-0.90.4目录下的一个目录：

[plain]view plaincopy 
     
 hadoop@master:~/installation/hbase-0.90.4$ mv hbase-webapps/ webapps  

默认会查找webapps目录。然后修改脚本，内容如下：

[plain]view plaincopy 
    
 export JAVA_HOME=/home/hadoop/installation/jdk1.6.0_30  
 export HADOOP_HOME=/home/hadoop/installation/hadoop-0.22.0  
 export HBASE_HEAPSIZE=128  
 export HBASE_MANAGES_ZK=false  
 export HBASE_CLASSPATH=$HBASE_HOME/  

最后一个表示使用外部Zookeeper集群，而不让HBase集群去管理。

（3）修改conf/hbase-site.xml文件内容，如下所示：

[html]view plaincopy 
    
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
   
 <configuration>  
         <property>  
                 <name>hbase.rootdir</name>  
                 <value>hdfs://master:9000/hbase</value>  
                 <description>The directory shared by RegionServers.</description>  
         </property>  
         <property>  
                 <name>hbase.cluster.distributed</name>  
                 <value>true</value>  
                 <description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)</description>  
         </property>  
         <property>  
                 <name>hbase.zookeeper.property.dataDir</name>  
                 <value>/home/hadoop/storage/zookeeper</value>  
                 <description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored.</description>  
         </property>  
         <property>  
                 <name>hbase.zookeeper.quorum</name>  
                 <value>slave-01,slave-02,slave-03</value>  
                 <description>The directory shared by RegionServers.</description>  
         </property>  
 </configuration>  

上面配置中：

hbase.rootdir 指定了HBase存储的根目录是在HDFS的hdfs://master:9000/hbase目录下，该目录要被HBase集群中Region Server共享。不要忘记了，在启动HBase集群之前，在HDFS集群中创建/hbase目录，在master上执行命令hadoop fs -mkdir /hbase即可。
hbase.cluster.distributed 指定了我们使用完全分布的模式进行安装
hbase.zookeeper.property.dataDir 指定了HBase集群使用的ZooKeeper集群的存储目录
hbase.zookeeper.quorum指定了用于协调HBase集群的ZooKeeper集群结点，必须配置奇数个结点，否则HBase集群启动会失败

所以，在启动HBase集群之前，首先要保证ZooKeeper集群已经成功启动。

（4）接下来，检查HBase的lib中的Hadoop的版本是否之前我们启动的HDFS集群使用的版本一致：

[plain]view plaincopy 
     
 rm ~/installation/hbase-0.90.4/lib/hadoop-core-0.20-append-r1056497.jar  
 cp ~/installation/hadoop-0.22.0/*.jar ~/installation/hbase-0.90.4/lib/  

我直接将HBase解压缩包中的hadoop的jar文件删除，用当前使用版本的Hadoop的jar文件。这一步很重要，如果不细看官方文档，你可能会感觉很怪异，实际HBase软件包中lib下的Hadoop的版本默认是0.20的，如果你启动的HDFS使用的是0.22，则HBase启动会报版本不一致的错误。

（5）修改conf/regionservers文件，配置HBase集群中的从结点Region Server，如下所示：

[plain]view plaincopy 
     
 slave-01  
 slave-02  
 slave-03  

一行一个主机字符串，上面使用是从结点主机的域名。上面配置，与HDFS的从结点的配置非常类似。

（6）经过上面几个骤，基本已经在一台机器上（master）配置好HBase了，这时，需要将上述的全部环境变量配置，也在各个从结点上进行配置，然后将配置好的HBase安装文件拷贝分发到各个从结点上：

[plain]view plaincopy 
     
 scp -r ~/installation/hbase-0.90.4 hadoop@slave-01:/home/hadoop/installation  
 scp -r ~/installation/hbase-0.90.4 hadoop@slave-02:/home/hadoop/installation  
 scp -r ~/installation/hbase-0.90.4 hadoop@slave-03:/home/hadoop/installation  

4、配置Zookeeper集群

具体安装、配置和启动，详见文章 http://blog.csdn.net/shirdrn/article/details/7183503 的说明。

在开始启动HBase集群之前，要先启动Zookeeper集群，保证其运行正常。

5、启动HBase集群

启动HBase集群了，执行如下脚本：

[plain]view plaincopy 
     
 start-hbase.sh  

你可以使用jps查看一下，当前master上启动的全部进程，如下所示：

[plain]view plaincopy 
     
 hadoop@master:~/installation/hbase-0.90.4$ jps  
 15899 SecondaryNameNode  
 15553 NameNode  
 21677 Jps  
 21537 HMaster  

其中，HMaster进程就是HBase集群的主结点服务进程。

slaves结点上启动的进程，以slave-03为例：

[plain]view plaincopy 
      
 hadoop@slave-03:~/installation/hbase-0.90.4$ jps  
 6919 HRegionServer  
 4212 QuorumPeerMain  
 7053 Jps  
 3483 DataNode  

上面，HReginServer是HBase集群的从结点服务进程，QuorumPeerMain是ZooKeeper集群的结点服务进程。

或者，查看日志，是否出现启动异常：

[plain]view plaincopy 
     
 master上  ：  tail -500f $HBASE_HOME/logs/hbase-hadoop-master-master.log  
 slave-01上：  tail -500f $HBASE_HOME/logs/hbase-hadoop-zookeeper-slave-01.log  
 slave-02上：  tail -500f $HBASE_HOME/logs/hbase-hadoop-zookeeper-slave-02.log  
 slave-03上：  tail -500f $HBASE_HOME/logs/hbase-hadoop-zookeeper-slave-03.log  

6、验证HBase安装
启动HBase shell，如果能够显示如下信息则说明HBase集群启动成功：

[plain]view plaincopy 
     
 hadoop@master:~/installation/hbase-0.90.4$ hbase shell  
 12/01/09 01:14:09 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available  
 12/01/09 01:14:09 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available  
 12/01/09 01:14:09 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available  
 HBase Shell; enter 'help<RETURN>' for list of supported commands.  
 Type "exit<RETURN>" to leave the HBase Shell  
 Version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011  
   
   
 hbase(main):001:0> help  
 HBase Shell, version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011  
 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.  
 Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.  
   
   
 COMMAND GROUPS:  
   Group name: general  
   Commands: status, version  
   
   
   Group name: ddl  
   Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list  
   
   
   Group name: dml  
   Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate  
   
   
   Group name: tools  
   Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump  
   
   
   Group name: replication  
   Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication  
   
   
 SHELL USAGE:  
 Quote all names in HBase Shell such as table and column names.  Commas delimit  
 command parameters.  Type <RETURN> after entering a command to run it.  
 Dictionaries of configuration used in the creation and alteration of tables are  
 Ruby Hashes. They look like this:  
   
   
   {'key1' => 'value1', 'key2' => 'value2', ...}  
   
   
 and are opened and closed with curley-braces.  Key/values are delimited by the  
 '=>' character combination.  Usually keys are predefined constants such as  
 NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type  
 'Object.constants' to see a (messy) list of all constants in the environment.  
   
   
 If you are using binary keys or values and need to enter them in the shell, use  
 double-quote'd hexadecimal representation. For example:  
   
   
   hbase> get 't1', "key\x03\x3f\xcd"  
   hbase> get 't1', "key\003\023\011"  
   hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"  
   
   
 The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.  
 For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html  
 hbase(main):002:0> status  
 3 servers, 0 dead, 0.0000 average load  
   
   
 hbase(main):003:0> version  
 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011  
   
   
 hbase(main):004:0>   

你可以按照前面使用本地文件系统安装过程中，使用的命令来进行相关的操作。

总结说明

1、出现版本不一致错误

如果启动时出现版本不一致的错误，如下所示：

[plain]view plaincopy 
   
 2012-01-06 21:27:18,384 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.  
 org.apache.hadoop.ipc.RemoteException: Server IPC version 5 cannot communicate with client version 3  
         at org.apache.hadoop.ipc.Client.call(Client.java:740)  
         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)  
         at $Proxy5.getProtocolVersion(Unknown Source)  
         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)  
         at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)  
         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)  
         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)  
         at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)  
         at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)  
         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)  
         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)  
         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)  
         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)  
         at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:364)  
         at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:81)  
         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346)  
         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)  
 2012-01-02 21:27:18,384 INFO org.apache.hadoop.hbase.master.HMaster: Aborting  

这就是说明Hadoop和HBase版本不匹配，仔细阅读文档，你会在 http://hbase.apache.org/book.html#hadoop 发现，解释如下所示：

[plain]view plaincopy 
   
 Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY   
   
 for use in standalone mode. In   
 distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib   
 directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster.   
 Hadoop version mismatch issues have various manifestations but often all looks like its hung up.  

将HBase解压缩包中lib的Hadoop Core jar文件替换为当前你所使用的Hadoop版本即可。

2、HBase集群启动以后，执行相关操作时抛出异常

如果HBase集群正常启动，但是在想要创建一个table的时候，出现如下异常，如下所示：

[plain]view plaincopy 
   
 ERROR: org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (10000ms)  
         at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:334)  
         at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:769)  
         at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:743)  
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)  
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)  
         at java.lang.reflect.Method.invoke(Method.java:597)  
         at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)  
         at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)  

解决方法就是，修改/etc/hosts文件，修改内容以master为例，如下所示：

[plain]view plaincopy 
   
 #127.0.0.1       localhost  
 192.168.0.180   master  
 192.168.0.191   slave-01  
 192.168.0.190   slave-02  
 192.168.0.189   slave-03  
 # The following lines are desirable for IPv6 capable hosts  
 #::1     ip6-localhost ip6-loopback  
 #fe00::0 ip6-localnet  
 #ff00::0 ip6-mcastprefix  
 #ff02::1 ip6-allnodes  
 #ff02::2 ip6-allrouters