【HBase】01-HBase安装

HBase依赖JDK,安装HBase前确定主机是否已经安装了该版本对应的JDK版本。

HBase VersionJDK 7JDK 8JDK 9JDK 10

2.0

Not Supported

yes

Not Supported

Not Supported

1.3

yes

yes

Not Supported

Not Supported

1.2

yes

yes

Not Supported

Not Supported

并且在集群中的每一台 主机都需要设置JAVA_HOME,hbase-env.sh文件可以设置。

HBase的安装分为本地模式和分布式模式,分布式模式分为伪分布式和完全分布式安装。

分布式需要安装Hadoop系统。

 HBase-1.2.xHBase-1.3.xHBase-1.5.xHBase-2.0.xHBase-2.1.x

Hadoop-2.4.x

S

S

X

X

X

Hadoop-2.5.x

S

S

X

X

X

Hadoop-2.6.0

X

X

X

X

X

Hadoop-2.6.1+

S

S

X

S

X

Hadoop-2.7.0

X

X

X

X

X

Hadoop-2.7.1+

S

S

S

S

S

Hadoop-2.8.[0-1]

X

X

X

X

X

Hadoop-2.8.2

NT

NT

NT

NT

NT

Hadoop-2.8.3+

NT

NT

NT

S

S

Hadoop-2.9.0

X

X

X

X

X

Hadoop-2.9.1+

NT

NT

NT

NT

NT

Hadoop-3.0.x

X

X

X

X

X

Hadoop-3.1.0

X

X

X

X

X

1、独立模式

下载Hbase安装包然后解压,进入安装目录。

$ tar xzvf hbase-2.1.1-bin.tar.gz
$ cd hbase-2.1.1/

 启动HBase前需要设置系统环境变量JAVA_HOME,也可以编辑conf/hbase-env.sh文件,去掉JAVA_HOME前的注释,替换成自己的JDK安装目录,如:

JAVA_HOME=/usr

 编辑conf/hbase-site.xml文件,它是HBase最主要的配置文件。您需要在本地文件系统上指定HBase和ZooKeeper写入数据的目录,并确认一些风险。默认情况下,在/ttmp下创建一个新目录。许多服务器被配置为在重新启动时删除/tmp的内容,因此您应该将数据存储在其他地方。以下配置将把HBase的数据存储在hbase目录中,在用户testuser的主目录中。复制下方的配置粘贴到新安装的HBase的conf/hbase-site.xml中去。

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/testuser/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/testuser/zookeeper</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
    <description>
      Controls whether HBase will check for stream capabilities (hflush/hsync).

      Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
      with the 'file://' scheme, but be mindful of the NOTE below.

      WARNING: Setting this to false blinds you to potential data loss and
      inconsistent system state in the event of process and/or node failures. If
      HBase is complaining of an inability to use hsync or hflush it's most
      likely not a false positive.
    </description>
  </property>
</configuration>

您不需要创建HBASE数据目录。HBase会为你做这件事。如果创建目录,HBASE将尝试进行迁移,而这不是您想要的。

bin/start-hbase.sh脚本作为启动HBase的便捷方式提供。执行命令,如果一切顺利,则将记录一条消息到标准输出,显示HBase已成功启动。您可以使用jps命令来验证您有一个正在运行的进程名为HMSTER。在独立模式下,HBase运行这个JVM中的所有守护进程,即HMaster、单个HRegionServer和ZooKeeper守护进程。转到http://localhost:16010查看HbaseWeb UI。

下面的操作来演示HBase的基本命令。

  1. Connect to HBase.

    Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a > character.

    $ ./bin/hbase shell
    hbase(main):001:0>
  2. Display HBase Shell Help Text.

    Type help and press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.

  3. Create a table.

    Use the create command to create a new table. You must specify the table name and the ColumnFamily name.

    hbase(main):001:0> create 'test', 'cf'
    0 row(s) in 0.4170 seconds
    
    => Hbase::Table - test
  4. List Information About your Table

    Use the list command to confirm your table exists

    hbase(main):002:0> list 'test'
    TABLE
    test
    1 row(s) in 0.0180 seconds
    
    => ["test"]

    Now use the describe command to see details, including configuration defaults

    hbase(main):003:0> describe 'test'
    Table test is ENABLED
    test
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
    'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
    alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
     => '65536'}
    1 row(s)
    Took 0.9998 seconds
  5. Put data into your table.

    To put data into your table, use the put command.

    hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
    0 row(s) in 0.0850 seconds
    
    hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
    0 row(s) in 0.0110 seconds
    
    hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
    0 row(s) in 0.0100 seconds

    Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.

  6. Scan the table for all data at once.

    One of the ways to get data from HBase is to scan. Use the scan command to scan the table for data. You can limit your scan, but for now, all data is fetched.

    hbase(main):006:0> scan 'test'
    ROW                                      COLUMN+CELL
     row1                                    column=cf:a, timestamp=1421762485768, value=value1
     row2                                    column=cf:b, timestamp=1421762491785, value=value2
     row3                                    column=cf:c, timestamp=1421762496210, value=value3
    3 row(s) in 0.0230 seconds
  7. Get a single row of data.

    To get a single row of data at a time, use the get command.

    hbase(main):007:0> get 'test', 'row1'
    COLUMN                                   CELL
     cf:a                                    timestamp=1421762485768, value=value1
    1 row(s) in 0.0350 seconds
  8. Disable a table.

    If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the disable command. You can re-enable it using the enable command.

    hbase(main):008:0> disable 'test'
    0 row(s) in 1.1820 seconds
    
    hbase(main):009:0> enable 'test'
    0 row(s) in 0.1770 seconds

    Disable the table again if you tested the enable command above:

    hbase(main):010:0> disable 'test'
    0 row(s) in 1.1820 seconds
  9. Drop the table.

    To drop (delete) a table, use the drop command.

    hbase(main):011:0> drop 'test'
    0 row(s) in 0.1370 seconds
  10. Exit the HBase Shell.

    To exit the HBase Shell and disconnect from your cluster, use the quit command. HBase is still running in the background.

Procedure: Stop HBase

  1. In the same way that the bin/start-hbase.sh script is provided to conveniently start all HBase daemons, the bin/stop-hbase.sh script stops them.

    $ ./bin/stop-hbase.sh
    stopping hbase....................
    $
  2. After issuing the command, it can take several minutes for the processes to shut down. Use the jps to be sure that the HMaster and HRegionServer processes are shut down.

2、伪分布式

伪分布式模式意味着HBase仍然完全在单个主机上运行,但是每个HBase守护进程(HMaster、HRegionServer和ZooKeeper)作为一个单独的进程运行:在独立模式下,所有守护进程在一个jvm进程/实例中运行。默认情况下,除非您配置了上面描述的hbase.rootdir属性,否则您的数据仍然存储在/tmp/。在这个过程中,我们将数据存储在HDFS中,假设您有可用的HDFS。您可以跳过HDFS配置来继续将数据存储在本地文件系统中。

此过程假定您在本地系统和/或远程系统上配置了Hadoop和HDFS,并且它们正在运行并且可用。它还假设你使用Hadoop 2。

  1. Stop HBase if it is running.

    If you have just finished quickstart and HBase is still running, stop it. This procedure will create a totally new directory where HBase will store its data, so any databases you created before will be lost.

  2. Configure HBase.

    Edit the hbase-site.xml configuration. First, add the following property which directs HBase to run in distributed mode, with one JVM instance per daemon.

    <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
    </property>

    Next, change the hbase.rootdir from the local filesystem to the address of your HDFS instance, using the hdfs: URI syntax. In this example, HDFS is running on the localhost at port 8020. Be sure to either remove the entry for hbase.unsafe.stream.capability.enforce or set it to true.

    <property>
      <name>hbase.rootdir</name>
      <value>hdfs://localhost:8020/hbase</value>
    </property>

    You do not need to create the directory in HDFS. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.

  3. Start HBase.

    Use the bin/start-hbase.sh command to start HBase. If your system is configured correctly, the jps command should show the HMaster and HRegionServer processes running.

  4. Check the HBase directory in HDFS.

    If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in /hbase/ on HDFS. You can use the hadoop fs command in Hadoop’s bin/ directory to list this directory.

    $ ./bin/hadoop fs -ls /hbase
    Found 7 items
    drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/.tmp
    drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/WALs
    drwxr-xr-x   - hbase users          0 2014-06-25 18:48 /hbase/corrupt
    drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/data
    -rw-r--r--   3 hbase users         42 2014-06-25 18:41 /hbase/hbase.id
    -rw-r--r--   3 hbase users          7 2014-06-25 18:41 /hbase/hbase.version
    drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/oldWALs
  5. Create a table and populate it with data.

    You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as in shell exercises.

  6. Start and stop a backup HBase Master (HMaster) server.

     Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production. This step is offered for testing and learning purposes only.

    The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, use the local-master-backup.sh. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses two ports (16000 and 16010 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16002 and 16012. The following command starts 3 backup servers using ports 16002/16012, 16003/16013, and 16005/16015.

    $ ./bin/local-master-backup.sh start 2 3 5

    To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like /tmp/hbase-USER-X-master.pid. The only contents of the file is the PID. You can use the kill -9 command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running:

    $ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
  7. Start and stop additional RegionServers

    The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode. The local-regionservers.sh command allows you to run multiple RegionServers. It works in a similar way to the local-master-backup.sh command, in that each parameter you provide represents the port offset for an instance. Each RegionServer requires two ports, and the default ports are 16020 and 16030. Since HBase version 1.1.0, HMaster doesn’t use region server ports, this leaves 10 ports (16020 to 16029 and 16030 to 16039) to be used for RegionServers. For supporting additional RegionServers, set environment variables HBASE_RS_BASE_PORT and HBASE_RS_INFO_BASE_PORT to appropriate values before running script local-regionservers.sh. e.g. With values 16200 and 16300 for base ports, 99 additional RegionServers can be supported, on a server. The following command starts four additional RegionServers, running on sequential ports starting at 16022/16032 (base ports 16020/16030 plus 2).

    $ .bin/local-regionservers.sh start 2 3 4 5

    To stop a RegionServer manually, use the local-regionservers.sh command with the stop parameter and the offset of the server to stop.

    $ .bin/local-regionservers.sh stop 3
  8. Stop HBase.

    You can stop HBase the same way as in the quickstart procedure, using the bin/stop-hbase.sh command.

3、完全分布式

In reality, you need a fully-distributed configuration to fully test HBase and to use it in real-world scenarios. In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase daemon. These include primary and backup Master instances, multiple ZooKeeper nodes, and multiple RegionServer nodes.

This advanced quickstart adds two more nodes to your cluster. The architecture will be as follows:

Table 1. Distributed Cluster Demo Architecture
Node NameMasterZooKeeperRegionServer

node-a.example.com

yes

yes

no

node-b.example.com

backup

yes

yes

node-c.example.com

no

yes

yes

This quickstart assumes that each node is a virtual machine and that they are all on the same network. It builds upon the previous quickstart, Pseudo-Distributed Local Install, assuming that the system you configured in that procedure is now node-a. Stop HBase on node-a before continuing.

 Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other. If you see any errors like no route to host, check your firewall.

Procedure: Configure Passwordless SSH Access

node-a needs to be able to log into node-b and node-c (and to itself) in order to start the daemons. The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from node-ato each of the others.

  1. On node-a, generate a key pair.

    While logged in as the user who will run HBase, generate a SSH key pair, using the following command:

    $ ssh-keygen -t rsa

    If the command succeeds, the location of the key pair is printed to standard output. The default name of the public key is id_rsa.pub.

  2. Create the directory that will hold the shared keys on the other nodes.

    On node-b and node-c, log in as the HBase user and create a .ssh/ directory in the user’s home directory, if it does not already exist. If it already exists, be aware that it may already contain other keys.

  3. Copy the public key to the other nodes.

    Securely copy the public key from node-a to each of the nodes, by using the scp or some other secure means. On each of the other nodes, create a new file called .ssh/authorized_keys if it does not already exist, and append the contents of the id_rsa.pub file to the end of it. Note that you also need to do this for node-a itself.

    $ cat id_rsa.pub >> ~/.ssh/authorized_keys
  4. Test password-less login.

    If you performed the procedure correctly, you should not be prompted for a password when you SSH from node-a to either of the other nodes using the same username.

  5. Since node-b will run a backup Master, repeat the procedure above, substituting node-b everywhere you see node-a. Be sure not to overwrite your existing .ssh/authorized_keys files, but concatenate the new key onto the existing file using the >> operator rather than the > operator.

Procedure: Prepare node-a

node-a will run your primary master and ZooKeeper processes, but no RegionServers. Stop the RegionServer from starting on node-a.

  1. Edit conf/regionservers and remove the line which contains localhost. Add lines with the hostnames or IP addresses for node-b and node-c.

    Even if you did want to run a RegionServer on node-a, you should refer to it by the hostname the other servers would use to communicate with it. In this case, that would be node-a.example.com. This enables you to distribute the configuration to each node of your cluster any hostname conflicts. Save the file.

  2. Configure HBase to use node-b as a backup master.

    Create a new file in conf/ called backup-masters, and add a new line to it with the hostname for node-b. In this demonstration, the hostname is node-b.example.com.

  3. Configure ZooKeeper

    In reality, you should carefully consider your ZooKeeper configuration. You can find out more about configuring ZooKeeper in zookeeper section. This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.

    On node-a, edit conf/hbase-site.xml and add the following properties.

    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>node-a.example.com,node-b.example.com,node-c.example.com</value>
    </property>
    <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/usr/local/zookeeper</value>
    </property>
  4. Everywhere in your configuration that you have referred to node-a as localhost, change the reference to point to the hostname that the other nodes will use to refer to node-a. In these examples, the hostname is node-a.example.com.

Procedure: Prepare node-b and node-c

node-b will run a backup master server and a ZooKeeper instance.

  1. Download and unpack HBase.

    Download and unpack HBase to node-b, just as you did for the standalone and pseudo-distributed quickstarts.

  2. Copy the configuration files from node-a to node-b.and node-c.

    Each node of your cluster needs to have the same configuration information. Copy the contents of the conf/directory to the conf/ directory on node-b and node-c.

Procedure: Start and Test Your Cluster

  1. Be sure HBase is not running on any node.

    If you forgot to stop HBase from previous testing, you will have errors. Check to see whether HBase is running on any of your nodes by using the jps command. Look for the processes HMasterHRegionServer, and HQuorumPeer. If they exist, kill them.

  2. Start the cluster.

    On node-a, issue the start-hbase.sh command. Your output will be similar to that below.

    $ bin/start-hbase.sh
    node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
    node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
    node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
    starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
    node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
    node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
    node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out

    ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters.

  3. Verify that the processes are running.

    On each node of the cluster, run the jps command and verify that the correct processes are running on each server. You may see additional Java processes running on your servers as well, if they are used for other purposes.

    node-a jps Output

    $ jps
    20355 Jps
    20071 HQuorumPeer
    20137 HMaster

    node-b jps Output

    $ jps
    15930 HRegionServer
    16194 Jps
    15838 HQuorumPeer
    16010 HMaster

    node-c jps Output

    $ jps
    13901 Jps
    13639 HQuorumPeer
    13737 HRegionServer
     

    ZooKeeper Process Name

    The HQuorumPeer process is a ZooKeeper instance which is controlled and started by HBase. If you use ZooKeeper this way, it is limited to one instance per cluster node and is appropriate for testing only. If ZooKeeper is run outside of HBase, the process is called QuorumPeer. For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, see zookeeper section.

  4. Browse to the Web UI.

     

    Web UI Port Changes

    Web UI Port Changes

    In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from 60010 for the Master and 60030 for each RegionServer to 16010 for the Master and 16030 for the RegionServer.

    If everything is set up correctly, you should be able to connect to the UI for the Master http://node-a.example.com:16010/ or the secondary master at http://node-b.example.com:16010/ using a web browser. If you can connect via localhost but not from another host, check your firewall rules. You can see the web UI for each of the RegionServers at port 16030 of their IP addresses, or by clicking their links in the web UI for the Master.

  5. Test what happens when nodes or services disappear.

    With a three-node cluster you have configured, things will not be very resilient. You can still test the behavior of the primary Master or a RegionServer by killing the associated processes and watching the logs.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值