HBase - Overview
Limitations of Hadoop
-
顺序批处理数据,简单job也要扫整个dataset
Hadoop can perform only batch processing, data will be accessed only in a sequential manner. That means one has to search the entire dataset even for the simplest of jobs. -
大数据集间的相互依赖处理是顺次的,这点来看,迫切需要一种单位时间随机获取数据的解决方案
A huge dataset when processed results in another huge data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access) -
已有的解决方案有:HBase, Cassandra, couchDB, Dynamo, MongoD
One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access
HBase and HDFS
HDFS | HBase |
---|---|
HDFS is a distributed file system suitable for storing large files. | HBase is a database built on top of the HDFS |
HDFS does not support fast individual record lookups | HBase provides fast lookups for larger tables. |
It provides high latency batch processing; no concept of batch processing. | It provides low latency access to single rows from billions of records (Random access). |
It provides only sequential access of data. | HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups. |
Installing HBase 单机安装HBase
http://www.interior-dsgn.com/apache/hbase/stable/hbase-0.98.24-hadoop2-bin.tar.gz
tar -zxvf hbase-0.98.8-hadoop2-bin.tar.gz
移动到安装目录
Configuring HBase Pseudo-Distributed Mode
hbase-env.sh
设置JAVA_HOME如
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/home
.
hbase-site.xml
在hbase的家目录中建立一个文件夹data/zk
<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
//Here you have to set the path where you want HBase to store its built in zookeeper files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper/data/zk</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
Start HBase
先启动hadoop,然后启动Hbase
bin/hbase-daemon.sh start zookeeper
bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver
查询进程
$ jps
77456 SecondaryNameNode
77363 DataNode
77651 NodeManager
77290 NameNode
77573 ResourceManager
78039 HRegionServer
77891 HQuorumPeer
Web UI
http://localhost:60010/
此时会在hdfs中建立默认文件
http://localhost:50070/explorer.html#/hbase
$ hadoop dfs -ls /hbase
Found 7 items
drwxr-xr-x - didi supergroup 0 2018-03-28 20:06 /hbase/.tmp
drwxr-xr-x - didi supergroup 0 2018-03-28 20:07 /hbase/WALs
drwxr-xr-x - didi supergroup 0 2018-03-28 20:07 /hbase/corrupt
drwxr-xr-x - didi supergroup 0 2018-03-19 23:50 /hbase/data
-rw-r--r-- 1 didi supergroup 42 2018-03-19 23:50 /hbase/hbase.id
-rw-r--r-- 1 didi supergroup 7 2018-03-19 23:50 /hbase/hbase.version
drwxr-xr-x - didi supergroup 0 2018-03-28 20:17 /hbase/oldWALs
Setting Java Environment
通过java libraries和Hbase通信,避免“class not found”异常,需要设置habase的lib到classPath ~/.bashrc添加
export CLASSPATH = $CLASSPATH://home/hadoop/hbase/lib/*
HBASE - SHELL
General Commands
- status - 状态, 如servers数.
- version - version.
- table_help - table相关命令帮助.
- whoami - 使用者的信息.
Data Definition Language
表操作命令
- create - 建表.
- list - 列举表.
- disable - 失能表
- is_disabled - 验证表是否失能
- enable - 失能表
- is_enabled - Verifies whether a table is enabled.
- describe - table 描述
- alter - Alters a table.
- exists - 是否存在.
- drop - drop table.
- drop_all - 正则匹配删除表.
- Java Admin API - 提供JAVA API 实现DDL操作,主要在package
org.apache.hadoop.hbase.client
中HBaseAdmin和HTableDescriptor两个主要类
Data Manipulation Language
- put - 向table中的特定row的特定column中put 一条cell值
- get - 获取一行row 或者一个 cell.
- delete - Deletes a cell value in a table.
- deleteall - Deletes all the cells in a given row.
- scan - Scans and returns the table data.
- count - Counts and returns the number of rows in a table.
- truncate - Disables, drops, and recreates a specified table.
- Java client API -
org.apache.hadoop.hbase.client
中HTable 、Put、Get主要类提供DML CRUD CreateRetrieveUpdateDelete 操作
启动shell
./bin/hbase shell
/hbase-0.98.24-hadoop2$ bin/hbase shell
2018-03-28 20:48:09,311 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.24-hadoop2, r9c13a1c3d8cf999014f30104d1aa9d79e74ca3d6, Thu Dec 22 02:36:05 UTC 2016
hbase(main):001:0>
hbase(main):002:0> list
TABLE
hbase(main):003:0> status
1 active master, 0 backup masters, 1 servers, 1 dead, 2.0000 average load
hbase(main):004:0> version
0.98.24-hadoop2, r9c13a1c3d8cf999014f30104d1aa9d79e74ca3d6, Thu Dec 22 02:36:05 UTC 2016
退出exit 或者 ctr+c