【关键key】配置
1 原理:hbase数据存储架构与存储流程(读写数据的流程)
2 dml(数据检索三种方式), ddl
源码:java
重运维
为什么
1海量存储
2海量查询
应用场景:
电信
电商
游戏
数据采集:
flume
kettle
jdbc,java api
2 HBASE设计概念
https://hbase.apache.org/
Apache HBase™ is the Hadoop database, a distributed,scalable,big data store.
Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relationaldatabase modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
table
column family
column01, column02
每条数据唯一标识符rowkey (主键)
rowkey + column family + column01 + timestamp(相当于版本号) : value => cell
0.98.0 非常重要的一个版本(2014-2-16)
源码:https://github.com/apache/hbase/releases
同其他类似,需要自行编译或者直接使用CDH已编译好(hbase-0.98.6-cdh5.3.6)
在生态圈的位置
HBASE的数据可以快速查询,关键在于表中rowkey的设计
schema: table name & column family name(列簇)
value is stored as byte[] (即没有数据类型,因此在java中需要用到字节数据转换工具bytes)
列式存储,所以当某字段没有值,也不占空间
3 配置安装
原理:client连接zookeeper,查找需要的regionServer,然后再连接到regionServer
文档:https://hbase.apache.org/book.html
http://archive.cloudera.com/cdh5/cdh/5/hbase-0.98.6-cdh5.3.6/
1)启动hadoop
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode
2) 配置hbase
http://archive.cloudera.com/cdh5/cdh/5/hbase-0.98.6-cdh5.3.6/book/quickstart.html#d807e245
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://centos01:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>centos01</value>
</property>
</configuration>
配置jdk
hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_25
是否使用hbase自带的zookeeper
hbase-env.sh
export HBASE_MANAGES_ZK=true
3) 启动hbase
bin/start-hbase.sh
或者
bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver
停止
bin/stop-hbase.sh
注:自行编译的记得替换hadoop的jar包
正常启动后的程序:
NameNode
DataNode
HMaster
HQuorumPeer
HRegionServer
TO-DO
???hdfs中存储的hbase各个目录的含义