环境:
[hadoop@big-master2 ~]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
## bigdata cluster ##
192.168.41.20 big-master1 #bigdata1 namenode1,zookeeper,resourcemanager
192.168.41.21 big-master2 #bigdata2 namenode2,zookeeper,slave,resourcemanager
192.168.41.22 big-slave01 #bigdata3 datanode1,zookeeper,slave
192.168.41.25 big-slave02 #bigdata4 datanode2,zookeeper,slave
192.168.41.27 big-slave03 #bigdata5 datanode3,zookeeper,slave
- HMaster是Master Server的实现,负责监控集群中的RegionServer实例,同时是所有metadata改变的接口,在集群中,通常运行在NameNode上面。
-
- HMasterInterface暴露的接口,Table(createTable, modifyTable, removeTable, enable, disable),ColumnFamily (addColumn, modifyColumn, removeColumn),Region (move, assign, unassign)
- Master运行的后台线程:LoadBalancer线程,控制region来平衡集群的负载。CatalogJanitor线程,周期性的检查hbase:meta表。
- HRegionServer是RegionServer的实现,服务和管理Regions,集群中RegionServer运行在DataNode
-
- HRegionRegionInterface暴露接口:Data (get, put, delete, next, etc.),Region (splitRegion, compactRegion, etc.)
- RegionServer后台线程:CompactSplitThread,MajorCompactionChecker,MemStoreFlusher,LogRoller
- Regions,代表table,Region有多个Store(列簇),Store有一个Memstore和多个StoreFiles(HFiles),StoreFiles的底层是Block。
存储设计
在Hbase中,表被分割成多个更小的块然后分散的存储在不同的服务器上,这些小块叫做Regions,存放Regions的地方叫做RegionServer。Master进程负责处理不同的RegionServer之间的Region的分发。在Hbase实现中HRegionServer和HRegion类代表RegionServer和Region。HRegionServer除了包含一些HRegions之外,还处理两种类型的文件用于数据存储
- HLog, 预写日志文件,也叫做WAL(write-ahead log)
- HFile 真实的数据存储文件
HLog
- MasterProcWAL:HMaster记录管理操作,比如解决冲突的服务器,表创建和其它DDLs等操作到它的WAL文件中,这个WALs存储在MasterProcWALs目录下,它不像RegionServer的WALs,HMaster的WAL也支持弹性操作,就是如果Master服务器挂了,其它的Master接管的时候继续操作这个文件。
- WAL记录所有的Hbase数据改变,如果一个RegionServer在MemStore进行FLush的时候挂掉了,WAL可以保证数据的改变被应用到。如果写WAL失败了,那么修改数据的完整操作就是失败的。
-
- 通常情况,每个RegionServer只有一个WAL实例。在2.0之前,WAL的实现叫做HLog
- WAL位于/hbase/WALs/目录下
- MultiWAL: 如果每个RegionServer只有一个WAL,由于HDFS必须是连续的,导致必须写WAL连续的,然后出现性能问题。MultiWAL可以让RegionServer同时写多个WAL并行的,通过HDFS底层的多管道,最终提升总的吞吐量,但是不会提升单个Region的吞吐量。
- WAL的配置:
// 启用multiwal <property> <name>hbase.wal.provider</name> <value>multiwal</value> </property>
###############################################
----------- 开始 ---------
###############################################
## 自定义部署 Hbase 分布式定义 ##
big-master1 hmaster1
big-master2 hmaster2
big-slave01 HRegionServer1 ## data
big-slave02 HRegionServer2 ## data
big-slave03 HRegionServer3 ## data
###########################################
下载地址:官方: https://mirror.bit.edu.cn/apache/hbase/
#################################################
作为一个 IT 农,是不是或多或少有些强迫症,比如用软件就用最新的~
---- 因为太懒,所以一些图片不想单独上传,排版,所以,将就将就吧。
HBase 和 JDK 兼容性
HBase Version | JDK 7 | JDK 8 | JDK 9 | JDK 10 |
2.0 | yes | |||
1.3 | yes | yes | ||
1.2 | yes | yes |
从该表可以看出,JDK建议用 JDK7 或者 JDK8。但用 JDK7 时,HBase2.0 不支持。当然也没事,因为大多数企业生产环境,还是 1.x 版本。
HBase 和 Hadoop 兼容性
Hadoop version support matrix
- "S" = supported
- "X" = not supported
- "NT" = Not tested
| HBase-1.2.x | HBase-1.3.x | HBase-1.5.x | HBase-2.0.x | HBase-2.1.x |
Hadoop-2.4.x | S | S | X | X | X |
Hadoop-2.5.x | S | S | X | X | X |
Hadoop-2.6.0 | X | X | X | X | X |
Hadoop-2.6.1+ | S | S | X | S | X |
Hadoop-2.7.0 | X | X | X | X | X |
Hadoop-2.7.1+ | S | S | S | S | S |
Hadoop-2.8.[0-1] | X | X | X | X | X |
Hadoop-2.8.2 | NT | NT | NT | NT | NT |
Hadoop-2.8.3+ | NT | NT | NT | S | S |
Hadoop-2.9.0 | X | X | X | X | X |
Hadoop-2.9.1+ | NT | NT | NT | NT | NT |
Hadoop-3.0.x | X | X | X | X | X |
Hadoop-3.1.0 | X | X | X | X | X |
从该表可以看出,学习 HBase,兼容各个版本的 Hadoop 版本还是2.7.1+ 系列,所以 2.8.x、2.9.x、3.x并不是最好的选择。
Hadoop 和 JDK 兼容性
Version 2.7 and later of Apache Hadoop requires Java 7. It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE.
Earlier versions (2.6 and earlier) support Java 6.
Here are the known JDKs in use or which have been tested:
Version | Status | Reported By |
oracle 1.7.0_15 | Good | Cloudera |
oracle 1.7.0_21 | Good (4) | Hortonworks |
oracle 1.7.0_45 | Good | Pivotal |
openjdk 1.7.0_09-icedtea | Good (5) | Hortonworks |
oracle 1.6.0_16 | Avoid (1) | Cloudera |
oracle 1.6.0_18 | Avoid | Many |
oracle 1.6.0_19 | Avoid | Many |
oracle 1.6.0_20 | Good (2) | LinkedIn, Cloudera |
oracle 1.6.0_21 | Good (2) | Yahoo!, Cloudera |
oracle 1.6.0_24 | Good | Cloudera |
oracle 1.6.0_26 | Good(2) | Hortonworks, Cloudera |
oracle 1.6.0_28 | Good | |
oracle 1.6.0_31 | Good(3, 4) | Cloudera, Hortonworks |
从该表可以看出,Hadoop 版本所依赖 JDK 环境,7 版本已经过测试,8 目前在官网无体现。所以还是选择 JDK7为好,而且是 JDK7 的中间版本,并不是最新版本。
总结
综上,建议安装:
JDK:Java SE Runtime Environment 7u45(当然其它 7版本 系列也可尝试,问题应该不大,下载地址:http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html)
Hadoop:2.7.1+(下载地址:https://archive.apache.org/dist/hadoop/common/)
HBase:1.x 系列(下载地址:http://archive.apache.org/dist/hbase/)
#####################
开始部署: -- 我这里是基于hdfs,zookeeper,时间同步,ssh等效性 等集群已经部署到位。
(一)
下载,解压:
gunzip hbase-2.2.5-bin.tar.gz
tar -xvf hbase-2.2.5-bin.tar -C /usr/local/
cd /usr/local/
mv hbase-2.2.5 hbase
[root@big-master1 ~]# cd /usr/local/hbase/
[root@big-master1 hbase]# ls
bin CHANGES.md conf hbase-webapps LEGAL lib LICENSE.txt logs NOTICE.txt README.txt RELEASENOTES.md
(二)
配置参数:
1, /etc/profile 配置hbase全局变量。
### JDK ###
JAVA_HOME=/usr/local/jdk1.8.0_251
CLASSPATH=$JAVA_HOME/lib/tools.jar$JAVA_HOME/lib/dt.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
### zookeeper ##
export ZK_HOME=/usr/local/zookeeper
export PATH=$ZK_HOME/bin:$PATH
### hadoop ##
export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$PATH:$HADOOP_HOME/sbin:$PATH
## tools ##
export PATH=/home/hadoop/tools:$PATH
## sqoop ##
export SQOOP_HOME=/usr/local/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
## flume ##
export FLUME_HOME=/usr/local/flume
export PATH=$FLUME_HOME/bin:$PATH
## hbase ##
export HBASE_HOME=/usr/local/hbase
export PATH=$HBASE_HOME/bin:$PATH
2, 修改文件$HBASE_HOME/conf/hbase-env.sh,新增修改内容如下:
## JDK 路径
export JAVA_HOME=/usr/local/jdk1.8.0_251
## 关闭hbase 自带的zookeeper
export HBASE_MANAGES_ZK=false
3, 修改文件$HBASE_HOME/conf/regionservers ,新增修改内容如下:
[hadoop@big-master1 conf]$ pwd
/usr/local/hbase/conf
[hadoop@big-master1 conf]$ cat regionservers
big-slave01
big-slave02
big-slave03
4, 修改文件$HBASE_HOME/conf/hbase-site.xml ,新增修改内容如下:
<configuration>
<!-- 设置hbase 调用zookeeper 分布式地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>big-master1:2181,big-master2:2181,big-slave01:2181,big-slave02:2181,big-slave03:2181</value>
<description>
The directory share by RegionServers.
</description>
</property>
<!-- 设置Hbase的zookeeper的客户端访问端口 -->
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<!-- hbase的元数据信息在本地的存储路径 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/data/hbase/zk</value>
<description>
Property from ZooKeeper config zoo.cfg. The Directory where the snapshot is stored.
</description>
</property>
<!-- Hbase集群对客户端提供的接口地址 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://cluster1/hbase</value>
<description>
The Directory shared by RegionServers.
</description>
</property>
<!-- 开启hbase分布式属性,flase 表示集群模式为standalone -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>
Possible values are false:standalone and pseudo-distributed setups with managed
zookeeper true:fully-distributed with unmanaged zookeeper Quorum(see hbase-env.sh)
</description>
</property>
</configuration>
5, 将/usr/local/hbase 目录和/etc/profile 文件 拷贝至其他几个节点,big-master2,big-slave01 - big-slave03,并授权 :
[root@big-master1 local]# rsync -avzP /usr/local/hbase big-master2:/usr/local/
[root@big-master1 local]# rsync -avzP /usr/local/hbase big-slave01:/usr/local/
[root@big-master1 local]# rsync -avzP /usr/local/hbase big-slave02:/usr/local/
[root@big-master1 local]# rsync -avzP /usr/local/hbase big-slave03:/usr/local/
6,创建对应的目录:
mkdir -pv /data/hbase/zk
(三)
Hbase 启动及验证:
在big-master1 上启动Hbase 集群服务.
[hadoop@big-master1 ~]$ start-hbase.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
running master, logging to /usr/local/hbase/logs/hbase-hadoop-master-big-master1.out
big-slave02: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-big-slave02.out
big-slave03: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-big-slave03.out
big-slave01: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-big-slave01.out
[hadoop@big-master1 ~]$ jps
30037 JournalNode
10181 HMaster
10438 Jps
4023 ResourceManager
29642 DFSZKFailoverController
29804 NameNode
28141 QuorumPeerMain
在big-master2节点上再启动一个Hmaster进程,构成高可用环境。
[hadoop@big-master2 ~]$ hbase-daemon.sh start master
running master, logging to /usr/local/hbase/logs/hbase-hadoop-master-big-master2.out
[hadoop@big-master2 ~]$ jps
20032 NameNode
20116 JournalNode
20324 DFSZKFailoverController
31540 HMaster
31704 Jps
18830 QuorumPeerMain
2462 ResourceManager
其他slave 节点,直接jps 验证查看即可:
[hadoop@big-slave01 ~]$ jps
10161 NodeManager
28513 Jps
28338 HRegionServer
7702 QuorumPeerMain
8583 DataNode
8686 JournalNode
[hadoop@big-slave02 ~]$ jps
26097 Jps
5187 DataNode
6697 NodeManager
4362 QuorumPeerMain
5290 JournalNode
25869 HRegionServer
[hadoop@big-slave03 ~]$ jps
26193 Jps
4562 QuorumPeerMain
5442 DataNode
26004 HRegionServer
6903 NodeManager
5545 JournalNode
hbase 默认端口为16010 ,直接登陆
http://192.168.41.20:16010/master-status
http://192.168.41.21:16010/master-status
(四)
基本操作:
[root@big-master1 ~]# su - hadoop
Last login: Thu Jun 4 23:34:55 CST 2020 on pts/0
[hadoop@big-master1 ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.5, rf76a601273e834267b55c0cda12474590283fd4c, 2020年 05月 21日 星期四 18:34:40 CST
Took 0.0082 seconds
hbase(main):001:0> whoami
hadoop (auth:SIMPLE)
groups: hadoop
Took 0.0332 seconds
hbase(main):002:0> create 'User','info'
Created table User
Took 10.1702 seconds
=> Hbase::Table - User
hbase(main):003:0> list
TABLE
User
1 row(s)
Took 0.0573 seconds
=> ["User"]
--- 以下内容为摘抄写 ------
- 表结构
1. 创建表
语法:create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
创建一个User表,并且有一个info列族
hbase(main):002:0> create 'User','info' 0 row(s) in 1.5890 seconds => Hbase::Table - User
3. 查看所有表
hbase(main):003:0> list TABLE SYSTEM.CATALOG SYSTEM.FUNCTION SYSTEM.SEQUENCE SYSTEM.STATS TEST.USER User 6 row(s) in 0.0340 seconds => ["SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "TEST.USER", "User"]
4. 查看表详情
hbase(main):004:0> describe 'User' Table User is ENABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.1410 seconds hbase(main):025:0> desc 'User' Table User is ENABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0380 seconds
5. 表修改
删除指定的列族
hbase(main):002:0> alter 'User', 'delete' => 'info' Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.5340 seconds
- 表数据
1. 插入数据
语法:put <table>,<rowkey>,<family:column>,<value>
hbase(main):005:0> put 'User', 'row1', 'info:name', 'xiaoming' 0 row(s) in 0.1200 seconds hbase(main):006:0> put 'User', 'row2', 'info:age', '18' 0 row(s) in 0.0170 seconds hbase(main):007:0> put 'User', 'row3', 'info:sex', 'man' 0 row(s) in 0.0030 seconds
2. 根据rowKey查询某个记录
语法:get <table>,<rowkey>,[<family:column>,....]
hbase(main):008:0> get 'User', 'row2' COLUMN CELL info:age timestamp=1502368069926, value=18 1 row(s) in 0.0280 seconds hbase(main):028:0> get 'User', 'row3', 'info:sex' COLUMN CELL info:sex timestamp=1502368093636, value=man hbase(main):036:0> get 'User', 'row1', {COLUMN => 'info:name'} COLUMN CELL info:name timestamp=1502368030841, value=xiaoming 1 row(s) in 0.0120 seconds
3. 查询所有记录
语法:scan <table>, {COLUMNS => [ <family:column>,.... ], LIMIT => num}
扫描所有记录
hbase(main):009:0> scan 'User' ROW COLUMN+CELL row1 column=info:name, timestamp=1502368030841, value=xiaoming row2 column=info:age, timestamp=1502368069926, value=18 row3 column=info:sex, timestamp=1502368093636, value=man 3 row(s) in 0.0380 seconds
扫描前2条
hbase(main):037:0> scan 'User', {LIMIT => 2} ROW COLUMN+CELL row1 column=info:name, timestamp=1502368030841, value=xiaoming row2 column=info:age, timestamp=1502368069926, value=18 2 row(s) in 0.0170 seconds
范围查询
hbase(main):011:0> scan 'User', {STARTROW => 'row2'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 row3 column=info:sex, timestamp=1502368093636, value=man 2 row(s) in 0.0170 seconds hbase(main):012:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row2'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 1 row(s) in 0.0110 seconds hbase(main):013:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 1 row(s) in 0.0120 seconds
另外,还可以添加TIMERANGE和FITLER等高级功能
STARTROW,ENDROW必须大写,否则报错;查询结果不包含等于ENDROW的结果集
4. 统计表记录数
语法:count <table>, {INTERVAL => intervalNum, CACHE => cacheNum}
INTERVAL设置多少行显示一次及对应的rowkey,默认1000;CACHE每次去取的缓存区大小,默认是10,调整该参数可提高查询速度
hbase(main):020:0> count 'User' 3 row(s) in 0.0360 seconds => 3
5. 删除
删除列
hbase(main):008:0> delete 'User', 'row1', 'info:age' 0 row(s) in 0.0290 seconds
删除所有行
hbase(main):014:0> deleteall 'User', 'row2' 0 row(s) in 0.0090 seconds
删除表中所有数据
hbase(main):016:0> truncate 'User' Truncating 'User' table (it may take a while): - Disabling table... - Truncating table... 0 row(s) in 3.6610 seconds
- 表管理
1. 禁用表
hbase(main):014:0> disable 'User' 0 row(s) in 2.2660 seconds hbase(main):015:0> describe 'User' Table User is DISABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0340 seconds hbase(main):016:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'} ROW COLUMN+CELL ERROR: User is disabled.
2. 启用表
hbase(main):017:0> enable 'User' 0 row(s) in 1.3470 seconds hbase(main):018:0> describe 'User' Table User is ENABLED User COLUMN FAMILIES DESCRIPTION {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0310 seconds hbase(main):019:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'} ROW COLUMN+CELL row2 column=info:age, timestamp=1502368069926, value=18 1 row(s) in 0.0280 seconds
3. 测试表是否存在
hbase(main):022:0> exists 'User' Table User does exist 0 row(s) in 0.0150 seconds hbase(main):023:0> exists 'user' Table user does not exist 0 row(s) in 0.0110 seconds hbase(main):024:0> exists user NameError: undefined local variable or method `user' for #<Object:0x412ebe64>
4. 删除表
删除前,必须先disable
hbase(main):030:0> drop 'TEST.USER' ERROR: Table TEST.USER is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1' hbase(main):031:0> disable 'TEST.USER' 0 row(s) in 2.2640 seconds hbase(main):033:0> drop 'TEST.USER' 0 row(s) in 1.2490 seconds hbase(main):034:0> list TABLE SYSTEM.CATALOG SYSTEM.FUNCTION SYSTEM.SEQUENCE SYSTEM.STATS User 5 row(s) in 0.0080 seconds => ["SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "User"]