1. 安装文件准备
提前安装好可运行的hadoop环境;
提前安装好可运行的zookeeper环境;
hbase 2.4.9.tar.gz安装包
2. 解压,移动,重命名
sudo tar -zxvf ~/hbase-2.4.9-bin.tar.gz -C /usr/local/
# 解压
cd /usr/local
sudo mv hbase-2.4.9 hbase
# 重命名
3. 用户、用户组权限修改
sudo chown -R hadoop /usr/local/hbase
sudo chgrp -R hadoop /usr/local/hbase
4. 配置文件修改
- 系统环境变量 /etc/profile
- hbase-env.sh配置
- hbase-site.xml配置
- hbase集群regionservers
4.1 系统环境变量 /etc/profile
sudo vi /etc/profile
# 添加如下
#####
export HBASE_HOME=/usr/local/hbase
# path 变量后追加如下
#####
:$HBASE_HOME/bin
刷新环境变量
source /etc/profile
环境变量,集群其它机器同步修改
4.2 hbase-env.sh配置
sudo vi /usr/local/hbase/conf/hbase-env.sh
# 取消export java_home的注释 输入java_home的环境变量配置
###
export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_321/
###
# 取消export HBASE_MANAGES_ZK=true的注释,改为false
export HBASE_MANAGES_ZK=false
4.3 hbase-site.xml配置
sudo vi hbase-site.xml
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/hbase/tmp</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master-msi:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master-msi,slave1-msi,slave2-msi</value>
</property><property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/hbase/tmp/zk/data</value>
</property>
4.4 修改regionservers
sudo vi regionservers
##############################
192.168.1.210
192.168.1.211
192.168.1.212
##############################
4.5 集群分发文件
scp /usr/local/hbase nodename1:~
#nodename1:>> shell下输入
sudo mv ~/hbase /usr/local
scp /usr/local/hbase nodename2:~
#nodename2:>> shell下输入
sudo mv ~/hbase /usr/local
注意:集群中hbase 目录的文件所有权限是否为root ,如果是root所有者权限,请输入chown&chgrp切换权限,否则初始化启动集群会报错,无法创建./log目录
5. 启动集群
5.1 前提:
分布式安装下,先启动hadoop集群和zookeeper集群(HBASE_MANAGES_ZK=false时)
jps查看集群的状态(可以编写shell脚本一键查看)
###################master######################
1536 NameNode
1701 DataNode
9190 Jps
2855 RunJar
2136 ResourceManager
1947 SecondaryNameNode
2315 NodeManager
9133 QuorumPeerMain
###################slave1#####################
1393 DataNode
6922 Jps
6860 QuorumPeerMain
1582 NodeManager
###################slave2#####################
1394 DataNode
6377 Jps
6316 QuorumPeerMain
1583 NodeManager
###################OK,hadoop+zookeeper启动成功#################
5.2 启动hbase集群
同启动hadoop类似,运行$HBASE_PATH 路径下 ./bin/start-hbase.sh
报错异常处理:
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: invalid variable name
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: invalid variable name
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
running master, logging to /usr/local/hbase/logs/hbase-hadoop-master-master-msi.out
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: invalid variable name
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: invalid variable name
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
192.168.1.212: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-slave2-msi.out
192.168.1.211: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-slave1-msi.out
192.168.1.210: running regionserver, logging to /usr/local/hbase/bin/../logs/hbase-hadoop-regionserver-master-msi.out
原因:
无效变量名,错误:找不到或无法加载主类
可能是Hadoop和Hbase包路径冲突所致
sudo vi /usr/local/hbase/conf/hbase-env.sh
##修改配置文件最后一项配置,取消注释
重新运行start-hbase.sh即可少许多报错信息。
5.3 check:
5.3.1 jps查看集群状态:
###################master######################
1536 NameNode
1701 DataNode
11287 HRegionServer
2855 RunJar
2136 ResourceManager
1947 SecondaryNameNode
2315 NodeManager
9133 QuorumPeerMain
11038 HMaster
12303 Jps
###################slave1#####################
7872 HRegionServer
8544 Jps
1393 DataNode
6860 QuorumPeerMain
1582 NodeManager
###################slave2#####################
1394 DataNode
8043 Jps
6316 QuorumPeerMain
7340 HRegionServer
1583 NodeManager
###################OK,hadoop + zookeeper + hbase启动成功#################
5.3.2 进入shell,增删改查
./bin/hbase shell
建表:(表属性默认属性,hbaseb表带有默认的历史数据,可通过配置自定义历史数据数量。)
(hbase支持列族Group,以下语句创建了6个列族Group)
create 'test_hbase' , 'id','name','age' ,'phone' ,'email' ,'home_address'
5.3.2.1 增:put
只能默认写每行的一个单元格,单元格value自带时间戳,索引需要定义。
put 'test_hbase' ,'0001','id','1'
put 'test_hbase','0001','name','mahuateng'
put 'test_hbase','0001','age','18'
put 'test_hbase', '0001','phone','13888888888'
put 'test_hbase', '0001','email','123456@qq.com'
put 'test_hbase', '0001','home_address','shenzhen'
5.3.2.2 查:get scan
输入表名和数据索引查数据,column的排序默认字母顺序
get 'test_hbase','0001'
hbase:027:0> get 'test_hbase','0001'
COLUMN CELL
age: timestamp=2022-03-09T21:18:56.460, value=18
email: timestamp=2022-03-09T21:20:15.704, value=123456@qq.com
home_address: timestamp=2022-03-09T21:21:41.139, value=shenzhen
id: timestamp=2022-03-09T21:18:40.881, value=1
name: timestamp=2022-03-09T21:12:12.619, value=mahuateng
phone: timestamp=2022-03-09T21:19:55.153, value=13888888888
扫描整张表
scan 'test_hbase'
5.3.2.3 删 :delete, deletall,truncate
delete 删除指定行单一列族。
hbase:003:0> delete 'test_hbase','0001','id'
deleteall 删除整行。
hbase:005:0> deleteall 'test_hbase','0001'
truncate,清空表,删除所有行。默认先执行disable
hbase:007:0> truncate 'test_hbase'
5.3.2.4 禁用表,启用表,drop 表
disable,禁用表。
hbase:007:0> disable 'test_hbase'
enable 启用表
hbase:006:0> enable 'test_hbase'
drop,删除表,复用需要重新创建。
hbase:004:0> drop 'test_hbase'
5.3.2.5 改:修改表数据
同写入数据,put 命令,
旧数据存为历史数据。
6. hbase API 接口调用
6.1 hbase web UI界面
hbase master status 默认端口号 16010,主机为master主机
hbase regionserver 默认端口号 16030,主机为master主机
端口号默认程序定义
区别:16010端口可以查看表属性,更详细。
6.2 python连接hbase
python 连接 默认需要在服务端开启thriftserver 服务
默认端口 9090
#方法1 log写在文件
./hbase-daemon.sh start thrift
#方法2 调试模式 ,log echo在控制台
hbase start thrift
1. happybase 封装包连接
下载happybase ,以及依赖,连接上connection
# -*- coding: utf-8 -*-
import happybase
con = happybase.Connection(host="192.168.1.210", port=9090)
# con = happybase.Connection(host="192.168.1.210",port=9090,
# timeout=None,autoconnect=True,table_prefix=None,table_prefix_separator= b'_',
# compat='0.98', transport='buffered',protocol='binary')
con.open()
print(con.tables())
for a in con.tables():
# print(type(a))
# b = a.decode()
# print(type(b))
# print(b)
con.delete_table(a.decode(), disable=True)
print(con.tables())
table_hbase_dic = {'id': dict(max_versions=5, block_cache_enabled=False),
'name': dict(max_versions=5, block_cache_enabled=False),
'age': dict(max_versions=5, block_cache_enabled=False),
'phone': dict(max_versions=5, block_cache_enabled=False),
'email': dict(max_versions=5, block_cache_enabled=False),
'home_address': dict(max_versions=5, block_cache_enabled=False)
}
con.create_table('test_hbase', table_hbase_dic)
print(con.tables())
# con.table().put()
con.close()
2. thrift + hbase 包连接
# -*- coding: utf-8 -*-
from thrift.transport import TSocket, TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
# server端地址和端口,web是HMaster也就是thriftServer主机名,thriftServer默认端口是9090
socket = TSocket.TSocket(host='192.168.1.210', port=9090)
# 可以设置超时
socket.setTimeout(5000)
# 设置传输方式(TFramedTransport或TBufferedTransport)
transport = TTransport.TBufferedTransport(socket)
# 设置传输协议,缺省简单的二进制序列化协议
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
transport.open()
print(client.getTableNames())