HBASE介绍

最新推荐文章于 2023-06-28 19:31:36 发布

wealon

最新推荐文章于 2023-06-28 19:31:36 发布

阅读量772

点赞数

分类专栏： hadoop hbase

本文链接：https://blog.csdn.net/wealon/article/details/41924497

版权

hadoop 同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

hbase

1 篇文章 0 订阅

订阅专栏

HBASE介绍

★ HBASE基础

Hbase中的每一张表，都是所谓的BigTable。

RowKey和ColumnKey是二进制值数组。byte[]

Timestamp是一个64位整数

什么可以作为RowKey？

字符串、整数、二进制串甚至串行化的结构都可以作为行键。

Hbase中的列必须用列族ColumnFamily来定义。

任意一列的表示方式是：==》列族：标签

在物理存储上将同“族”的数据存储在一起。

列族中的Column均以二进制格式存储。用户需要自行进行类型转换。

如果一张表的大小超过了设定的大小，会分成多个Region。一个region由开始RowKey和EndRowKey来表示。每个HRegion分散在不同的RegionServer中。如下图所示是Hbase的物理存储：

★ HBASE体系

包括Client 、 Zookeeper 、 Master如下图所示。

▲ Client

Client包含有访问Hbase的接口，client维护着一些cache来加快对Hbase的访问，比如region的位置信息。

▲ zookeeper的作用

1：保证在任何时候，集群中只有一个runningmaster

2：存贮所有Region的寻址入口

3：实时监控RegionServer的状态，将RegionServer的上线和下线信息，实时通知给HMaster

4：存储Hbase的schema，包括有哪些Table，每个table有哪些columnfamily

▲ Master

Master可以启动多个HMaster，通过Zookeeper的MasterElection机制保证总有一个Master运行。

可见：client访问Hbase上数据的过程并不需要master参与，寻址访问Zookeeper和RegionServer。数据读写访问RegionServer。

HRegionServer主要负责响应用户的IO请求，向HDFS文件系统中读写数据。

★ RegionServer的作用

维护Master分配给它的region，处理对这些region的IO请求。

负责切分在运行过程中变得过大的region

★ HBASE的特殊表

Hbase中有两张特殊的表。-ROOT- 和.META

.META.表记录了用户表的Region信息，.META.可以有多个region，即.META.可以保存到多个region中

-ROOT-表记录了.META.表的Region信息，-ROOT-只有一个region 。Zookeeper中记录了-ROOT-表的location

如下是Client访问数据时Hbase的Steps

首先访问Zookeeper，然后访问-ROOT-表，接着访问.META表，最后才能找到用户数据的位置去访问。

★ HBASE伪分布式安装

前提条件：Hadoop环境已安装（伪分布式或集群环境均可）

▲ 解压Hbase安装包

tar -zxvf hbase-0.94.2-security.tar.gz

mv hbase-0.94.2-security hbase

▲ 修改/etc/profile文件

设置Hbase的环境变量

source /etc/profile

▲ 修改conf/hbase-env.sh

设置JAVA_HOME

export JAVA_HOME=/usr/local/jdk1.6

设置zookeeper，用HBASE的zookeeper

export HBASE_MANAGES_ZK=true

▲ 修改conf/hbase-site.sh

在配置文件中添加如下内容：

注意：其中hbase.rootdir要与Hadoop中配置的fs.default.name的主机名及端口号要一致。

配置内容如下：

<name>hbase.rootdir</name>

<value>hdfs://hadoop5:9000/hbase</value>

</property>

<name>hbase.cluster.distributed</name>

</property>

<name>hbase.zookeeper.quorum</name>

<value>hadoop5</value>

</property>

<name>dfs.replication</name>

</property>

</configuration>

补充：在Hadoop配置文件conf/core-site.xml中的配置如下：

内容如下：

<name>fs.default.name</name>

<value>hdfs://hadoop5:9000</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop1/tmp</value>

</property>

</configuration>

▲ 修改conf/regionservers

在该文件中增加一行

localhost

用来设置regionserver的主机名

▲ 启动/停止 Hbase

启动Hbase

[root@hadoop5 conf]# start-hbase.sh

启动后，在HDFS文件系统中会多一个目录/hbase

该目录是在hbase.rootdir 中指定的。

如下所示：

停止Hbase

[root@hadoop5 conf]# stop-hbase.sh

▲ Hbase的Web Console

地址及端口：

http://hadoop5:60010/master-status

★ HBASE的shell操作

包括对表的创建、更新、删除、查询等。

▲ 一些常用的Hbase命令汇总

名称	命令表达式
创建表	create '表名称', '列族名称1','列族名称2','.......' 创建表时，只指定表名称和列名称就可以了，列可以动态添加
添加记录	put '表名称', '行名称', '列名称:', '值'
查看记录	get '表名称', '行名称'
查看表中的记录总数	count '表名称'
删除记录	delete '表名' ,'行名称' , '列名称'
删除一张表	先要屏蔽该表，才能对该表进行删除，第一步 disable '表名称' 第二步 drop '表名称'
查看所有记录	scan "表名称"
查看某个表某个列中所有数据	scan "表名称" , {COLUMNS=>'列族名称:列名称'}
更新记录	就是重写一遍进行覆盖

▲ 一些常用的Hbase命令

● 进入shell客户端

命令：#hbase shell

● 查看已有的表

命令：list

● 创建表

命令：create 'users','user_id','address','info'

● 判断表是否存在

命令：exists 'users'

● 判断表是否可用

命令：is_enabled 'users'

is_disabled 'users'

命令：disable 'users'

● 查看表结构

命令：descreibe 'users'

● 查看表中所有数据

命令：scan 'users'

注：上述表中暂无记录

● 添加数据

添加的数据如下：

put 'users','xiaotou','info:age','24';

put'users','xiaotou','info:birthday','1993-06-17';

put'users','xiaotou','info:company','easynet';

put'users','xiaotou','address:contry','china';

put'users','xiaotou','address:province','shanxi';

put 'users','xiaotou','address:city','lvliang';

put'users','datou','info:birthday','1987-4-17';

put'users','datou','info:favorite','movie';

put 'users','datou','info:company','baidu';

put'users','datou','address:contry','china';

put 'users','datou','address:province','beijing';

put 'users','datou','address:city','bj';

put'users','datou','address:town','huoying';

添加后显示所有记录

命令：scan 'users'

● 根据RowId获取所有记录

命令：get 'users','datou'

● 根据RowId+列族获取所有记录

命令：get 'users','datou','address'

● 根据RowId+列族+列获取所有记录

命令：get 'users','datou','address:contry'

● 更新记录

命令：put 'users','datou','address:contry','usa'

● 根据TIMESTAMP获取数据

命令：get'users','datou',{COLUMN=>'address:province',TIMESTAMP=>1406057342573}

● 添加列族

第一步：disable 'users'

第二步：alter 'users',{NAME=?>'other',VERSION=>9}

第三步：查看表结构，确实增加了所在的列族

第四步：使表可用

第五步：增加一条记录，用新增加的列族

第六步：查看是否添加成功

put 'users','lili','other:sex','woman'

查看是否有RowKey为lili的记录。

注：如果Hbase中没有aaa列族，而put一行记录时，会出现如下错误。

NoSuchColumnFamilyException

● 删除列族alter/disable/enable

第一步：查询删除前的数据

第二步：使表不可用

第三步：删除列族

命令：alter 'users',{NAME=>'address',METHOD=>'delete'}

第四步：使表可用

第五步：查看删除列族后，表的数据。确实，删除列族后，表中相应列族及其子列的数据被删除。

● 删除指定行键的字段

命令：delete 'users','datou','info:birthday'

● 删除整行

命令：deleteall 'users','lili'

● 查询表中有多少行

命令：count 'users'

● 清空表

清空表的过程是：

Truncating 'users' table (it may take awhile):

-Disabling table...

-Dropping table...

-Creating table...

● 删除表

以上是删除表的步骤。

★ HBASE的JAVA-API

1：导入Hbase相关的jar包到工程Library

2：启动Hbase

3：单元测试导入junit.jar

▲ 类的前期准备

public classCreateTable {

//定义表名

private static final String TABLE_NAME = "users";

//定义列族

private static final String FAMILY_NAME1 = "user_id";

private static final String FAMILY_NAME2 = "address";

private static final String FAMILY_NAME3 = "info";

//要用到的实例

static Configuration conf = null;

static HBaseAdmin admin = null;

static {

try {

//初始化Configuration

conf = HBaseConfiguration.create();

conf.set("hbase.rootdir","hdfs://hadoop5:9000/hbase");

conf.set("hbase.zookeeper.quorum", "hadoop5");

admin = new HBaseAdmin(conf);

}catch(Exception e) {

e.printStackTrace();

}

在每次运行单无测试的时候，都会先实例化Configuration和HBaseAdmin对象。

▲ 创建表

建表前库中的表列表：

运行代码：

/**

* 创建表

* @throws IOException

@Test

public void createTable() throws IOException {

if(admin.tableExists(TABLE_NAME)){

System.out.println("table exist...");

}else{

HTableDescriptordesc = newHTableDescriptor(TABLE_NAME);

desc.addFamily(new HColumnDescriptor(FAMILY_NAME1));

desc.addFamily(new HColumnDescriptor(FAMILY_NAME2));

desc.addFamily(new HColumnDescriptor(FAMILY_NAME3));

admin.createTable(desc);

System.out.println("create success!!");

}

运行结果：

▲ 删除表

建表前库中的表列表：

运行代码：

/**

* 删除表

* @throws IOException

@Test

public void deleteTable() throws IOException {

admin.disableTable(TABLE_NAME);

admin.deleteTable(TABLE_NAME);

System.out.println("delete table success....");

}

运行结果：

▲ 查看已有的表及表的列族

建表前库中的表列表：

运行代码：

/**

* 查看已有的表及表的列族

* @throws IOException

@Test

public void getAllTables() throws IOException {

HTableDescriptor[]allTables = admin.listTables();

for (HTableDescriptor table: allTables) {

System.out.println(table.getNameAsString());

HColumnDescriptor[]columnFamilies = table.getColumnFamilies();

for (HColumnDescriptor cf :columnFamilies) {

System.out.println(" "+cf.getNameAsString());

}

运行结果：

▲ 判断表是否存在或可用

建表前库中的表列表：

运行代码：

/**

* 判断表是否存在或可用

* @throws IOException

@Test

public void isTableDisable() throws IOException {

boolean tableDisabled = admin.isTableDisabled(TABLE_NAME);

boolean tableExists = admin.tableExists(TABLE_NAME);

System.out.println("tableDisabled="+ tableDisabled + "\ntableExists=" + tableExists);

}

运行结果：

sehll客户端的输出：

▲ 查看表中所有数据

建表前库中的表列表：

运行代码：

运行结果：

/**

* 查看表中所有数据

* @throws IOException

@Test

public void getAllResult() throws IOException {

HTabletable = newHTable(conf,TABLE_NAME);

Scanscan = newScan();

ResultScannerscanner = table.getScanner(scan);

for (Result r : scanner) {

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue :r.raw()) {

System.out.println(" " + newString(keyValue.getFamily())

+":"+ newString(keyValue.getQualifier()) + "="

+newString(keyValue.getValue()));

}

▲ 添加数据

运行代码：

/**

* 插入一条数据

* @throws IOException

@Test

public void putRecord() throws IOException {

Putput = newPut("lili".getBytes());

put.add("info".getBytes(), "age".getBytes(), "24".getBytes());

table.put(put);

System.out.println("save success.");

}

运行结果：如下所示是保存进去的记录

▲ 根据RowID获取所有记录

建表前库中的表列表：

运行代码：

/**

* 根据RowID获取所有记录

* @throws IOException

@Test

public void getRecordByRowID() throws IOException {

//设定Rowid

Getget = newGet("xiaotou".getBytes());

Resultr = table.get(get);

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue :r.raw()) {

System.out.println(newString(keyValue.getFamily()) + "："

+newString(keyValue.getQualifier()) + "===="

+newString(keyValue.getValue()));

}

运行结果：

▲ 根据RowID和CF获取记录

建表前库中的表列表：

运行代码：

/**

* 根据RowID和列族获取所有记录

* @throws IOException

@Test

public void getRecordByRowIDAndCF()throwsIOException {

//设定Rowid

Getget = newGet("xiaotou".getBytes());

get.addFamily("info".getBytes());

Resultr = table.get(get);

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue :r.raw()) {

System.out.println(newString(keyValue.getFamily()) + "："

+newString(keyValue.getQualifier()) + "===="

+newString(keyValue.getValue()));

}

运行结果：

▲ 根据RowID和CF和Column获取记录

建表前库中的表列表：

运行代码：

/**

* 根据RowID和列族和列获取所有记录

* @throws IOException

@Test

public voidgetRecordByRowIDAndCFAndClumn() throws IOException {

//设定Rowid

Getget = newGet("xiaotou".getBytes());

get.addColumn("info".getBytes(), "company".getBytes());

Resultr = table.get(get);

System.out.println("获得到rowkey:" + new String(r.getRow()));

for (KeyValue keyValue :r.raw()) {

System.out.println(newString(keyValue.getFamily()) + "："

+newString(keyValue.getQualifier()) + "===="

+newString(keyValue.getValue()));

}

运行结果：

▲ 更新一条记录

建表前库中的表列表：

运行代码：

/**

* 更新一条数据

* @throws IOException

@Test

public void updateRecord() throws IOException {

Putput = newPut("lili".getBytes());

put.add("info".getBytes(), "age".getBytes(), "99".getBytes());

table.put(put);

System.out.println("update success.");

}

运行结果：

▲ 删除表

建表前库中的表列表：

运行代码：

/**

* 删除表

* @throws IOException

@Test

public void deleteTable() throws IOException {

admin.disableTable(TABLE_NAME);

admin.deleteTable(TABLE_NAME);

System.out.println("delete table success....");

}

运行结果：

=======================

▲ TODO 更多JAVA操作在后边补齐

★ HBASE与MapReduce结合案例

需求：批量的导入HDFS中的数据到Hbase

写一个MapReduce来实现。

★ HBASE的完全分布式安装

wealon

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HBASE介绍

HBASE介绍★ HBASE基础Hbase中的每一张表，都是所谓的BigTable。RowKey和ColumnKey是二进制值数组。byte[]Timestamp是一个64位整数什么可以作为RowKey？字符串、整数、二进制串甚至串行化的结构都可以作为行键。 Hbase中的列必须用列族ColumnFamily来定义。任意一列的表示方式是：==》列族：标签
复制链接

扫一扫