【无标题】

澄绪猿

已于 2024-03-26 13:54:51 修改

阅读量336

点赞数 3

分类专栏： HBase 文章标签： hbase

于 2024-03-26 13:46:52 首次发布

本文链接：https://blog.csdn.net/python8181/article/details/137042474

版权

HBase 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

本文详细介绍了HBase的基础指令，包括全表扫描、限定数量、行、列和行键过滤，以及与Hive集成的过程，展示了如何使用列限定符、累加器和Hive外表进行数据操作和统计。

摘要由CSDN通过智能技术生成

HBase入门篇（必看必会）

HBase基础指令- 练习版本

scan 'table' 扫描全表
扫描全表加上限定列

scan 'table' , {columns=> ['列簇：列限定符','列簇：列限定符']}

限定数量

scan 'table' ,{limit=>3}

限定行

get 'table' ,'rowkey'

rowkey前置限定 rowkey 以39dd 开头的行

scan 'talbe' ,{columns=> ['列簇：列限定符'],rowprefixfilter=>'39dd'}

rowkey对其做过滤比较使用到过滤器筛选出rowkey为fdsa3 样式的行

scan 'table' , {filter=>"rowfilter(=,'binary:fdsa3')"}

查询指定订单的数据，订单号为“02602f66-adc7-40d4-8485-76b5632b5b53”、订单状态(status)及支付方式(payway)

scan 'table' ,{filter=>"rowfilter(=,'binary:02602f66-adc7-40d4-8485-76b5632b5b53')", /
columns=>['cf:status','cf:payway']}

查询状态为已付款(status)的订单

scan 'table' , {filter=>"SingleColumnValueFilter('cf','status',=,'binary:已付款')"}

查询支付方式为1，且金额大于3000的订单

scan 'talbe', {filter=>"SingleColumnValueFilter('cf','payway',=,'binary:1') and /
SingleColumnValueFilter('cf','money',>,'binary:3000')"}

列限定过滤器：Columnprefixfilter(“pay”) 用于限定以pay开头的列的列限定符

显示在2020-04-25 12:09:16 时间和2020-04-29 12:09:16 时间段内的数据列限定符为datetime

scan 'talbe' , {filter=>"SingleColumnValueFilter('cf','datetime',>=,'binary:2020-04-25 12:09:16') and /
SingleColumnValueFilter('cf','datetime',<=,'binary:2020-04-29 12:09:16')"}

累加器的用法注：get_counter 获取累加器value

incr 'table' ,'rowkey','c1:CNT'    # 累加器默认加一
>>> counter	 value =12
incr 'table' ,'rowkey','c1:CNT' 执行一次累加器value就加一
>>> counter value = 13

## 获取counter 的值, 使用scan获取不到，使用get得到的值不一样
get_counter 'table' ,'rowkey','c1:CNT'
>>> counter value = 13

# 改变累加器的增加数值为10， 每次加10
incr 'table' ,'rowkey','c1:CNT',10
>>>counter value = 23

查看表结构

describe 'table'

检查表是否存在

exists 'table'

添加新列簇就是修改表的schema了所以用alter

alter 'table' 'test'   新增一个列簇为test
# 删除一个列簇 二种写法
1. alter 'table' ,'delete'=>'test'
2. alter 'table', {'delete'=>'test'}

hbase创建一个命名空间类似于mysql中的库，使得表于表之间相互隔离，实现了不同表之间的隔离于复用

create_namespace 'first'

将表写入到该命名空间下只需要再建表的时候写：这里指定了二个列限定符

create 'first:tableTest' ,{NAME=>['cf','cf1']}

HBase-Hive集成

hbase中无表

需要建二张表都在hive的客户端来完成
1. hive的外表
1. hive 于hbase映射的map表完成hbase表的创建以及数据的同步备份

# hive 外表
# 这里注意： 如果写程序打jar包去执行的再建hive外表时必须指定真实数据位置，且好像只能写道warehouse目录下，细节很多
# 但直接再hive客户端执行ddl 就不用指定Location
create external table if not exists hiveT(id int, name string) /
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as  textfile

load data inpath '/user/werahouse/ods.db/foo.csv' into table hiveT;

map表

create table HbaseT(id int,name string)/
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties('hbase.columns.mapping'=':id,cf:name')
tblproperties('hbase.table.name'='fisrt:HbaseT')

insert overwrite table HbaseT select * from hiveT;

ok ，hbase备份hive数据over ，可以删掉重点的外表了

对于json类型的数据做同步备份
第一张hive外表

add jar /opt/hive/lib/hive-hcatalog-core.jar;
create external table test(id int , name string)/
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe';

load data inpath '/user/xx/json' into table test;

第二张mao表

create table Hbasetest(id int, name string) /
stored by 'org.apache.hadoop.hive.hbase.HbaseStorageHandler'
with serdeproperties('hbase.columns.mapping'=':id,cf:name')
tblproperties("hbase.table.name"="first:Hbasetest");

insert overwrite table Hbasetest select * from test;

hbase中已经存在表

此时就只需要一张map表即可普通表格类型

create table if not exists teest(id int,name string)
row format serde 'org.apache.hive.hbase.HBaseSerDe'
stored by 'org.apache.hadoop.hive.hbase.HbaseStorageHandler'
with serdeproperties('hbase.columns.mapping'='cf:id,cf:name')
tblproperties('hbase.table.name'='first:teest');
insert ......

对于json格式

create table if not exists teest(id int,name string)
row format serde 'org.apache.hive.hbase.JsonSerDe'
stored by 'org.apache.hadoop.hive.hbase.HbaseStorageHandler'
with serdeproperties('hbase.columns.mapping'='cf:id,cf:name') tblproperties('hbase.table.name'='first:teest');
insert ......

使用其mr的计数统计类来进行计数统计

hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'student:stu'

多练多敲很容易记下哦

澄绪猿

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
【无标题】

hbase备份hive数据over ，可以删掉重点的外表了。使用其mr的计数统计类来进行计数统计。多练多敲很容易记下哦。
复制链接

扫一扫

专栏目录