HIVE实战处理(二)hive、hbase、phoenix实战

4 篇文章 1 订阅
2 篇文章 0 订阅
1 、准备hive数据

需求:数据从hive到hbase这个部分数据已经落地,但是hbase不支持多级查询,所以要切换到phoenix做hive集成phoenix。

巨坑!!!
本来是数据hive是在数据仓库这边是多列的,之前的方案是hive的源数据是什么样的就到hbase什么样的,以至于后面的数据都是一个列族一个列,数据在hbase建表的时候没有split成多列。
在这里插入图片描述

1 、hbase到phoenix映射数据

创建hbase到phoenix映射表

create table "HBASE的表名"(ROW varchar primary key, "info"."record" varchar);
#注意表名用双引号引起来,和HBASE一样的名称(如果hbase里面的是小写这里也是小写)。info是列族,record是对应的列名,
#使用HBASE里面的ROW作为phoenix的表的主键,保证数据唯一。

在这里插入图片描述

<property>
  <name>hbase.client.scanner.timeout.period</name>
  <value>60000</value>
</property>

这里的报错信息并不影响到phoenix的映射,数据已经执行成功,只是phoenix客户端连接超时。
但是数据的话接口还是要做split切割,存成一个JAVA对对象,然后把查询条件按照rowkey的生成规则做了组合主键,做为查询条件查询Hbase数据。

2、hive集成phoenix数据

1、准备hive2phoenix的映射表

drop table if exists adm_qltsys.adm_2ndline_user_visit_h;
create table adm_qltsys.adm_2ndline_user_visit_h (
statis_day                       string,
search_time                      string,
serv_number                      string,
prov_id                          string,
region_id                        string,
node_id                          string,
sup_node_id                      string,
url_detail                       string,
sup_url_detail                   string,
client_id                        string,
chn_id                           string,
chn_id_source                    string,
cp_id                            string,
cp_name                          string,
node_type                        string,
net_type                         string,
term_type                        string,
gate_ip                          string,
session_id                       string,
page_id                          string,
term_prod_id                     string,
business_id                      string,
sub_busi_id                      string,
virt_busi_id                     string,
client_code                      string
)
partitioned by (
statis string,
statis_hour string
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "qltsys.adm_2ndline_user_visit_h",
"phoenix.zookeeper.quorum" = "10.194.128.72,10.194.128.73,10.194.128.74",
"phoenix.zookeeper.znode.parent" = "/hbase-unsecure",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "statis_day,search_time,serv_number,prov_id,region_id,node_id,sup_node_id,url_detail,sup_url_detail,client_id,chn_id,chn_id_source,cp_id,cp_name,node_type,net_type,term_type,gate_ip,session_id,page_id,term_prod_id,business_id,sub_busi_id,virt_busi_id,client_code",
"phoenix.column.mapping" = "statis:statis,statis_hour:statis_hour,statis_day:STATIS_DAY,search_time:SEARCH_TIME,serv_number:SERV_NUMBER,prov_id:PROV_ID,region_id:REGION_ID,node_id:NODE_ID,sup_node_id:SUP_NODE_ID,url_detail:URL_DETAIL,sup_url_detail:SUP_URL_DETAIL,client_id:CLIENT_ID,chn_id:CHN_ID,chn_id_source:CHN_ID_SOURCE,cp_id:CP_ID,cp_name:CP_NAME,node_type:NODE_TYPE,net_type:NET_TYPE,term_type:TERM_TYPE,gate_ip:GATE_IP,session_id:SESSION_ID,page_id:PAGE_ID,term_prod_id:TERM_PROD_ID,business_id:BUSINESS_ID,sub_busi_id:SUB_BUSI_ID,virt_busi_id:VIRT_BUSI_ID,client_code:CLIENT_CODE",
"phoenix.table.options" = "SALT_BUCKETS=10, DATA_BLOCK_ENCODING='PREFIX_TREE', COMPRESSION ='SNAPPY'",
"phoenix.upsert.batch.size" ="1000");

2、从hive导入到phoenix

#特殊udf函数 getuuid的SQL
add jar hdfs://ns1/user/hadoop/hive_udf/phoenix-core-4.7.1-HBase-1.1-SNAPSHOT.jar;
add jar hdfs://ns1/user/hadoop/hive_udf/phoenix-hive-4.7.1-HBase-1.1-SNAPSHOT.jar;
add jar hdfs://ns1/user/hadoop/hive_udf/tephra-api-0.7.0.jar;
add jar hdfs://ns1/user/hadoop/hive_udf/tephra-core-0.7.0.jar;
add jar hdfs://ns1/user/hadoop/hive_udf/twill-zookeeper-0.6.0-incubating.jar;
add jar hdfs://ns1/user/hadoop/hive_udf/twill-discovery-core-0.6.0-incubating.jar;
add jar hdfs://ns1/user/hadoop/hive_udf/twill-discovery-api-0.6.0-incubating.jar;
#添加UDF函数的JAR
add jar hdfs://ns1/user/hadoop/hive_udf/original-mg-hive-udf-1.0.jar;
set hive.auto.convert.join=false;
set hive.execution.engine=mr;
insert overwrite table adm_qltsys.adm_2ndline_user_visit_h partition (statis='${hiveconf:STATIS_DAY}',statis_hour='${hiveconf:STATIS_HOUR}')
select
dt,
hour,
case when nvl(statis_day, '') = '' then 'unknown' else statis_day end as statis_day,
case when nvl(search_time, '') = '' then 'unknown' else search_time end as search_time,
case when nvl(serv_number, '') = '' then 'unknown' else serv_number end as serv_number,
case when nvl(prov_id, '') = '' then 'unknown' else prov_id end as prov_id,
case when nvl(region_id, '') = '' then 'unknown' else region_id end as region_id,
case when nvl(node_id, '') = '' then 'unknown' else node_id end as node_id,
case when nvl(sup_node_id, '') = '' then 'unknown' else sup_node_id end as sup_node_id,
case when nvl(url_detail, '') = '' then 'unknown' else url_detail end as url_detail,
case when nvl(sup_url_detail, '') = '' then 'unknown' else sup_url_detail end as sup_url_detail,
case when nvl(client_id, '') = '' then 'unknown' else client_id end as client_id,
case when nvl(chn_id, '') = '' then 'unknown' else chn_id end as chn_id,
case when nvl(chn_id_source, '') = '' then 'unknown' else chn_id_source end as chn_id_source,
case when nvl(cp_id, '') = '' then 'unknown' else cp_id end as cp_id,
case when nvl(cp_name, '') = '' then 'unknown' else cp_name end as cp_name,
case when nvl(node_type, '') = '' then 'unknown' else node_type end as node_type,
case when nvl(net_type, '') = '' then 'unknown' else net_type end as net_type,
case when nvl(term_type, '') = '' then 'unknown' else term_type end as term_type,
case when nvl(gate_ip, '') = '' then 'unknown' else gate_ip end as gate_ip,
case when nvl(session_id, '') = '' then 'unknown' else session_id end as session_id,
case when nvl(page_id, '') = '' then 'unknown' else page_id end as page_id,
case when nvl(term_prod_id, '') = '' then 'unknown' else term_prod_id end as term_prod_id,
case when nvl(business_id, '') = '' then 'unknown' else business_id end as business_id,
case when nvl(sub_busi_id, '') = '' then 'unknown' else sub_busi_id end as sub_busi_id,
case when nvl(virt_busi_id, '') = '' then 'unknown' else virt_busi_id end as virt_busi_id,
case when nvl(client_code, '') = '' then 'unknown' else client_code end as client_code,
yanfa.mg_getuuid() as rank
from adm.adm_2ndline_user_visit_1h_delta_hourly
where dt='${hiveconf:STATIS_DAY}' and hour='${hiveconf:STATIS_HOUR}' ;

2、调度执行上面的hql语句,并传入参数
使用beeline方式连接hive执行HQL灌入数据到phoenix。
HIVE实战处理(三)beeline和hive的区别<a>

在你执行hql的时候必要的时候手动指定下相关操作要用的JAR,
或者使用beeline的方式会默认会去寻找指定目录下的JAR,并加载在环境中。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值