hive常用语句

最新推荐文章于 2023-11-30 21:13:41 发布

kehan_c

最新推荐文章于 2023-11-30 21:13:41 发布

阅读量345

点赞数

分类专栏：基础知识文章标签： hive

本文链接：https://blog.csdn.net/kehan_c/article/details/93196042

版权

基础知识专栏收录该内容

13 篇文章 0 订阅

订阅专栏

表结构

创建表：

create external table `etl_fb_unmatched_history`(
`device_id_md5` string,
`device_type` string,
`platform` string,
`package_name` string)
row format delimited fields terminated by '\t'
location 's3://mob-emr-test/dataplatform/DataWareHouse/data/dwh/etl_fb_unmatched_history';

修改表分区：
alter table etl_fb_org_daily drop if exists partition (yr=‘yyyy’,mt=‘mm’,dt=‘dd’);
alter table etl_fb_org_daily add partition (dt=‘20190620’) location ‘s3://etl_fb_org_daily/2019/06/20’;

加载数据：
load data local inpath ‘/home/hadoop/spark/employees’ overwrite into table employees;

insert overwrite table md5_match partition (dt=‘20190623’)
select id,md5(id) as id_md5 from db.ods_info where dt=‘20190623’ group by id;

修改表名：
alter table table_name rename to new_table_name;

添加列：
alter table tablename add columns(column1 string comment ‘xxxx’,column2 long comment ‘yyyy’) cascade；

修改列名：
alter table table_name change column column_name column_newName int comment ‘column_name’;

内部表和外部表的转换：
alter table table_name set TBLPROPERTIES(‘EXTERNAL’=‘TRUE’);//内部表转化成外部表
alter table table_name set TBLPROPERTIES(‘EXTERNAL’=‘FALSE’);//外部表转成内部表

case when语句：
第一种：
CASE WHEN sex = ‘1’ THEN ‘男’
WHEN sex = ‘2’ THEN ‘女’
ELSE ‘其他’ END
第二种：
CASE sex
WHEN ‘1’ THEN ‘男’
WHEN ‘2’ THEN ‘女’
ELSE ‘其他’ END

hive使用json serde解析json数据：
create external table etl_fb_org_daily(
timestamp_date string,
os string,
log_time bigint)
partitioned by (
dt string)
row format serde ‘org.apache.hive.hcatalog.data.JsonSerDe’
location ‘s3://etl_fb_org_daily’;

hive json serde jar包下载地址：
https://repository.cloudera.com/content/repositories/releases/org/apache/hive/hcatalog/hive-hcatalog-core/
可以直接用新版本的hive-hcatalog-core-2.3.3.jar

union all的使用：
select t3.col from(
select a as col from t1
UNION ALL
select b as col from t2
) as t3;

通过Shell打_SUCCESS标记：
hadoop fs -touchz s3://output_path/_SUCCESS

Spark禁用打_SUCCESS标记
df.write.mode(SaveMode.Overwrite)
.option(“orc.compress”, “zlib”)
.option(“mapreduce.fileoutputcommitter.marksuccessfuljobs”, false)
.orc(output)

Hive复杂数据结构：
array
map<string,string>
struct<name: string, score: int>

Hive常用函数

coalesce：
coalesce(‘string1’,‘string2’,‘string3’)
按顺序依次选取不为空的字符串，若都为空，则返回null

concat_ws：
concat_ws(’,’,‘string1’,‘string2’,‘string3’)
连接字符串，可以连接不同字段，也可以搭配group by 连接同一个字段

concat_ws(’,’,collect_set(application_input_dir))
再搭配collect_set()函数还可以实现去重

regexp_replace：
regexp_replace(device_id,’-’,’’)
regexp_replace(‘00000000-0206-c316-222a-12920206c316’, ‘-’, ‘’)
字符串模式匹配替换，第二个字符串是模式匹配字符串，有些需要加转义符（普通字符加转义符也不影响使用）。
例子：
regexp_replace(‘foolish’, ‘oo|is’, ‘’) 返回 flh

kehan_c

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
hive常用语句

创建表：create external table etl_fb_unmatched_history(device_id_md5 string,device_type string,platform string,package_name string)location ‘s3://mob-emr-test/dataplatform/DataWareHouse/data/dwh/etl...
复制链接

扫一扫