文章目录
1. 操作外部表
1. 修改外部表为内部表
alter table xxx.xxx set tblproperties ('EXTERNAL' = 'false');
2. 删除hive表分区
alter table xxx.xxx drop if exists partition (dp='bkactive',dt<'xxx');
3.修改内部表为外部表
alter table xxx.xxx set tblproperties ('EXTERNAL' = true);
2. 设置内存大小
set spark.executor.memory=10g;
3. 设置非严格模式
set hive.exec.dynamici.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.support.quoted.identifiers=None;
4. 设置执行引擎
set hive.execution.engine=mr;
set hive.execution.engine=spark;
set hive.execution.engine=tez;
5. union all报错
The column KEY._col1:0._col0 is not in the vectorization context column map
添加参数:
set hive.vectorized.execution.enabled=false;
6. 将yyyy-MM-dd HH:mm:ss的timestamp转化为yyyyMMdd或者yyyyMMddHH
FROM_UNIXTIME(unix_timestamp(ctime),'yyyyMMdd') dt
FROM_UNIXTIME(unix_timestamp(ctime),'yyyyMMddHH') dh
7. timestamp转换为yyyyMMdd/yyyy-MM-dd
FROM_UNIXTIME(cast(ctime as int),'yyyyMMdd') dt
FROM_UNIXTIME(cast(ctime as int),'yyyy-MM-dd') dt
8. impala更新数据
invalidate metadata ads.ads_cps_yg_referral_push_report
9. csv建表语句
--在tmp中建表.表名可以修改. 即 tmp.表名
CREATE EXTERNAL TABLE `tmp.tmp_xhd_csv`(
-- csv中各种字段
`uid` string COMMENT '',
`days_acl` string COMMENT '',
`days_arrange` string COMMENT '',
`days_ac` string COMMENT '',
`days_nonac` string COMMENT '',
`dt_max` string COMMENT '',
`days_acavg` string COMMENT '',
`avtice` string COMMENT '')
COMMENT '&增量'
-- csv默认根据','分割
row format delimited
fields terminated by ','
-- 指定路径,指定到目录下.即/user/xiehuadong/questionnaires
LOCATION
'hdfs://nameservice1/user/hive/warehouse/tmp.db/zhd_csv'
-- csv第一行一般为字段名,此设置为跳过第一行
TBLPROPERTIES('skip.header.line.count'='1')
;
10. 计算分钟差
两个时间计算时间差的绝对值.
select abs((unix_timestamp('2021-03-01 00:00:00','yyyy-MM-dd HH:mm')- unix_timestamp('2021-03-02 00:01:00','yyyy-MM-dd HH:mm'))/60)
11. 列转行
explode对array和map生效
select app_id,app_name
from table_a
later view explode(app_names) table_view as app_name;
根据|切割
select app_id,app_name
from table_a
later view explode(split(app_names,'|')) table_view as app_name;
12. coalesce,nvl的坑
(NVL同coalesce,只会过滤NULL,不会过滤空字符串)
示例1:
select imei,oaid
from (
select case when length(imei)>0 then imei end imei,
case when length(trim(imei))=0 or imei is null
or imei = 'NULL'
or imei='null'
and length(oaid)>0
then oaid end oaid
from dwd.dwd_tf_tf_h5_dsp_log
where dt='20210413'
) s
group by imei,oaid;
示例2:
select imei(
select coalesce(imei,oaid) imei
from dwd.dwd_tf_tf_h5_dsp_log
where dt='20210413'
) s
group by imei;
发现示例2中数据量要比示例1中少,分析原因:
imei oaid
'' 1
NULL 2
3 4
'' 5
6 7
示例2 中的coalesce只会过滤null,并不会过滤空字符串,所以取数的时候,coalesce(imei,oaid)结果为:
imei
''
2
3
''
6
然后我们进行distinct后,只保存了
imei
''
2
3
6
共4条数据.
而CASE WHEN则避免了这个情况,我们使用了length()>0,去重前的结果为:
imei oaid
1
2
3
5
6
去重后:
imei oaid
1
2
3
5
6
共5条数据,比coalesce 4条数据多1条.
13. 新增字段
alter table table_name add columns(new_column string comment '新增字段');
14. 增加分区
alter table al.dwd_rec_clt_user_read_chpts add partition (dt='20210701');
15. beeline重定向
beeline --silent=true \
-u jdbc:hive2://192.168.1.124:10000/default \
-n uname -p 123456 \
--hiveconf mapreduce.job.queuename=root.yarn.hdfs \
--hiveconf spark.app.name="clt_abtest" \
-e "${sql}" >> data/report_2021_location.csv
16. 设置队列名
set mapreduce.job.queuename=root.yarn.rectask;