项目要分析如下一种数据:
607756662#5,12;10985547374#5;147508043#5;11123876158#5,12;
写的hive sql如下:
select nvl(split(gdmodel,'#')[0],0) gdid
from
(select recall_set
from aps.dpa_gds_recall_set t
where statis_date=20191124
) t lateral view explode(split(recall_set,';')) c as gdmodel;
报错,认为引号中的分号就是sql的结束。
sql修改为:
select nvl(split(gdmodel,'#')[0],0) as gdid
from (select recall_set from aps.dpa_gds_recall_set t where statis_date=20191124) t lateral view explode(split(recall_set,'\073')) c as gdmodel;
原因:
分号是sql的结束符,在HDFS中识别不了,因此需要用分号的二进制\073来表示。
参考:https://blog.csdn.net/dj_2009291007/article/details/78667695