问题场景描述:遇到会有需要反复使用的结果,比如总数,一直join就感觉不够优雅,就想着能不能把结果输出到变量中,然后利用变量就好了。
然后百度搜了半天的结果无一例外是像下面这样,先设定了变量,然后用变量去做判断。这个结果和我要的其实算是相反的?
set cur_time = '2012-01-01'
select a
from table_name
where date_time = ${hiveconf:cur_time}
无奈之下换个语言搜索,stack overflow大法好,我竟然在差评回答中找到了我要的答案
正解:
像写其他代码初始化变量一样输出赋值,${hiveconf:xxx}
这样子检查结果(或者${hivevar:xxx}
如果你设置了用户变量的话
set num = select count(1) from table_name where visitTime = current_date;
${hiveconf:num};//usage
也可以存储多个结果字段
set total = select (case when user_type = '1' then 'Apple'
else 'Orange' end) as fruit,count(1) as cnt
from store_list
group by fruit;
${hiveconf:total};
但有个问题是,这样子赋的值,无论是单个字段还是多个,他们都是以类似dataframe的形式存储在变量中的,要取出来的话不知道该怎么弄,感觉只能改成用python写?(在线蹲个大佬解答
补充:hiveconf vs. hivevar vs. system
(贴一下链接上的解释
Most of the answers here have suggested to either use hiveconf or hivevar namespace to store the variable. And all those answers are right. However, there is one more namespace.
There are total three namespaces available for holding variables.
- hiveconf - hive started with this, all the hive configuration is stored as part of this conf. Initially, variable substitution was not part of hive and when it got introduced, all the user-defined variables were stored as part of this as well. Which is definitely not a good idea. So two more namespaces were created.
- hivevar: To store user variables
- system: To store system variables.
And so if you are storing a variable as part of a query (i.e. date or product_number) you should use hivevar namespace and not hiveconf namespace.
And this is how it works.
hiveconf is still the default namespace, so if you don’t provide any namespace it will store your variable in hiveconf namespace.
However, when it comes to referring a variable, it’s not true. By default it refers to hivevar namespace.
If you do not provide namespace as mentioned below, variable var will be stored in hiveconf namespace.