HQL的规范

最新推荐文章于 2023-04-10 23:28:31 发布

嘉平11

最新推荐文章于 2023-04-10 23:28:31 发布

阅读量684

点赞数

分类专栏： Hive

本文链接：https://blog.csdn.net/zgm12/article/details/104150570

版权

Hive 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

一、Group by

1.1 select 后面的字段必须在groupby后面出现，除了一些聚合函数

select
uname,
floor(age/10) flag
from program
group by uname;

会报错：


FAILED: SemanticException [Error 10025]: Line 2:6 Expression not in GROUP BY key '10'

原因：group by 后并没有age

解决：

select
uname,
floor(age/10) flag
from program
group by uname,age;

聚合函数则不用必须出现在groupby之后，如：

select
uname,
sum(age)
from program
group by uname;

1.2开窗和group by 的字段

如果有group，那么row_number中的order by项必须是group内的字段，
sum（）over（partition by store）只有partition 的话，好像求得sum是每个店铺的销售总额，窗口范围并不会到当前行截止，min函数也是这样的，其他的应该也是。

1.3groupby、orderby 和别名

先groupBy，后select，后orderby ，orderby 的字段名，还得是select 中的字段的名字。

select 
name ,substr(dt,0,4),max(times) 
from access
group by name,substr(dt,0,4) ;
可以

select 
name ,substr(dt,0,4) as y,max(times) 
from access
group by name,substr(dt,0,4) ;
可以

select 
name ,substr(dt,0,4) as y,max(times) 
from access
group by name,y ;
不可以
FAILED: SemanticException [Error 10004]:
 Line 4:14 Invalid table alias or column reference 'y':
(possible column names are: name, dt, times)

select 
name ,substr(dt,0,4) as y,max(times) 
from access
group by name,substr(dt,0,4) 
order by substr(dt,0,4) desc;
不可以
FAILED: SemanticException [Error 10004]: Line 5:16 Invalid 
table alias or column reference 'dt': (possible column names
 are: name, y, _c2)
 
 select 
name ,substr(dt,0,4) as y,max(times) 
from access
group by name,substr(dt,0,4) 
order by y desc;
可以

ps:mysql是不一样的，select 后面的别名可以出现在groupby后面

sparksql也是可以的；

二、一些常用的功能、函数

2.1 between范围

int型包头包尾，

对于parquet格式、以及时间好像有特殊情况，以后再研究

2.2 coalesce(x,y,z.....)

可以返回第一个不为null的值。，好像可以是不同类型的。

2.3 nvl(x,y)

NVL函数是一个空值转换函数NVL(表达式1,表达式2)如果表达式1为空值,NVL返回值为表达式2的值,否则返回表达式1的值。

三、开窗函数里使用sum函数

select
uname,programid,
row_number() over(partition by uname order by sum(playtime) desc) as rn
from program
 group by uname,programid;

//这样报错
//在orderby后面加sum函数报错，因为select后面没有加上sum(playtime)
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 4:0 Expression not in GROUP BY key 'playtime'

四、having

select a.sid,a.sname,avg(b.score) as `平均成绩`
from student2 a
join sc2 b on a.sid=b.sid 
group by a.sid,a.sname
having `平均成绩`>=60;

select 后面的别名，可以放在having后面

select a.sid,a.sname,round(avg(b.score),2) as `平均成绩`
from student2 a
join sc2 b on a.sid=b.sid 
group by a.sid,a.sname
having `平均成绩`>=60;
//平均成绩如果不加飘号，则编译不通过


select a.sid,a.sname,avg(b.score) as av
from student2 a
join sc2 b on a.sid=b.sid 
group by a.sid,a.sname
having av>=60;
//编译通过

select a.sid,a.sname,round(avg(b.score),2) as av
from student2 a
join sc2 b on a.sid=b.sid
group by a.sid,a.sname
having av>=60;
//编译不通过，为什么？？？

ps:执行顺序

（8）SELECT（9）DISTINCT <select_list>   
（1）FROM <left_table>
（3）<join_type> JOIN <right_table>
（2）ON <join_condition>
（4）WHERE <where_condition>
（5）GROUP BY <grout_by_list>
（6）WITH {CUTE|ROLLUP}
（7）HAVING <having_condition>
（10）ORDER BY <order_by_list>
（11）LIMIT <limit_number>

五、with as

with q1 as (select * from src where key= '5'),

q2 as (select * from src s2 where key = '4')

select * from q1 union all select * from q2;

with as 可以把数据缓存在内存中，避免多次计算重复的数据，with as就类似于一个视图或临时表，可以用来存储一部分的sql语句作为别名，不同的是with as 属于一次性的，而且必须要和其他sql一起使用才可以

六、一些时间函数

hive (default)> select to_date("2018-01-02 09:23:23");
OK
_c0
2018-01-02

select date_format("2018-01-02 09:23:23","yyyy-MM-dd");
OK
_c0
2018-01-02

hive (myhive2)> select months_between('2019-3-2','2019-2-1');
OK
_c0
1.03225806
Time taken: 0.228 seconds, Fetched: 1 row(s)

七、建表语句

1、load

LOAD DATA [ LOCAL ] INPATH '/path/to/local/files'

[ OVERWRITE ] INTO TABLE test

PARTITION (country='CHINA')

有LOCAL表示从本地文件系统加载（文件会被拷贝到HDFS中）

无LOCAL表示从HDFS中加载数据（注意：文件直接被移动！！！而不是拷贝！！！并且。。文件名都不带改的。。）

OVERWRITE 表示是否覆盖表中数据（或指定分区的数据）（没有OVERWRITE 会直接APPEND，而不会滤重!）

嘉平11

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
HQL的规范

一、Group by1.1selectuname,floor(age/10) flagfrom programgroup by uname;会报错：FAILED: SemanticException [Error 10025]: Line 2:6 Expression not in GROUP BY key '10'原因：group by 后并没有age解决...
复制链接

扫一扫