HiveQL常用查询语句where、group by、having、join子句记录

由于hivesql中查询语句时,常用查询条件比较多,所以专门写一篇博文对HiveQL的查询语句进行总结,理清联系和区别。

目录

1. where子句(不能跟别名)

(1)比较运算符

(2)like和rlike(正则匹配)

(3)逻辑运算符and or not

2. group by子句(每...)

3. having子句(只用于group by之后)

4. join子句(A join B on A.key = B.key where X.key is null)

(1)A join B on A.字段 = B.字段,连接两张表

(2)内连接(求交集)

(3)左外连接(求左表独有)

(4)右外连接(求右表独有)

(5)全外连接(求并集)

(6)去重全外连接(求并集并且去除交集部分)

(7)多表连接

(8)笛卡尔积


1. where子句

where子句使用一般紧接from tablename,并且where子句中不能使用字段别名

select ~ from tablename where语句

(1)比较运算符

< > 等价于= !(需要注意的是,<>会排除掉null值,如果想保留null,则用where a.id<>10 or a is null)
is null 
is not null;

A (not) between B and C:A介于[B,C]之间;

A in (B ,C):使用in运算列表中的值,即查询A=B、A=C的情况;

(2)like和rlike(正则匹配)

%代表任意个字符,通配符;_代表一个字符

A (not) like 'x%':A必须以x开头

A (not) like '%x':A必须以x结尾

A (not) like '%x%':A必须包含x

A (not) like ‘_5%’:A第2位数是5

A rlike '[Java正则表达式]'

ps:必须加单引号

(3)逻辑运算符and or not

同时查询两个字段,逻辑并、逻辑或、逻辑否

2. group by子句(每...)

group by子句通常和聚合函数一起使用,后接分组字段,分组后对每个组进行聚合操作

在不使用聚合函数的时候,group by 子句中必须包含所有的列,即select A,B,C from table group by A,B,C;否则会报错,但这样group by就只有去重的意义了;所以一般group by都需要跟聚合函数配合使用;

一般用于需求中要求每个XXX,便对该XXX字段进行group by分组。

此外,group by后和where后都不能接字段别名。

计算emp表中每个部门每个岗位的最高薪水

select deptno,job,max(sal) max_sal from emp group by deptno,job;

+---------+------------+----------+--+
| deptno  |    job     | max_sal  |
+---------+------------+----------+--+
| 10      | CLERK      | 1300.0   |
| 10      | MANAGER    | 2450.0   |
| 10      | PRESIDENT  | 5000.0   |
| 20      | ANALYST    | 3000.0   |
| 20      | CLERK      | 1100.0   |
| 20      | MANAGER    | 2975.0   |
| 30      | CLERK      | 950.0    |
| 30      | MANAGER    | 2850.0   |
| 30      | SALESMAN   | 1600.0   |
+---------+------------+----------+--+

ps:groupby,对多行聚合后输出一行的结果;若没有max(col)、collect_set(col)、concat_ws('~',col)等行转列,即多行变一行的sql条件,是没法聚合的会报错。

3. having子句(只用于group by之后)

having只用于group by子句后,给分组字段添加条件

一般是跟聚合函数,例如select count(*) from table group by a having count(*)>10这种,having可以接别名

求每个部门的平均薪水大于2000的部门

select deptno,avg(sal) avg_sal from emp group by deptno having avg_sal>2000;


+---------+---------------------+--+
| deptno  |       avg_sal       |
+---------+---------------------+--+
| 10      | 2916.6666666666665  |
| 20      | 2175.0              |
+---------+---------------------+--+

4. join子句(A join B on A.key = B.key where X.key is null)

(1)A join B on A.字段 = B.字段,连接两张表

例如两张hive表,员工表emp和部门表dept,通过deptno来串联

要求:根据员工表和部门表中的部门编号,查询员工编号、员工名称和部门名称

0: jdbc:hive2://hadoop100:10000> select e.empno,e.ename,d.dname from emp e join dept d on e.deptno = d.deptno;

+----------+----------+-------------+--+
| e.empno  | e.ename  |   d.dname   |
+----------+----------+-------------+--+
| 7369     | SMITH    | RESEARCH    |
| 7499     | ALLEN    | SALES       |
| 7521     | WARD     | SALES       |
| 7566     | JONES    | RESEARCH    |
| 7654     | MARTIN   | SALES       |
| 7698     | BLAKE    | SALES       |
| 7782     | CLARK    | ACCOUNTING  |
| 7788     | SCOTT    | RESEARCH    |
| 7839     | KING     | ACCOUNTING  |
| 7844     | TURNER   | SALES       |
| 7876     | ADAMS    | RESEARCH    |
| 7900     | JAMES    | SALES       |
| 7902     | FORD     | RESEARCH    |
| 7934     | MILLER   | ACCOUNTING  |
+----------+----------+-------------+--+

(2)内连接(求交集)

select * from A a
join B b
on a.key = b.key;

(3)左外连接(求左表独有)

select * from A a
left join B b
on a.key = b.key
where b.key is null;

(4)右外连接(求右表独有)

select * from A a
left join B b
on a.key = b.key
where a.key is null;

(5)全外连接(求并集)

select * from A a
full join B b
on a.key = b.key;

(6)去重全外连接(求并集并且去除交集部分)

select * from A a
full join B b
on a.key = b.key
where a.key is null or b.key is null;

(7)多表连接

select a.name, b.name, c.name
from A a
join B b
on a.key1 = b.key1
join C c
on a.key2 = c.key2;

ps:hive从左到右顺序执行,多表连接join顺序也是按照查询字段从上到下。

(8)笛卡尔积

0: jdbc:hive2://hadoop100:10000> select empno,dname from emp, dept;

  • 3
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
--高级查询在数据库中用得是最频繁的,也是应用最广泛的。 Ø 基本常用查询 --select select * from student; --all 查询所有 select all sex from student; --distinct 过滤重复 select distinct sex from student; --count 统计 select count(*) from student; select count(sex) from student; select count(distinct sex) from student; --top 取前N条记录 select top 3 * from student; --alias column name 列重命名 select id as 编号, name '名称', sex 性别 from student; --alias table name 表重命名 select id, name, s.id, s.name from student s; --column 列运算 select (age + id) col from student; select s.name + '-' + c.name from classes c, student s where s.cid = c.id; --where 条件 select * from student where id = 2; select * from student where id > 7; select * from student where id < 3; select * from student where id <> 3; select * from student where id >= 3; select * from student where id <= 5; select * from student where id !> 3; select * from student where id !< 5; --and 并且 select * from student where id > 2 and sex = 1; --or 或者 select * from student where id = 2 or sex = 1; --between ... and ... 相当于并且 select * from student where id between 2 and 5; select * from student where id not between 2 and 5; --like 模糊查询 select * from student where name like '%a%'; select * from student where name like '%[a][o]%'; select * from student where name not like '%a%'; select * from student where name like 'ja%'; select * from student where name not like '%[j,n]%'; select * from student where name like '%[j,n,a]%'; select * from student where name like '%[^ja,as,on]%'; select * from student where name like '%[ja_on]%'; --in 子查询 select * from student where id in (1, 2); --not in 不在其中 select * from student where id not in (1, 2); --is null 是空 select * from student where age is null; --is not null 不为空 select * from student where age is not null; --order by 排序 select * from student order by name; select * from student order by name desc; select * from student order by name asc; --group by 分组 按照年龄进行分组统计 select count(age), age from student group by age; 按照性别进行分组统计 select count(*), sex from student group by sex; 按照年龄和性别组合分组统计,并排序 select count(*), sex from student group by sex, age order by age; 按照性别分组,并且是id大于2的记录最后按照性别排序 select count(*), sex from student where id > 2 group by sex order by sex; 查询id大于2的数据,并完成运算后的结果进行分组和排序 select count(*), (sex * id) new from student where id > 2 group by sex * id order by sex * id; --group by all 所有分组 按照年龄分组,是所有的年龄 select count(*), age from student group by all age; --having 分组过滤条件 按照年龄分组,过滤年龄为空的数据,并且统计分组的条数和现实年龄信息 select count(*), age from student group by age having age is not null; 按照年龄和cid组合分组,过滤条件是cid大于1的记录 select count(*), cid, sex from student group by cid, sex having cid > 1; 按照年龄分组,过滤条件是分组后的记录条数大于等于2 select count(*), age from student group by age having count(age) >= 2; 按照cid和性别组合分组,过滤条件是cid大于1,cid的最大值大于2 select count(*), cid, sex from student group by cid, sex having cid > 1 and max(cid) > 2; Ø 嵌套子查询查询是一个嵌套在select、insert、update或delete语句或其他子查询中的查询。任何允许使用表达式的地方都可以使用子查询。子查询也称为内部查询或内部选择,而包含子查询语句也成为外部查询或外部选择。 # from (select … table)示例 将一个table的查询结果当做一个新表进行查询 select * from ( select id, name from student where sex = 1 ) t where t.id > 2; 上面括号中的语句,就是子查询语句(内部查询)。在外面的是外部查询,其中外部查询可以包含以下语句: 1、 包含常规选择列表组件的常规select查询 2、 包含一个或多个表或视图名称的常规from语句 3、 可选的where子句 4、 可选的group by子句 5、 可选的having子句 # 示例 查询班级信息,统计班级学生人生 select *, (select count(*) from student where cid = classes.id) as num from classes order by num; # in, not in子句查询示例 查询班级id大于小于的这些班级的学生信息 select * from student where cid in ( select id from classes where id > 2 and id < 4 ); 查询不是班的学生信息 select * from student where cid not in ( select id from classes where name = '2班' ) in、not in 后面的子句返回的结果必须是一列,这一列的结果将会作为查询条件对应前面的条件。如cid对应子句的id; # exists和not exists子句查询示例 查询存在班级id为的学生信息 select * from student where exists ( select * from classes where id = student.cid and id = 3 ); 查询没有分配班级的学生信息 select * from student where not exists ( select * from classes where id = student.cid ); exists和not exists查询需要内部查询和外部查询进行一个关联的条件,如果没有这个条件将是查询到的所有信息。如:id等于student.id; # some、any、all子句查询示例 查询班级的学生年龄大于班级的学生的年龄的信息 select * from student where cid = 5 and age > all ( select age from student where cid = 3 ); select * from student where cid = 5 and age > any ( select age from student where cid = 3 ); select * from student where cid = 5 and age > some ( select age from student where cid = 3 ); Ø 聚合查询 1、 distinct去掉重复数据 select distinct sex from student; select count(sex), count(distinct sex) from student; 2、 compute和compute by汇总查询 对年龄大于的进行汇总 select age from student where age > 20 order by age compute sum(age) by age; 对年龄大于的按照性别进行分组汇总年龄信息 select id, sex, age from student where age > 20 order by sex, age compute sum(age) by sex; 按照年龄分组汇总 select age from student where age > 20 order by age, id compute sum(age); 按照年龄分组,年龄汇总,id找最大值 select id, age from student where age > 20 order by age compute sum(age), max(id); compute进行汇总前面是查询的结果,后面一条结果集就是汇总的信息。compute子句中可以添加多个汇总表达式,可以添加的信息如下: a、 可选by关键字。它是每一列计算指定的行聚合 b、 行聚合函数名称。包括sum、avg、min、max、count等 c、 要对其执行聚合函数的列 compute by适合做先分组后汇总的业务。compute by后面的列一定要是order by中出现的列。 3、 cube汇总 cube汇总和compute效果类似,但语法较简洁,而且返回的是一个结果集。 select count(*), sex from student group by sex with cube; select count(*), age, sum(age) from student where age is not null group by age with cube; cube要结合group by语句完成分组汇总 Ø 排序函数 排序在很多地方需要用到,需要对查询结果进行排序并且给出序号。比如: 1、 对某张表进行排序,序号需要递增不重复的 2、 对学生的成绩进行排序,得出名次,名次可以并列,但名次的序号是连续递增的 3、 在某些排序的情况下,需要跳空序号,虽然是并列 基本语法 排序函数 over([分组语句] 排序子句[desc][asc]) 排序子句 order by 列名, 列名 分组子句 partition by 分组列, 分组列 # row_number函数 根据排序子句给出递增连续序号 按照名称排序的顺序递增 select s.id, s.name, cid, c.name, row_number() over(order by c.name) as number from student s, classes c where cid = c.id; # rank函数函数 根据排序子句给出递增的序号,但是存在并列并且跳空 顺序递增 select id, name, rank() over(order by cid) as rank from student; 跳过相同递增 select s.id, s.name, cid, c.name, rank() over(order by c.name) as rank from student s, classes c where cid = c.id; # dense_rank函数 根据排序子句给出递增的序号,但是存在并列不跳空 不跳过,直接递增 select s.id, s.name, cid, c.name, dense_rank() over(order by c.name) as dense from student s, classes c where cid = c.id; # partition by分组子句 可以完成对分组的数据进行增加排序,partition by可以与以上三个函数联合使用。 利用partition by按照班级名称分组,学生id排序 select s.id, s.name, cid, c.name, row_number() over(partition by c.name order by s.id) as rank from student s, classes c where cid = c.id; select s.id, s.name, cid, c.name, rank() over(partition by c.name order by s.id) as rank from student s, classes c where cid = c.id; select s.id, s.name, cid, c.name, dense_rank() over(partition by c.name order by s.id) as rank from student s, classes c where cid = c.id; # ntile平均排序函数 将要排序的数据进行平分,然后按照等分排序。ntile中的参数代表分成多少等分。 select s.id, s.name, cid, c.name, ntile(5) over(order by c.name) as ntile from student s, classes c where cid = c.id; Ø 集合运算 操作两组查询结果,进行交集、并集、减集运算 1、 union和union all进行并集运算 --union 并集、不重复 select id, name from student where name like 'ja%' union select id, name from student where id = 4; --并集、重复 select * from student where name like 'ja%' union all select * from student; 2、 intersect进行交集运算 --交集(相同部分) select * from student where name like 'ja%' intersect select * from student; 3、 except进行减集运算 --减集(除相同部分) select * from student where name like 'ja%' except select * from student where name like 'jas%'; Ø 公式表表达式 查询表的时候,有时候中间表需要重复使用,这些子查询被重复查询调用,不但效率低,而且可读性低,不利于理解。那么公式表表达式可以解决这个问题。 我们可以将公式表表达式(CET)视为临时结果集,在select、insert、update、delete或是create view语句的执行范围内进行定义。 --表达式 with statNum(id, num) as ( select cid, count(*) from student where id > 0 group by cid ) select id, num from statNum order by id; with statNum(id, num) as ( select cid, count(*) from student where id > 0 group by cid ) select max(id), avg(num) from statNum; Ø 连接查询 1、 简化连接查询 --简化联接查询 select s.id, s.name, c.id, c.name from student s, classes c where s.cid = c.id; 2、 left join左连接 --左连接 select s.id, s.name, c.id, c.name from student s left join classes c on s.cid = c.id; 3、 right join右连接 --右连接 select s.id, s.name, c.id, c.name from student s right join classes c on s.cid = c.id; 4、 inner join内连接 --内连接 select s.id, s.name, c.id, c.name from student s inner join classes c on s.cid = c.id; --inner可以省略 select s.id, s.name, c.id, c.name from student s join classes c on s.cid = c.id; 5、 cross join交叉连接 --交叉联接查询,结果是一个笛卡儿乘积 select s.id, s.name, c.id, c.name from student s cross join classes c --where s.cid = c.id; 6、 自连接(同一张表进行连接查询) --自连接 select distinct s.* from student s, student s1 where s.id <> s1.id and s.sex = s1.sex; Ø 函数 1、 聚合函数 max最大值、min最小值、count统计、avg平均值、sum求和、var求方差 select max(age) max_age, min(age) min_age, count(age) count_age, avg(age) avg_age, sum(age) sum_age, var(age) var_age from student; 2、 日期时间函数 select dateAdd(day, 3, getDate());--加天 select dateAdd(year, 3, getDate());--加年 select dateAdd(hour, 3, getDate());--加小时 --返回跨两个指定日期的日期边界数和时间边界数 select dateDiff(day, '2011-06-20', getDate()); --相差秒数 select dateDiff(second, '2011-06-22 11:00:00', getDate()); --相差小时数 select dateDiff(hour, '2011-06-22 10:00:00', getDate()); select dateName(month, getDate());--当前月份 select dateName(minute, getDate());--当前分钟 select dateName(weekday, getDate());--当前星期 select datePart(month, getDate());--当前月份 select datePart(weekday, getDate());--当前星期 select datePart(second, getDate());--当前秒数 select day(getDate());--返回当前日期天数 select day('2011-06-30');--返回当前日期天数 select month(getDate());--返回当前日期月份 select month('2011-11-10'); select year(getDate());--返回当前日期年份 select year('2010-11-10'); select getDate();--当前系统日期 select getUTCDate();--utc日期 3、 数学函数 select pi();--PI函数 select rand(100), rand(50), rand(), rand();--随机数 select round(rand(), 3), round(rand(100), 5);--精确小数位 --精确位数,负数表示小数点前 select round(123.456, 2), round(254.124, -2); select round(123.4567, 1, 2);

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值