hive中查询语句执行的顺序
select * 4
from tb_name 1
where 2
grotp by 3
having 5
order by 6
limit 7
hive语句的冷门知识
join on 连接两个表时,on后面的判断条件不仅可以写相等,还可以写不等,或者大于或小于
show functions 可以查看所有的hive函数列表
例如:concat(name,"-","hello")
可以做字符串的拼接
如果有不会用的函数,可以输入desc function concat;
查看该函数的使用说明
在hive中,数字和字符串类型的数字做运算时,会自动进行类型转换
select "12"+13
也可以使用**cast()**进行手动转换
select cast ("12" as int) ;
hive的基础查询语句
– 求MF的平均年龄
select
gender,
avg(age) as avg_age
from
tb_user
group by gender ;
+---------+----------+
| gender | avg_age |
+---------+----------+
| F | 32.0 |
| M | 27.0 |
+---------+----------+
–按性别分区,然后求出平均工资大于2000的
select
gender,
avg(sal) as avg_sal
from
tb_user
group by gender
having avg_sal > 2000 ;
+---------+---------------------+
| gender | avg_sal |
+---------+---------------------+
| F | 3516.6666666666665 |
+---------+---------------------+
–按性别分区,然后求出平均工资并排序
select
gender,
avg(sal) as avg_sal
from
tb_user
group by gender
order by avg_sal asc
;
+---------+---------------------+
| gender | avg_sal |
+---------+---------------------+
| M | 1966.6666666666667 |
| F | 3516.6666666666665 |
+---------+---------------------+
设置测试环境下运行hive语句(本地模式运行,只在当前会话生效)
set mapreduce.framework.name=local;
永久生效需要配hive-site.xml的配置参数
– 查询用户信息 按照工资降序[排列
select
uid ,
name ,
sal ,
gender
from
tb_user
order by sal DESC
;
– 工资你最高的那个人的名字
select
tb_user.name ,
t.sal
from
tb_user
join
(select
sal
from
tb_user
order by sal desc limit 1
)t
on tb_user.sal = t.sal ;
yangmi 5000.0
yangxiao 5000.0
select
*
from
tb_user
where
sal =
(select
sal
from
tb_user
order by sal desc limit 1
) ;
条件判断语句case when then 的使用
用户工资组成表
gz.txt
uid jb jj tc deptno
1,2000,3000,1500,1
2,5000,500,1000,2
3,1500,1000,3000,2
4,3000,6000,8000,3
5,1500,2000,1800,1
6,2500,1000,1900,1
bm.txt 部门表
bid,bname
1,销售
2,技术
3,行政
员工信息表
yg.txt
uid name gender age
1,zs,M,28
2,ww,F,36
3,zl,F,48
4,pp,M,44
5,wb,M,32
6,TQ,F,32
1.求出公司中每个员工的姓名 和 三类收入中最高的那种收入的类型
- greatest(v1,v2…) 函数,求出括号内字段的最大值
select
y.name,
case
when g.tc==greatest(g.jj,g.jb,g.tc) then "tc"
when g.jj==greatest(g.jj,g.jb,g.tc) then "jj"
else "jb"
end as gz
from
tb_yg12 y
join
tb_gz12 g
on y.uid = g.uid;
2. 求出公司中每个岗位的薪资总和
select
b.bname,
sum(g.jb+g.tc+g.jj)
from tb_bm12 b
join tb_gz12 g
on b.bid=g.deptno
group by b.bid,b.bname
;
3.求出公司中每个岗位不同性别员工薪资总和
select
b.bname,y.gender,sum(g.jb+g.tc+g.jj)
from tb_bm12 b
join tb_gz12 g
join tb_yg12 y
on g.uid = y.uid and b.bid=g.deptno
group by b.bid,b.bname,y.gender
;
4.求出公司中不同性别、不同年龄阶段(20-30,31-40,41-50)的员工薪资总和
答案一:
select
y.gender,sum(g.jj+g.tc+g.jb),
case
when y.age>=20 and y.age<= 30 then "20-30"
when y.age>30 and y.age<= 40 then "30-40"
when y.age>40 and y.age<= 50 then "40-50"
else "qt"
end as ages
from tb_yg12 y
join tb_gz12 g on y.uid = g.uid
group by y.gender,
case
when y.age>=20 and y.age<= 30 then "20-30"
when y.age>30 and y.age<= 40 then "30-40"
when y.age>40 and y.age<= 50 then "40-50"
else "qt" end;
答案二:
with a as --将子查询命名为表a,然后和后面的表进行jion
(select
y.gender,y.age,y.uid,
case
when y.age>=20 and y.age<= 30 then "20-30"
when y.age>30 and y.age<= 40 then "30-40"
when y.age>40 and y.age<= 50 then "40-50"
else "qt" end as ages
from tb_yg12 y)
select
a.gender,a.ages,sum(g.jj+g.tc+g.jb) sum_sal
from tb_gz12 g
join a
on a.uid= g.uid
group by a.gender,a.ages;
CASE WHEN (switch case if else)
- 数据准备
悟空 A 男
娜娜 A 男
宋宋 B 男
凤姐 A 女
热巴 B 女
慧慧 B 女
2.需求
求出不同部门男女各多少人。结果如下:
A 2 1
B 1 2
3.创建本地emp_sex.txt,导入数据
[doit@hadoop102 datas]$ vi emp_sex.txt
4.创建hive表并导入数据
create table emp_sex(
name string,
dept_id string,
sex string)
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/datas/emp_sex.txt' into table emp_sex;
5.按需求查询数据
select
dept_id,
sum(case sex when '男' then 1 else 0 end) male_count,
sum(case sex when '女' then 1 else 0 end) female_count
from
emp_sex
group by
dept_id;