员工信息表emp:
字段:员工id,员工名字,工作岗位,部门经理,受雇日期,薪水,奖金,部门编号
英文名:EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,BONUS,DEPTNO
7369,SMITH,CLERK,7902,1980-12-17,800,null,20
部门信息表dept:
字段:部门编号,部门名称,部门地点
英文名:DEPTNO,DEPTNAME,DEPTADDR
10,ACCOUNTING,NEW YORK
1. 列出至少有一个员工的所有部门。
select distinct t1.deptno ,t2.deptname
from(select deptno ,count(*) as num from emp group by deptno having num>=1 ) as t1
join dept as t2
on t1.deptno = t2.deptno;
-
列出薪金比“SMITH”多的所有员工。
之前的博客里面有详细解释
hive笔记 —— 解决where子句不能使用子查询的前提下,比较子查询结果(等于,大于,平均值) -
列出所有员工的姓名及其直接上级的姓名。
select e1.ename,e2.ename from emp as e1 left join emp as e2 on e1.MGR = e2.empno;
- 列出受雇日期早于其直接上级的所有员工
select e1.ename,e1.hiredate,e2.ename,e2.hiredate
from emp as e1 left join emp as e2 on e1.MGR = e2.empno
where e1.hiredate < e2.hiredate;
- 列出部门名称和这些部门的员工信息,同时列出那些没有员工的部门。
select * from dept left join emp on dept.deptno=emp.deptno;
- 列出所有“CLERK”(办事员)的姓名及其部门名称。
select ename,deptname from emp
left join dept on emp.deptno=dept.deptno
where emp.job='CLERK'
group by ename,deptname;
7. 列出最低薪金大于1500的各种工作。
select job from emp group by job having min(sal)>1500;
9. 列出薪金高于公司平均薪金的所有员工。
with t1 as (select ename,sal,1 as eid1 from emp),
t2 as (select avg(sal) as avg_sal , 1 as eid2 from emp)
select ename from t1 left join t2
on t1.eid1 = t2.eid2
where t1.sal > t2.avg_sal;
10.列出与“SCOTT”从事相同工作的所有员工。
select * from emp
where job in (select job from emp as e where e.ename = 'SCOTT')
and ename != 'SCOTT';
11.列出薪金等于部门30中员工的薪金的所有员工的姓名和薪金。
select ename,sal from emp
where sal in (select sal from emp as e where e.deptno=30)
and deptno != 30;
结果为空
12.列出薪金高于在部门30工作的所有员工的薪金的员工姓名和薪金。
with t1 as (select ename,sal,1 as eid1 from emp where deptno != 30),
t2 as (select max(sal) as max_sal,1 as eid2 from emp where deptno=30 group by deptno)
select t1.ename,t1.sal from t1 left join t2 on t1.eid1=t2.eid2 where t1.sal>t2.max_sal;
13.列出在每个部门工作的员工数量、平均工资和平均服务期限。
select count(*) , round(avg(sal),1) , round(avg(datediff(current_date,hiredate)),1)
from emp group by deptno;
指定精度取整函数: round
语法: round(double a, int d)
返回值: DOUBLE
说明: 返回指定精度 d 的 double 类型
14.列出所有员工的姓名、部门名称和工资。
select distinct ename,deptname,sal from emp left join dept on emp.deptno=dept.deptno;
但题目这里,只是部门表里面有重复的部门名称,所以可以将distinct移到dept表这边
with t1 as (select ename,sal,deptno from emp),
t2 as (select distinct deptno,deptname from dept)
select ename,deptname,sal from t1 join t2 on t1.deptno=t2.deptno;
15.列出所有部门的详细信息和部门人数。
select t1.deptno ,t1.deptname ,t1.deptaddr ,t2.num
from dept as t1
left join(select deptno ,count(empno) as num from emp group by deptno) as t2
on t1.deptno = t2.deptno;
16.列出各种工作的最低工资。
select job,min(sal) from emp group by job;
17.列出各个部门的MANAGER(经理)的最低薪金。
select deptno,min(sal) as min_sal from emp where job='MANAGER' group by deptno;
18.列出所有员工的年工资,按年薪从低到高排序。
select ename, (sal*12 + if(bonus is null,0,bonus)) as money from emp;
- 列出每个部门薪水前两名最高的人员名称以及薪水。
select ename,sal
from (select ename,sal,row_number() over (partition by deptno order by sal desc) as num from emp) as t
where t.num <=2;
- 列出每个员工从受雇开始到2018-12-12 为止共受雇了多少天。
select ename,datediff('2018-12-12',hiredate) as d from emp;
统计用户连续交易的总额、连续登陆天数、连续登陆开始和结束时间、间隔天数
create table days (id int,daystr string,amount decimal(10,2));
insert into days values
(1, "2019-02-08", 6214.23),
(1, "2019-02-08", 6247.32),
(1, "2019-02-09", 85.63),
(1, "2019-02-09", 967.36),
(1, "2019-02-10", 85.69),
(1, "2019-02-12", 769.85),
(1, "2019-02-13", 943.86),
(1, "2019-02-14", 538.42),
(1, "2019-02-15", 369.76),
(1, "2019-02-16", 369.76),
(1, "2019-02-18", 795.15),
(1, "2019-02-19", 715.65),
(1, "2019-02-21", 537.71),
(2, "2019-02-08", 6214.23),
(2, "2019-02-08", 6247.32),
(2, "2019-02-09", 85.63),
(2, "2019-02-09", 967.36),
(2, "2019-02-10", 85.69),
(2, "2019-02-12", 769.85),
(2, "2019-02-13", 943.86),
(2, "2019-02-14", 943.18),
(2, "2019-02-15", 369.76),
(2, "2019-02-18", 795.15),
(2, "2019-02-19", 715.65),
(2, "2019-02-21", 537.71),
(3, "2019-02-08", 6214.23),
(3, "2019-02-08", 6247.32),
(3, "2019-02-09", 85.63),
(3, "2019-02-09", 967.36),
(3, "2019-02-10", 85.69),
(3, "2019-02-12", 769.85),
(3, "2019-02-13", 943.86),
(3, "2019-02-14", 276.81),
(3, "2019-02-15", 369.76),
(3, "2019-02-16", 369.76),
(3, "2019-02-18", 795.15),
(3, "2019-02-19", 715.65),
(3, "2019-02-21", 537.71);
建表和插入数据完毕
要想求出连续登录天数,连续登陆开始和结束时间
首先要对表格进行去重,因为一个用户可能在同一天多次登录网站,现在只需要知道用户在哪几天登录了网站,而不需要知道一天登录了几次网站。另外在合并去重的时候,记得要把一天内多次交易的金额也合并成一天内的交易金额
select id,daystr,sum(amount) as money
from days
group by id,daystr;
在这之后,就把上面的去重语句放进 with as 里面
with t1 as (select id,daystr,sum(amount) as money from days group by id,daystr)
然后还要做第二件事,用 row_number() 给每一行数据打上一个 行号,然后用日期减去行号,得出的结果起一个别名叫 day,如果连续多行数据的 day 值相同就说明是这几行是连续登录。
因为如果是连续登录,那么日期是递增的,每次增加一天,然后行号也是递增的,每次加一。因此用连续加一的天数减去连续加一的行号,结果自然都是被一样的。如果不一样就说明中间有几天没有登录,导致增加的天数超过一。
with t1 as (select id,daystr,sum(amount) as money from days group by id,daystr)
select id,daystr,money,date_sub(daystr,row_number() over (partition by id)) as day from t1;
从图上可以看出来,前三条记录是连续登录,因此最后一列 day 的值都是 2019-02-07
然后到了第四条记录,由于2019-02-11这一天没有登录,所以日期直接加二到了2019-02-12,减去行号4,结果为2019-02-08
同样的,将上面这张表的sql语句也放入 with as 中
with t1 as (select id,daystr,sum(amount) as money from days group by id,daystr),
t2 as (select id,daystr,money,date_sub(daystr,row_number() over (partition by id)) as day from t1)
前期表格的处理完成
计算连续登录天数,按照 day 值进行group by分组聚合即可,交易金额同理。按照分组的结果求daystr 的min值和max值,即可求出连续登陆开始和结束时间
with t1 as (select id,daystr,sum(amount) as money from days group by id,daystr),
t2 as (select id,daystr,money,date_sub(daystr,row_number() over (partition by id)) as day from t1)
select id,
count(1) as count,
min(daystr) as startday,
max(daystr) as endday,
sum(money) as sum,
datediff(day , lag(day,1,day) over (partition by id order by day)) as intervalday
from t2
group by id,day;
上面sql语句,select部分的值从上到下依次是,用户id,连续登录天数,连续登录的开始日期,连续登录的结束日期,连续登录的交易金额总和,连续登录日期之间的间隔天数
其中,求取连续登录日期之间的间隔天数时,用到了 lag
LAG(col,n,DEFAULT) 用于统计窗口内往上第n行值
第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)
因为已经按照 id,day 进行group by 分组,所以每一行的day值都是一组连续登录天数的day值。所以两行的day值相减,即可得出连续登录日期之间的间隔天数