假设有表goods(日期,产品id,产品当日收入,产品当日成本),日期和产品id是组合主键,有若干条数据,日期范围2016年1月1日至今,且一定每个产品,每天都有数据
写出SQL实现如下要求:
数据文件:goods.txt数据:
2018-03-01,a,3000,2500
2018-03-01,b,4000,3200
2018-03-01,c,3200,2400
2018-03-01,d,3000,2500
2018-03-02,a,3000,2500
2018-03-02,b,1500,800
2018-03-02,c,2600,1800
2018-03-02,d,2400,1000
2018-03-03,a,3100,2400
2018-03-03,b,2500,2100
2018-03-03,c,4000,1200
2018-03-03,d,2500,1900
2018-03-04,a,2800,2400
2018-03-04,b,3200,2700
2018-03-04,c,2900,2200
2018-03-04,d,2700,2500
2018-03-05,a,2700,1000
2018-03-05,b,1800,200
2018-03-05,c,5600,2200
2018-03-05,d,1200,1000
2018-03-06,a,2900,2500
2018-03-06,b,4500,2500
2018-03-06,c,6700,2300
2018-03-06,d,7500,5000
2018-04-01,a,3000,2500
2018-04-01,b,4000,3200
2018-04-01,c,3200,2400
2018-04-01,d,3000,2500
2018-04-02,a,3000,2500
2018-04-02,b,1500,800
2018-04-02,c,4600,1800
2018-04-02,d,2400,1000
2018-04-03,a,6100,2400
2018-04-03,b,4500,2100
2018-04-03,c,6000,1200
2018-04-03,d,3500,1900
2018-04-04,a,2800,2400
2018-04-04,b,3200,2700
2018-04-04,c,2900,2200
2018-04-04,d,2700,2500
2018-04-05,a,4700,1000
2018-04-05,b,3800,200
2018-04-05,c,5600,2200
2018-04-05,d,5200,1000
2018-04-06,a,2900,2500
2018-04-06,b,4500,2500
2018-04-06,c,6700,2300
2018-04-06,d,7500,5000
要求:输出2018年3月中每一天与上一天相比,总成本的变化。
1)先求2018每一天的总成本。创建中间表方便使用。
create table goods_day as
select dt,sum(income) income,sum(cost) cost
from goods
where substring(dt,1,4)="2018"
group by dt;
表内容:
select * from goods_day limit 10;
+---------------+-------------------+-----------------+
| goods_day.dt | goods_day.income | goods_day.cost |
+---------------+-------------------+-----------------+
| 2018-03-01 | 13200 | 10600 |
| 2018-03-02 | 9500 | 6100 |
| 2018-03-03 | 12100 | 7600 |
| 2018-03-04 | 11600 | 9800 |
| 2018-03-05 | 11300 | 4400 |
| 2018-03-06 | 21600 | 12300 |
| 2018-04-01 | 13200 | 10600 |
| 2018-04-02 | 11500 | 6100 |
| 2018-04-03 | 20100 | 7600 |
| 2018-04-04 | 11600 | 9800 |
+---------------+-------------------+-----------------+
2)
方法一:开窗over函数中取当前[ current ]数据与前一条数据 [ following ] ,的第一个值(first_value)和最后一个值(last_value)得差。
select dt,cost,
cost - (LAST_VALUE(cost) OVER(ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)) as change
from goods_day
where substring(dt,6,2)="03"
order by dt;
方法二:使用lead函数获取前一行的值。
select dt,cost,
cost - lead(cost,1,cost)over() as change
from goods_day
substring(dt,6,2)="03"
order by dt;
结果如下:
+-------------+--------+---------+
| dt | cost | change |
+-------------+--------+---------+
| 2018-03-01 | 10600 | 0 |
| 2018-03-02 | 6100 | -4500 |
| 2018-03-03 | 7600 | 1500 |
| 2018-03-04 | 9800 | 2200 |
| 2018-03-05 | 4400 | -5400 |
| 2018-03-06 | 12300 | 7900 |
+-------------+--------+---------+
方法三:传统自连接查询。
select distinct
g1.dt,g1.cost,g1.cost-g2.cost change
from goods_day g1 left join goods_day g2
on cast(substring(g1.dt,-1) as int)-1=cast(substring(g2.dt,-1) as int)
where substring(g1.dt,1,7)="2018-03";
结果:
+-------------+----------+---------+
| g1.dt | g1.cost | change |
+-------------+----------+---------+
| 2018-03-01 | 10600 | NULL |
| 2018-03-02 | 6100 | -4500 |
| 2018-03-03 | 7600 | 1500 |
| 2018-03-04 | 9800 | 2200 |
| 2018-03-05 | 4400 | -5400 |
| 2018-03-06 | 12300 | 7900 |
+-------------+----------+---------+
案例二:
编写一个SQL实现查找所有至少连续三次出现的数字
1 1
2 1
3 1
4 2
5 1
6 2
7 2
8 3
9 3
10 3
11 3
12 4
准备操作:
create database if not exists hive_interview;
use hive_interview;
drop table if exists numbers;
create table numbers(id int, number int) row format delimited fields terminated by "\t";
load data local inpath "/home/hadoop/numbers.txt" into table numbers;
select * from numbers;
hql解决方案:
分析:连续3次,就是当前行的值与前两行的值相同。
hql如下:
select distinct(a.number) number
from
(select number,
(case when
number=(LAG(number,1,0) OVER(order by id)) //前一行
and number=(LAG(number,2,0) OVER(order by id)) //前第二行
then 1 else 0 //相同返回1,不同返回0
end
) as b
from numbers) a
where a.b=1;
结果:
+---------+
| number |
+---------+
| 1 |
| 3 |
+---------+
传统SQL解决方案--自连接:
select
distinct n1.number
from numbers n1
join numbers n2 on n1.id-1=n2.id and n2.number=n1.number
join numbers n3 on n1.id-2=n3.id and n3.number=n2.number;
结果:
+------------+
| n1.number |
+------------+
| 1 |
| 3 |
+------------+
这两个题提示我们SQL世界里永远不要忘记自连接!