leetcode数据库刷题记录

1.找出某特征连续出现n次的所有记录

写在前面:对于求连续记录的题,套路都是:求哪个变量连续就用这个变量-row_number()

1.1 例题1

题目要求:
在这里插入图片描述
id和people都是int类型,visit_date是日期类型,id是主键

  • 方法一:使用rank()
    思路分析:
    将people>=100的所有id进行排序(由于本例中id是主键,所以肯定不会有重复,因此直接使用rank就可以);然后用id值减去rank值,结果记为c,如果id是连续的,那么连续id对应c应该是相同的;c出现了几次,那么id就连续了c条记录。
with t1 as(
    select id,visit_date,people,
    id - rank() over(order by id) as c
    from stadium
    where people>=100
)
select id,visit_date,people from t1 where c in (
    select c
    from t1
    group by c
    having count(id)>=3
);
  • 方法二:不使用rank()
    如果mysql版本不支持窗口函数,则可使用该方法。
    思路分析:
    从所有people>=100的id(假设叫id_list)中,找出id-1、id+1以及自己本身都在id_list中的id(假设叫center_id);最后从最原始记录中找出center_id-1,center_id,center_id+1的记录;
with t1 as(
	select id from stadium where 
	id in (select id  from stadium where people>=100) and
	id-1 in (select id  from stadium where people>=100) and
	id+1 in (select id  from stadium where people>=100)
)
select distinct s.* from stadium s join t1
on s.id=t1.id-1 or s.id=t1.id or s.id=t1.id+1;

1.2 例题2

题目要求:
在这里插入图片描述

  • 方法一:使用rank()
with t1 as(select seat_id,seat_id-rank() over(order by seat_id asc) as gap 
			from cinema where free=1)
select seat_id from t1 where gap in 
(select gap from t1 group by gap having count(seat_id)>=2);
  • 方法二:不使用rank()
select distinct a.seat_id from cinema a 
join cinema b 
on abs(a.seat_id-b.seat_id) = 1 
where a.free = 1 and b.free = 1 order by a.seat_id; 

在本例中,在free为1的记录中,满足abs(a.seat_id-b.seat_id) = 1 条件的有

a.seat_idb.seat_id
34
43
45
54

所以上述a.seat_id去重后结果就为3、4、5.
方法二的思想很简单易懂,但是只适用于连续2次及以上,如果多次的话,代码修改起来就会很麻烦,而方法一只需要修改一个参数就行。

1.3 拓展题

在这里插入图片描述

  • step1:先把指定日期内的失败记录和成功记录连接起来:
select 0 as 'state',fail_date as date from failed where fail_date>='2019-01-01' and fail_date<='2019-12-31'
union all
select 1 as 'state',success_date as date from succeeded where success_date>='2019-01-01' and success_date<='2019-12-31'
order by date

在这里插入图片描述
设该表为t1.

  • step2:把连续的记录分割成组
    从下图可以看出,只要从每组中取出最小的日期和最大的日期,就能完成题目的要求。
    那么这些组该如何形成呢?
    state为0和1的记录可以通过group by state区分开,那同一个state里的记录又如何划分组呢?
    这个是不是就和上面那个电影座位问题一样?date就相当于seat_id,state就相当于free;只不过上例中只需要将free为1的记录分组,本例中,state为0和为1的记录都需要分组。
    方法一:state为1和为0的分别在内部再次分组,然后拼接起来:
    对于state为1的:
select state,date,subdate(date,row_number() over(order by date asc)) as gap
from t1
where state=1;
statedategap
12019-01-012018-12-31
12019-01-022018-12-31
12019-01-032018-12-31
12019-01-062019-01-02

可以看到,通过gap就可以把state都为1的记录分开;

对于state为0的也是相同的操作:

select state,date,subdate(date,row_number() over(order by date asc)) as gap
from t1
where state=2;
statedategap
02019-01-042019-01-03
02019-01-042019-01-03
  • step3:选出每组中的最小日期作为start_date,最大日期作为end_date.
    对于state为1的:
select state,min(date) as start_date,max(date) as end_date
from (
select state,date,subdate(date,row_number() over(order by date asc))
from t1
where state=1
) t2
group by gap
statestart_dateend_date
12019-01-012019-01-03
12019-01-062019-01-06

对于state为0的:

select state,min(date) as start_date,max(date) as end_date
from (
select state,date,subdate(date,row_number() over(order by date asc))
from t1
where state=0
) t3
group by gap
statestart_dateend_date
02019-01-042019-01-05

将上面两表通过union结合起来,然后按照start_date升序排列,就是正确答案了。

当然,我们也可以不用把state取值为0、1的情况分开,我们使用partition by 和 group by 也可以完成step2、3

select state,date,subdate(date,row_number() over(partition by state order by date asc)) as gap from t1
statedategap
12019-01-012018-12-31
12019-01-022018-12-31
12019-01-032018-12-31
02019-01-042019-01-03
02019-01-042019-01-03
12019-01-062019-01-02

ps:此时的row_number 为:

statedaterow_number
12019-01-011
12019-01-022
12019-01-033
02019-01-041
02019-01-042
12019-01-064

最后通过state和gap进行分组,选出每组最小日期和最大日期即可。
完整代码:

select state as period_state,min(date) as start_date,max(date) as end_date
from
(
select state,date,subdate(date,row_number() over(partition by state order by date asc)) as gap from
	(
	select 'failed' as 'state',fail_date as date from failed 
	union all
	select 'succeeded' as 'state',success_date as date from succeeded 
	order by date
	) t1
where date>='2019-01-01' and date<='2019-12-31'
) t2
group by state,gap
order by start_date;

2.交换座位

题目要求:
在这里插入图片描述

2.1 交换id

2.1.1 使用case

-解题思路:当id为奇数且不是最后一个时,id变为id+1;当id为奇数,但是是最后一个时,id不变;当id为偶数时,id变为id-1.

with n as(select max(id) as max_id from seat)
select 
(case 
when id%2=1 and id!=n.max_id then id+1
when id%2=1 and id=n.max_id then id
else id-1
end) as id,student
from seat,n
order by id asc;

另一种写法:

select 
    if(id%2=0,
        id-1,
        if(id=(select count(distinct id) from seat),
            id,
            id+1)) 
    as id,student 
from seat 
order by id;

2.1.2 使用位异或和coalesce

不了解位异或的,请戳sql位运算符
coalesce(exp1,exp2,exp3,…)返回第一个不为null的值,如果全为null就返回null。

  1. 首先使用(id+1)^1-1来达到交换id的目的:
SELECT id, (id+1)^1-1, student FROM seat;

在这里插入图片描述
2. 连接原始表和更新id后的表,连接条件是ori.id=update.id
但是更新后的表中,没有id为5的记录,所以直接连接的话,结果是null,因此使用coalesce来解决

select id,coalesce(t1.student,s.student) as student
from seat s
left join (select (id+1)^1-1 as update_id,student from seat) t1
on s.id = t1.update_id;

2.2 上移记录、下移记录(使用开窗函数)

select id, if(t.id%2=1,t.back,t.ahead) as 'student' from
	(select id,student,
			LAG(student, 1,student) over() ahead, 
			LEAD(student, 1,student) over() back
	from seat) t

如果不能使用开窗函数,就连接两张表(一张是上移之后的表,一张是下移在之后的表):

with t1 as(select id-1 as sub_id,student from seat),
     t2 as(select id+1 as add_id,student from seat)
select id,ifnull(if(id%2=1,t1.student,t2.student),s.student) as student from seat s 
left join t1 on s.id=t1.sub_id
left join t2 on s.id=t2.add_id;

3.avg与is not null连用

如果某个变量(设为var)的值只分为null和非null(设为a),那么可以用avg(var is not null)求a的个数占总体个数的比例

3.1 例题1

在这里插入图片描述

  • step1:先找出每个人登录日期
select player_id,min(event_date) as login
from activity
group by player_id;

在这里插入图片描述

  • step2:将上表与原表连接起来
select a.player_id,a.event_date,t1.login from
(select player_id,min(event_date) as login
from activity
group by player_id) t1
left join activity a
on a.player_id=t1.player_id and date_sub(a.event_date,interval 1 day)=login;

在这里插入图片描述
因为t1作为主表,所以连接查询一共会返回三行,每行代表一个玩家,若前面两个为null,说明该玩家在login的一天之后并没有再次登录

  • step3:使用avg算fraction
select round(avg(a.player_id is not null),2) as fraction 
from (select player_id,min(event_date) as login
		from activity
		group by player_id) t1
left join activity a
on a.player_id=t1.player_id and date_sub(a.event_date,interval 1 day)=login;

可以看到,这段代码和上一段基本一致,只是最后的查询结果变了。
a.player_id is not null:如果是null 就返回0,如果不是null就返回1,所以从step2的结果可以看出,每行返回的值分别是1、0、0;那么再avg一下,结果就是0.33
在这里插入图片描述

3.2 例题2

在这里插入图片描述

select question_id as survey_log
from survey_log
group by question_id
order by avg(answer_id is not null) desc
limit 1;

4.求中位数

4.1 例题1

在这里插入图片描述首先说一下,我感觉它这里的中位数不太合适,按理来说如果总数是偶数个的话,中位数应该是中间两个数取平均,但是这里还是先按照它的方法来。
思路一:先把所有数从小到大排列起来并标号从1到n (因此这里不能使用rank(),而应该使用 row_number()),中位数与(n/2)的差在[0,1]之间,以此来选择出中位数。

  • step1:通过窗口函数求出每个人在该公司的排名(工资升序排列)以及该公司的总人数。
SELECT id, company, salary,
ROW_NUMBER() OVER (PARTITION BY company 
ORDER BY Salary ASC, id ASC) AS row_num,
COUNT(Id) OVER (PARTITION BY company) AS count_id
FROM Employee;
  • step2:根据思路中的那个条件选出中位数
select Id,Company,Salary from(
    SELECT id, company, salary,
    ROW_NUMBER() OVER (PARTITION BY company 
    ORDER BY Salary ASC, id ASC) AS row_num,
    COUNT(Id) OVER (PARTITION BY company) AS count_id
    FROM Employee) t1
    where row_num-count_id/2 between 0 and 1;

思路二:
在这里插入图片描述
也就是说,只有中位数才能保证 升序排名>=(总数/2) and 降序排名>=(总数/2)

select id,company,salary
from(
select id,company,salary,
row_number() over(partition by company order by salary asc,id asc) as asc_rank,
row_number() over(partition by company order by salary desc,id asc) as desc_rank,
count(id) over(partition by company) as total
from employee) t1
where asc_rank>=total/2 and desc_rank>=total/2;

4.2 例题2

在这里插入图片描述思路一:就和上例的第二个思路差不多

select avg(number) as median from(
select number,
        sum(frequency) over(order by number) asc_accumu,
        sum(frequency) over(order by number desc) desc_accumu,
        sum(frequency) over() total
        from numbers) t1
where asc_accumu>=total/2 and desc_accumu>=total/2;

思路二:
在这里插入图片描述

  • 设总的个数是n。
  • 若n为奇数,则n/2必落在某个区间内部(即不会在边界)假设是[f1,f1+f2],那么该区间对应的数字,即b,就是中位数。
  • 若n为偶数,则n/2可能落在某个区间内部,假设是[0,f1],那么该区间对应的数字a,就是中位数;
  • -n/2也可能落在某两个区间的边界处,比如n/2=f1+f2+f3,那么[f1+f2,fi+f2+f3]和[f1+f2+f3,f1+f2+f3+f4]对应的数:c、d,这两数的平均数就是中位数。
select avg(Number) as median from (
select *,sum(Frequency) over(order by Number rows between 
unbounded preceding and unbounded following) as total,
sum(Frequency) over(order by Number rows between 
unbounded preceding and current row) as accumulation 
from Numbers) t1
where total/2 between accumulation -Frequency and accumulation;

(还是第一种思路简单…)

5. n preceding

在这里插入图片描述

select Id,Month,Salary from(
	select Id,Month,
	sum(salary) over(partition by id 
	order by month asc rows 2 PRECEDING) as Salary,
	rank() over(partition by id 
	order by month desc) as month_rank
	from employee) t1
where month_rank>1
order by 1,2 desc;
  • month_rank:把月份从大到小排序,然后取出排名大于1的记录就可以将最后的一个月去除;
  • rows 2 preceding:其实写全应该是 rows between 2 preceding and current row ;
  • order by 1,2 :1代表id,2代表month

6.其实是一个简单的问题,我自己把它做复杂了…

在这里插入图片描述正常方法:

select sum(tiv_2016) as TIV_2016 from insurance 
where tiv_2015 in 
(select tiv_2015 from insurance 
group by tiv_2015 having count(1) > 1)
and (LAT,LON) in 
(select LAT,LON from insurance 
group by LAT,LON having count(1) = 1);

我的方法:

  • step1:求出经纬度不完全相同的PID
select pid from insurance
group by concat(lat,lon)
having count(pid)=1;
  • step2:求出有保额相同的pid
select pid from insurance where tiv_2015 in (
select tiv_2015
from insurance
group by tiv_2015
having count(pid)>1)
  • step3:求两者交集
select t1.pid from
(select pid from insurance
group by concat(lat,lon)
having count(pid)=1) t1 join
(select pid from insurance where tiv_2015 in (
(select tiv_2015
from insurance
group by tiv_2015
having count(pid)>1)))t2
on t1.pid = t2.pid;
  • step4:得到最后答案
select sum(tiv_2016) as tiv_2016 from insurance where pid in(
select t1.pid from
(select pid from insurance
group by concat(lat,lon)
having count(pid)=1) t1 join
(select pid from insurance where tiv_2015 in (
(select tiv_2015
from insurance
group by tiv_2015
having count(pid)>1)))t2
on t1.pid = t2.pid);

其实思路是一样的,就是我把它写复杂了,,,嗐

7.巧用union创造行

7.1 例题1

在这里插入图片描述
如果没有记录为0的那一行,那么这道题就并不是很难:

SELECT spend_date,platform,SUM(amount) AS total_amount,COUNT(user_id) AS total_users FROM(
SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending GROUP BY spend_date,user_id) t1
GROUP BY spend_date,platform;

在这里插入图片描述
因此,本题的关键就是怎么搞出记录为0的那一行。
解题步骤:

  • step1:找出每天、每个用户的支付平台类型:
SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending
GROUP BY spend_date,user_id;

在这里插入图片描述

  • step2:创建一个表,该表只有一列(‘platform’)、三行(‘mobile’、‘desktop’、‘both’)
    这一步是本题关键点所在,也就是如何凭空给没有数据的记录(即total_amount、total_users)搞出一行来
SELECT 'desktop' AS 'platform' UNION
SELECT 'mobile' AS 'platform' UNION
SELECT 'both' AS 'platform';

在这里插入图片描述

  • step3:把上述两表连接起来(笛卡尔乘积)
SELECT * FROM
(SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending
GROUP BY spend_date,user_id) a,
(SELECT 'desktop' AS 'platform' UNION
SELECT 'mobile' AS 'platform' UNION
SELECT 'both' AS 'platform') b;

在这里插入图片描述原本a表有5条记录,b表有3条记录,笛卡尔乘积之后,变成15条记录

  • step4:
SELECT spend_date,b.platform,
SUM(IF(a.platform = b.platform,amount,0)) AS total_amount,
COUNT(IF(a.platform = b.platform,user_id,NULL)) AS total_users
FROM

(SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending
GROUP BY spend_date,user_id) a,
(SELECT 'desktop' AS 'platform' UNION
SELECT 'mobile' AS 'platform' UNION
SELECT 'both' AS 'platform') b

GROUP BY spend_date,b.platform;


ps:
1 b.platform才能保证每一天都有‘both’、‘desktop’、‘mobile’(其实这也是这道题的难点所在)
2 count if 那儿,不满足if条件的,一定要写成null,因为null不会被count计算。

7.2 例题2

matches表数据如下:
在这里插入图片描述players表数据如下:
在这里插入图片描述
要求:查询每组中总score最高的player_id,若成绩相同,则取id较小的那个。查询结果应如下图所示:
在这里插入图片描述

  • 思路:先用union all把每个player作为first_player和second_player的得分拼接起来,之后就可以计算每个player的总得分;然后按照总得分(降序)和player_id(升序)排列,最后按照group_id分组。(这里用到了group by的一个小技巧:group by之后如果没有使用聚合函数,那么返回的就是该组中的第一条记录!!!)
  • step1:union all
SELECT first_player AS player_id,first_score AS score FROM matches
UNION ALL
SELECT second_player AS player_id,second_score AS score FROM matches
player_idscore
153
301
302
405
351
450
252
150
202
501
  • step2:计算每个player的总得分,并按照总得分和player_id排好序
SELECT t1.player_id,SUM(score) AS score,group_id FROM(
SELECT first_player AS player_id,first_score AS score FROM matches
UNION ALL
SELECT second_player AS player_id,second_score AS score FROM matches) t1
LEFT JOIN players p ON t1.player_id = p.`player_id`
GROUP BY t1.player_id
ORDER BY score DESC,player_id ASC
player_idscoregroup_id
4053
1531
3031
2023
2521
3512
5012
4501
  • step3: 使用group by选出每组中的第一条记录
SELECT group_id,player_id FROM(
SELECT t1.player_id,SUM(score) AS score,group_id FROM(
SELECT first_player AS player_id,first_score AS score FROM matches
UNION ALL
SELECT second_player AS player_id,second_score AS score FROM matches) t1
LEFT JOIN players p ON t1.player_id = p.`player_id`
GROUP BY t1.player_id
ORDER BY score DESC,player_id ASC) t2
GROUP BY group_id;
group_idplayer_id
115
235
340

8.with recursive

8.1 创造连续数列

在这里插入图片描述

  • step1:使用with recursive创建一个从1到max(customer_id)的数列
with recursive full_id(n) as (
    select 1 as id
    union all
    select n+1 from full_id where n<(select max(customer_id) from customers)
)
select * from full_id

在这里插入图片描述

  • step2:找出遗失的ID
with recursive full_id(n) as (
    select 1 as id
    union all
    select n+1 from full_id where n<(select max(customer_id) from customers)
)
select n as ids from full_id where n not in (select customer_id from customers)

8.2

在这里插入图片描述

在这里插入图片描述
要求:查询2020年每月截止到月底时已有的司机人数、每月被接受的订单量

# 创建一个月份表 month_table
with recursive month_table(m) as(
    select 1 as m
    union all
    select m+1 from month_table where m<=11
)
select month,active_drivers,ifnull(accepted_rides,0) as accepted_rides from
# t1计算每个月的已有注册司机人数
(select mt.m as month ,count(driver_id) active_drivers
from month_table mt left join drivers on 202000+mt.m>=date_format(join_date,'%Y%m')
group by mt.m) t1
left join
# t2计算每月被接受的订单量
(select month(requested_at) as order_month,count(a.ride_id) as accepted_rides
from rides r join acceptedrides a on r.ride_id=a.ride_id
where substr(requested_at,1,4)='2020' group by order_month
) t2 
on t1.month=t2.order_month

8.3

原始数据就是8.2的数据,查询2020年每3个月的平均ride_distance和平均ride_duration
在这里插入图片描述

with recursive month_table(m) as(
    select 1 as m
    union all
    select m+1 from month_table where m<=11
)
select month,
round(avg(month_distance) over(rows between current row and 2 following),2) as average_ride_distance,
round(avg(month_duration) over(rows between current row and 2 following),2) as average_ride_duration
from (
    select mt.m as month
    ,ifnull(sum(ride_distance),0) as month_distance,ifnull(sum(ride_duration),0) as month_duration 
    from month_table as mt left join rides r on mt.m = month(requested_at) and substr(requested_at,1,4)='2020'
    # 注意,这里2020年这个限制条件不能放到where里面,因为where的执行是在全连接之后、on之前
    # 如果使用where,就会首先在全连接的结果中把所有不是2020的全部剔除,只剩下年份是2020的记录
    # 然后on才作用于这些剩下的结果,因此就不能保证1~12月份都有记录
    left join acceptedrides a on r.ride_id=a.ride_id
    group by mt.m
) t1
limit 10  #只显示前10条记录

9. greatest、least选择多列中的最大值、最小值

9.1

在这里插入图片描述

select least(from_id,to_id) as person1,greatest(from_id,to_id) as person2,count(*) as call_count,sum(duration) as total_duration
from calls
group by person1,person2

10.窗口函数和group by连用

窗口函数作用的范围是group by之后的结果。什么叫group by之后的结果?就比如本例中:group by machine_id,process_id。那么group by之后的结果中,每一条记录就代表一组不同的(machine_id,process_id) (当然后面还可以有其他聚合字段)。那么窗口函数就是在这基础上进行的操作。

在这里插入图片描述

# 方法一:常规方法
select machine_id,round(avg(processing_time),3) as  processing_time from(
select machine_id,process_id,sum(if(activity_type='start',-1,1)*timestamp) as processing_time
from activity group by machine_id,process_id) t1 group by machine_id

# 方法二:使用窗口函数(窗口函数作用的范围是group by之后的结果)
select * from(
select machine_id
,round(avg(sum(if(activity_type='start',-1,1)*timestamp)) over(partition by machine_id),3) as processing_time
from activity group by machine_id,process_id) t1 group by machine_id
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值