leetcode数据库刷题记录

最新推荐文章于 2023-07-29 19:59:20 发布

缦旋律

最新推荐文章于 2023-07-29 19:59:20 发布

阅读量286

点赞数 1

分类专栏： MySQL

小陈一行一行地敲出来的啦~

本文链接：https://blog.csdn.net/weixin_41391619/article/details/111062662

版权

MySQL 专栏收录该内容

26 篇文章 1 订阅

订阅专栏

1.找出某特征连续出现n次的所有记录

写在前面：对于求连续记录的题，套路都是：求哪个变量连续就用这个变量-row_number()

1.1 例题1

题目要求：
在这里插入图片描述
id和people都是int类型，visit_date是日期类型，id是主键

方法一：使用rank()
思路分析：
将people>=100的所有id进行排序（由于本例中id是主键，所以肯定不会有重复，因此直接使用rank就可以）；然后用id值减去rank值，结果记为c，如果id是连续的，那么连续id对应c应该是相同的；c出现了几次，那么id就连续了c条记录。

with t1 as(
    select id,visit_date,people,
    id - rank() over(order by id) as c
    from stadium
    where people>=100
)
select id,visit_date,people from t1 where c in (
    select c
    from t1
    group by c
    having count(id)>=3
);

方法二：不使用rank()
如果mysql版本不支持窗口函数，则可使用该方法。
思路分析：
从所有people>=100的id（假设叫id_list）中，找出id-1、id+1以及自己本身都在id_list中的id（假设叫center_id）；最后从最原始记录中找出center_id-1,center_id,center_id+1的记录；

with t1 as(
	select id from stadium where 
	id in (select id  from stadium where people>=100) and
	id-1 in (select id  from stadium where people>=100) and
	id+1 in (select id  from stadium where people>=100)
)
select distinct s.* from stadium s join t1
on s.id=t1.id-1 or s.id=t1.id or s.id=t1.id+1;

1.2 例题2

题目要求：
在这里插入图片描述

方法一：使用rank()

with t1 as(select seat_id,seat_id-rank() over(order by seat_id asc) as gap 
			from cinema where free=1)
select seat_id from t1 where gap in 
(select gap from t1 group by gap having count(seat_id)>=2);

方法二：不使用rank()

select distinct a.seat_id from cinema a 
join cinema b 
on abs(a.seat_id-b.seat_id) = 1 
where a.free = 1 and b.free = 1 order by a.seat_id;

在本例中，在free为1的记录中，满足abs(a.seat_id-b.seat_id) = 1 条件的有

a.seat_id	b.seat_id
3	4
4	3
4	5
5	4

所以上述a.seat_id去重后结果就为3、4、5.
方法二的思想很简单易懂，但是只适用于连续2次及以上，如果多次的话，代码修改起来就会很麻烦，而方法一只需要修改一个参数就行。

1.3 拓展题

在这里插入图片描述

step1:先把指定日期内的失败记录和成功记录连接起来：

select 0 as 'state',fail_date as date from failed where fail_date>='2019-01-01' and fail_date<='2019-12-31'
union all
select 1 as 'state',success_date as date from succeeded where success_date>='2019-01-01' and success_date<='2019-12-31'
order by date

在这里插入图片描述
设该表为t1.

step2:把连续的记录分割成组
从下图可以看出，只要从每组中取出最小的日期和最大的日期，就能完成题目的要求。
那么这些组该如何形成呢？
state为0和1的记录可以通过group by state区分开，那同一个state里的记录又如何划分组呢？
这个是不是就和上面那个电影座位问题一样？date就相当于seat_id,state就相当于free；只不过上例中只需要将free为1的记录分组，本例中，state为0和为1的记录都需要分组。
方法一：state为1和为0的分别在内部再次分组，然后拼接起来：
对于state为1的：

select state,date,subdate(date,row_number() over(order by date asc)) as gap
from t1
where state=1;

state	date	gap
1	2019-01-01	2018-12-31
1	2019-01-02	2018-12-31
1	2019-01-03	2018-12-31
1	2019-01-06	2019-01-02

可以看到，通过gap就可以把state都为1的记录分开；

对于state为0的也是相同的操作：

select state,date,subdate(date,row_number() over(order by date asc)) as gap
from t1
where state=2;

state	date	gap
0	2019-01-04	2019-01-03
0	2019-01-04	2019-01-03

step3：选出每组中的最小日期作为start_date,最大日期作为end_date.
对于state为1的：

select state,min(date) as start_date,max(date) as end_date
from (
select state,date,subdate(date,row_number() over(order by date asc))
from t1
where state=1
) t2
group by gap

state	start_date	end_date
1	2019-01-01	2019-01-03
1	2019-01-06	2019-01-06

对于state为0的：

select state,min(date) as start_date,max(date) as end_date
from (
select state,date,subdate(date,row_number() over(order by date asc))
from t1
where state=0
) t3
group by gap

state	start_date	end_date
0	2019-01-04	2019-01-05

将上面两表通过union结合起来，然后按照start_date升序排列，就是正确答案了。

当然，我们也可以不用把state取值为0、1的情况分开，我们使用partition by 和 group by 也可以完成step2、3

select state,date,subdate(date,row_number() over(partition by state order by date asc)) as gap from t1

state	date	gap
1	2019-01-01	2018-12-31
1	2019-01-02	2018-12-31
1	2019-01-03	2018-12-31
0	2019-01-04	2019-01-03
0	2019-01-04	2019-01-03
1	2019-01-06	2019-01-02

ps:此时的row_number 为：

state	date	row_number
1	2019-01-01	1
1	2019-01-02	2
1	2019-01-03	3
0	2019-01-04	1
0	2019-01-042
1	2019-01-06	4

最后通过state和gap进行分组，选出每组最小日期和最大日期即可。
完整代码：

select state as period_state,min(date) as start_date,max(date) as end_date
from
(
select state,date,subdate(date,row_number() over(partition by state order by date asc)) as gap from
	(
	select 'failed' as 'state',fail_date as date from failed 
	union all
	select 'succeeded' as 'state',success_date as date from succeeded 
	order by date
	) t1
where date>='2019-01-01' and date<='2019-12-31'
) t2
group by state,gap
order by start_date;

2.交换座位

题目要求：
在这里插入图片描述

2.1 交换id

2.1.1 使用case

-解题思路：当id为奇数且不是最后一个时，id变为id+1;当id为奇数，但是是最后一个时，id不变；当id为偶数时，id变为id-1.

with n as(select max(id) as max_id from seat)
select 
(case 
when id%2=1 and id!=n.max_id then id+1
when id%2=1 and id=n.max_id then id
else id-1
end) as id,student
from seat,n
order by id asc;

另一种写法：

select 
    if(id%2=0,
        id-1,
        if(id=(select count(distinct id) from seat),
            id,
            id+1)) 
    as id,student 
from seat 
order by id;

2.1.2 使用位异或和coalesce

不了解位异或的，请戳sql位运算符
coalesce(exp1,exp2,exp3,…)返回第一个不为null的值，如果全为null就返回null。

首先使用(id+1)^1-1来达到交换id的目的：

SELECT id, (id+1)^1-1, student FROM seat;

在这里插入图片描述
2. 连接原始表和更新id后的表，连接条件是ori.id=update.id
但是更新后的表中，没有id为5的记录，所以直接连接的话，结果是null，因此使用coalesce来解决

select id,coalesce(t1.student,s.student) as student
from seat s
left join (select (id+1)^1-1 as update_id,student from seat) t1
on s.id = t1.update_id;

2.2 上移记录、下移记录（使用开窗函数）

select id, if(t.id%2=1,t.back,t.ahead) as 'student' from
	(select id,student,
			LAG(student, 1,student) over() ahead, 
			LEAD(student, 1,student) over() back
	from seat) t

如果不能使用开窗函数，就连接两张表（一张是上移之后的表，一张是下移在之后的表）：

with t1 as(select id-1 as sub_id,student from seat),
     t2 as(select id+1 as add_id,student from seat)
select id,ifnull(if(id%2=1,t1.student,t2.student),s.student) as student from seat s 
left join t1 on s.id=t1.sub_id
left join t2 on s.id=t2.add_id;

3.avg与is not null连用

如果某个变量(设为var)的值只分为null和非null（设为a）,那么可以用avg(var is not null)求a的个数占总体个数的比例

3.1 例题1

在这里插入图片描述

step1：先找出每个人登录日期

select player_id,min(event_date) as login
from activity
group by player_id;

在这里插入图片描述

step2：将上表与原表连接起来

select a.player_id,a.event_date,t1.login from
(select player_id,min(event_date) as login
from activity
group by player_id) t1
left join activity a
on a.player_id=t1.player_id and date_sub(a.event_date,interval 1 day)=login;

在这里插入图片描述
因为t1作为主表，所以连接查询一共会返回三行，每行代表一个玩家，若前面两个为null，说明该玩家在login的一天之后并没有再次登录

step3：使用avg算fraction

select round(avg(a.player_id is not null),2) as fraction 
from (select player_id,min(event_date) as login
		from activity
		group by player_id) t1
left join activity a
on a.player_id=t1.player_id and date_sub(a.event_date,interval 1 day)=login;

可以看到，这段代码和上一段基本一致，只是最后的查询结果变了。
a.player_id is not null:如果是null 就返回0，如果不是null就返回1，所以从step2的结果可以看出，每行返回的值分别是1、0、0；那么再avg一下，结果就是0.33
在这里插入图片描述

3.2 例题2

在这里插入图片描述

select question_id as survey_log
from survey_log
group by question_id
order by avg(answer_id is not null) desc
limit 1;

4.求中位数

4.1 例题1

在这里插入图片描述首先说一下，我感觉它这里的中位数不太合适，按理来说如果总数是偶数个的话，中位数应该是中间两个数取平均，但是这里还是先按照它的方法来。
思路一：先把所有数从小到大排列起来并标号从1到n (因此这里不能使用rank(),而应该使用 row_number()),中位数与（n/2）的差在[0,1]之间，以此来选择出中位数。

step1：通过窗口函数求出每个人在该公司的排名（工资升序排列）以及该公司的总人数。

SELECT id, company, salary,
ROW_NUMBER() OVER (PARTITION BY company 
ORDER BY Salary ASC, id ASC) AS row_num,
COUNT(Id) OVER (PARTITION BY company) AS count_id
FROM Employee;

step2：根据思路中的那个条件选出中位数

select Id,Company,Salary from(
    SELECT id, company, salary,
    ROW_NUMBER() OVER (PARTITION BY company 
    ORDER BY Salary ASC, id ASC) AS row_num,
    COUNT(Id) OVER (PARTITION BY company) AS count_id
    FROM Employee) t1
    where row_num-count_id/2 between 0 and 1;

思路二：
在这里插入图片描述
也就是说，只有中位数才能保证升序排名>=(总数/2) and 降序排名>=(总数/2)

select id,company,salary
from(
select id,company,salary,
row_number() over(partition by company order by salary asc,id asc) as asc_rank,
row_number() over(partition by company order by salary desc,id asc) as desc_rank,
count(id) over(partition by company) as total
from employee) t1
where asc_rank>=total/2 and desc_rank>=total/2;

4.2 例题2

在这里插入图片描述 思路一：就和上例的第二个思路差不多

select avg(number) as median from(
select number,
        sum(frequency) over(order by number) asc_accumu,
        sum(frequency) over(order by number desc) desc_accumu,
        sum(frequency) over() total
        from numbers) t1
where asc_accumu>=total/2 and desc_accumu>=total/2;

思路二：
在这里插入图片描述

设总的个数是n。
若n为奇数，则n/2必落在某个区间内部（即不会在边界）假设是[f1，f1+f2]，那么该区间对应的数字，即b，就是中位数。
若n为偶数，则n/2可能落在某个区间内部,假设是[0，f1],那么该区间对应的数字a,就是中位数；
-n/2也可能落在某两个区间的边界处，比如n/2=f1+f2+f3,那么[f1+f2，fi+f2+f3]和[f1+f2+f3，f1+f2+f3+f4]对应的数：c、d,这两数的平均数就是中位数。

select avg(Number) as median from (
select *,sum(Frequency) over(order by Number rows between 
unbounded preceding and unbounded following) as total,
sum(Frequency) over(order by Number rows between 
unbounded preceding and current row) as accumulation 
from Numbers) t1
where total/2 between accumulation -Frequency and accumulation;

（还是第一种思路简单…）

5. n preceding

在这里插入图片描述

select Id,Month,Salary from(
	select Id,Month,
	sum(salary) over(partition by id 
	order by month asc rows 2 PRECEDING) as Salary,
	rank() over(partition by id 
	order by month desc) as month_rank
	from employee) t1
where month_rank>1
order by 1,2 desc;

month_rank:把月份从大到小排序，然后取出排名大于1的记录就可以将最后的一个月去除；
rows 2 preceding：其实写全应该是 rows between 2 preceding and current row ；
order by 1,2 ：1代表id，2代表month

6.其实是一个简单的问题，我自己把它做复杂了…

在这里插入图片描述正常方法：

select sum(tiv_2016) as TIV_2016 from insurance 
where tiv_2015 in 
(select tiv_2015 from insurance 
group by tiv_2015 having count(1) > 1)
and (LAT,LON) in 
(select LAT,LON from insurance 
group by LAT,LON having count(1) = 1);

我的方法：

step1:求出经纬度不完全相同的PID

select pid from insurance
group by concat(lat,lon)
having count(pid)=1;

step2:求出有保额相同的pid

select pid from insurance where tiv_2015 in (
select tiv_2015
from insurance
group by tiv_2015
having count(pid)>1)

step3:求两者交集

select t1.pid from
(select pid from insurance
group by concat(lat,lon)
having count(pid)=1) t1 join
(select pid from insurance where tiv_2015 in (
(select tiv_2015
from insurance
group by tiv_2015
having count(pid)>1)))t2
on t1.pid = t2.pid;

step4:得到最后答案

select sum(tiv_2016) as tiv_2016 from insurance where pid in(
select t1.pid from
(select pid from insurance
group by concat(lat,lon)
having count(pid)=1) t1 join
(select pid from insurance where tiv_2015 in (
(select tiv_2015
from insurance
group by tiv_2015
having count(pid)>1)))t2
on t1.pid = t2.pid);

其实思路是一样的，就是我把它写复杂了，，，嗐

7.巧用union创造行

7.1 例题1

在这里插入图片描述
如果没有记录为0的那一行，那么这道题就并不是很难：

SELECT spend_date,platform,SUM(amount) AS total_amount,COUNT(user_id) AS total_users FROM(
SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending GROUP BY spend_date,user_id) t1
GROUP BY spend_date,platform;

在这里插入图片描述
因此，本题的关键就是怎么搞出记录为0的那一行。
解题步骤：

step1:找出每天、每个用户的支付平台类型：

SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending
GROUP BY spend_date,user_id;

在这里插入图片描述

step2：创建一个表，该表只有一列（‘platform’）、三行（‘mobile’、‘desktop’、‘both’）
这一步是本题关键点所在，也就是如何凭空给没有数据的记录（即total_amount、total_users）搞出一行来

SELECT 'desktop' AS 'platform' UNION
SELECT 'mobile' AS 'platform' UNION
SELECT 'both' AS 'platform';

在这里插入图片描述

step3：把上述两表连接起来（笛卡尔乘积）

SELECT * FROM
(SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending
GROUP BY spend_date,user_id) a,
(SELECT 'desktop' AS 'platform' UNION
SELECT 'mobile' AS 'platform' UNION
SELECT 'both' AS 'platform') b;

在这里插入图片描述原本a表有5条记录，b表有3条记录，笛卡尔乘积之后，变成15条记录

step4:

SELECT spend_date,b.platform,
SUM(IF(a.platform = b.platform,amount,0)) AS total_amount,
COUNT(IF(a.platform = b.platform,user_id,NULL)) AS total_users
FROM

(SELECT spend_date,user_id, IF(COUNT(DISTINCT platform) = 2, 'both', platform) 'platform',
SUM(amount) 'amount' FROM spending
GROUP BY spend_date,user_id) a,
(SELECT 'desktop' AS 'platform' UNION
SELECT 'mobile' AS 'platform' UNION
SELECT 'both' AS 'platform') b

GROUP BY spend_date,b.platform;

ps:
1 b.platform才能保证每一天都有‘both’、‘desktop’、‘mobile’（其实这也是这道题的难点所在）
2 count if 那儿，不满足if条件的，一定要写成null，因为null不会被count计算。

7.2 例题2

matches表数据如下：
在这里插入图片描述 players表数据如下：

要求：查询每组中总score最高的player_id,若成绩相同，则取id较小的那个。查询结果应如下图所示：

思路：先用union all把每个player作为first_player和second_player的得分拼接起来，之后就可以计算每个player的总得分；然后按照总得分（降序）和player_id（升序）排列，最后按照group_id分组。（这里用到了group by的一个小技巧：group by之后如果没有使用聚合函数，那么返回的就是该组中的第一条记录！！！）
step1：union all

SELECT first_player AS player_id,first_score AS score FROM matches
UNION ALL
SELECT second_player AS player_id,second_score AS score FROM matches

player_id	score
15	3
30	1
30	2
40	5
35	1
45	0
25	2
15	0
20	2
50	1

step2：计算每个player的总得分，并按照总得分和player_id排好序

SELECT t1.player_id,SUM(score) AS score,group_id FROM(
SELECT first_player AS player_id,first_score AS score FROM matches
UNION ALL
SELECT second_player AS player_id,second_score AS score FROM matches) t1
LEFT JOIN players p ON t1.player_id = p.`player_id`
GROUP BY t1.player_id
ORDER BY score DESC,player_id ASC

player_id	score	group_id
40	5	3
15	3	1
30	3	1
20	2	3
25	2	1
35	1	2
50	1	2
45	0	1

step3: 使用group by选出每组中的第一条记录

SELECT group_id,player_id FROM(
SELECT t1.player_id,SUM(score) AS score,group_id FROM(
SELECT first_player AS player_id,first_score AS score FROM matches
UNION ALL
SELECT second_player AS player_id,second_score AS score FROM matches) t1
LEFT JOIN players p ON t1.player_id = p.`player_id`
GROUP BY t1.player_id
ORDER BY score DESC,player_id ASC) t2
GROUP BY group_id;

group_id	player_id
1	15
2	35
3	40

8.with recursive

8.1 创造连续数列

在这里插入图片描述

step1:使用with recursive创建一个从1到max(customer_id)的数列

with recursive full_id(n) as (
    select 1 as id
    union all
    select n+1 from full_id where n<(select max(customer_id) from customers)
)
select * from full_id

在这里插入图片描述

step2：找出遗失的ID

with recursive full_id(n) as (
    select 1 as id
    union all
    select n+1 from full_id where n<(select max(customer_id) from customers)
)
select n as ids from full_id where n not in (select customer_id from customers)

8.2

在这里插入图片描述

在这里插入图片描述
要求：查询2020年每月截止到月底时已有的司机人数、每月被接受的订单量

# 创建一个月份表 month_table
with recursive month_table(m) as(
    select 1 as m
    union all
    select m+1 from month_table where m<=11
)
select month,active_drivers,ifnull(accepted_rides,0) as accepted_rides from
# t1计算每个月的已有注册司机人数
(select mt.m as month ,count(driver_id) active_drivers
from month_table mt left join drivers on 202000+mt.m>=date_format(join_date,'%Y%m')
group by mt.m) t1
left join
# t2计算每月被接受的订单量
(select month(requested_at) as order_month,count(a.ride_id) as accepted_rides
from rides r join acceptedrides a on r.ride_id=a.ride_id
where substr(requested_at,1,4)='2020' group by order_month
) t2 
on t1.month=t2.order_month

8.3

原始数据就是8.2的数据，查询2020年每3个月的平均ride_distance和平均ride_duration
在这里插入图片描述

with recursive month_table(m) as(
    select 1 as m
    union all
    select m+1 from month_table where m<=11
)
select month,
round(avg(month_distance) over(rows between current row and 2 following),2) as average_ride_distance,
round(avg(month_duration) over(rows between current row and 2 following),2) as average_ride_duration
from (
    select mt.m as month
    ,ifnull(sum(ride_distance),0) as month_distance,ifnull(sum(ride_duration),0) as month_duration 
    from month_table as mt left join rides r on mt.m = month(requested_at) and substr(requested_at,1,4)='2020'
    # 注意，这里2020年这个限制条件不能放到where里面，因为where的执行是在全连接之后、on之前
    # 如果使用where，就会首先在全连接的结果中把所有不是2020的全部剔除，只剩下年份是2020的记录
    # 然后on才作用于这些剩下的结果，因此就不能保证1~12月份都有记录
    left join acceptedrides a on r.ride_id=a.ride_id
    group by mt.m
) t1
limit 10  #只显示前10条记录

9. greatest、least选择多列中的最大值、最小值

9.1

在这里插入图片描述

select least(from_id,to_id) as person1,greatest(from_id,to_id) as person2,count(*) as call_count,sum(duration) as total_duration
from calls
group by person1,person2

10.窗口函数和group by连用

窗口函数作用的范围是group by之后的结果。什么叫group by之后的结果？就比如本例中:group by machine_id,process_id。那么group by之后的结果中，每一条记录就代表一组不同的（machine_id,process_id) （当然后面还可以有其他聚合字段）。那么窗口函数就是在这基础上进行的操作。

在这里插入图片描述

# 方法一：常规方法
select machine_id,round(avg(processing_time),3) as  processing_time from(
select machine_id,process_id,sum(if(activity_type='start',-1,1)*timestamp) as processing_time
from activity group by machine_id,process_id) t1 group by machine_id

# 方法二：使用窗口函数（窗口函数作用的范围是group by之后的结果）
select * from(
select machine_id
,round(avg(sum(if(activity_type='start',-1,1)*timestamp)) over(partition by machine_id),3) as processing_time
from activity group by machine_id,process_id) t1 group by machine_id

缦旋律

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
leetcode数据库刷题记录

1.找出某特征连续出现n次的所有记录题目要求：id和people都是int类型，visit_date是日期类型，id是主键方法一：使用rank()思路分析：将people>=100的所有id进行排序（由于本例中id是主键，所以肯定不会有重复，因此直接使用rank就可以）；然后用id值减去rank值，结果记为c，如果id是连续的，那么连续id对应c应该是相同的；c出现了几次，那么id就连续了c条记录。with t1 as( select id,visit_date,people,
复制链接

扫一扫