【力扣刷题总结】数据库题目按知识点分类总结

Sprite.Nym

已于 2022-09-05 12:25:02 修改

阅读量1.9k

点赞数 14

分类专栏：第三阶段数据库笔记文章标签： leetcode 数据库

于 2022-06-01 16:39:44 首次发布

本文链接：https://blog.csdn.net/SpriteNym/article/details/125086132

版权

观前提示：

这个图先自行背诵至滚瓜烂熟。

文档中论述题目解题思路时没有特殊情况都按照上面的程序执行顺序为准，个别题也会以其他更自然的顺序讲解。
本文的知识点整理仅起提示作用，一些不常用功能、参数可能会遗漏，如需了解详细用法请自行百度。
有的题涉及多个知识点，则取按重要性排名的前2-3个知识点归入相应分类（同一道题可能在本文出现多次）。
本文持续更新，更新顺序基本是由易到难。
有些明显有上下级关系的分类，则只取高级分类归入。如差集必然涉及子查询，则只归入差集。自联结是普通联结中的特殊情况则不再归入普通联结。
很多复杂的题写子查询套太多层，就统一使用临时表了。

文章目录

0. 常用聚合函数
1. distinct去重
- - - 第176题
    - 第177题
    - 第597题
    - 第1045题
    - 第1113题
    - 第1141题
    - 第1149题
    - 第1919题
    - 第2082题
    - 第2205题
    - 第2356题
2. where 筛选
3. 联结
- 3.1 笛卡尔积
- 3.2 内外联结
- - 内联结
  - - 第1073题
    - 第1251题
    - 第1571题
    - 第1587题
    - 第1623题
    - 第1731题
    - 第1919题
    - 第2329题
    - 第2339题
  - 外联结
  - - 第175题
    - 第262题
    - 第577题
    - 第580题
    - 第618题
    - 第1068题
    - 第1098题
    - 第1132题
    - 第1158题
    - 第1241题
    - 第1364题(四表联查)
    - 第1378题
    - 第1407题
    - 第1421题
    - 第1581题
    - 第1677题
    - 第1715题
    - 第1809题
    - 第1988题
    - 第2142题
- 3.3 自连接
- - - 第180题
    - 第181题
    - 第196题
    - 第197题
    - 第570题
    - 第603题
    - 第612题
    - 第613题
    - 第626题
    - 第1097题
    - 第1270题
    - 第1285题（连续区间）
    - 第1321题
    - 第1459题
    - 第1747题
    - 第1843题
    - 第1875题
    - 第1951题
    - 第1939题
    - 第1949题
    - 第1951题
    - 第2084题
    - 第2228题
4. union 求并集（常用于列转行/宽转长）
5. 子查询/嵌套查询
- - - 第176题
    - 第177题
    - 第597题
    - 第1076题
    - 第1082题
    - 第1107题
    - 第1112题
    - 第1126题
    - 第1174题
    - 第1241题
    - 第1303题
    - 第1468题
    - 第1821题
    - 第1831题
    - 第1867题
    - 第2112题
    - 第2142题
6. 字符串相关函数
7. 时间相关函数
8. group by 分组
- - 单字段分组
  - - 第511题
    - 第586题
    - 第619题
    - 第1069题
    - 第1075题
    - 第1194题
    - 第1211题
    - 第1322题
    - 第1327题
    - 第1484题
    - 第1571题
    - 第1633题
    - 第1729题
    - 第1821题
    - 第1890题
  - 多字段联合分组
9. having 筛选
- - - 第182题
    - 第586题
    - 第596题
    - 第1050题
    - 第1076题
    - 第1501题
    - 第1511题
    - 第1587题
    - 第1867题
    - 第2041题
10. if/case 语句
11. 窗口函数
- - - 第176题
    - 第177题
    - 第178题
    - 第185题
    - 第534题
    - 第550题
    - 第569题
    - 第571题
    - 第579题
    - 第613题
    - 第618题
    - 第1076题
    - 第1077题
    - 第1112题
    - 第1204题
    - 第1225题
    - 第1285题
    - 第1308题
    - 第1369题
    - 第1412题
    - 第1454题（连续区间）
    - 第1532题
    - 第1549题
    - 第1596题
    - 第1709题
    - 第1811题
    - 第1831题
    - 第1875题
    - 第1951题
    - 第2066题
    - 第2112题
    - 第2159题
    - 第2175题
    - 第2228题
    - 第2292题
    - 第2308题
    - 第2314题
    - 第2324题
    - 第2346题
12. 限定谓词
13. like模糊匹配和regexp 正则表达式
- - - 第1517题
    - 第1527题
14. 临时表
15. recursive递归

0. 常用聚合函数

注意：由于基本每道题都在用聚合函数，就不把题总结进来了。

功能	函数
求和	sum([distinct]列名)
计数(可去重)	count([distinct] 列名)
求平均值	avg([distinct]列名)
求最大值	max(列名)
求最小值	min(列名)
字段拼接(可去重、指定顺序、指定分隔符)	group_concat([distinct] 列名1 [order by 列名2 asc/desc] [separator 分隔符])

1. distinct去重

功能	用法
单字段去重	distinct 列名
多字段去重	distinct 列名1, 列名2
使用聚合函数时去重	聚合函数(distinct 列名)

第176题

方法1：子查询中distinct 去重、order by 排序、limit + offset 输出第二位、主查询再查一次把空值变成null。

SELECT
    (SELECT DISTINCT
            Salary
        FROM
            Employee
        ORDER BY Salary DESC
        LIMIT 1 OFFSET 1) AS SecondHighestSalary
;

方法2：子查询中dense_rank() 排序、主查询中where 筛选、sum()或者max()或者min()聚合函数把空值变成null。

select  max(salary) as SecondHighestSalary
  from  (select  salary,
                 dense_rank() over(order by salary desc) as rk 
           from  employee) t
 where  rk = 2;

第177题

类似第176题。用方法1要先set n = n - 1。

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  set n = n - 1;
  RETURN (
      # Write your MySQL query statement below.
      select  salary
        from  (  select  distinct salary 
                   from  employee
               order by  salary desc) t
       limit  1 offset n
  );
END

第597题

方法一：先distinct + 多字段得到去重后的临时表，再count()对临时表中的数据计数，无法直接count(distinct + 多字段)。

方法二：为了解决无法直接count(distinct + 多字段)的问题可以把多字段用concat()拼成单字段。

select  round(coalesce((t1.acc/t2.req), 0), 2) as accept_rate
  from  (select  count(distinct concat(requester_id, accepter_id)) as acc,
                 'yyy' as tag
           from  RequestAccepted) t1 inner join
        (select  count(distinct concat(sender_id, send_to_id)) as req,
                 'yyy' as tag
           from  FriendRequest) t2 
    on  t1.tag = t2.tag;

第1045题

group by 按用户id分组、子查询中得到所有产品的数量、having + count(distinct) 去重计数筛选出数量=所有产品数量的用户id。

  select  customer_id
    from  customer 
group by  customer_id
  having  count(distinct product_key) = (select  count(*)
                                           from  product);

第1113题

where 筛选，group by 分组，count(distinct) 统计。

  select  extra as report_reason,
          count(distinct post_id) as report_count
    from  actions
   where  action = 'report'
     and  action_date = '2019-07-04'
group by  extra;

第1141题

where + datediff 筛选、group by 分组、count(distinct) 计数。

  select  activity_date as day,
          count(distinct user_id) as active_users
    from  activity
   where  datediff('2019-07-27', activity_date) < 30 and datediff('2019-07-27', activity_date) >= 0
group by  activity_date;

第1149题

子查询中先将表去重一次，排除看同一本书的情况。主查询中按viewer_id和view_date 联合分组、having+ count() 筛选出看了2篇及以上文章的人、最后distinct去重是排除同一个人多天读多本书的情况~~（太学霸了）~~。

  select  distinct viewer_id as id  
    from  (select  distinct article_id,
                   viewer_id,
                   view_date
             from  views) t
group by  viewer_id, view_date
  having  count(*) >= 2
order by  id;

第1919题

先给每个user1找朋友（user2）：listens表和friendship表内联结，此时id大的人被去掉（因为没有必要同时出现2配5和5配2这样相同的组合）、再内联结listens表看每个user2当天听的和user1相同的歌、group by 按照user1_id、user2_id和day三字段联合分组、having + count(distinct)统计不同的歌曲数，大于3即可。最后select的时候也要distinct，因为有的user1和user2可能有很多天都听了同样的多首歌，会被筛出多次。

  select  distinct t1.user_id as user1_id,
          t3.user_id as user2_id
    from  listens t1 inner join friendship t2
      on  t1.user_id = t2.user1_id
          inner join listens t3
      on  t2.user2_id = t3.user_id 
     and  t1.day = t3.day
     and  t1.song_id = t3.song_id
group by  t1.user_id, t3.user_id, t1.day
  having  count(distinct t1.song_id) >= 3;

第2082题

where 筛选、count(distinct)计数。

  select  count(distinct customer_id) as rich_count
    from  store
   where  amount > 500;

第2205题

where 筛选、count(distinct)计数。

CREATE FUNCTION getUserIDs(startDate DATE, endDate DATE, minAmount INT) RETURNS INT
BEGIN
  RETURN (
      # Write your MySQL query statement below.
      select  count(distinct user_id) as user_cnt
        from  purchases
       where  amount >= minAmount
         and  time_stamp between startDate and endDate
  );
END

第2356题

count + distinct不重复计数。

  select  teacher_id,
          count(distinct subject_id) as cnt 
    from  teacher
group by  teacher_id;

2. where 筛选

功能	用法
单条件筛选	where 条件语句
多条件筛选(与)	where 条件1 and 条件2
多条件筛选(或)	where 条件1 or 条件2
目标字段在/不在子查询结果中	where 字段 in / not in (子查询)
目标元组在/不在子查询结果中	where (字段1, 字段2) in / not in (子查询)

普通用法

第584题

coalesce() 处理null值、where 筛选。

select  name
  from  customer
 where  coalesce(referee_id, 0) <> 2;

第1132题

actions表 left join removals表 on post_id相同、where留下extra = 'spam’的数据（这里必须用where筛选，不能写在on后面）、round() + 100 * count(distinct t2.post_id) / count(distinct t1.post_id)得到每天的百分比（这里去重是因为有可能同一个垃圾广告被举报多次，就会有多条记录，但是实际上这只是1条广告，也只被删了1次）、再把前面的查询作为子查询，查出每天删除比率的平均值。

select  round(avg(d_ratio), 2) as average_daily_percent
  from  (  select  100 * count(distinct t2.post_id) / count(distinct t1.post_id) as d_ratio
             from  actions t1 left join removals t2
               on  t1.post_id = t2.post_id
            where  t1.extra = 'spam'
         group by  t1.action_date) t;

第1148题

where 筛选、distinct 去重、order by 排序。

  select  distinct author_id as id
    from  views
   where  author_id = viewer_id
order by  author_id asc;

第1495题

inner join 两表联结、where + 筛选条件。

select  distinct t2.title as title
  from  (select  content_id
           from  tvprogram
          where  date_format(program_date, "%Y-%m") = 
                 '2020-06') t1 inner join 
        (select  title,
                 content_id
           from  content
          where  kids_content = 'Y'
            and  content_type = 'Movies') t2
    on  t1.content_id = t2.content_id;

第2026题

where + 筛选条件。

  select  problem_id
    from  problems
   where  likes / (likes + dislikes) < 0.6
order by  problem_id;

(列1, 列2, …) in 语句

第184题

先查出各部门最高工资表，再筛选出(部门，工资) in 各部门最高工资表中的员工。

  select  t2.name as department,
          t1.name as employee,
          t1.salary
    from  employee t1 inner join department t2 on t1.departmentId = t2.id
   where  (t1.departmentId, salary)
      in  (  select  departmentId,
                     max(salary)
               from  employee
           group by  departmentId);

第512题

先查出各player最早登录日期表，再筛选出(玩家id, 登陆日期) in 最早登录日期表中的玩家。

select  player_id,
        device_id
  from  activity t1
 where  (player_id, t1.event_date) in (  select  player_id,
                                                 min(event_date)
                                           from  activity
                                       group by  player_id)

第585题

思路是“2015年和别人相同”、“坐标唯一”两个条件分开看，再用where in 取交集。

临时temp表中两个insurance表自连接，将2015年投资和别人一样的数据筛选出来、注意这样一个人可能联上了多条数据，所以最后要distinct去重。子查询中group by按经纬度多字段分组、having + count() 筛选出唯一坐标的pid（注意一般用了分组之后是要聚合的，但因为这次pid是唯一的，就不聚合了）。主查询从temp表里筛选出pid在子查询中的数据（即坐标唯一的数据），并sum()求和。

with temp as (
    select  distinct t1.PID,
            t1.TIV_2016
      from  insurance t1 inner join insurance t2
        on  t1.PID <> t2.PID
       and  t1.TIV_2015 = t2.TIV_2015
)

select  round(sum(TIV_2016), 2) as TIV_2016
  from  temp
 where  PID in (  select  PID
                    from  insurance
                group by  LAT, LON
                  having  count(*) = 1);

第614题

利用where in (子查询) 筛选出followee列中在follower列中的人、分组计数、排序（题目让按字典顺序排，其实就是按followee字母顺序排）

  select  followee as follower,
          count(*) as num
    from  follow
   where  followee in (select follower from follow)
group by  followee
order by  followee;

第1070题

子查询中group by按product_id分组得到每个product_id和他对应的min(year)、主查询中用where (product_id, year) in 子查询筛选出需要的数据。

select  product_id,
        year as first_year,
        quantity,
        price
  from  sales
 where  (product_id, year) in (  select  product_id,
                                         min(year)
                                   from  sales
                               group by  product_id);

第1164题

方法1：子查询中分组聚合查出日期在8月16号前的max(日期)，即16号前的最近改动日期（这里没法连着new_price一起查哈！分组后的数据是不会自动一一对应的）、再套一层查询用where (id, 日期) in 子查询，查出id和new_price、将之前的查询结果外联结到一个distinct id的子查询表中、这样联上了的就按联上的价格取，没联上的就是10元。

方法2：既然原表没有初始10元这个数据，我们就自己加上去。创建临时表：原表union all 查出每个id原价10元的子查询（包含distinct id、10、初始日期’0000-00-00’），之后涉及from的全部用这个临时表、where筛选出16号之前的数据、同方法1先查出id和max(日期)、再套一层查询得到id和价格。

with temp as (
    select  product_id,
            new_price,
            change_date
      from  products
 union all 
    select  distinct product_id,
            10,
            '0000-00-00'
      from  products
)

select  product_id,
        new_price as price 
  from  temp
 where  (product_id, change_date) in (  select  product_id,
                                                max(change_date) as change_date
                                          from  temp
                                         where  change_date <= '2019-08-16'
                                      group by  product_id);

第1174题

子查询查出用户首次购买数据、主查询统计order_date = pref_date的数据。这里注意一个小技巧：按条件计数，是即时订单+1，计划订单+0，可以用sum(case when)，也可以直接写sum(条件)，此时条件满足时自动把布尔值转换成1计算，不满足转换成0计算。

select  round(100 * sum(order_date = customer_pref_delivery_date) / count(*), 2) as immediate_percentage
  from  delivery 
 where  (customer_id, order_date) in (  select  customer_id,
                                                min(order_date) as first 
                                          from  delivery
                                      group by  customer_id);

not in 语句（找差集）

第183题

子查询出订了东西的用户、where not in 查出不订的用户。

select  Name as Customers
  from  Customers t1
 where  t1.Id not in (select CustomerId from Orders);

第607题

子查询出向’red’销售过的人、再用where not in找差集。

select  name
  from  salesperson
 where  sales_id not in (select  sales_id
                           from  orders t1 natural join company t2
                          where  t2.name = 'RED');

第1083题

方法一：分别where筛选出买S8和iphone的人，再not in 找差集。

select  distinct buyer_id
  from  sales t1 left join product t2
    on  t1.product_id = t2.product_id
 where  buyer_id not in (select  buyer_id
                           from  sales t1 left join product t2
                             on  t1.product_id = t2.product_id
                          where  product_name = 'iPhone')
   and  product_name = 'S8';

方法二：用sum(case when)的方法绕开子查询达到找差集目的，具体做法是求和时遇到“S8”就+1，遇到“iphone”就-1，只买“S8”的人的求和结果就是1，都买了的人就是0。

  select  buyer_id
    from  sales t1 left join product t2
      on  t1.product_id = t2.product_id
   where  product_name in ('S8', 'iPhone')
group by  buyer_id
  having  sum(distinct case product_name when 'S8' then 1 else -1 end) = 1;

第1084题

子查询出在目标时间范围以外卖过的产品、where not in找差集。

select  product_id,
        product_name
  from  product
 where  product_id not in (select  distinct product_id
                             from  sales
                            where  sale_date > '2019-03-31'
                               or  sale_date < '2019-01-01')

第1264题

子查询中将friendship原表和交换左右顺序的原表union all取并集、套一层查询从第二列找到user_id=1的朋友、主查询中查distinct page_id，因为有的人喜欢同一个页面、where进行筛选，条件是user_id在上面子查询的朋友中且page_id不等于1号已经喜欢的page_id（要再用个子查询得到1号喜欢的page_id）。

select  distinct page_id as recommended_page
  from  likes
 where  user_id in (select  user2_id
                      from  (select  user1_id,
                                     user2_id
                               from  friendship
                             union all 
                             select  user2_id,
                                     user1_id
                               from  friendship) t1
                     where  user1_id = 1)
   and  page_id not in (select  page_id
                          from  likes
                         where  user_id = 1);

第1350题

not in 找差集。

select  id,
        name
  from  students
 where  department_id not in (select id from departments);

第1412题

临时temp表中用min()、max()窗口函数找到每次考试最高最低分，再顺手求个当前学生分数和最高/最低分的分差、套一层查询筛选出两个分差乘积为0的id，这一步也就是在找考了最高分或最低分的id。

主查询中where筛选出在exam表（即参加了考试），但不在temp表（即没考到最高或最低分）中的人。

with temp as (
    select  distinct student_id
      from  (select  student_id,
                     score - min(score) over(partition by exam_id 
                                             order by score asc) as d1,
                     score - max(score) over(partition by exam_id 
                                             order by score desc) as d2
               from  exam) t 
     where  d1 * d2 = 0    
)

select  student_id,
        student_name
  from  student
 where  student_id in (select  student_id 
                         from  exam)
   and  student_id not in (select  student_id
                             from  temp);

第1607题

not in 找差集。

  select  seller_name
    from  seller t1
   where  t1.seller_id not in (select  seller_id 
                                 from  orders
                                where  year(sale_date) = '2020')
order by  seller_name;

第1978题

not in 找差集。

  select  employee_id
    from  employees
   where  salary < 30000
     and  manager_id not in (select  employee_id 
                               from  employees)
order by  employee_id;

and 同时满足多个条件

第196题

inner join自连接、delete删除。

delete t1 from Person t1 inner join Person t2 on t1.email = t2.email and t1.id > t2.id;

第620题

where A and B。

  select  id,
          movie,
          description,
          rating
    from  cinema
   where  description <> 'boring'
     and  id % 2 = 1
order by  rating desc;

第1757题

where A and B。

select  product_id
  from  products
 where  low_fats = 'Y' 
   and  recyclable = 'Y';

第2020题

natural join 联结两表、where 筛选。

select  count(*) as accounts_count
  from  subscriptions t1 natural join streams t2
 where  year(end_date) = '2021'
   and  year(stream_date) != '2021';

第2230题

where A and B。

CREATE PROCEDURE getUserIDs(startDate DATE, endDate DATE, minAmount INT)
BEGIN
	# Write your MySQL query statement below.
	  select  distinct user_id 
        from  purchases
       where  amount >= minAmount
         and  time_stamp between startDate and endDate
    order by  user_id;
END

or 满足多个条件中的一个条件即可

第595题

where 条件1 or 条件2。

select  name,
        population,
        area
  from  world
 where  area >= 3000000 or population >= 25000000;

3. 联结

preview

3.1 笛卡尔积

笛卡尔积是理解所有联结的基础。笛卡尔积就是不管表1和表2有没有关系，将他们的数据强行组合在一起。例如表1有2排，表2有3排，笛卡尔积则有2×3=6排。

所有联结的执行方式都是先产生两表的笛卡尔积，再在此基础上进行筛选出需要的数据。

笛卡尔积用于得到两表行与行之间的全部排列组合情况。

功能	用法
产生两个表的笛卡尔积(不能跟筛选条件)	from 表1, 表2
产生两个表的笛卡尔积(能跟筛选条件)	from 表1 cross join 表2

注意：

（1）inner join / join 不加筛选条件也得到笛卡尔积，但不建议这样写，表述不清。left join / right join 不能不跟筛选条件。

（2）之所以使用笛卡尔积就是为了得到排列组合的所有情况，否则还不如使用inner join，所以原则上来说笛卡尔积后面是不跟筛选条件的。

（3）由于本人水平原因，很多题使用了先笛卡尔积再筛选的写法，类似先while True再break，其实都可以优化成inner join，大家不要盲目照搬。

第1280题

from 表1 cross join 表2得到笛卡尔积，再left join表3、group by分组、count()计数。

这里注意2点：① 因为表1和表2无关所以才适合用cross join，否则一般不用；② 一般不能两个join连着写最后再写on条件，但这次因为前面是cross join本来就不需要跟筛选条件所以没关系。

  select  t1.student_id,
          t1.student_name,
          t2.subject_name,
          count(t3.subject_name) as attended_exams
    from  students t1 cross join subjects t2 left join examinations t3
      on  t1.student_id = t3.student_id
     and  t2.subject_name = t3.subject_name
group by  t1.student_id, t2.subject_name
order by  t1.student_id, t2.subject_name;

第1783题（列转行/宽转长）

方法1：用多个union all 将championships表的数据汇总到一列并将列名改成player_id、natural join players 表、group by 分组、count() 统计。

  select  player_id,
          player_name,
          count(*) as grand_slams_count
    from  (   select  wimbledon as player_id
                from  championships
           union all
              select  fr_open as player_id
                from  championships t1
           union all
              select  US_open as player_id
                from  championships
           union all
              select  Au_open as player_id
                from  championships) t1 natural join players t2
group by  player_id;

方法2：两张表生成笛卡尔积，此时每个球员都对应一份championships表的全部数据、group by 分组、sum(case when) 筛选统计。

SELECT player_id,player_name,      SUM(IF(player_id=Wimbledon,1,0)+IF(player_id=Fr_open,1,0)+IF(player_id=US_open,1,0)+IF(player_id=Au_open,1,0)) AS grand_slams_count
FROM Championships 
JOIN Players 
GROUP BY player_id
HAVING grand_slams_count>0

第1990题（查固定字段）

union all枚举创建platform表和experiment_name表、cross join得到这两个表笛卡尔积、left join experiments表+group by+count(*) 计数。

  select  t1.platform,
          t2.experiment_name,
          count(t3.experiment_name) as num_experiments
    from  (select  'Android' as platform union all
           select  'IOS' as platform union all
           select  'Web' as platform) t1 cross join
          (select  'Reading' as experiment_name union all
           select  'Sports' as experiment_name union all
           select  'Programming' as experiment_name) t2 left join
          experiments t3
      on  t1.platform = t3.platform
     and  t2.experiment_name = t3.experiment_name
group by  platform, experiment_name;

3.2 内外联结

功能	用法
内联结	表1 inner join 表2 on 条件
自然联结	表1 natural join 表2
左外联结	表1 left join 表2 on 条件
右外联结	表1 right join 表2 on 条件

注意：

（1）natural join是一种特殊的inner join，区别是：

① natural join会选择外键进行连接，没有外键时自动选择两表相同字段进行联结，而无需设置on条件。

② natural join后两表相同的字段即被合并，而普通的inner join不会自动合并。

例如：表1有id、性别2列，表2有id、姓名2列。

inner join后的表有4列：表1id、表1性别、表2id、表2姓名。

natural join后的表只有3列：id、表1性别、表2姓名。

（2）通过1的例子可以发现，inner join（外联结也一样）后经常会出现重复的列名，为了避免提示ambiguous错误，建议任何联结时都给表标t1、t2别名，且使用任何列名时都以t1.列名、t2.列名的形式书写。

（3）外联结一般用于保留联结时联不上的地方产生的null值。即使用外联结以外的联结方式并跟上筛选条件时，未联上/选上的行会被自动排除，但使用外联结时则会保留并显示null。

（4）如果涉及多个表联结，表1 join 表2后马上跟on条件，不要都join完了最后再来慢慢写筛选条件。

（5）两表内联结时如果要联结的列名相同，则可以用on 表1.列名 = 表2.列名，还可以用using(列名)。区别是第一种用法不会合并用于联结的同名列，而using()会合并，效果类似natural join，使用using()时前面不要写on。

（6）内联结时把筛选条件写在on里和联完后的where里对查询结果是没有影响的，但外联结就有影响。原因是：在外联结中，筛选条件写在on里，没联上的数据也会保留null值，而筛选条件写在where里，没选上的数据会直接消失。例如：

表1

id	姓名	出生日期
1	张三	1997-8-22
2	李四	null
3	王五	1997-8-20

表2

id	性别
1	男
2	女

表1 left join 表2 on 表1.id = 表2.id and t1.出生日期 is not null

t1.id	t1.姓名	t2.id	t2.性别
1	张三	1	男
2	李四	null	null
3	王五	null	null

注意观察出生日期是null的李四还在表里，只是没跟她的性别联上而已。如果后续我们进行一个count()操作，会发现数出来有3人。

表1 left join 表2 on 表1.id = 表2.id where t1.出生日期 is not null

t1.id	t1.姓名	t2.id	t2.性别
1	张三	1	男
3	王五	null	null

出生日期是null的李四直接消失。如果后续我们进行一个count()操作，会发现数出来不是上一个情况的3人，而是2人。

内联结

第1073题

inner join 两表联结、group by 分组、avg()算平均数。

第1251题

inner join 内联结、group by分组、round()和sum()等函数计算数据。

  select  t1.product_id,
          round(sum(t2.units * t1.price) / sum(t2.units), 2) as average_price
    from  Prices t1 inner join UnitsSold t2
      on  t2.purchase_date between t1.start_date and t1.end_date
     and  t1.product_id = t2.product_id
group by  product_id;

第1571题

inner join 两表联结、group by 分组、sum()求和。

  select  name as WAREHOUSE_NAME,
          sum(units * v_pro) as VOLUME
    from  warehouse t1 inner join (select  product_id,
                                           width * length * height as v_pro
                                     from  products) t2
      on  t1.product_id = t2.product_id
group by  t1.name;

第1587题

inner join 两表联结、group by 分组、sum()求和、having 筛选。

第1623题

一边inner join一边筛选。

select  t1.student_name as member_A,
        t2.student_name as member_B,
        t3.student_name as member_C
  from  schoola t1 inner join schoolb t2
    on  t1.student_name <> t2.student_name 
   and  t1.student_id <> t2.student_id
        inner join schoolc t3
    on  t2.student_name <> t3.student_name
   and  t1.student_name <> t3.student_name
   and  t1.student_id <> t3.student_id
   and  t3.student_id <> t2.student_id;

第1731题

inner join 把经理名字联到reports_to的id后面、group by 按经理id分组、count() 计数、avg() 求平均值、round() 四舍五入。

  select  t1.reports_to as employee_id,
          t2.name,
          count(t1.reports_to) as reports_count,
          round(avg(t1.age), 0) as average_age
    from  employees t1 inner join employees t2
      on  t1.reports_to = t2.employee_id
group by  t1.reports_to
order by  t2.employee_id;

第1919题

  select  distinct t1.user_id as user1_id,
          t3.user_id as user2_id
    from  listens t1 inner join friendship t2
      on  t1.user_id = t2.user1_id
          inner join listens t3
      on  t2.user2_id = t3.user_id 
     and  t1.day = t3.day
     and  t1.song_id = t3.song_id
group by  t1.user_id, t3.user_id, t1.day
  having  count(distinct t1.song_id) >= 3;

第2329题

inner join两表联结、group by按用户id分组、sum(销量*单价)得到每个人总开销、按要求排序。

  select  user_id,
          sum(quantity * price) as spending
    from  Sales t1 inner join Product t2
          using(product_id) 
group by  user_id
order by  spending desc, user_id asc;

第2339题

inner join两表联结、on筛选出不同的配对。

select  t1.team_name as home_team,
        t2.team_name as away_team 
  from  teams t1 inner join teams t2
    on  t1.team_name <> t2.team_name;

外联结

注意：为了方便起见，本文档所有能用左/右外联结的地方全部用左外联结记录，实际做题时各人视情况而定。

第175题

left join 左外联结。

select  firstName,
        lastName,
        city,
        state
  from  Person t1 left join Address t2
    on  t1.personId = t2.personId;

第262题

子查询中先查出被禁的用户和司机，减少数据量、trips表 left join 子查询，保留连不上的地方以达到求差集目的（这种做法比where not in 找差集效率更高）、where继续筛选日期、group by 按日期分组、聚合函数统计、别名的时候记得打反引号，因为不打反引号时不允许出现空格。

  select  request_at as `Day`,
          round(1 - sum(status='completed') /
          count(*), 2) as `Cancellation Rate`
    from  trips t1 left join (select  users_id
                                from  users
                               where  banned = 'Yes') t2
      on  t1.client_id = t2.users_id
      or  t1.driver_id = t2.users_id
   where  t2.users_id is null
     and  request_at between '2013-10-01' and '2013-10-03'
group by  request_at;

第577题

left join 左外联结、coalesce() 处理null。

select  t1.name,
        t2.bonus
  from  employee t1 left join bonus t2 
    on  t1.empid = t2.empid
 where  coalesce(t2.bonus, 0) < 1000;

第580题

部门表left join 学生表、group by 分组、count() 计数、coalesce() 处理null、order by排序。

  select  dept_name,
          coalesce(count(t2.dept_id), 0) as student_number
    from  department t1 left join student t2
      on  t1.dept_id = t2.dept_id
group by  t1.dept_id
order by  student_number desc, dept_name asc;

第618题

第一个子查询查美洲学生，后面的学生都没美洲学生多，所以可以用left join联上去。给每个子查询都用row_number()窗口函数编号，方便后续按照编号联结。

select  America,
        Asia,
        Europe
  from  (select  name as America,
                 row_number() over(order by name) as rk
           from  student
          where  continent = 'America') t1 left join
        (select  name as Asia,
                 row_number() over(order by name) as rk
           from  student
          where  continent = 'Asia') t2
    on  t1.rk = t2.rk
        left join
        (select  name as Europe,
                 row_number() over(order by name) as rk
           from  student
          where  continent = 'Europe') t3
    on  t1.rk = t3.rk;

第1068题

left join 左外联结。

select  `product_name`,
        `year`,
        `price`
  from  sales t1 left join product t2 
    on  t1.product_id = t2.product_id;

第1098题

这道题筛选条件比较多，如果先外联结再一口气筛选，可能会出现不知道把筛选条件写在on后面还是where后面或者一不小心写错的情况。所以思路是先通过子查询筛选一遍，再对筛选后的表外联结，这样条理比较清晰。

books表先用where筛掉日期不合规的数据、orders表也用where筛掉日期不合规的数据、两个查询结果left join、group by 分组、having + sum() 统计和筛选。

  select  t1.book_id,
          t1.name
    from  (select  book_id,
                   name
             from  books
            where  available_from <= '2019-05-23') t1 left join 
          (select  book_id,
                   quantity
             from  orders
            where  dispatch_date >= '2018-06-23') t2
      on  t1.book_id = t2.book_id
group by  t1.book_id
  having  coalesce(sum(t2.quantity), 0) < 10;

第1132题

select  round(avg(d_ratio), 2) as average_daily_percent
  from  (  select  100 * count(distinct t2.post_id) / count(distinct t1.post_id) as d_ratio
             from  actions t1 left join removals t2
               on  t1.post_id = t2.post_id
            where  t1.extra = 'spam'
         group by  t1.action_date) t;

第1158题

left join 左外联结、group by 分组、where筛选日期、year()截年份、count()计数。

  select  user_id as buyer_id,
          join_date,
          count(buyer_id) as orders_in_2019
    from  users t1 left join (select  buyer_id
                                from  orders
                               where  year(order_date) = '2019') t2 
      on  t1.user_id = t2.buyer_id
group by  user_id;

第1241题

子查询+where筛选查出两个表（帖子表、评论表）、帖子表left join评论表、count计数。

  select  t1.sub_id as post_id,
          count(t2.sub_id) as number_of_comments
    from  (select  distinct sub_id, 
                   parent_id 
             from  submissions
            where  parent_id is null) t1 left join
          (select  distinct sub_id, 
                   parent_id 
             from  submissions
            where  parent_id is not null) t2
      on  t1.sub_id = t2.parent_id  
group by  t1.sub_id
order by  post_id;

第1364题(四表联查)

invoices表内联结customers表、left join contacts表以统计联系人数量、再left join customers表（这里是用邮箱一一对应联结的）以统计可信联系人数量、group by分组、count()统计。

  select  t1.invoice_id,
          t2.customer_name,
          t1.price,
          count(t3.contact_name) as contacts_cnt,
          count(t4.customer_name) as trusted_contacts_cnt
    from  invoices t1 inner join customers t2
      on  t1.user_id = t2.customer_id
          left join contacts t3
      on