SQLZOO刷题笔记



注意

笔记中的默认 SQL Engine 为 MySQL,若换成其他的,会另外说明。

Self Join

在这里插入图片描述在这里插入图片描述在这里插入图片描述

10. 公交车的转车站点

Find the routes involving two buses that can go from Craiglockhart to Lochend.
Show the bus no. and company for the first bus, the name of the stop for the transfer,
and the bus no. and company for the second bus.
Hint: Self-join twice to find buses that visit Craiglockhart and Lochend, then join those on matching stops.

思路:根据题目的提示,找到第一辆公交车(经过 Craiglockhart 的公交车)对应的线路、公司、所经过的站点名称,将这三列信息选出来组合成一个表格,并将表格重命名为表格 bus1 ;同理,找到第二辆公交车(经过 Lochend 的公交车)的三列信息,并组合成表格 bus2 。此时,可以将 bus1 的 “name” 这列看成一个集合,将 bus2 的 “name” 这列看成另一个集合,然后,取这两个集合的交集,就可以得到 “中转站”(即:bus1bus2 共同经过的站点)。
  之所以这样做,是因为用下列代码测试完发现,并没有一辆公交车是直接经过 Craiglockhart 和 Lochend 这两个地点的,所以需要转车。(下列代码运行后到不到公交车)

select distinct r1.num,r1.company
 from route r1 join route r2 on r1.company=r2.company and r1.num=r2.num
 join stops s1 on s1.id=r1.stop join stops s2 on s2.id=r2.stop
where s1.name='Craiglockhart' and s2.name='Lochend'

找不到公交车
思路分析图

以下代码可以得到正确的结果:

select bus1.num ,bus1.company,bus1.name 中转站,bus2.num,bus2.company from
  (select distinct r11.num,r11.company, s12.name 
   from route r11 join route r12 on r11.company=r12.company and r11.num=r12.num
      join stops s11 on s11.id=r11.stop join stops s12 on s12.id=r12.stop
  where s11.name='Craiglockhart')bus1
join 
  (select distinct r21.num,r21.company,s22.name
   from route r21 join route r22 on r21.company=r22.company and r21.num=r22.num
      join stops s21 on s21.id=r21.stop join stops s22 on s22.id=r22.stop
  where s21.name='Lochend')bus2
on
  bus1.name=bus2.name
order by bus1.num,中转站,bus2.num

正确结果

Window functions

在这里插入图片描述

0. 排序

(1)rank():1,2,2,4,5,6,6,6,9,……比如,高考排名;
(2)dense_rank():1,1,2,3,4,4,4,5,……;
(3)row_rank():1,2,3,4,5,6,7,……主要用于显示行数。

1. warming up

 Show the lastName, party and votes for the constituency ‘S14000024’ in 2017.

SELECT lastName, party, votes
  FROM ge
 WHERE constituency = 'S14000024' AND yr = 2017
ORDER BY votes DESC

在这里插入图片描述

2. Who won?

 You can use the RANK function to see the order of the candidates. If you RANK using (ORDER BY votes DESC) then the candidate with the most votes has rank 1.
 Show the party and RANK for constituency S14000024 in 2017. List the output by party

SELECT party, votes,
       RANK() OVER (ORDER BY votes DESC) as posn
  FROM ge
 WHERE constituency = 'S14000024 ' AND yr = 2017
order by party

在这里插入图片描述

3. PARTITION BY

 The 2015 election is a different PARTITION to the 2017 election. We only care about the order of votes for each year.
 Use PARTITION to show the ranking of each party in S14000021 in each year. Include yr, party, votes and ranking (the party with the most votes is 1).

SELECT yr,party, votes,
      RANK() OVER (PARTITION BY yr ORDER BY votes DESC) as posn
  FROM ge
 WHERE constituency = 'S14000021'
ORDER BY party,yr

在这里插入图片描述

4. Edinburgh Constituency

 Edinburgh constituencies are numbered S14000021 to S14000026.
 Use PARTITION BY constituency to show the ranking of each party in Edinburgh in 2017. Order your results so the winners are shown first, then ordered by constituency.

SELECT constituency,party, votes,
  rank() over(partition by constituency order by votes desc) posn
  FROM ge
 WHERE constituency BETWEEN 'S14000021' AND 'S14000026'
   AND yr  = 2017
ORDER BY posn,constituency

在这里插入图片描述

5. Winners Only

 You can use SELECT within SELECT to pick out only the winners in Edinburgh.
 Show the parties that won for each Edinburgh constituency in 2017.

select constituency,party from
( SELECT constituency,party, votes,
   rank() over(partition by constituency order by votes desc) as posn
   FROM ge
  WHERE constituency BETWEEN 'S14000021' AND 'S14000026'
    AND yr  = 2017
 ORDER BY posn,constituency )SortedTable /*Every derived table must have its own alias:每个派生表都必须有自己的别名。这里的“SortedTable”是自己命名的*/
where posn=1

SortedTable 如下图所示:
在这里插入图片描述

正确结果如下图所示:
在这里插入图片描述

6. Scottish seats

知识背景:爱丁堡(英文、苏格兰文:Edinburgh;苏格兰盖尔文:Dùn Èideann)是英国苏格兰首府,位于苏格兰中部低地的福斯湾的南岸。

 You can use COUNT and GROUP BY to see how each party did in Scotland. Scottish constituencies start with ‘S’
 Show how many seats for each party in Scotland in 2017.

思路:在一个选区(constituency)的不同政党(party)中,得票数最高的政党将获得一个席位(seat)。所以,要先根据选区来划分(PARTITION BY),选出各个选区中得票数最高的政党,记为"选区内的排名=1,即:前面题目中的“posn=1”。最后,再将这些政党通过GROUP BY来分类统计。

select party as 政党,count(constituency) as 获得的席位数
from(
 select party,constituency,
    rank() over(partition by constituency order by votes desc)as 选区内的排名
  from ge
  where constituency like 'S%' and yr=2017
  order by 选区内的排名 asc)排序后的表格 
where 排序后的表格.选区内的排名=1
group by party

《排序后的表格》如下图所示:
在这里插入图片描述
得到的正确结果如下图:
在这里插入图片描述

MySQL 的 DATE_FORMAT() 函数

DATE_FORMAT()

Window LAG

注:本小节的第 2、3、6题要点击右上角的齿轮,将 SQL Engine 从 MySQL 改为 Microsoft SQL,不然会报错。

在这里插入图片描述

2. Introducing the LAG function

 The LAG function is used to show data from the preceding row or the table. When lining up rows the data is partitioned by country name and ordered by the data whn. That means that only data from Italy is considered.
 Modify the query to show confirmed for the day before.

SELECT name, DAY(whn), confirmed,
   LAG(confirmed,1) OVER (PARTITION BY name ORDER BY whn) 昨天确诊人数
 FROM covid
WHERE name = 'Italy'AND MONTH(whn) = 3
ORDER BY whn

在这里插入图片描述

3. Number of new cases

 The number of confirmed case is cumulative - but we can use LAG to recover the number of new cases reported for each day.
 Show the number of new cases for each day, for Italy, for March.

SELECT name 国家, DAY(whn) as '2020年3月__日',
 confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) as 今日新冠肺炎确诊人数
 FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn

在这里插入图片描述

4. Weekly changes

 The data gathered are necessarily estimates and are inaccurate. However by taking a longer time span we can mitigate some of the effects.
 You can filter the data to view only Monday’s figures WHERE WEEKDAY(whn) = 0.
 Show the number of new cases in Italy for each week - show Monday only.

SELECT name 国家, DATE_FORMAT(whn,'%Y-%m-%d') 日期,confirmed 截至今日确诊病例, 
 confirmed - lag(confirmed,1) over(partition by name order by whn) 今日新增确诊病例
 FROM covid
WHERE name = 'Italy'
AND WEEKDAY(whn) = 0
group by name, confirmed, whn /* 这句话一定要写,不然会报错 */
ORDER BY whn

  以上代码如果不加group by name, confirmed, whn这句,则会报错:Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause.
在这里插入图片描述
  上图来源于:这个地方。很遗憾,sqlzoo 的 window_lag 换了个地址,bug 又出现了。不用管它,知道就好。
  值得注意的是,本小节的第2、3题,如果用 MySQL 这个引擎的话,也是加上group by name, confirmed, whn这句话就能得到正确的结果。也就是说,用 MySQL 这个引擎比用 Microsoft SQL 这个引擎多了group by name, confirmed, whn这句话。

正确运行结果:
在这里插入图片描述

5. LAG using a JOIN

 You can JOIN a table using DATE arithmetic. This will give different results if data is missing.
 Show the number of new cases in Italy for each week - show Monday only.
 In the sample query we JOIN this week tw with last week lw using the DATE_ADD function.

select tw.name 国家, date_format(tw.whn,'%Y-%m-%d') 日期,tw.confirmed - lw.confirmed 比上周一增加的确诊病例数
 from covid as lw right join covid as tw 
 on lw.name=tw.name and date_add(lw.whn,interval 7 day)=tw.whn
where tw.name='Italy' and weekday(tw.whn)=0
order by tw.whn

在这里插入图片描述

6. RANK()

 The query shown shows the number of confirmed cases together with the world ranking for cases.
 United States has the highest number, Spain is number 2…
 Notice that while Spain has the second highest confirmed cases, Italy has the second highest number of deaths due to the virus.
 Include the ranking for the number of deaths in the table.

SELECT 
   name,
   confirmed,
   RANK() OVER (ORDER BY confirmed DESC) rc,
   deaths,
   rank() over(order by deaths desc) rd
  FROM covid
WHERE whn = '2020-04-20'
ORDER BY confirmed DESC

在这里插入图片描述

7. Infection rate

 The query shown includes a JOIN t the world table so we can access the total population of each country and calculate infection rates (in cases per 100,000).
 Show the infect rate ranking for each country. Only include countries with a population of at least 10 million.

SELECT  world.name 国家,population 总人口数,confirmed 感染的人数,
   ROUND(confirmed/(population/100000),0) 每十万人中感染的人数,
   rank() over(order by 每十万人中感染的人数 desc) 感染率排名
  FROM covid JOIN world ON covid.name=world.name
WHERE whn = '2020-04-20' AND population > 10000000
group by 国家,confirmed,population
ORDER BY 感染率排名 asc

在这里插入图片描述

8. Turning the corner

 For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.

select 国家, max(新增确诊病例) 单日新增最多病例数,
  rank() over(order by 单日新增最多病例数 desc) 排名
from( select name 国家,
   confirmed - lag(confirmed,1) over(partition by name order by whn) 新增确诊病例
 from covid
 group by 国家,confirmed
)SortedTable
group by 国家
having 单日新增最多病例数>=1000
order by 排名 asc

  上述代码中之所以采用 select 嵌套,是因为 lag 无法与 max 嵌套使用,所以我们选择先构造一个表格 SortedTable,然后再从该表格中去取列的 max 值。
在这里插入图片描述

主键

  数据库自动按主键值的顺序显示表中的记录。如果没有定义主键,则按输入记录的顺序显示表中的记录。

复合主键

  就是用两个或两个以上的字段(即:表中的列的名称)做为主键。有时一个字段难以标识唯一,就采取复合主键方式来标识唯一。

外键

  如果公共关键字在一个关系中是主关键字,那么这个公共关键字被称为另一个关系的外键。由此可见,外键表示了两个关系之间的相关联系。以另一个关系的外键作主关键字的表被称为主表,具有此外键的表被称为主表的从表。外键又称作外关键字。
(详见:外键——百度百科
在这里插入图片描述

  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值