ch06 决胜秋招综合练习

原题目链接所需建表代码

Section A

练习一: 各部门工资最高的员工(难度:中等)

创建Employee 表,包含所有员工信息,每个员工有其对应的 Id, salary 和 department Id。

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

创建Department 表,包含公司所有部门的信息。

+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+

编写一个 SQL 查询,找出每个部门工资最高的员工。例如,根据上述给定的表格,Max 在 IT 部门有最高工资,Henry 在 Sales 部门有最高工资。

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| Sales      | Henry    | 80000  |
+------------+----------+--------+

解答:

select d.name as department,e.name as employee,e.salary as salary
from department d
join employee e
on d.id=e.departmentid
where (e.departmentid, e.salary) in
(select departmentid, MAX(salary)
 from employee
 group by departmentid)

运行结果:

 


练习二: 换座位(难度:中等)

小美是一所中学的信息科技老师,她有一张 seat 座位表,平时用来储存学生名字和与他们相对应的座位 id。其中纵列的id是连续递增的。小美想改变相邻俩学生的座位。你能不能帮她写一个 SQL query 来输出小美想要的结果呢?

请创建如下所示seat表:

示例:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Abbot   |
|    2    | Doris   |
|    3    | Emerson |
|    4    | Green   |
|    5    | Jeames  |
+---------+---------+

假如数据输入的是上表,则输出结果如下:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Doris   |
|    2    | Abbot   |
|    3    | Green   |
|    4    | Emerson |
|    5    | Jeames  |
+---------+---------+

注意: 如果学生人数是奇数,则不需要改变最后一个同学的座位。

解答:

select
    (case mod(id,2)
         when 0 then id - 1
		 when (1 and id != (select max(id) from seat)) then id + 1
	 else id end) as id,student
from seat
order by id

运行结果:


练习三: 分数排名(难度:中等)

假设在某次期末考试中,二年级四个班的平均成绩分别是 93、93、93、91,请问可以实现几种排名结果?分别使用了什么函数?排序结果是怎样的?(只考虑降序)

+-------+-----------+
| class | score_avg |
+-------+-----------+
|    1  |       93  |
|    2  |       93  |
|    3  |       93  |
|    4  |       91  |
+-------+-----------+

解答:

select class,score_avg,
 rank () over (order by score_avg) as ranking,
 dense_rank () over (order by score_avg) as dense_ranking,
 row_number () over (order by score_avg) as row_num
from score

运行结果:

 


练习四:连续出现的数字(难度:中等)

编写一个 SQL 查询,查找所有至少连续出现三次的数字。

+----+-----+
| Id | Num |
+----+-----+
| 1  |  1  |
| 2  |  1  |
| 3  |  1  |
| 4  |  2  |
| 5  |  1  |
| 6  |  2  |
| 7  |  2  |
+----+-----+

例如,给定上面的 Logs 表, 1 是唯一连续出现至少三次的数字。

+-----------------+
| ConsecutiveNums |
+-----------------+
| 1               |
+-----------------+

解答:

select distinct l1.Num AS ConsecutiveNums
from logs as l1,
     logs as l2,
	 logs as l3 
where l1.id = l2.id - 1
and l2.id = l3.id - 1
and l1.Num = l2.Num
and l2.Num = l3.Num

运行结果:

 


练习五:树节点 (难度:中等)

对于tree表,id是树节点的标识,p_id是其父节点的id

+----+------+
| id | p_id |
+----+------+
| 1  | null |
| 2  | 1    |
| 3  | 1    |
| 4  | 2    |
| 5  | 2    |
+----+------+

每个节点都是以下三种类型中的一种:

  • Root: 如果节点是根节点。
  • Leaf: 如果节点是叶子节点。
  • Inner: 如果节点既不是根节点也不是叶子节点。

写一条查询语句打印节点id及对应的节点类型。按照节点id排序。上面例子的对应结果为:

+----+------+
| id | Type |
+----+------+
| 1  | Root |
| 2  | Inner|
| 3  | Leaf |
| 4  | Leaf |
| 5  | Leaf |
+----+------+

说明:

  • 节点’1’是根节点,因为它的父节点为NULL,有’2’和’3’两个子节点。
  • 节点’2’是内部节点,因为它的父节点是’1’,有子节点’4’和’5’。
  • 节点’3’,‘4’,'5’是叶子节点,因为它们有父节点但没有子节点。

下面是树的图形:

    1         
  /   \ 
 2    3    
/ \
4  5

注意:如果一个树只有一个节点,只需要输出根节点属性。

解答:

select id,
	case when p_id is null then 'Root' 
		 when id not in (select p_id from tree as p1 where p_id is not null) then 'Leaf'	
		 else 'Inner'		
    end as Type 
from tree

运行结果:

 


练习六:至少有五名直接下属的经理 (难度:中等)

Employee表包含所有员工及其上级的信息。每位员工都有一个Id,并且还有一个对应主管的Id(ManagerId)。

+------+----------+-----------+----------+
|Id    |Name 	  |Department |ManagerId |
+------+----------+-----------+----------+
|101   |John 	  |A 	      |null      |
|102   |Dan 	  |A 	      |101       |
|103   |James 	  |A 	      |101       |
|104   |Amy 	  |A 	      |101       |
|105   |Anne 	  |A 	      |101       |
|106   |Ron 	  |B 	      |101       |
+------+----------+-----------+----------+

针对Employee表,写一条SQL语句找出有5个下属的主管。对于上面的表,结果应输出:

+-------+
| Name  |
+-------+
| John  |
+-------+

注意:

没有人向自己汇报。

解答:

select `Name`
from employee2 as e1
join
(select managerid, COUNT(id)
from employee2
group by managerid
having COUNT(id) >= 5) as e2
on e1.id = e2.managerid

运行结果:

 


练习七:查询回答率最高的问题 (难度:中等)

求出survey_log表中回答率最高的问题,表格的字段有:uid, action, question_id, answer_id, q_num, timestamp。

uid是用户id;action的值为:“show”, “answer”, “skip”;当action是"answer"时,answer_id不为空,相反,当action是"show"和"skip"时为空(null);q_num是问题的数字序号。

写一条sql语句找出回答率最高的 question_id

举例:

输入

 uid   action  question_id   answer_id   q_numtimestamp  
5 show      285 null       1123
5 answer    2851241241124
5 show      369 null       2125
5 skip      369 null       2126

输出

question_id
285

说明

问题285的回答率为1/1,然而问题369的回答率是0/1,所以输出是285。

注意:

最高回答率的意思是:同一个问题出现的次数中回答的比例。

解答:

select question_id 
from 
(select question_id,
	    SUM(case action when 'answer' then 1 else 0 end) / SUM(case action when 'show' then 1 else 0 end) as answer_rate
from survey_log
group by question_id) as a
order by answer_rate desc
limit 1

运行结果:

 


练习八:各部门前3高工资的员工(难度:中等)

将练习一中的 employee 表清空,重新插入以下数据(也可以复制练习一中的 employee 表,再插入第5、第6行数据):

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
+----+-------+--------+--------------+

编写一个 SQL 查询,找出每个部门工资前三高的员工。例如,根据上述给定的表格,查询结果应返回:

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Randy    | 85000  |
| IT         | Joe      | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |
+------------+----------+--------+

此外,请考虑实现各部门前N高工资的员工功能。

解答:

select d.name AS 'Department', e1.`Name` as 'Employee', e1.Salary as 'Salary'
from department as d
join
(select name, departmentid, salary,
row_number () over (partition by departmentid order by salary desc) as `rank`
from employee9
order by departmentid) as e1
on d.id = e1.departmentid
where `rank` <= 3

运行结果:

 


练习九:平面上最近距离 (难度: 困难)

point_2d表包含一个平面内一些点(超过两个)的坐标值(x,y)。

写一条查询语句求出这些点中的最短距离并保留2位小数。

|x   | y  |
|----|----|
| -1 | -1 |
|  0 |  0 |
| -1 | -2 |

最短距离是1,从点(-1,-1)到点(-1,-2)。所以输出结果为:

+--------+
|shortest|
+--------+
|1.00    |
+--------+

注意:所有点的最大距离小于10000。

解答:

select min(round(pow(pow(abs(p1.x-p2.x),2)+pow(abs(p1.y-p2.y),2),0.5),2)) as shortest
from point_2d p1
join point_2d p2
on p1.x!=p2.x or p1.y!=p2.y

执行结果:

 


练习十:行程和用户(难度:困难)

Trips 表中存所有出租车的行程信息。每段行程有唯一键 Id,Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。Status 是枚举类型,枚举成员为 (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’)。

 Id  Client_Id  Driver_Id  City_Id         Status      Request_at
11101     completed      2013-10-1
22111 cancelled_by_driver2013-10-1
33126     completed      2013-10-1
44136 cancelled_by_client2013-10-1
51101     completed      2013-10-2
62116     completed      2013-10-2
73126     completed      2013-10-2
821212     completed      2013-10-3
931012     completed      2013-10-3
1041312 cancelled_by_driver2013-10-3

Users 表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。

+----------+--------+--------+
| Users_Id | Banned |  Role  |
+----------+--------+--------+
|    1     |   No   | client |
|    2     |   Yes  | client |
|    3     |   No   | client |
|    4     |   No   | client |
|    10    |   No   | driver |
|    11    |   No   | driver |
|    12    |   No   | driver |
|    13    |   No   | driver |
+----------+--------+--------+

写一段 SQL 语句查出2013年10月1日至2013年10月3日期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。

+------------+-------------------+
|     Day    | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 |       0.33        |
| 2013-10-02 |       0.00        |
| 2013-10-03 |       0.50        |
+------------+-------------------+

解答:

select t.Request_at 'DAY',
	round(sum( case when t.status like 'cancelled%' then 1 else 0 end )/ count(*), 2 ) as 'Cancellation Rate' 
from trips t
	inner join users u on u.Users_Id = t.Client_Id 
	and u.Banned = 'No' 
where t.Request_at between '2013-10-01' and '2013-10-03' 
group by t.Request_at

运行结果:


Section B

练习一:行转列

假设 A B C 三位小朋友期末考试成绩如下所示:

+-----+-----------+------|
| name|   subject |score |
+-----+-----------+------|
|  A  |  chinese  |  99  |
|  A  |  math     |  98  |
|  A  |  english  |  97  |
|  B  |  chinese  |  92  |
|  B  |  math     |  91  |
|  B  |  english  |  90  |
|  C  |  chinese  |  88  |
|  C  |  math     |  87  |
|  C  |  english  |  86  |
+-----+-----------+------|

请使用 SQL 代码将以上成绩转换为如下格式:

+-----+-----------+------|---------|
| name|   chinese | math | english |
+-----+-----------+------|---------|
|  A  |     99    |  98  |    97   |
|  B  |     92    |  91  |    90   |
|  C  |     88    |  87  |    86   |
+-----+-----------+------|---------|

解答:

select name,
       sum(case when subject = 'chinese' then score else null end) as chinese,
       sum(case when subject = 'math' then score else null end) as math,
       sum(case when subject = 'english' then score else null end) as english
from score2
group by name

运行结果:

 


练习二:列转行

假设 A B C 三位小朋友期末考试成绩如下所示:

+-----+-----------+------|---------|
| name|   chinese | math | english |
+-----+-----------+------|---------|
|  A  |     99    |  98  |    97   |
|  B  |     92    |  91  |    90   |
|  C  |     88    |  87  |    86   |
+-----+-----------+------|---------|

请使用 SQL 代码将以上成绩转换为如下格式:

+-----+-----------+------|
| name|   subject |score |
+-----+-----------+------|
|  A  |  chinese  |  99  |
|  A  |  math     |  98  |
|  A  |  english  |  97  |
|  B  |  chinese  |  92  |
|  B  |  math     |  91  |
|  B  |  english  |  90  |
|  C  |  chinese  |  88  |
|  C  |  math     |  87  |
|  C  |  english  |  86  |
+-----+-----------+------|

解答:

select name, 'chinese' as subject, chinese as score
from score22
union all
select name, 'math' as subject, math as score
from score22
union all
select name, 'english' as subject, english as score
from score22
order by name

运行结果:

 


练习三:谁是明星带货主播?

假设,某平台2021年主播带货销售额日统计数据如下:

表名 anchor_sales

+-------------+------------+---------|
| anchor_name |     date   |  sales  | 
+-------------+------------+---------|
|      A      |  20210101  |  40000  |
|      B      |  20210101  |  80000  |
|      A      |  20210102  |  10000  |
|      C      |  20210102  |  90000  |
|      A      |  20210103  |   7500  |
|      C      |  20210103  |  80000  |
+-------------+------------+---------|

定义:如果某主播的某日销售额占比达到该平台当日销售总额的 90% 及以上,则称该主播为明星主播,当天也称为明星主播日。

请使用 SQL 完成如下计算:

a. 2021年有多少个明星主播日?

b. 2021年有多少个明星主播?

解答:

a.

select sum(case when a.percent>=0.9 then 1 else null end) as '明星主播日数量'
from (select date,
      anchor_name,sales,
      sum(sales) over(partition by date order by date) total_sales,
      sales/sum(sales) over(partition by date order by date) percent
      from anchor_sales) a

运行结果:

 b.

select count(distinct anchor_name) '明星主播人数'
from (select date,
             anchor_name,
             sales,
             sum(sales) over(partition by date order by date) total_sales,
             sales/sum(sales) over(partition by date order by date) percent
      from anchor_sales) a
where a.percent>=0.9

运行结果:


 练习四:MySQL 中如何查看sql语句的执行计划?可以看到哪些信息?

解答:

执行计划就是sql的执行查询的顺序,可以在语句之前加explain来查看。

可以看id(表示查询中执行select子句或操作表的顺序)、select_type(表示多个select字句的类型)、type(表示MySQL在表中找到所需行的方式,又称“访问类型”)、key(显示MySQL在查询中实际使用的索引)、key_len(表示索引中使用的字节数)、ref(表示上述表的连接匹配条件,即哪些列或常量被用于查找索引列上的值)、rows(表示MySQL根据表统计信息及索引选用情况,估算找到所需的记录所需要读取(扫描)的行数)、filtered(表示通过查询条件获取的最终记录行数占通过type字段指明的搜索方式搜索出来的记录行数的百分比)、extra(包含不适合在其他列中显示但十分重要的额外信息)。

查看练习三中b题的结果如下:

 


练习五:解释一下 SQL 数据库中 ACID 是指什么

解答:

ACID指在数据库管理系统中事务所具有的四个特性:

原子性(Atomicity):整个事务中的所有操作,要么全部完成,要么全部不完成,不可能停滞在中间某个环节。事务在执行过程中发生错误,会被回滚到事务开始前的状态。

一致性(Consistency):在事务开始之前和事务结束以后,数据库的完整性约束没有被破坏。

隔离性(Isolation,又称独立性):直到事务结束时,其他事务(或者会话)对此事务所操作的数据都不可见。

持久性(Durability):在事务完成以后,该事务所对数据库所作的更改便持久的保存在数据库之中,并不会被回滚。


Section C

练习一:行转列

假设有如下比赛结果:

+--------------+-----------+
|    cdate     |   result  |
+--------------+-----------+
|  2021-01-01  |     胜    |
|  2021-01-01  |     胜    |
|  2021-01-01  |     负    |
|  2021-01-03  |     胜    |
|  2021-01-03  |     负    |
|  2021-01-03  |     负    |
+------------+-------------+

请使用 SQL 将比赛结果转换为如下形式:

+--------------+-----+-----|
|  比赛日期     | 胜  | 负  |
+--------------+-----------+
|  2021-01-01  |  2  |  1  |
|  2021-01-03  |  1  |  2  |
+------------+-----------+

解答:

select cdate as '比赛日期',
	   count(case when result='胜' then true else null end) as '胜',
	   count(case when result='负' then true else null end) as '负'
from row_col
group by cdate

运行结果:

 


练习二:列转行

假设有如下比赛结果:

+--------------+-----+-----|
|  比赛日期     | 胜  | 负  |
+--------------+-----------+
|  2021-01-01  |  2  |  1  |
|  2021-01-03  |  1  |  2  |
+------------+-----------+

请使用 SQL 将比赛结果转换为如下形式:

+--------------+-----------+
|    cdate     |   result  |
+--------------+-----------+
|  2021-01-01  |     胜    |
|  2021-01-01  |     胜    |
|  2021-01-01  |     负    |
|  2021-01-03  |     胜    |
|  2021-01-03  |     负    |
|  2021-01-03  |     负    |
+------------+-------------+

解答:

select cdate,result
from (select *
      from (select 比赛日期 as cdate, '胜' as result, 胜 as times
            from col_row
            union
			select 比赛日期 as cdate, '负' as result, 负 as times
            from col_row)c1
 
      union
	  select *
      from (select 比赛日期 as cdate, '胜' as result, 胜-1 as times
            from col_row
            union
            select 比赛日期 as cdate, '胜' as result, 负-1 as times
            from col_row)c2 )c3
where times>0
order by cdate

运行结果:


 

练习三:连续登录

有用户表行为记录表t_act_records表,包含两个字段:uid(用户ID),imp_date(日期)

  1. 计算2021年每个月,每个用户连续登录的最多天数
  2. 计算2021年每个月,连续2天都有登录的用户名单
  3. 计算2021年每个月,连续5天都有登录的用户数

构造表mysql如下:

DROP TABLE if EXISTS t_act_records;
CREATE TABLE t_act_records
(uid  VARCHAR(20),
imp_date DATE);

INSERT INTO t_act_records VALUES('u1001', 20210101);
INSERT INTO t_act_records VALUES('u1002', 20210101);
INSERT INTO t_act_records VALUES('u1003', 20210101);
INSERT INTO t_act_records VALUES('u1003', 20210102);
INSERT INTO t_act_records VALUES('u1004', 20210101);
INSERT INTO t_act_records VALUES('u1004', 20210102);
INSERT INTO t_act_records VALUES('u1004', 20210103);
INSERT INTO t_act_records VALUES('u1004', 20210104);
INSERT INTO t_act_records VALUES('u1004', 20210105);

解答:

1.

select distinct uid,max(maxday) over(partition by uid) as '连续登陆天数'
from(select uid,count(*) as maxday
     from (select *,date-cum as result
           from (select *,row_number() over(partition by uid order by date) as cum 
                 from (select distinct imp_date as date,uid 
					   from t_act_records )t1 )t2 )t3 
group by uid,result)t4

运行结果:

 2.

select * 
from (select distinct uid,max(maxday) over(partition by uid) as consecutive_maxday
      from (select uid,count(*) as maxday
            from (select *,date-cum as result 
                  from (select *,row_number() over(partition by uid order by date) as cum 
                        from (select distinct imp_date as date,uid 
					          from t_act_records )t1 )t2 )t3
group by uid,result)t4 )t5
where consecutive_maxday >=2

运行结果:

 3.

select count(*) as user_num
from (select distinct uid,max(maxday) over(partition by uid) as consecutive_maxday
      from (select uid,count(*) as maxday
            from (select *,date-cum as result
				  from (select *,row_number() over(partition by uid order by date) as cum 
                        from (select distinct imp_date as date,uid 
                              from t_act_records)t1 )t2 )t3
group by uid,result)t4 )t5
where consecutive_maxday >= 5

运行结果:


练习四:用户购买商品推荐

假设现在需要根据算法给每个 user_id 推荐购买商品,推荐算法比较简单,推荐和他相似的用户购买过的 product 即可,说明如下:

  • 排除用户自己购买过的商品
  • 相似用户定义:曾经购买过 2 种或 2 种以上的相同的商品

输入表:orders

+---------+------------+
| user_id | product_id |
+---------+------------+
|     123 |          1 |
|     123 |          2 |
|     123 |          3 |
|     456 |          1 |
|     456 |          2 |
|     456 |          4 |
+---------+------------+

输出表:

+---------+------------+
| user_id | product_id |
+---------+------------+
|     123 |          4 |
|     456 |          3 |
+---------+------------+

 解答:


练习五:hive 数据倾斜的产生原因及优化策略?

解答:

​​​​​​参考链接


练习六:LEFT JOIN 是否可能会出现多出的行?为什么?

假设 t1 表有6行(关联列 name 有2行为空),t2 表有6行(关联列 name 有3行为空),

那么 SELECT * FROM t1 LEFT JOIN t2 on t1.name = t2.name 会返回多少行结果?

可以参考下图

t1表

 t2表

解答:

select * 
from t1 
left join t2 
on t1.name = t2.name

 会可能出现多的行数,左表关联列为NULL的行会与右表关联列为NULL的行去关联,会产生两表NULL行数相乘的NULL行。


学习心得:

本次组队学习,学到了sql相关的很多知识,不过在应用方面依旧有点不熟练,有很多方法实现同一目标时,想不到最佳解决的代码写法;在运用一些语句时记不清写法,需要翻看以前的笔记等等……但是总的来说还是收获很多的!datawhale的任务进度安排对我来说挺合理的,学习体验很好,datawhale提供了教学内容,而且组队学习的模式督促我坚持下来一个学习周期,掌握了新知识,比我自己一个人盲目的学好很多!每天进步一点点!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值