task6 决胜秋招

task6 决胜秋招

Section A

练习一: 各部门工资最高的员工(难度:中等)

创建Employee 表,包含所有员工信息,每个员工有其对应的 Id, salary 和 department Id。

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
+----+-------+--------+--------------+
解答:
CREATE TABLE employee
(
id INT,
NAME VARCHAR(20),
salary INT,
departmentid INT
);

INSERT INTO employee 
VALUES
(1,"Joe",70000,1),
(2,"Henry",80000,2),
(3,"Sam",60000,2),
(4,"Max",90000,1);

SELECT * FROM employee;

在这里插入图片描述

创建Department 表,包含公司所有部门的信息。

+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+
CREATE TABLE department
(
id INT,
NAME VARCHAR(10)
);

INSERT INTO department
VALUES
(1,"IT"),
(2,"Sales");

SELECT * FROM department;

在这里插入图片描述

编写一个 SQL 查询,找出每个部门工资最高的员工。例如,根据上述给定的表格,Max 在 IT 部门有最高工资,Henry 在 Sales 部门有最高工资。

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| Sales      | Henry    | 80000  |
+------------+----------+--------+
SELECT d.name AS department,e.name AS employee,e.salary AS salary
FROM employee e
INNER JOIN department d
ON e.`departmentid`=d.`id`
WHERE salary IN (SELECT MAX(salary)
FROM employee e GROUP BY departmentid)
ORDER BY salary DESC;

在这里插入图片描述

练习二: 换座位(难度:中等)

小美是一所中学的信息科技老师,她有一张 seat 座位表,平时用来储存学生名字和与他们相对应的座位 id。

其中纵列的id是连续递增的

小美想改变相邻俩学生的座位。

你能不能帮她写一个 SQL query 来输出小美想要的结果呢?

请创建如下所示seat表:

示例:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Abbot   |
|    2    | Doris   |
|    3    | Emerson |
|    4    | Green   |
|    5    | Jeames  |
+---------+---------+

假如数据输入的是上表,则输出结果如下:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Doris   |
|    2    | Abbot   |
|    3    | Green   |
|    4    | Emerson |
|    5    | Jeames  |
+---------+---------+

**注意:**如果学生人数是奇数,则不需要改变最后一个同学的座位。

解答:

1.建表

CREATE TABLE seat
(id INT,
student VARCHAR(10)
);

INSERT INTO seat
VALUES
(1,"Abbot"),
(2,"Doris"),
(3,"Emerson"),
(4,"Green"),
(5,"Jeames");

SELECT * FROM seat;

在这里插入图片描述

2.换座位

SELECT(CASE 
       WHEN MOD(id,2)=1 AND id<(SELECT COUNT(id) FROM seat) THEN id+1
       WHEN MOD(id,2)=0 THEN id-1
       ELSE id
       END) AS id,student
FROM seat
ORDER BY id;

在这里插入图片描述

练习三: 分数排名(难度:中等)

假设在某次期末考试中,二年级四个班的平均成绩分别是 93、93、93、91,请问可以实现几种排名结果?分别使用了什么函数?排序结果是怎样的?(只考虑降序)

+-------+-----------+
| class | score_avg |
+-------+-----------+
|    1  |       93  |
|    2  |       93  |
|    3  |       93  |
|    4  |       91  |
+-------+-----------+
解答:

三种,分别使用RANK函数、DENSE_RANK函数和ROW_NUMBER函数。

CREATE TABLE rk
(class INT,
score_avg INT
);

INSERT INTO rk
VALUES(1,93),
(2,93),
(3,93),
(4,91);

SELECT *,RANK() OVER (ORDER BY score_avg DESC) AS ranking
       ,DENSE_RANK() OVER (ORDER BY score_avg DESC) AS dense_ranking
       ,ROW_NUMBER() OVER (ORDER BY score_avg DESC) AS row_num
FROM rk;

在这里插入图片描述

练习四:连续出现的数字(难度:中等)

编写一个 SQL 查询,查找所有至少连续出现三次的数字。

+----+-----+
| Id | Num |
+----+-----+
| 1  |  1  |
| 2  |  1  |
| 3  |  1  |
| 4  |  2  |
| 5  |  1  |
| 6  |  2  |
| 7  |  2  |
+----+-----+

例如,给定上面的 Logs 表, 1 是唯一连续出现至少三次的数字。

+-----------------+
| ConsecutiveNums |
+-----------------+
| 1               |
+-----------------+
解答:

1.建表

CREATE TABLE LOGS
(id INT,
num INT);

INSERT INTO LOGS
VALUES(1,1),
(2,1),
(3,1),
(4,2),
(5,1),
(6,2),
(7,2);

SELECT * FROM LOGS;

在这里插入图片描述

查询

SELECT  l1.Num AS ConsecutiveNums
FROM LOGS AS l1,LOGS AS l2,LOGS AS l3
WHERE l1.Id = l2.Id-1
  AND l2.Id = l3.Id-1
  AND l1.Num = l2.Num
  AND l2.Num = l3.Num;

在这里插入图片描述

练习五:树节点 (难度:中等)

对于tree表,id是树节点的标识,p_id是其父节点的id

+----+------+
| id | p_id |
+----+------+
| 1  | null |
| 2  | 1    |
| 3  | 1    |
| 4  | 2    |
| 5  | 2    |
+----+------+

每个节点都是以下三种类型中的一种:

  • Root: 如果节点是根节点。
  • Leaf: 如果节点是叶子节点。
  • Inner: 如果节点既不是根节点也不是叶子节点。

写一条查询语句打印节点id及对应的节点类型。按照节点id排序。上面例子的对应结果为:

+----+------+
| id | Type |
+----+------+
| 1  | Root |
| 2  | Inner|
| 3  | Leaf |
| 4  | Leaf |
| 5  | Leaf |
+----+------+

说明

  • 节点’1’是根节点,因为它的父节点为NULL,有’2’和’3’两个子节点。
  • 节点’2’是内部节点,因为它的父节点是’1’,有子节点’4’和’5’。
  • 节点’3’,‘4’,'5’是叶子节点,因为它们有父节点但没有子节点。

下面是树的图形:

    1         
  /   \ 
 2    3    
/ \
4  5

注意

如果一个树只有一个节点,只需要输出根节点属性。

解答:
CREATE TABLE tree
(id INT,
p_id INT);

INSERT INTO tree
VALUES(1,NULL),
(2,1),
(3,1),
(4,2),
(5,2);

SELECT * FROM tree;

在这里插入图片描述

查询

SELECT id,CASE 
WHEN p_id IS NULL THEN "Root"       
WHEN id IN (SELECT p_id FROM tree) THEN "Inner" 
ELSE "Leaf" 
END AS "Type"
FROM tree;

在这里插入图片描述

练习六:至少有五名直接下属的经理 (难度:中等)

Employee表包含所有员工及其上级的信息。每位员工都有一个Id,并且还有一个对应主管的Id(ManagerId)。

+------+----------+-----------+----------+
|Id    |Name 	  |Department |ManagerId |
+------+----------+-----------+----------+
|101   |John 	  |A 	      |null      |
|102   |Dan 	  |A 	      |101       |
|103   |James 	  |A 	      |101       |
|104   |Amy 	  |A 	      |101       |
|105   |Anne 	  |A 	      |101       |
|106   |Ron 	  |B 	      |101       |
+------+----------+-----------+----------+

针对Employee表,写一条SQL语句找出有5个下属的主管。对于上面的表,结果应输出:

+-------+
| Name  |
+-------+
| John  |
+-------+

注意:

没有人向自己汇报。

解答:

建表

DROP TABLE IF EXISTS employee ;
CREATE TABLE employee
(id INT,
NAME VARCHAR(10),
department VARCHAR(1),
managerid INT);

INSERT INTO employee
VALUES(101,"John","A",NULL),
(102,"Dan","A",101),
(103,"James","A",101),
(104,"Amy","A",101),
(105,"Anne","A",101),
(106,"Ron","B",101);

SELECT * FROM employee;

在这里插入图片描述

查询

SELECT 
  NAME 
FROM
  employee 
WHERE id = 
  (SELECT 
    managerid 
  FROM
    employee 
  GROUP BY managerid 
  HAVING COUNT(*) = 5) ;

在这里插入图片描述

练习七:查询回答率最高的问题 (难度:中等)

求出survey_log表中回答率最高的问题,表格的字段有:uid, action, question_id, answer_id, q_num, timestamp

uid是用户id;action的值为:“show”, “answer”, “skip”;当action是"answer"时,answer_id不为空,相反,当action是"show"和"skip"时为空(null);q_num是问题的数字序号。

写一条sql语句找出回答率(show 出现次数 / answer 出现次数)最高的 question_id

举例:

输入

uidactionquestion_idanswer_idq_numtimestamp
5show285null1123
5answer2851241241124
5show369null2125
5skip369null2126

输出

question_id
285

说明

问题285的回答率为1/1,然而问题369的回答率是0/1,所以输出是285。

注意:

最高回答率的意思是:同一个问题出现的次数中回答的比例。

解答:

建表

CREATE TABLE survey_log
(uid INT
,ACTION VARCHAR(10)
,question_id INT
,answer_id INT
,q_num INT
,TIMESTAMP INT);

INSERT INTO survey_log VALUES(5,'show',285,NULL,1,123),
(5,'answer',285,124124,1,124),
(5,'show',369,NULL,2,125),
(5,'skip',369,NULL,2,126);

SELECT * FROM survey_log;

在这里插入图片描述

查询

SELECT 
  a.question_id AS question_id
FROM
  (SELECT 
    question_id,
    SUM(
      CASE
        WHEN ACTION = 'answer' 
        THEN 1 
        ELSE 0 
      END
    ) /    SUM(
      CASE
        WHEN ACTION = 'show' 
        THEN 1 
        ELSE 0 
      END
    ) AS rate 
  FROM
    survey_log 
  GROUP BY question_id 
  ORDER BY rate DESC 
  LIMIT 1) AS a ;

在这里插入图片描述

练习八:各部门前3高工资的员工(难度:中等)

将练习一中的 employee 表清空,重新插入以下数据(也可以复制练习一中的 employee 表,再插入第5、第6行数据):

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
+----+-------+--------+--------------+

编写一个 SQL 查询,找出每个部门工资前三高的员工。例如,根据上述给定的表格,查询结果应返回:

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Randy    | 85000  |
| IT         | Joe      | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |
+------------+----------+--------+

此外,请考虑实现各部门前N高工资的员工功能。

解答:

建表

DROP TABLE IF EXISTS employee;

CREATE TABLE employee
(
id INT,
NAME VARCHAR(20),
salary INT,
departmentid INT
);

INSERT INTO employee 
VALUES
(1,"Joe",70000,1),
(2,"Henry",80000,2),
(3,"Sam",60000,2),
(4,"Max",90000,1),
(5,"Janet",69000,1),
(6,"Randy",85000,1);

SELECT * FROM employee;

在这里插入图片描述

查询

(SELECT d.name AS NAME,e.name AS employee,e.salary AS salary
FROM employee e
INNER JOIN department d ON e.`departmentid`=d.`id`
WHERE e.departmentid=1
ORDER BY e.salary DESC
LIMIT 3)
UNION
(SELECT d.name AS NAME,e.name AS employee,e.salary AS salary
FROM employee e
INNER JOIN department d ON e.`departmentid`=d.`id`
WHERE e.departmentid=2
ORDER BY e.salary DESC
LIMIT 3);

在这里插入图片描述

练习九:平面上最近距离 (难度: 困难)

point_2d表包含一个平面内一些点(超过两个)的坐标值(x,y)。

写一条查询语句求出这些点中的最短距离并保留2位小数。

|x   | y  |
|----|----|
| -1 | -1 |
|  0 |  0 |
| -1 | -2 |

最短距离是1,从点(-1,-1)到点(-1,-2)。所以输出结果为:

| shortest |

1.00

+--------+
|shortest|
+--------+
|1.00    |
+--------+

**注意:**所有点的最大距离小于10000。

解答:

建表

CREATE TABLE point_2d
(X INT
,Y INT);

INSERT INTO point_2d VALUES(-1,-1),(0,0),(-1,-2);

SELECT * FROM point_2d;

在这里插入图片描述

查询

SELECT POW(d.dist,0.5) AS distance FROM
(SELECT POW(p1.x-p2.x,2)+POW(p1.y-p2.y,2) AS dist
FROM point_2d p1
INNER JOIN point_2d AS p2 ON (p1.x <> p2.x OR p1.y <> p2.y)
) AS d 
ORDER BY distance ASC
LIMIT 1;

在这里插入图片描述

练习十:行程和用户(难度:困难)

Trips 表中存所有出租车的行程信息。每段行程有唯一键 Id,Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。Status 是枚举类型,枚举成员为 (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’)。

IdClient_IdDriver_IdCity_IdStatusRequest_at
11101completed2013-10-1
22111cancelled_by_driver2013-10-1
33126completed2013-10-1
44136cancelled_by_client2013-10-1
51101completed2013-10-2
62116completed2013-10-2
73126completed2013-10-2
821212completed2013-10-3
931012completed2013-10-3
1041312cancelled_by_driver2013-10-3

Users 表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。

+----------+--------+--------+
| Users_Id | Banned |  Role  |
+----------+--------+--------+
|    1     |   No   | client |
|    2     |   Yes  | client |
|    3     |   No   | client |
|    4     |   No   | client |
|    10    |   No   | driver |
|    11    |   No   | driver |
|    12    |   No   | driver |
|    13    |   No   | driver |
+----------+--------+--------+

写一段 SQL 语句查出2013年10月1日2013年10月3日期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。

+------------+-------------------+
|     Day    | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 |       0.33        |
| 2013-10-02 |       0.00        |
| 2013-10-03 |       0.50        |
+------------+-------------------+
解答:

建trips表

CREATE TABLE trips
(id INT
,clien_Id INT
,driver_Id INT
,city_Id INT
,STATUS ENUM("completed","cancelled_by_driver","cancelled_by_client")
,request_at DATE 
,PRIMARY KEY(id));

INSERT INTO trips 
VALUES(1,1,10,1,1,'2013-10-1'),
(2,2,11,1,2,'2013-10-1'),
(3,3,12,6,1,'2013-10-1'),
(4,4,13,6,3,'2013-10-1'),
(5,1,10,1,1,'2013-10-2'),
(6,2,11,6,1,'2013-10-2'),
(7,3,12,6,1,'2013-10-2'),
(8,2,12,12,1,'2013-10-3'),
(9,3,10,12,1,'2013-10-3'),
(10,4,13,12,2,'2013-10-3');

SELECT * FROM trips;

在这里插入图片描述

建users表

CREATE TABLE users
(
users_id INT,
banned ENUM("Yes","No"),
role ENUM("client","driver")

);

INSERT INTO users VALUES(1,2,1),
(2,1,1),
(3,2,1),
(4,2,1),
(10,2,2),
(11,2,2),
(12,2,2),
(13,2,2);

SELECT * FROM users;

在这里插入图片描述

查询

SELECT tb.day AS DAY,ROUND(SUM(CASE tb.status WHEN "completed" THEN 0 ELSE 1 END)/COUNT(tb.status),2) AS "Cancellation Rate"
FROM (SELECT STATUS,request_at AS DAY FROM trips AS t
INNER JOIN users AS u ON t.clien_id=u.users_id
WHERE u.`banned`="No" )AS tb
 GROUP BY tb.day; 

在这里插入图片描述

Section B

练习一:行转列

假设 A B C 三位小朋友期末考试成绩如下所示:

+-----+-----------+------|
| name|   subject |score |
+-----+-----------+------|
|  A  |  chinese  |  99  |
|  A  |  math     |  98  |
|  A  |  english  |  97  |
|  B  |  chinese  |  92  |
|  B  |  math     |  91  |
|  B  |  english  |  90  |
|  C  |  chinese  |  88  |
|  C  |  math     |  87  |
|  C  |  english  |  86  |
+-----+-----------+------|

请使用 SQL 代码将以上成绩转换为如下格式:

+-----+-----------+------|---------|
| name|   chinese | math | english |
+-----+-----------+------|---------|
|  A  |     99    |  98  |    97   |
|  B  |     92    |  91  |    90   |
|  C  |     88    |  87  |    86   |
+-----+-----------+------|---------|
解答:

建表

 CREATE TABLE grade0
 (NAME ENUM("A","B","C"),
SUBJECT ENUM("chinese","math","english"),
score INT 
 );
 
 INSERT INTO grade0
 VALUES(1,1,99),
 (1,2,98),
 (1,3,97),
 (2,1,92),
 (2,2,91),
 (2,3,90),
 (3,1,88),
 (3,2,87),
 (3,3,86);
 
 SELECT * FROM grade0;

在这里插入图片描述

查询

SELECT NAME,
SUM(CASE SUBJECT WHEN  'chinese' THEN score ELSE 0 END) AS chinese,
SUM(CASE SUBJECT WHEN  'math' THEN score ELSE 0 END) AS math,
SUM(CASE SUBJECT WHEN  'english' THEN score ELSE 0 END) AS english
FROM grade0 GROUP BY NAME;

在这里插入图片描述

练习二:列转行

假设 A B C 三位小朋友期末考试成绩如下所示:

+-----+-----------+------|---------|
| name|   chinese | math | english |
+-----+-----------+------|---------|
|  A  |     99    |  98  |    97   |
|  B  |     92    |  91  |    90   |
|  C  |     88    |  87  |    86   |
+-----+-----------+------|---------|

请使用 SQL 代码将以上成绩转换为如下格式:

+-----+-----------+------|
| name|   subject |score |
+-----+-----------+------|
|  A  |  chinese  |  99  |
|  A  |  math     |  98  |
|  A  |  english  |  97  |
|  B  |  chinese  |  92  |
|  B  |  math     |  91  |
|  B  |  english  |  90  |
|  C  |  chinese  |  88  |
|  C  |  math     |  87  |
|  C  |  english  |  86  |
+-----+-----------+------|
解答:
 CREATE TABLE grade
 (NAME VARCHAR(1),
 chinese INT,
 math INT,
 english INT
 
 );
 
 INSERT INTO grade
 VALUES("A",99,98,97),
 ("B",92,91,90),
 ("C",88,87,86);
 
 SELECT * FROM grade;

在这里插入图片描述

查询

 SELECT NAME,
 'chinese' AS SUBJECT,
chinese AS score
 FROM grade 
 UNION ALL
  SELECT NAME,
 'math' AS SUBJECT,
 math AS score
 FROM grade
 UNION ALL
  SELECT NAME,
 'english' AS SUBJECT,
 english AS score
 FROM grade 
 ORDER BY NAME ASC;

在这里插入图片描述

练习三:谁是明星带货主播?

假设,某平台2021年主播带货销售额日统计数据如下:

表名 anchor_sales

+-------------+------------+---------|
| anchor_name |     date   |  sales  | 
+-------------+------------+---------|
|      A      |  20210101  |  40000  |
|      B      |  20210101  |  80000  |
|      A      |  20210102  |  10000  |
|      C      |  20210102  |  90000  |
|      A      |  20210103  |   7500  |
|      C      |  20210103  |  80000  |
+-------------+------------+---------|

定义:如果某主播的某日销售额占比达到该平台当日销售总额的 90% 及以上,则称该主播为明星主播,当天也称为明星主播日。

请使用 SQL 完成如下计算:

a. 2021年有多少个明星主播日?

b. 2021年有多少个明星主播?

解答:

建表

CREATE TABLE anchor_sales
(anchor_name ENUM("A","B","C"),
DATE ENUM("20210101","20210102","20210103"),
sales INT
);

INSERT INTO anchor_sales
VALUES(1,1,40000),
(2,1,80000),
(1,2,10000),
(3,2,90000),
(1,3,7500),
(3,3,80000);

SELECT * FROM anchor_sales;

在这里插入图片描述

a. 2021年有多少个明星主播日?

SELECT 
  COUNT(*) 
FROM
  (SELECT 
    DATE 
  FROM
    anchor_sales 
  GROUP BY DATE 
  HAVING MAX(sales) / SUM(sales) >= 0.9) AS tb ;

在这里插入图片描述

b. 2021年有多少个明星主播?

SELECT COUNT(*) FROM (
SELECT DISTINCT anchor_name FROM anchor_sales WHERE sales IN (  
SELECT 
    MAX(sales)
  FROM
    anchor_sales 
  GROUP BY DATE 
  HAVING MAX(sales) / SUM(sales) >= 0.9)) AS tb;  

在这里插入图片描述

练习四:MySQL 中如何查看sql语句的执行计划?可以看到哪些信息?

解答:
 EXPLAIN SELECT * FROM anchor_sales;

在这里插入图片描述

id,select_type,表名,行数等等。

练习五:解释一下 SQL 数据库中 ACID 是指什么

解答:
  1. 原子性(atomicity)
  2. 一致性(consistency)
  3. 隔离性(isolation)
  4. 持久性(durability)

Section C

练习一:行转列

假设有如下比赛结果:

+--------------+-----------+
|    cdate     |   result  |
+--------------+-----------+
|  2021-01-01  |     胜    |
|  2021-01-01  |     胜    |
|  2021-01-01  |     负    |
|  2021-01-03  |     胜    |
|  2021-01-03  |     负    |
|  2021-01-03  |     负    |
+------------+-------------+

请使用 SQL 将比赛结果转换为如下形式:

+--------------+-----+-----|
|  比赛日期     | 胜  | 负  |
+--------------+-----------+
|  2021-01-01  |  2  |  1  |
|  2021-01-03  |  1  |  2  |
+------------+-----------+
解答:

建表

 CREATE TABLE race
 (
 cdate ENUM("2021-01-01","2021-01-03"),
 result ENUM("胜","负")
 );

INSERT INTO race
VALUES(1,"胜"),
(1,"胜"),
(1,"负"),
(2,"胜"),
(2,"负"),
(2,"负");

SELECT * FROM race;

在这里插入图片描述

查询

SELECT cdate AS "比赛日期",
SUM(CASE result WHEN "胜" THEN 1 ELSE 0 END) AS "胜",
SUM(CASE result WHEN "负" THEN 1 ELSE 0 END) AS "负"
FROM race
GROUP BY cdate;

在这里插入图片描述

练习二:列转行

假设有如下比赛结果:

+--------------+-----+-----|
|  比赛日期     | 胜  | 负  |
+--------------+-----------+
|  2021-01-01  |  2  |  1  |
|  2021-01-03  |  1  |  2  |
+------------+-----------+

请使用 SQL 将比赛结果转换为如下形式:

+--------------+-----------+
|    cdate     |   result  |
+--------------+-----------+
|  2021-01-01  |     胜    |
|  2021-01-01  |     胜    |
|  2021-01-01  |     负    |
|  2021-01-03  |     胜    |
|  2021-01-03  |     负    |
|  2021-01-03  |     负    |
+------------+-------------+
解答:

建表

CREATE TABLE race1
(
比赛日期 DATE,
胜 INT,
负 INT
);

INSERT INTO race1
VALUES("2021-01-01",2,1),
("2021-01-03",1,2);

SELECT * FROM race1;

在这里插入图片描述

查询

没有最终做出来,没有思路,不知道怎么把result给展开。

SELECT 比赛日期 AS cdate,胜 AS result FROM race1
UNION ALL
SELECT 比赛日期 AS cdate,负 AS result FROM race1
ORDER BY cdate;

在这里插入图片描述

练习三:连续登录

有用户表行为记录表t_act_records表,包含两个字段:uid(用户ID),imp_date(日期)

  1. 计算2021年每个月,每个用户连续登录的最多天数
  2. 计算2021年每个月,连续2天都有登录的用户名单
  3. 计算2021年每个月,连续5天都有登录的用户数

构造表mysql如下:

DROP TABLE if EXISTS t_act_records;
CREATE TABLE t_act_records
(uid  VARCHAR(20),
imp_date DATE);

INSERT INTO t_act_records VALUES('u1001', 20210101);
INSERT INTO t_act_records VALUES('u1002', 20210101);
INSERT INTO t_act_records VALUES('u1003', 20210101);
INSERT INTO t_act_records VALUES('u1003', 20210102);
INSERT INTO t_act_records VALUES('u1004', 20210101);
INSERT INTO t_act_records VALUES('u1004', 20210102);
INSERT INTO t_act_records VALUES('u1004', 20210103);
INSERT INTO t_act_records VALUES('u1004', 20210104);
INSERT INTO t_act_records VALUES('u1004', 20210105);
解答:

建表如下:

在这里插入图片描述

1.计算2021年每个月,每个用户连续登录的最多天数

SELECT 
COUNT(*) AS '连续天数'
FROM (SELECT uid,imp_date,
DATEDIFF(imp_date,'2020-01-01')-rank()over(PARTITION BY uid ORDER BY imp_date) AS ranking
FROM t_act_records) AS r
GROUP BY uid,MONTH(imp_date),r.ranking
ORDER BY 连续天数 DESC
LIMIT 1;

在这里插入图片描述

2.计算2021年每个月,连续2天都有登录的用户名单

SELECT 
uid
FROM (SELECT uid,imp_date,
DATEDIFF(imp_date,'2020-01-01')-rank()over(PARTITION BY uid ORDER BY imp_date) AS ranking
FROM t_act_records) AS r
GROUP BY uid,MONTH(imp_date),r.ranking
HAVING COUNT(*)=2;

在这里插入图片描述

3.计算2021年每个月,连续5天都有登录的用户数

SELECT COUNT(*) FROM
(SELECT 
uid
FROM (SELECT uid,imp_date,
DATEDIFF(imp_date,'2020-01-01')-rank()over(PARTITION BY uid ORDER BY imp_date) AS ranking
FROM t_act_records) AS r
GROUP BY uid,MONTH(imp_date),r.ranking
HAVING COUNT(*)=5) AS tb;

在这里插入图片描述

练习四:用户购买商品推荐

假设现在需要根据算法给每个 user_id 推荐购买商品,推荐算法比较简单,推荐和他相似的用户购买过的 product 即可,说明如下:

  • 排除用户自己购买过的商品
  • 相似用户定义:曾经购买过 2 种或 2 种以上的相同的商品

输入表:orders

+---------+------------+
| user_id | product_id |
+---------+------------+
|     123 |          1 |
|     123 |          2 |
|     123 |          3 |
|     456 |          1 |
|     456 |          2 |
|     456 |          4 |
+---------+------------+

输出表:

+---------+------------+
| user_id | product_id |
+---------+------------+
|     123 |          4 |
|     456 |          3 |
+---------+------------+
解答:

建表

CREATE TABLE orders
(
user_id INT,
product_id INT
);

INSERT INTO orders
VALUES(123,1),
(123,2),
(123,3),
(456,1),
(456,2),
(456,4);

SELECT * FROM orders;

在这里插入图片描述

查询

SELECT DISTINCT 
  o1.user_id,
  (SELECT 
    product_id 
  FROM
    orders 
  WHERE product_id IN 
    (SELECT 
      product_id 
    FROM
      orders 
    WHERE user_id <> o1.user_id) 
    AND product_id NOT IN 
    (SELECT 
      product_id 
    FROM
      orders 
    WHERE user_id = o1.user_id)) AS product_id 
FROM
  orders o1 
WHERE user_id IN 
  (SELECT 
    user_id 
  FROM
    orders 
  GROUP BY user_id 
  HAVING COUNT(DISTINCT product_id) >= 2) ;

在这里插入图片描述

练习五:hive 数据倾斜的产生原因及优化策略?

解答:

产生原因:

1)、key分布不均匀

2)、业务数据本身的特性

3)、建表时考虑不周

4)、某些SQL语句本身就有数据倾斜

优化策略:

参数调节,sql语句调节。

练习六:LEFT JOIN 是否可能会出现多出的行?为什么?

假设 t1 表有6行(关联列 name 有2行为空),t2 表有6行(关联列 name 有3行为空),

那么 SELECT * FROM t1 LEFT JOIN t2 on t1.name = t2.name 会返回多少行结果?

可以参考下图

t1 表:在这里插入图片描述
t2 表:在这里插入图片描述

解答:

不会,只有6行,因为left join主表为from后面的表,即t1,而t1为6行。

写一写本次学习的收获。

感谢本课程有帮助的话,可以帮忙 star ~~

这次课程从8.15到8.30,强度有点大,我也从不知道mysql和sql的关系,到比较熟练一些基本的建表,插入数据,查询表等操作。我最想说的是感谢,感谢datawhale开源组织,感谢sql航海开辟者和领航员,也感谢共同学习的伙伴。这种开源精神在当今世界是可贵的,我也想后续继续学习datawhale的其他课程。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值