十个精妙绝伦的SQL语句,说尽SQL精华

本文精选了十个典型的SQL问题及解决方案,涵盖了统计分析、数据清洗、连续行为分析等多个方面,旨在帮助读者掌握SQL处理复杂场景的能力。

引子

哪些是程序员的通用能力?算法、正则表达式和SQL。这三样,是程序员的基本功,就跟数学公式一样,它不涉及智商,但关乎你的学习态度,牵扯到程序员的面子。面试官考这些时,如果连这个都不会,就会遭到鄙视。有次有个应聘者质疑我,说术业有专攻,你不该考这些问题,说我这是在问茴字的四种写法。

十大SQL

1. 统计班级总分前十名

表结构stu_score:(student_id, course_id, score)

select student_id, sum(score) as s from stu_score group by student_id order by s desc limit 10

这道题比较基本,考察聚合函数用法。下面两个进阶一点:
要求输出课程号和选修人数,查询结果按人数降序排序,若人数相同,按课程号升序排序

select 课程号, count(学号) as 选修人数
from score
group by 课程号
having count(学号)>=2
order by 选修人数 desc, 课程号 asc;

查询没有学全所有课的学生的学号、姓名

select 学号,姓名
from student
where 学号 in
(select 学号
from score
group by 学号
having count(课程号) < (select count(课程号) from course));

2. 删除重复记录, 且保留一条

表结构: (book_id, book_name)
从书籍列表里,删除书名重复的记录,保留book_id最小的记录:

delete from ebook
where book_name in (select book_name from ebook group by book_name having count(*) > 1)
  and book_id not in (select min(book_id) from ebook group by book_name having count(*)>1);

这个考察了子查询和min()函数以及having子句的使用。很多数据库都支持这种子查询。注意,上述SQL在MySQL中执行会报错:

[HY000][1093] You can't specify target table 'ebook' for update in FROM clause

这是因为MySQL不允许你在做子查询时去修改表。trick的办法是创建临时表:

delete from ebook
where book_name in (select t1.book_name from (select book_name 
         from ebook group by book_name having count(*) > 1) as t1)
     and book_id not in (select t2.id from (select min(book_id) as id 
         from ebook group by book_name having count(*)>1) as t2);

上面创建了两张临时表t1和t2。这样MySQL就可以执行了。

3. 最大连续登陆天数的问题

题目: 找出连续7天登陆,连续30天登陆的用户。
考察点:窗口函数

select *
fromselect user_id ,count(1) as num
  from
     (select user_id,date_sub(log_in_date, rank) dts
          from  (select user_id,log_in_date,	
                  row_number() over(partitioned by user_id order by log_in_date ) as rank
		  from user_log
	          )t
      )a
  group by dts
)b
where num = 7  

4. 计算除去部门最高工资,和最低工资的平均工资

emp 表:(id 员工 id ,deptno 部门编号,salary 工资)

核心是使用窗口函数降序和升序分别排一遍就取出了最高和最低。

select a.deptno,avg(a.salary)
from  
 (
 select *, rank() over( partition by deptno order by salary ) as rank_1
 , rank() over( partition by deptno order by salary desc) as rank_2 
 from emp
 )  a 
group by a.deptno
where a.rank_1 >1 and a.rank_2 >1 

5. 计算占比和同比增长

t_user记录了用户注册时间和平台,统计2018年1月份
每天各平台(“ios”,“android”,“h5”)注册用户总量占所有平台总用户的比例,以及各平台注册用户按周同比增长(与一周前相比)的比例

建表语句

create table t_user
(
uid BIGINT COMMENT "用户id"
, reg_time STRING COMMENT "注册时间,如2018-07-01 08:11:39"
, platform STRING COMMENT "注册平台,包括app ios h5"
);

解答:
知识点:窗口函数。
注意:如果存在某天的缺失数据,偏移函数会有错误

SELECT a.reg_date
,a.platform
,ROUND(a.reg_num/sum(a.reg_num)over(PARTITION BY a.reg_date),4) as rate
,ROUND((a.reg_num-a.reg_num_7)/a.reg_num_7,4) as rate_week
FROM(
SELECT 
DATE(reg_time) as reg_date
,platform
,COUNT(uid) as reg_num
,lag(COUNT(uid),7)over(PARTITION BY platform ORDER BY DATE(reg_time)) as reg_num_7
FROM t_user
WHERE SUBSTR(reg_time,1,7)='2018-01'
GROUP BY DATE(reg_time),platform
) a ;

6. 算成绩

表名:subject_scores
输入
Name subject score
王建国 数学 95
王建国 语文 89
李雪琴 数学 100
李雪琴 语文 100
李雪琴 英语 100

输出
Name math chinese English
王建国 95 89 0
李雪琴 100 100 100

解答:
所涉知识点:GROUP BY 和 CASE WHEN 实现行变列
注意:(1)空的数据这里判断为0;(2)CASE WHEN 前要使用聚合函数,不然报错)

SELECT 
name,
MAX(CASE subject WHEN '数学' THEN score ELSE 0 END) as math,
MAX(CASE subject WHEN '语文' THEN score ELSE 0 END) as chinese,
MAX(CASE subject WHEN '英语' THEN score ELSE 0 END) as English
FROM subject_scores 
GROUP BY name;

7.算昨天每个城市top 10消费金额的用户,输出city_id,city_name,uid, 消费总金额

表名:orders
每次消费记录一条
city_id,city_name,uid,order_id,amount,pay_order_time, pay_date
解答:(窗口函数)

SELECT a.city_id,
a.city_name,
a.uid,
a.pay_amount as '消费总金额'
FROM 
(
SELECT city_id,city_name,uid,SUM(amount) as pay_amount,RANK()over(PARTITION BY city_id ORDER BY SUM(amount) DESC) as rank_no
FROM orders
WHERE pay_date='2020-01-01'
GROUP BY city_id,city_name,uid 
) a 
WHERE a.rank_no<=10;

8. 求连续点击三次的用户数,中间不能有别人的点击 ,最大连续天数的变形问题

总结:相邻问题的本质就是基于研究对象(比如用户、会员、员工等),利用窗口函数对时间字段进行有差别的排序,然后基于研究对象和新增的{排序差值列},进行分组计数的求连续点击、签到、复购等业务问题的计算;

'''
a表记录了点击的流水信息,包括用户id ,和点击时间
usr_id a a b a a a a
click_time t1 t2 t3 t4 t5 t6 t7

'''
--  方式一:      
use demo;       
WITH t1 AS (
		SELECT MemberID AS user_id, STime AS click_time
		FROM OrderList
		WHERE MemberID IS NOT NULL
		/*选取demo.OrderList 作为底表测试数据*/
			AND DATE_FORMAT(STime, '%Y-%m') = '2017-02'
	), 
	t2 AS (
		SELECT  *
				, row_number() OVER (ORDER BY click_time) AS rank1
				, row_number() OVER (PARTITION BY user_id ORDER BY click_time) AS rank2
		FROM t1
	), 
	t3 AS (
		SELECT *, rank1 - rank2 AS diff
		FROM t2
	), 
	t4 AS (
		SELECT DISTINCT user_id
		FROM t3
		GROUP BY user_id, diff
		HAVING COUNT(1) > 3
	)
-- SELECT  * from t4 ;
SELECT *
FROM t3
WHERE user_id IN (
	SELECT user_id
	FROM t4
)
ORDER BY user_id, diff, click_time;       
        
--  方式二:        
        
SELECT DISTINCT user_id
FROM (
	SELECT *, rank_1 - rank_2 AS diff
	FROM (
		SELECT  *
		       ,row_number() OVER (ORDER BY click_time) AS rank_1
		       ,row_number() OVER (PARTITION BY user_id ORDER BY click_time) AS rank_2
		FROM (
			SELECT MemberID AS user_id, STime AS click_time
			FROM OrderList
			WHERE MemberID IS NOT NULL 
				  /*选取demo.OrderList 作为底表测试数据*/
				  AND DATE_FORMAT(STime, '%Y-%m') = '2017-02'
		) a
	) b
) c
GROUP BY diff, user_id
HAVING COUNT(1) > 3; 

9. AB球队得分流水表,得到连续三次得分的队员名字 和每次赶超对手的球员名字

表结构:

create table bktab (
   team  string comment '球队名称',
   number int comment '球员号码',
   score_time string comment '得分时间',
   score int comment '得分分数',
   name string comment '球员姓名')
comment 'AB球队得分流水表'
row format delimited fields terminated by '\t'
lines terminated by '\n' stored as orc;

分析思路:
1.按score_time 对全局排序
2.获取当前行 A队累计得分 B队累计得分
3.获取 当前 A队累计得分 与 B队累计得分的差值
4.当前行差值 与上一行差值,发生符合变化时,表示 分数发生了反超



-- 查询sql
select
    team
    ,number
    ,score_time
    ,score
    ,name
    ,ateam_score
    ,bteam_score
from (

    select
        team
        ,number
        ,score_time
        ,score
        ,name
        ,ateam_score
        ,bteam_score
        ,diff_score
        ,lag(diff_score) over (order by score_time asc) as pre_diff_score
        ,case when diff_score > 0 and lag(diff_score) over (order by score_time asc) < 0 then 1
              when diff_score < 0 and lag(diff_score) over (order by score_time asc) > 0 then 1
              when diff_score is not null and lag(diff_score) over (order by score_time asc) is null then 1
              else 0
         end as if_surpass

    from (
        select
        team
        ,number
        ,score_time
        ,score
        ,name
        ,sum(if(team = 'A',score,0)) over (order by score_time asc) as ateam_score
        ,sum(if(team = 'B',score,0)) over (order by score_time asc) as bteam_score
        ,sum(if(team = 'A',score,0)) over (order by score_time asc) - sum(if(team = 'B',score,0)) over (order by score_time asc) as diff_score
        from bktab
    ) t1
) t2
where if_surpass = 1
;

10. 举例说明内连接、外连接、左连接、右连接的区别

下图展示了 LEFT JOIN、RIGHT JOIN、INNER JOIN、OUTER JOIN 相关的 7 种用法:
在这里插入图片描述
查询所有课程成绩小于60分学生的学号、姓名

SELECT A.学号,B.姓名 FROM score A LEFT JOIN student B ON A.学号 = B.学号 GROUP BY A.学号 HAVING MAX(成绩) < 60;
SELECT customer.last_name, city.name
FROM customer
INNER JOIN city
  ON customer.id = city.customer_id;
SELECT c.last_name AS lname, t.name AS city
FROM customer AS c
INNER JOIN city AS t
  ON c.id = t.customer_id;
SELECT last_name FROM customer
INTERSECT
SELECT last_name FROM employee;

You can join tables using JOIN, including INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and CROSS JOIN (please see the courses listed at the end of this article for more information). In this example, we want to join data from the tables customer and city. INNER JOIN needs to come after FROM and the name of the first table, customer. After INNER JOIN, place the name of the second table, city. The records with data from both tables are matched by ON with the condition to join. The records in the table city are matched to the records from the table customer if they have the same value in the column id in the table customer and in the column customer_id in the table city.

SQL语法图解

在这里插入图片描述

在这里插入图片描述

其它拾遗

lastIndexOf

Mysql没有直接提供这么一个函数,可以这么实现:

假定t_article有字段url,形如https://zhuanlan.zhihu.com/p/662034389,后面是文章的ID,要获得ID

SELECT substr(url, LENGTH(url) - LOCATE('/', REVERSE(url)) + 2) from t_article;

增补表的字段

我有两张数据库表,一张是股票信息,另一张是用户的自选股信息,自选股信息里的stock_id和股票信息里的id对应,现在我在自选股里加了个字段,其值对应股票信息里的code字段,怎么用一条sql语句更新这个字段?

UPDATE user_stocks us
JOIN stock_tech si ON us.code = si.code
SET us.market_code = si.market_code where us.market_code is null;

窗口函数

窗口函数的基本语法如下:

<窗口函数> over (partition by <用于分组的列名>
                order by <用于排序的列名>)

窗口函数包括:
1) 专用窗口函数,如rank, dense_rank, row_number等专用窗口函数
2) 聚合函数,如sum. avg, count, max, min等

因为窗口函数是对where或者group by子句处理后的结果进行操作,所以窗口函数原则上只能写在select子句中。

参考

  • https://zhuanlan.zhihu.com/p/92654574
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

北极象

如果觉得对您有帮助,鼓励一下

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值