SQL复习

最新推荐文章于 2024-04-23 21:17:10 发布

sueong

最新推荐文章于 2024-04-23 21:17:10 发布

阅读量377

点赞数

分类专栏：开发笔记文章标签： sql 数据库 mysql

本文链接：https://blog.csdn.net/sueong/article/details/124525145

版权

开发笔记专栏收录该内容

24 篇文章 1 订阅

订阅专栏

like

字符匹配 1、_ ：匹配一个字符例如：查找姓张且名字为三个字的学生信息 select * from student where name like ‘张__’

2、%：匹配0个或多个字符串例如：查找姓张的学生的信息 select * from student where name like '张%

3、[]匹配框中的任意一个字符例如查找姓李、张、宋的学生信息 select * from student where name like ‘[张李宋]%’

4、不匹配[]中的字符例如找出除姓张、唐之外的学生信息 select * from student where name like ‘[^张唐]%’

5、查询姓名中第二个字为大或小的学生信息 seletc * from student where name like ‘_[小大]%’

count avg round

要求得到『男性用户有多少人』以及『他们的平均gpa是多少』

浮点数的平均值可能小数点位数很多，按照示例保存一位小数，用round函数

select
  count(gender) as male_num,
  round(avg(gpa), 1) as avg_gpa
from user_profile where gender="male";

group by avg count

每个学校每种性别的用户数、30天内平均活跃天数和平均发帖数量

限定条件：无；
每个学校每种性别：按学校和性别分组：group by gender, university
用户数：count(device_id)
30天内平均活跃天数：avg(active_days_within_30)
平均发帖数量：avg(question_cnt)

select 
    gender,university,
    count(device_id) as user_num,
    avg(active_days_within_30) as avg_active_days,
    avg(question_cnt) as avg_question_cnt
    from user_profile
group by gender,university

avg group by having

取出平均发贴数低于5的学校或平均回帖数小于20的学校

限定条件：平均发贴数低于5或平均回帖数小于20的学校，avg(question_cnt)<5 or avg(answer_cnt)<20，聚合函数结果作为筛选条件时，不能用where，而是用having语法，配合重命名即可；

按学校输出：需要对每个学校统计其平均发贴数和平均回帖数，因此group by university

select
    university,
    avg(question_cnt) as avg_question_cnt,
    avg(answer_cnt) as avg_answer_cnt
from user_profile
group by university
having avg_question_cnt<5 or avg_answer_cnt<20

order by group by avg

不同大学的用户平均发帖情况，并按照平均发帖情况进行升序排列

问题分解：
限定条件：无；
不同大学：按学校分组group by university
平均发帖数：avg(question_cnt)
升序排序：order by avg_question_cnt

select university, avg(question_cnt) as avg_question_cnt
from user_profile
group by university
order by avg_question_cnt

inner join 升序 order by

所有来自浙江大学的用户题目回答明细情况

问题分解：
限定条件：来自浙江大学的用户，学校信息在用户画像表，答题情况在用户练习明细表，因此需要通过device_id关联两个表的数据；方法1：join两个表，用inner join，条件是on up.device_id=qpd.device_id and up.university=‘浙江大学’ 方法2：先从画像表找到浙江大学的所有学生id列表where university=‘浙江大学’，再去练习明细表筛选出id在这个列表的记录，用where in

select qpd.device_id, qpd.question_id, qpd.result
from question_practice_detail as qpd
inner join user_profile as up
on qpd.device_id = up.device_id
and up.university ='浙江大学'
order by question_id

group by/ count /distinct

题意明确：
每个学校的用户平均答题数量

问题分解：
限定条件：无；
每个学校：按学校分组，group by university
平均答题数量：在每个学校的分组内，用总答题数量除以总人数即可得到平均答题数量count(question_id) / count(distinct device_id)。
表连接：学校和答题信息在不同的表，需要做连接

select university,
    count(question_id) / count(distinct qpd.device_id) as avg_answer_cnt
from question_practice_detail as qpd
inner join user_profile as up
on qpd.device_id=up.device_id
group by university

left join/group by/count/distinct

计算每个学校用户不同难度下的用户平均答题题目数

问题分解：
限定条件：无；
每个学校：按学校分组group by university
不同难度：按难度分组group by difficult_level
平均答题数：总答题数除以总人数count(qpd.question_id) / count(distinct qpd.device_id)
来自上面信息三个表，需要联表，up与qpd用device_id连接，qd与qpd用question_id连接。

select 
    university,
    difficult_level,
    round(count(qpd.question_id) / count(distinct qpd.device_id), 4) as avg_answer_cnt
from question_practice_detail as qpd

left join user_profile as up
on up.device_id=qpd.device_id

left join question_detail as qd
on qd.question_id=qpd.question_id

group by university, difficult_level

join on xx=xx and …/group by/ count

select 
    university,
    difficult_level,
    count(qpd.question_id) / count(distinct qpd.device_id) as avg_answer_cnt
from question_practice_detail as qpd

inner join user_profile as up
on up.device_id=qpd.device_id and up.university="山东大学"

inner join question_detail as qd
on qd.question_id=qpd.question_id

group by difficult_level

union all 不去重

select 
    device_id, gender, age, gpa
from user_profile
where university='山东大学'

union all

select 
    device_id, gender, age, gpa
from user_profile
where gender='male'

在这里插入图片描述

count(*) /if/case when then end

select if(age > =25,'25岁及以上','25岁以下') as age_cut, count(*) AS number
from user_profile
group by age_cut

```sql
select 
    (case
        when age>=25 then '25岁及以上'
        else '25岁以下' end) as age_cut, 
    count(*) as number
from user_profile
group by age_cut

select device_id,gender,
(case when age<20 then "20岁以下"
 when age between 20 and 24 then "20-24岁"
 when age>24 then "25岁及以上"
 else "其他"
end) as age_cut
from user_profile

case when

题意明确：
将用户划分为20岁以下，20-24岁，25岁及以上三个年龄段，分别查看不同年龄段用户的明细情况，年龄为空输出为其他

问题分解：
限定条件：无；
划分年龄段：数值条件判断，可以用多重if，不过更方便的是用case when [expr] then [result1]…else [default] end

select device_id,gender,
(case when age<20 then "20岁以下"
 when age between 20 and 24 then "20-24岁"
 when age>24 then "25岁及以上"
 else "其他"
end) as age_cut
from user_profile

day/month/year(date)

题意明确：
2021年8月每天用户练习题目的数量

问题分解：
限定条件：2021年8月，写法有很多种，比如用year/month函数的year(date)=2021 and month(date)=8，比如用date_format函数的date_format(date, “%Y-%m”)=“202108”
每天：按天分组group by date
题目数量：count(question_id)

select day(date) as day,count(question_id) as question_cnt
from question_practice_detail
where month(date)=8 and year(date) = 2021
group by date

题意明确：
用户在某天刷题后第二天再来刷题的平均概率

问题分解：
限定条件：第二天再来。
解法1：表里的数据可以看作是全部第一天来刷题了的，那么我们需要构造出第二天来了的字段，因此可以考虑用left join把第二天来了的拼起来，限定第二天来了的可以用date_add(date1, interval 1 day)=date2筛选，并用device_id限定是同一个用户。
解法2：用lead函数将同一用户连续两天的记录拼接起来。先按用户分组partition by device_id，再按日期升序排序order by date，再两两拼接（最后一个默认和null拼接），即lead(date) over (partition by device_id order by date)
平均概率：
解法1：可以count(date1)得到左表全部的date记录数作为分母，count(date2)得到右表关联上了的date记录数作为分子，相除即可得到平均概率
解法2：检查date2和date1的日期差是不是为1，是则为1（次日留存了），否则为0（次日未留存），取avg即可得平均概率。
附：lead用法，date_add用法，datediff用法，date函数
细节问题：
表头重命名：as
去重：需要按照devece_id,date去重，因为一个人一天可能来多次
子查询必须全部有重命名

partition by/order by /窗口函数/dateadd

https://zhuanlan.zhihu.com/p/92654574

<窗口函数> over (partition by <用于分组的列名>
order by <用于排序的列名>)

select count(date2) / count(date1) as avg_ret
from (
    select
        distinct qpd.device_id,
        qpd.date as date1,
        uniq_id_date.date as date2
    from question_practice_detail as qpd
    left join(
        select distinct device_id, date
        from question_practice_detail
    ) as uniq_id_date
    on qpd.device_id=uniq_id_date.device_id
        and date_add(qpd.date, interval 1 day)=uniq_id_date.date
) as id_last_next_date

substring_index/case when

select 
(case when profile like"%,male" then "male"
 when profile like"%,female" then "female"
 end ) as gender, count(device_id)
 from user_submit 
 group by gender

select 
substring_index(profile,',',-1) as gender,count(device_id)
 from user_submit 
 group by gender

replace/substring/trim/substring_index

select
-- 替换法 replace(string, '被替换部分','替换后的结果')
-- device_id, replace(blog_url,'http:/url/','') as user_name
 
-- 截取法 substr(string, start_point, length*可选参数*)
-- device_id, substr(blog_url,11,length(blog_url)-10) as user_nam
 
-- 删除法 trim('被删除字段' from 列名)
-- device_id, trim('http:/url/' from blog_url) as user_name
 
-- 字段切割法 substring_index(string, '切割标志', 位置数（负号：从后面开始）)
device_id, substring_index(blog_url,'/',-1) as user_name
 
from user_submit;

substring_index/count

select
    substring_index(substring_index(profile, ',', 3), ',', -1) as age,
    count(device_id) as number
from user_submit
group by age

rank() over (partition by university order by gpa)/窗口函数

问题分解：
限定条件：gpa最低，看似min(gpa)，但是要留意，是每个学校里的最低，不是全局最低。min(gpa)的时候对应同学的ID丢了，直接干是拿不到最低gpa对应的同学ID的；
每个学校最低：
第一种方式是用group by把学校分组，然后计算得到每个学校最低gpa，再去找这个学校里和这个gpa相等的同学ID。注意这样如果最低gpa对应多个同学，都会输出，题目没有明确此种情况，心理明白就行。
第二种方式是利用窗口函数，先按学校分组计算排序gpa，得到最低gpa的记录在用子查询语法拿到需要的列即可。此题中rou_number可以得到排序后的位序，取位序为1即可得到最小值（升序时）。

select device_id, university, gpa
from(
    select *,
    rank() over (partition by university order by gpa) as rn
    from user_profile
)as univ_min
where rn=1
order by university

sql执行语句

.关键字的顺序是不能颠倒的：

SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY … SQL

2.SELECT语句的执行顺序（在MySQL和Oracle中，SELECT执行顺序基本相同）：

FROM > WHERE > GROUP BY > HAVING > SELECT的字段 > DISTINCT > ORDER BY > LIMIT

FROM子句组装数据（包括通过ON进行连接）；
WHERE子句进行条件筛选；
GROUP BY分组；
使用聚集函数进行计算；
HAVING筛选分组；
计算所有的表达式；
SELECT 的字段；
ORDER BY排序；
LIMIT筛选。

SELECT DISTINCT player_id, player_name, count(*) as num #顺序5
FROM player JOIN team ON player.team_id = team.team_id #顺序1
WHERE height > 1.80 #顺序2
GROUP BY player.team_id #顺序3
HAVING num > 2 #顺序4
ORDER BY num DESC #顺序6
LIMIT 2 #顺序7

sum/if/month(date)=xx

题意明确：
复旦大学的每个用户在8月份练习的总题目数和回答正确的题目数情况

问题分解：
限定条件：需要是复旦大学的（来自表user_profile.university），8月份练习情况（来自表question_practice_detail.date）
从date中取month：用month函数即可；
总题目：count(question_id)
正确的题目数：sum(if(qpd.result=‘right’, 1, 0))
按列聚合：需要输出每个用户的统计结果，因此加上group by up.device_id

细节问题：
8月份没有答题的用户输出形式：题目要求『对于在8月份没有练习过的用户，答题数结果返回0』因此明确使用left join即可，即输出up表中复旦大学的所有用户，如果8月没有练习记录，输出0就好了
老样子-表头：as语法重命名后两列就好
完整代码：

select up.device_id, university,
    count(question_id) as question_cnt,
    sum(if(qpd.result='right', 1, 0)) as right_question_cnt
from user_profile as up
 
left join question_practice_detail as qpd
  on qpd.device_id = up.device_id and month(qpd.date) = 8
 
where up.university = '复旦大学'
group by up.device_id

IF 表达式 IF( expr1 , expr2 , expr3 )
expr1 的值为 TRUE，则返回值为 expr2
expr1的值为FALSE，则返回值为 expr3

join/avg

浙江大学的用户在不同难度题目下答题的正确率情况，按照准确率升序输出

问题分解：
限定条件：浙江大学的用户；
不同难度：difficult_level（question_detail表中的列），需要分组统计，因此用到group by，

正确率：表面理解就是正确数÷总数，正确的是result=‘right’（question_practice_detail表），数目用函数count，总数是count(question_id)；
多张表联合查询：需要用到join，join有多种语法，因为条件限定需要是浙江大学的用户，所以需要是user_profile表的并且能统计出题目难度的记录，因此用user_profile表inner join另外两张表。

select difficult_level,
    avg(if(qpd.result='right', 1, 0)) as correct_rate
#    sum(if(qpd.result='right', 1, 0)) / count(qpd.question_id) as correct_rate
#    count(if(qpd.result='right', 1, null)) / count(qpd.question_id) as correct_rate
from user_profile as up
 
inner join question_practice_detail as qpd
    on up.device_id = qpd.device_id
 
inner join question_detail as qd
    on qd.question_id = qpd.question_id
 
where up.university = '浙江大学'
group by qd.difficult_level
order by correct_rate asc;

sueong

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
SQL复习

like字符匹配 1、_ ：匹配一个字符例如：查找姓张且名字为三个字的学生信息 select * from student where name like ‘张__’2、%：匹配0个或多个字符串例如：查找姓张的学生的信息 select * from student where name like '张%3、[]匹配框中的任意一个字符例如查找姓李、张、宋的学生信息 select * from student where name like ‘[张李宋]%’4、不匹配[]中的字符例如找出除姓张、
复制链接

扫一扫