前言
提示:这里可以添加本文要记录的大概内容:
本文章记录了自己撰写牛客网中的SQL题目答案以及不会的地方,以供后续复习使用,太简单的就不写思考过程了,一些不会写的会用黄色标记。
并且不能借助datagrip
软件,纯手写。
提示:以下是本篇文章正文内容,下面案例可供参考
一、非技术快速入门篇
1.总结
本文的刷题地址为:牛客网MySQL地址-非技术快速入门篇;
2.SQL语句
- 很简单的,easy
-- SQL1 查询所有列
select * from user_profile;
-- SQL2 查询多列
select device_id, gender, age, university from user_profile;
-- SQL3 查询结果去重
select distinct(university) from user_profile;
-- SQL4 查询结果限制返回行数
select device_id from user_profile limit 0, 2;
-- SQL5 将查询后的列重新命名
select device_id as 'user_infos_example' from user_profile limit 0, 2;
-- SQL6 查找学校是北大的学生信息
select device_id,university from user_profile where university = '北京大学';
-- SQL7 查找年龄大于24岁的用户信息
select device_id, gender, age, university from user_profile where age > 24;
-- SQL8 查找某个年龄段的用户信息
select device_id, gender, age from user_profile where age >= 20 and age <= 23;
select device_id, gender, age from user_profile where age between 20 and 23;
-- SQL9 查找除复旦大学的用户信息
select device_id, gender, age, university from user_profile where university != '复旦大学';
-- SQL10 用where过滤空值练习
select device_id, gender, age, university from user_profile where age is not null;
-- SQL11 高级操作符练习(1)
select device_id, gender, age, university, gpa from user_profile
where gender = 'male' and gpa > 3.5;
-- SQL12 高级操作符练习(2)
select device_id, gender, age, university, gpa from user_profile
where university = '北京大学' or gpa > 3.7;
-- SQL13 Where in 和Not in
select device_id, gender, age, university, gpa from user_profile
where university in ('北京大学','复旦大学','山东大学');
-- SQL14 操作符混合运用
select device_id, gender, age, university, gpa from user_profile
where university = '山东大学' and gpa > 3.5 or (university = '复旦大学' and gpa > 3.8);
-- SQL15 查看学校名称中含北京的用户
select device_id, age, university from user_profile where university like '北京%';
-- SQL16 查找GPA最高值
select max(gpa) from user_profile where university = '复旦大学';
-- SQL17 计算男生人数以及平均GPA
select count(id) as 'male_num', avg(gpa) as 'avg_gpa' from user_profile where gender = 'male';
-- SQL18 分组计算练习题
select gender, university, count(id) as 'user_num', avg(active_days_within_30) as 'avg_active_day', avg(question_cnt) as 'avg_question_cnt'
from user_profile group by university, gender;
-- SQL19 分组过滤练习题
select university, avg(question_cnt) as 'avg_question_cnt', avg(answer_cnt) as 'avg_answer_cnt' from user_profile
group by university having avg_question_cnt < 5 or avg_answer_cnt < 20;
-- SQL20 分组排序练习题
select university, avg(question_cnt) as 'avg_question_cn' from user_profile
group by university order by avg_question_cn asc;
-- SQL21 浙江大学用户题目回答情况
select device_id, question_id, result from question_practice_detail
where device_id in (select device_id from user_profile where university = '浙江大学')
order by question_id asc;
-- SQL22 统计每个学校的答过题的用户的平均答题数
select university,count(*) / count(distinct user_profile.device_id) from user_profile
inner join question_practice_detail on user_profile.device_id = question_practice_detail.device_id
group by university;
-- SQL24 统计每个用户的平均刷题数
select university, difficult_level, round(count(b.question_id) / count(distinct b.device_id), 4) from user_profile as a
inner join question_practice_detail as b on a.device_id = b.device_id
inner join question_detail as c on b.question_id = c.question_id
where university = '山东大学' group by difficult_level
-- SQL27 查看不同年龄段的用户明细
select
device_id, gender,
case
when age is null then '其他'
when age >=25 then '25岁及以上'
when age < 20 then '20岁以下'
else '20-24岁' end as 'age_cut'
from user_profile;
-- SQL31 提取博客URL中的用户名
select
device_id,
substring_index(blog_url, '/', -1) as 'user_name'
from user_submit;
-- SQL32 截取出年龄
select
substring_index(substring_index(profile, ',',3), ',', -1) as 'age',
count(*) as 'number'
from user_submit
group by age;
-- SQL34 统计复旦用户8月练题情况
select
a.device_id,
university,
count(question_id) as 'question_cnt',
sum(
case
when result = 'right' then 1
else 0 end
) as 'right_question_cnt'
from user_profile as a
inner join question_practice_detail as b
on a.device_id = b.device_id and month(date) = 8
where university = '复旦大学' group by a.device_id;
-- SQL35 浙大不同难度题目的正确率
select
difficult_level,
round(
sum(
case
when b.result = 'right' then 1
else 0 end
) / count(b.question_id), 4
) as 'correct_rate'
from user_profile as a
inner join question_practice_detail as b on a.device_id = b.device_id
inner join question_detail as c on b.question_id = c.question_id
where a.university = '浙江大学' group by c.difficult_level order by correct_rate asc;
-- SQL36 查找后排序
select
device_id, age
from user_profile
order by age asc;
-- SQL37 查找后多列排序
select
device_id, gpa, age
from user_profile
order by gpa asc, age asc;
-- SQL38 查找后降序排列
select
device_id, gpa, age
from user_profile
order by gpa desc, age desc;
-- SQL39 21年8月份练题总数
select
count(distinct device_id) as 'did_cnt',
count(question_id) as 'question_cnt'
from question_practice_detail
where year(date) = 2021 and month(date) = 8;
3.我没写出来的
这部分的题目是我第一遍没有写出来的,就是重点题目:
-- SQL23 统计每个学校各难度的用户平均刷题数
select
university,
difficult_level,
round(
count(b.question_id) / count(distinct b.device_id), 4
) as 'avg_answer_cnt'
from user_profile as a
inner join question_practice_detail as b on a.device_id = b.device_id
inner join question_detail as c on b.question_id = c.question_id
group by a.university, difficult_level
- 首先需要去了解表结构的类型,涉及到三张表:
- user_profile:记录了用户设备id、用户学校
- question_practice_detail:记录了某个设备id、答题的题目号
- question_detail:记录了题目号,题目号的难度
- 问题是:计算不同学校、不同难度的用户平均答题量。
- 首先知道三张表联查之间的on语句关系;
- 不同学校的话,需要根据学校进行分组,不同难度,就根据难度进行分组;
- 平均值精度:保留4位小数round(x, 4)
-- SQL25 查找山东大学或者性别为男生的信息
select
device_id, gender, age, gpa
from user_profile
where university='山东大学'
union all
select
device_id, gender, age, gpa
from user_profile
where gender='male';
- 该题考查的内容是对
union
的使用:- Union:对两个结果集进行并集操作,不包括重复行,同时进行默认规则的排序;
- Union All:对两个结果集进行并集操作,包括重复行,不进行排序;
-- SQL26 计算25岁以上和以下的用户数量
select case
when age < 25 or age is null then '25岁以下'
when age >= 25 then '25岁及以上' end as 'age_cut', count(*) as 'number'
from user_profile group by age_cut;
CASE 测试表达式
WHEN 简单表达式1 THEN 结果表达式1
WHEN 简单表达式2 THEN 结果表达式2 …
WHEN 简单表达式n THEN 结果表达式n
[ ELSE 结果表达式n+1 ]
END
- 该题考察的是对
case
函数的使用,如上所示。
-- SQL28 计算用户8月每天的练题数量
select
day(date) as 'day', count(question_id)
from question_practice_detail
where year(date) = 2021 and month(date) = 8
group by day(date);
- 该题目考察的是关于日期函数的使用
-- SQL29 计算用户的平均次日留存率
select count(date2) / count(date1) as 'avg_ret'
from(
select
distinct a.device_id, a.date as 'date1',b.date as 'date2'
from question_practice_detail as a
left join (
select distinct device_id, date from question_practice_detail
) as b
on a.device_id = b.device_id and date_add(a.date, interval 1 day) = b.date
) as c;
- 很难
- 限定条件:第二天再来。
- 解法1:表里的数据可以看作是全部第一天来刷题了的,那么我们需要构造出第二天来了的字段,因此可以考虑用left join把第二天来了的拼起来,限定第二天来了的可以用date_add(date1, interval 1 day)=date2筛选,并用device_id限定是同一个用户。
- 解法2:用lead函数将同一用户连续两天的记录拼接起来。先按用户分组partition by device_id,再按日期升序排序order by date,再两两拼接(最后一个默认和null拼接),即lead(date) over (partition by device_id order by date)
- 平均概率:
- 解法1:可以count(date1)得到左表全部的date记录数作为分母,count(date2)得到右表关联上了的date记录数作为分子,相除即可得到平均概率
- 解法2:检查date2和date1的日期差是不是为1,是则为1(次日留存了),否则为0(次日未留存),取avg即可得平均概率。
DATE_ADD(date,INTERVAL expr unit)
DATE_SUB(date,INTERVAL expr unit)
date_add()
常常用户在mysql的sql中实现对日期类型的操作,比如增加或者减少,但是不改变原来的数据,只是对查询的数据做处理
- date:表示时间字段或者时间类型
- interval:是固定标志
- expr:表示数量,可以为正负,表示加减
- unit:表示日期类型 可以是day、hour、second
-- SQL30 统计每种性别的人数
select
substring_index(profile, ',', -1) as gender,
count(*) as 'number'
from user_submit group by gender;
重点是对字符串的处理:
- substring_index(str,delim,count):
例子:str=www.google.com
- substring_index(str,‘.’,1);结果是:www;
- substring_index(str,‘.’,2);结果是:www.google
也就是说,如果count是正数,那么就是从左往右数,第N个分隔符的左边的全部内容。相反,如果是负数,那么就是从右边开始数,第N个分隔符右边的所有内容,如:
- substring_index(str,‘.’,-2); 结果为:google.com
- concat是连接几个字符串
concat(‘wo’,‘lin’,‘xue’,‘bin’);结果就是wolinxuebin。
-- SQL33 找出每个学校GPA最低的同学
select
device_id, university, gpa
from (
select *,
row_number() over (partition by university order by gpa asc) as rn
from user_profile
) a where a.rn = 1;
本题目是窗口函数,进行排序,row_number() over(partition by 分组列 order by 排序列 desc)
:
总结
该部分为对非技术快速入门篇的总结,重点是对一些函数的使用:
- round(x,d)
- union
- union all
- case when (条件) then (结果) else (结果) end
- year(date)
- month(date)
- day(date)
- DATE_ADD(date,INTERVAL expr unit)
- substring_index(s,split,index)
- row_number() over(partition by A order by C)
- rank()
- dense_rank()