1.统计作答次数
有一个试卷作答记录表exam_record,请从中统计出总作答次数total_pv、试卷已完成作答数complete_pv、已完成的试卷数complete_exam_cnt。
示例数据 exam_record表(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
示例输出:
解释:表示截止当前,有11次试卷作答记录,已完成的作答次数为7次(中途退出的为未完成状态,其交卷时间和份数为NULL),已完成的试卷有9001和9002两份。
示例1
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-02 09:01:01', '2020-01-02 09:21:01', 80),
(1001, 9001, '2021-05-02 10:01:01', '2021-05-02 10:30:01', 81),
(1001, 9001, '2021-06-02 19:01:01', '2021-06-02 19:31:01', 84),
(1001, 9002, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 89),
(1001, 9001, '2021-09-02 12:01:01', null, null),
(1001, 9002, '2021-09-01 12:01:01', null, null),
(1002, 9002, '2021-02-02 19:01:01', '2021-02-02 19:30:01', 87),
(1002, 9001, '2021-05-05 18:01:01', '2021-05-05 18:59:02', 90),
(1003, 9001, '2021-02-06 12:01:01', null, null),
(1003, 9001, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 89),
(1004, 9001, '2021-09-06 12:01:01', null, null);
输出
11|7|2
题解
思路:
1.总作答次数:count(exam_id) as total_pv;
2.试卷已完成作答数,count(A)会忽略A的值为null的行:
count(submit_time) as complete_pv;
3.已完成的试卷数,已完成时才计数用if判断,试卷可能被完成多次,
需要去重用distinct:
count(distinct if(submit_time is not null, exam_id, null)) as complete_exam_cnt
select
count(exam_id)as total_pv,
count(submit_time) as complete,
count(distinct if(submit_time is not null, exam_id,null)) as complete_exam_cnt
from exam_record record
2.得分不小于平均分的最低分
请从试卷作答记录表中找到SQL试卷得分不小于该类试卷平均得分的用户最低得分。
示例数据 exam_record表(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
examination_info表(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间)
示例输出数据:
解释:试卷9001和9002为SQL类别,作答这两份试卷的得分有[80,89,87,90],平均分为86.5,不小于平均分的最小分数为87
示例1
drop table if exists examination_info;
CREATE TABLE examination_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
tag varchar(32) COMMENT '类别标签',
difficulty varchar(8) COMMENT '难度',
duration int NOT NULL COMMENT '时长',
release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
(9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
(9002, 'SQL', 'easy', 60, '2020-02-01 10:00:00'),
(9003, '算法', 'medium', 80, '2020-08-02 10:00:00');
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-02 09:01:01', '2020-01-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 89),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-02-02 19:01:01', '2021-02-02 19:30:01', 87),
(1002, 9002, '2021-05-05 18:01:01', '2021-05-05 18:59:02', 90),
(1003, 9002, '2021-02-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null);
题解
思路
1.
要找类别为SQL的试卷平均得分:
得分信息在exam_record,试卷类别在表examination_info中,因此要将两个表以exam_id连接。
从连接后的表中找到类别为SQL的试卷的分数。知识点:select...from...where...
计算得分的平均值。
2.找到类别SQL的试卷得分大于平均得分的最小值:
得分信息在exam_record,试卷类别在表examination_info中,因此要将两个表以exam_id连接。
从连接后的表中找到类别为SQL的试卷且分数大于刚刚找到的平均分的分数。知识点:select...from...where...and...
从中选出最小值。
select min(record.score) as min_score_over_avg
from exam_record record
left join examination_info info
on record.exam_id=info.exam_id
where info.tag='SQL'and record.score>=
(
select avg(record.score)
from exam_record record
left join examination_info info
on record.exam_id=info.exam_id
where info.tag='SQL'
)
3.平均活跃天数和月活人数
用户在牛客试卷作答区作答记录存储在表exam_record中,内容如下:
exam_record表(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
请计算2021年每个月里试卷作答区用户平均月活跃天数avg_active_days和月度活跃人数mau,上面数据的示例输出如下:
解释:2021年7月有2人活跃,共活跃了3天(1001活跃1天,1002活跃2天),平均活跃天数1.5;2021年9月有4人活跃,共活跃了5天,平均活跃天数1.25,结果保留2位小数。
注:此处活跃指有交卷行为。
示例1
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', '2021-07-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-07-02 19:01:01', '2021-07-02 19:30:01', 82),
(1002, 9002, '2021-07-05 18:01:01', '2021-07-05 18:59:02', 90),
(1003, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1006, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89),
(1007, 9002, '2020-09-02 12:11:01', '2020-09-02 12:31:01', 89);
题解:
思路:
1.根据提交时间进行月分组 date_format(submit_time, '%Y%m')
2.根据提交时间submit_time不为空筛选活跃的的人
3.利用COUNT(DISTINCT expr,[expr...])检索到的行中去重后非NULL值的数目.去重同一天产生的数据
4.保留两位小数 round(expr,2)
select
date_format(submit_time, '%Y%m') as month,
round((count(distinct uid,
date_format(submit_time,'%y%m%d'))) / count(distinct uid), 2) as avg_active_days,
count(distinct uid) as mau
from exam_record
where submit_time is not null
and year(submit_time) = 2021
group by date_format(submit_time, '%Y%m')
拓展
count( DISTINCT expr,[expr…])函数
COUNT( DISTINCT expr,[expr...])
DISTINCT: 去重
expr: 字段或表达式
[expr]: 0个或多个字段
count(expr) :返回select语句检索到的行中非NULL值的数目,
相当于select expr from 表名,得到n行记录,去掉字段expr值为null的行数,即为count(expr) 的值;
如果expr为*,即count(*),则select * from 表名结果中把每个字段都为null的行去掉,最后得到的行数即为count(*)的值
COUNT(DISTINCT expr,[expr...]):
返回select语句检索到的行中去重后非NULL值的数目
[expr...]:表示可以写多个字段,COUNT(DISTINCT *) 为无效写法
举例
COUNT(DISTINCT field1, field2, field3),
表示去掉field1, field2, field3的行,
field1, field2, field3组合的结果去重后,再统计个数
4.月总刷题数和日均刷题数
现有一张题目练习记录表practice_record,示例内容如下:
请从中统计出2021年每个月里用户的月总刷题数month_q_cnt 和日均刷题数avg_day_q_cnt(按月份升序排序)以及该年的总体情况,示例数据输出如下:
解释:2021年8月共有2次刷题记录,日均刷题数为2/31=0.065(保留3位小数);2021年9月共有3次刷题记录,日均刷题数为3/30=0.100;2021年共有5次刷题记录(年度汇总平均无实际意义,这里我们按照31天来算5/31=0.161)
示例1:
drop table if exists practice_record;
CREATE TABLE practice_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
question_id int NOT NULL COMMENT '题目ID',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO practice_record(uid,question_id,submit_time,score) VALUES
(1001, 8001, '2021-08-02 11:41:01', 60),
(1002, 8001, '2021-09-02 19:30:01', 50),
(1002, 8001, '2021-09-02 19:20:01', 70),
(1002, 8002, '2021-09-02 19:38:01', 70),
(1003, 8002, '2021-08-01 19:38:01', 80);
输出
202108|2|0.065
202109|3|0.100
2021汇总|5|0.161
题解
思路:
1.筛选2021年的刷题记录:where year(submit_time)='2021'
2.获取当月的天数:DAYOFMONTH(LAST_DAY(submit_time))
3.获取当月的年月:DATE_FORMAT(submit_time, "%Y%m") as y_m
4.统计每月的总刷题数和日均刷题数:先按年月分组:group by y_m
5.统计月总刷题数:count(1)
6.月均刷题数=月总刷题数÷该月天数:count(1) / max(days_of_month)
7.汇总情况:with rollup
方式一:
select ifnull(y_m, '2021汇总') as submit_month,
count(1) as month_q_cnt,
round(count(1) / max(days_of_month), 3) as avg_day_q_cnt
from (
select
DAYOFMONTH(LAST_DAY(submit_time)) as days_of_month,
DATE_FORMAT(submit_time, "%Y%m") as y_m
from practice_record
where year(submit_time)='2021'
) as t_month_stat
group by y_m
with rollup
方式二:
with tmp as (
select DATE_FORMAT(submit_time,'%Y%m')as submit_month,
count(question_id)as month_q_cnt
from practice_record
where year(submit_time)=2021
group by submit_month
order by submit_month
)
select submit_month,month_q_cnt,
round(month_q_cnt/
datediff(concat(submit_month,'01')+interval 1 month,
concat(submit_month,'01')),3)as avg_day_q_cnt
from tmp
union (
select '2021汇总' as submit_month,
sum(month_q_cnt)as month_q_cnt,
round(sum(month_q_cnt)/31,3) as avg_day_q_cnt
from tmp
);
方式三:
select * from
(
select
date_format(submit_time,'%Y%m') as submit_month,
count(1) as month_q_cnt,
round(count(1)/
max(day(last_day(date_format(submit_time,'%Y%m%d')))),3) as avg_day_q_cnt
from practice_record
where year(submit_time) = 2021
group by date_format(submit_time,'%Y%m')
union all
select
'2021汇总' as submit_month,
count(1) as month_q_cnt,
round(count(1)/31,3) as avg_day_q_cnt
from practice_record
where year(submit_time) = 2021
) t
order by submit_month
拓展
with()函数
mysql版本在8.0之前不能使用with的写法。
with t1 as (
select
emp.emp_level as type,
count(emp.id) as n
from
emp
group by
emp.emp_level
),
t2 as (
select
d.depot_level as type,
count(d.id) as n
from
depot as d
group by
d.depot_level
)
select
t1.n/t2.n
from
t1
left join
t2
on t1.type = t2.type
特别是从多张表中取数据时,而且每张表的数据量又很大时,使用with写法可以先筛选出来一张数据量较少的表,避免全表join。