牛客SQL刷题笔记

  • 共81道
  • AS 均可省略
  • 留意 * 的使用会节省很多时间
  • 看清楚要用几张表以及是否要用子查询
  • 比如求薪水差值或比大小或求多组,多次并列薪水表
  • 分解问题
  • count(1)和count(*)一样统计所有行数且不会忽略列值为NULL,count(列名)只包括列名那一列且会忽略列值为空的计数
  1. 最晚入职的当天未必就一个人,也许有多人,使用排序并限制得只能取得指定数量的结果
    要加上select否则会报错invalid use of…
SELECT * FROM employees
    WHERE hire_date = (SELECT max(hire_date) FROM employees);
  1. 首先需要加distinct去重,再用where查询出满足条件的所有员工。或者limit 2,1。此类问题,四种解法:1. distinct + limit 2. 来自表用开窗函数加个排序 3. 先确定序号,然后找对应信息 4. max不断嵌套
SELECT *
FROM employees
WHERE hire_date = (
    SELECT DISTINCT hire_date
    FROM employees
    ORDER BY hire_date DESC
    LIMIT 1 OFFSET 2);
  1. 分清楚谁应该是主表,以及加上日期的限制(INNER JOIN好像都可以)
SELECT s.*, d.dept_no
FROM salaries AS s
INNER JOIN dept_manager AS d
ON s.emp_no = d.emp_no
WHERE s.to_date = '9999-01-01' AND d.to_date = '9999-01-01';
SELECT e.last_name, e.first_name, d.dept_no
FROM employees AS e
INNER JOIN dept_emp AS d
ON e.emp_no = d.emp_no;
SELECT e.last_name, e.first_name, d.dept_no
FROM employees as e
LEFT JOIN dept_emp as d
ON e.emp_no = d.emp_no;
  1. COUNT()函数不可用于WHERE语句中,故使用HAVING语句来限定t>15的条件
SELECT emp_no, COUNT(emp_no) AS t 
FROM salaries 
GROUP BY emp_no 
HAVING t > 15;
  1. 大表一般用distinct效率不高,大数据量的时候都禁止用distinct,建议用group by解决重复问题
SELECT salary FROM salaries
WHERE to_date = '9999-01-01' 
GROUP BY salary
ORDER BY salary DESC;
  1. MySQL官方文档有说明,in关键字适合确定数量的情况,一般效率较低,不推荐使用
SELECT employees.emp_no FROM employees LEFT JOIN dept_manager
ON employees.emp_no = dept_manager.emp_no
WHERE dept_no IS NULL;
  1. <> 和 != 一样代表不等于
SELECT de.emp_no, dm.emp_no AS manager_no 
FROM dept_emp AS de INNER JOIN dept_manager AS dm
ON de.dept_no = dm.dept_no 
WHERE dm.to_date = '9999-01-01' AND de.to_date = '9999-01-01' AND de.emp_no <> dm.emp_no;

12. 此题常见漏洞:
(1). MAX(salary) 和 emp_no 不一定对应. emp_no 直接和 group by dept_no一起使用,拿到了最大 salary 但是存在 emp_no 取值其实与 salary 不匹配的问题;
(2). 先使用 group by 获得最高 salary,再去用最高 salary 匹配两表返回 dept_no, emp_no 信息,这存在A部门的最高薪水,等于B部门非最高薪水时,B部门的非最高薪水也会被显示出来。

SELECT t1.dept_no, t1.emp_no, t1.salary
FROM (SELECT d.emp_no, d.dept_no, s.salary 
      FROM dept_emp AS d 
      INNER JOIN salaries AS s 
      ON d.emp_no = s.emp_no 
      WHERE d.to_date = '9999-01-01' AND s.to_date = '9999-01-01') AS t1
INNER JOIN (SELECT d.dept_no, max(salary) AS salary 
      FROM dept_emp d 
      INNER JOIN salaries s 
      ON d.emp_no = s.emp_no 
      WHERE d.to_date = '9999-01-01' AND s.to_date = '9999-01-01' 
      GROUP BY d.dept_no) t2
ON t1.dept_no = t2.dept_no AND t1.salary = t2.salary
ORDER BY t1.dept_no ASC;
  1. 取奇数位运算方法 e.emp_no & 1 = 1
SELECT * FROM employees
WHERE emp_no % 2 = 1
AND last_name != 'Mary'
ORDER BY hire_date DESC;
  1. inner join 也可以
    一张完整表,一张部门和最大薪水表
SELECT t.title, avg(s.salary)
FROM titles AS t, salaries AS s  
WHERE t.emp_no = s.emp_no
GROUP BY t.title 
ORDER BY avg(s.salary) ASC;
  1. 这类问题一定要考虑多个人相同工资
    rank()是跳跃排序,结果是1133;dense_rank()是连续排序,结果是1122;row_number连续且唯一,结果是1234
    重命名不要和函数名一致
    不需要partition by
SELECT emp_no, salary
FROM salaries
WHERE salary = (SELECT salary 
                FROM salaries 
                GROUP BY salary 
                ORDER BY salary DESC 
                LIMIT 1, 1)
AND to_date = '9999-01-01';
  1. 这题还有一种炫技的解法,inner join salaries s2 on s1.salary <= s2.salary (100 98 97 关联后 100 的对应100 98 97,98 对应 98 97) ,然后 group by, having count() = 2
SELECT e.emp_no, s.salary, e.last_name, e.first_name
FROM employees AS e
INNER JOIN salaries AS s 
ON e.emp_no = s.emp_no 
AND s.to_date = '9999-01-01'
AND s.salary = (SELECT max(salary)
                FROM salaries
                WHERE salary < (SELECT max(salary) 
                                FROM salaries 
                                WHERE to_date='9999-01-01')
                AND to_date='9999-01-01');
  1. 两个 left join,观察问题和表之间的联系
SELECT e.last_name, e.first_name, d.dept_name
FROM employees AS e 
LEFT JOIN dept_emp AS de ON e.emp_no = de.emp_no
LEFT JOIN departments AS d ON de.dept_no = d.dept_no;

21. 看清问题,留意关键信息 hire_date, 两个 inner join

SELECT e.emp_no, (a.salary-b.salary) AS growth
FROM employees AS e
INNER JOIN salaries AS a
ON e.emp_no = a.emp_no AND a.to_date = '9999-01-01'
INNER JOIN salaries AS b
ON e.emp_no = b.emp_no AND b.from_date = e.hire_date
ORDER BY growth ASC;
  1. 有使用 join ,和 不使用 join 直接拼接两种方式,这里给出后者(要加上group by,否则count只有一个)
SELECT  dm.dept_no, dm.dept_name, count(*) AS sum
FROM departments AS dm, dept_emp AS de, salaries AS s
WHERE dm.dept_no = de.dept_no
AND de.emp_no = s.emp_no
GROUP BY dm.dept_no;
  1. 注意 distinct 的使用;
    同时学习 <= 这种排序的方法;
    另若不使用 group by 则会由于 count 的使用导致只有一个结果;
    还有最后两个排序。
    或使 dense_rank,注意 order by 之后加 desc
SELECT s1.emp_no, s1.salary, COUNT(DISTINCT s2.salary) AS t_rank
FROM salaries AS s1, salaries AS s2
WHERE s1.to_date = '9999-01-01'  AND s2.to_date = '9999-01-01' AND s1.salary <= s2.salary
GROUP BY s1.emp_no
ORDER BY s1.salary DESC, s1.emp_no ASC;

注意最后两个 order by 的顺序

select emp_no, salary, DENSE_RANK() over (order by salary desc) t_rank
from salaries order by t_rank asc,emp_no asc;
  1. 使用 != 会报错 ‘Subquery returns more than 1 row’,所以使用 not in
    差集(left join)然后找出为null的地方也可以
SELECT de.dept_no, e.emp_no, s.salary
FROM employees AS e 
INNER JOIN dept_emp AS de ON e.emp_no = de.emp_no
INNER JOIN salaries AS s ON e.emp_no = s.emp_no
WHERE de.to_date='9999-01-01' AND s.to_date ='9999-01-01'
AND e.emp_no NOT IN (SELECT emp_no FROM dept_manager WHERE to_date = '9999-01-01');
select dm.dept_no, e.emp_no, s.salary
from employees e join salaries s on e.emp_no = s.emp_no join dept_emp de on e.emp_no = de.emp_no join dept_manager dm on de.dept_no = dm.dept_no
where e.emp_no <> dm.emp_no
  1. 先判断一共要用几张表
SELECT sem.emp_no AS emp_no, sdm.emp_no AS manager_no, sem.salary AS emp_salary, sdm.salary AS manager_salary
FROM (SELECT s.salary, s.emp_no, de.dept_no FROM salaries s INNER JOIN dept_emp de
ON s.emp_no = de.emp_no AND s.to_date = '9999-01-01' ) AS sem, 
(SELECT s.salary, s.emp_no, dm.dept_no FROM salaries s INNER JOIN dept_manager dm
ON s.emp_no = dm.emp_no AND s.to_date = '9999-01-01' ) AS sdm
WHERE sem.dept_no = sdm.dept_no AND sem.salary > sdm.salary;

也可以拼三张表

SELECT de.emp_no, dm.emp_no manager_no, s.salary emp_salary , s1.salary manager_salary
FROM dept_emp de
LEFT JOIN dept_manager dm ON de.dept_no = dm.dept_no
LEFT JOIN salaries s ON de.emp_no = s.emp_no
LEFT JOIN salaries s1 ON dm.emp_no = s1.emp_no
WHERE s.salary > s1.salary

26. 两个 group by

SELECT de.dept_no, dp.dept_name, t.title, COUNT(t.title) AS count
FROM titles AS t
INNER JOIN dept_emp AS de ON t.emp_no = de.emp_no 
AND de.to_date = '9999-01-01' AND t.to_date = '9999-01-01'
INNER JOIN departments AS dp ON de.dept_no = dp.dept_no
GROUP BY de.dept_no, t.title
ORDER BY de.dept_no ASC;
  1. 不需要使用join,先找满足 >= 5 这个条件的。这题有一个陷阱,电影数目>=5 是这类电影的所有数目,并不是包含了robot的这类电影的数目
SELECT c.name, COUNT(fc.film_id) 
FROM (select category_id, COUNT(film_id) AS category_num 
      FROM film_category
      GROUP BY category_id
      HAVING count(film_id) >= 5) AS cc, film AS f, film_category AS fc, category AS c
WHERE  f.description LIKE '%robot%'
AND f.film_id = fc.film_id
AND c.category_id = fc.category_id
AND c.category_id = cc.category_id;
SELECT f.film_id, f.title
FROM film AS f
LEFT JOIN film_category AS fc ON f.film_id = fc.film_id
WHERE fc.category_id IS NULL;
  1. 子查询 / join
SELECT f.title, f.description
from film as f
where f.film_id in (select fc.film_id 
                    from film_category as fc 
                    where fc.category_id in (select c.category_id
                                             from category as c
                                             where c.name = "Action"));

  • 对一些函数的掌握
SELECT CONCAT(last_name," ",first_name) AS name FROM employees
CREATE TABLE IF NOT EXISTS actor (
    actor_id smallint(5) NOT NULL PRIMARY KEY COMMENT'主键id',
    first_name varchar(45) NOT NULL COMMENT'名字',
    last_name varchar(45) NOT NULL COMMENT '姓氏',
    last_update date NOT NULL COMMENT '日期'
);
INSERT INTO actor
VALUES (1, 'PENELOPE', 'GUINESS', '2006-02-15 12:34:33'),
       (2, 'NICK', 'WAHLBERG', '2006-02-15 12:34:33')
INSERT IGNORE INTO actor
VALUES ('3','ED','CHASE','2006-02-15 12:34:33');
  1. 注意分号,且不是插入值不用 values
CREATE TABLE IF NOT EXISTS actor_name (
first_name varchar(45) NOT NULL COMMENT "名字",
last_name varchar(45) NOT NULL COMMENT "姓氏");
INSERT INTO actor_name
SELECT first_name, last_name FROM actor;

37.

ALTER TABLE actor ADD UNIQUE uniq_idx_firstname (first_name);
ALTER TABLE actor ADD INDEX idx_lastname (last_name);
CREATE VIEW actor_name_view AS 
SELECT first_name AS first_name_v, last_name AS last_name_v
FROM actor;

39.

SELECT  *
FROM salaries
FORCE INDEX(idx_emp_no)
WHERE emp_no = 10005;

40.

ALTER TABLE actor ADD COLUMN create_date DATETIME NOT NULL DEFAULT '2020-10-01 00:00:00' AFTER last_update;

41.

CREATE TRIGGER audit_log
AFTER INSERT ON employees_test
FOR EACH ROW
BEGIN
    INSERT INTO audit VALUES(new.id,new.name);
END

42. MySQL中不允许在子查询的同时删除表数据(不能一边查一边把查的表删了)
而且后面必须要有 AS a,否则报错 SQL_ERROR_INFO: ‘Every derived table must have its own alias’

DELETE FROM titles_test
WHERE id NOT IN(
    SELECT * 
    FROM (SELECT MIN(id)
          FROM titles_test
          GROUP BY emp_no) AS a);
  1. 注意若干列 to_date = NULL 和 from_date = ‘2001-01-01’ 之间只能用逗号连接
UPDATE titles_test SET to_date = NULL, from_date = '2001-01-01'
WHERE to_date = '9999-01-01';
UPDATE titles_test
SET emp_no = REPLACE(emp_no, 10001, 10005)
WHERE id = 5;
ALTER TABLE titles_test
RENAME TO titles_2017;

46.

ALTER TABLE audit
ADD CONSTRAINT FOREIGN KEY (emp_no)
REFERENCES employees_test(id);
UPDATE salaries SET salary=salary*1.1
WHERE to_date='9999-01-01' AND salaries.emp_no IN (SELECT emp_no FROM emp_bonus)
SELECT CONCAT(last_name,"'",first_name) FROM employees;
SELECT LENGTH("10,A,B") - LENGTH(REPLACE("10,A,B",",","")) AS cnt;
SELECT first_name FROM employees ORDER BY RIGHT(first_name, 2);
  1. group_concat
SELECT dept_no, group_concat(emp_no) AS employees
FROM dept_emp
GROUP BY dept_no

54.

SELECT (SUM(salary) - MAX(salary) - MIN(salary)) / (COUNT(1)-2) avg_salary
FROM salaries WHERE to_date = '9999-01-01';
SELECT * FROM employees LIMIT 5,5
  1. WHERE employees.emp_no = dept_emp.emp_no 相当于拼接(dept_emp 在左),找出 dept_emp 中没有的 emp_no 。exists 会通过循环来查找结果,where 之后的判断相当于是循环的判断条件,注意一定是否定的形式,否则 exists 若能返回结果集则全部输出
SELECT *
FROM employees
WHERE NOT EXISTS (SELECT emp_no
                  FROM dept_emp
                  WHERE employees.emp_no = dept_emp.emp_no);

59.

SELECT e.emp_no, e.first_name, e.last_name, b.btype, s.salary, 
(CASE b.btype WHEN 1 THEN s.salary * 0.1
 WHEN 2 THEN s.salary * 0.2
 ELSE s.salary * 0.3 END) AS bonus
FROM employees AS e 
INNER JOIN emp_bonus AS b ON e.emp_no = b.emp_no
INNER JOIN salaries AS s ON e.emp_no = s.emp_no AND s.to_date = '9999-01-01'
SELECT emp_no, salary, SUM(salary) OVER (ORDER BY emp_no) AS running_total
FROM salaries WHERE to_date = '9999-01-01';
  1. 注意题目中要求 “输出时不需排序” ,所以多接了个 inner join 变回原来的顺序
SELECT e.first_name
FROM employees e 
INNER JOIN (SELECT first_name, ROW_NUMBER() OVER(ORDER BY first_name ASC) AS r_num
            FROM employees) AS t
ON e.first_name = t.first_name
WHERE t.r_num % 2 = 1;

字符串可以比大小,且注意题中是升序(用>=)最后不排序

SELECT e1.first_name FROM employees e1
WHERE (SELECT count(*) FROM employees e2 WHERE e1.first_name >= e2.first_name) % 2 = 1;
SELECT number FROM grade GROUP BY number HAVING COUNT(number)>=3;
SELECT id, number, dense_rank() OVER (ORDER BY number DESC) AS t_tank
FROM passing_number ORDER BY t ASC, id ASC;
SELECT p.id, p.name, t.content
FROM person AS p
LEFT JOIN task AS t ON t.person_id = p.id
ORDER BY p.id ASC

65.

SELECT date, round(sum(type = "no_completed") / count(*), 3) AS p
FROM email AS t1
INNER JOIN user AS t2 ON t1.send_id = t2.id
INNER JOIN user AS t3 ON t1.receive_id = t3.id
WHERE t2.is_blacklist = 0 AND t3.is_blacklist = 0
GROUP BY date
ORDER BY date;

  1. MAX(date) 而不是放在 group by 后面的 having
SELECT user_id, MAX(date) AS d
FROM login 
GROUP BY user_id
ORDER BY user_id;
  1. 单取 id 和 max(date) 是可以直接用的,但是要取多个就要用子查询(直接用会出现不匹配)
SELECT u.name AS u_n, c.name AS c_n, l.date
FROM login AS l
INNER JOIN user AS u ON l.user_id = u.id
INNER JOIN client AS c ON l.client_id = c.id
WHERE (l.user_id,l.date) IN (SELECT user_id, max(date) 
                             FROM login 
                             GROUP BY login.user_id)
ORDER BY u.name;
select u.name, c.name, t.d as date
from user as u, client as c, (select *, max(date) over(partition by user_id) as d from login) as t
where u.id = t.user_id
and c.id = t.client_id
and t.date = t.d
order by u.name;

68. 现在可以不用*1.0,(user_id, date)要加括号, DATE_ADD(min(date), INTERVAL 1 DAY)。也可以通过拼接,条件是user_id相等,以及时间=时间+1/-1

SELECT ROUND(COUNT(DISTINCT user_id) / (SELECT COUNT(DISTINCT user_id) FROM login), 3)
FROM login
WHERE (user_id, date)
IN (SELECT user_id, DATE_ADD(min(date), INTERVAL 1 DAY) FROM login GROUP BY user_id);

69. 还可以用 row_number() 找 rank = 1 来判断是否是新用户

SELECT date, SUM(CASE WHEN (user_id, date) IN (SELECT user_id, MIN(date)
                                               FROM login 
                                               GROUP BY user_id)
                 THEN 1 ELSE 0 END)
FROM login
GROUP BY date
ORDER BY date ASC;

70. 注意 INTERVAL -1 day, ifnull 和 group by 后接的是 user_id (注意留存用户的判断是两个条件)
此题也可使用 left join 做

SELECT date, IFNULL(ROUND(SUM(CASE WHEN (user_id, date) IN (SELECT user_id, MIN(date) FROM login GROUP BY user_id)
                      AND (user_id, date) IN (SELECT user_id, DATE_ADD(date, INTERVAL -1 day) FROM login)
                      THEN 1 ELSE 0 END) / SUM(CASE WHEN (user_id, date) IN (SELECT user_id, MIN(date) FROM login GROUP BY user_id)
                                              THEN 1 ELSE 0 END), 3), 0) AS p
FROM login
GROUP BY date
ORDER BY date ASC;

71. 开窗函数(不像普通聚合函数那样每组只返回一个值,开窗函数可以为每组返回多个值)
order by 可以拼着写,名称也可用重命名之后的。另:其中一个表用不到

SELECT u.name AS u_n, p.date, SUM(number) OVER (PARTITION BY p.user_id ORDER BY p.date) AS ps_num
FROM passing_number AS p
LEFT JOIN user AS u
ON p.user_id = u.id
ORDER BY p.date, u_n;

SELECT job, ROUND(avg(score), 3)
FROM grade
GROUP BY job
ORDER BY avg(score) DESC;
  1. 注意命名和判断是哪个表的数据
SELECT a.id, a.job, a.score
FROM grade AS a
LEFT JOIN (SELECT b.job, avg(b.score) AS avgscore
           FROM grade b
           GROUP BY b.job) AS c
ON a.job = c.job
WHERE a.score > c.avgscore
ORDER BY a.id;
  1. 命名不能与窗口函数名称冲突,且排序的窗口函数要给ORDER BY,根据题中出现三个的情况只能是 DENSE_RANK()
SELECT g.id, l.name, g.score
FROM (SELECT *, DENSE_RANK() OVER (PARTITION BY language_id ORDER BY score desc) AS rank_num FROM grade) AS g, language AS l
WHERE g.language_id = l.id AND g.rank_num <= 2
ORDER BY l.name ASC, g.score DESC, g.id ASC;
  1. floor(去尾整数化)也可以
SELECT a.job, ROUND(COUNT(a.id)/2) as start, ROUND((COUNT(a.id)+1)/2) as end
FROM grade AS a
GROUP BY a.job
ORDER BY a.job;
  1. 无论奇偶,【中位数的位置】与【(个数+1)/ 2 】之间的距离小于1(寻找统一的规则)
SELECT id, job, score, t_rank
FROM (SELECT *, 
     (ROW_NUMBER() OVER (PARTITION BY job ORDER BY score DESC)) AS t_rank,
     (COUNT(score) OVER (PARTITION BY job)) AS num
     FROM grade) AS t1
WHERE ABS(t1.t_rank-(t1.num+1)/2)<1
ORDER BY id;

对grade表添加组内排名列作为表1,各组中位数的表(也就是上题的查询)作为表2,
用相同组(job)关联表1表2,用where筛出表1中的组内排名等于表2中组内start位置或end位置。

select t1.id
        ,t1.job
        ,t1.score
        ,t1.s_rank
from 
(select id,job,score
        ,(row_number()over(partition by job order by score desc))as s_rank
        from grade)t1
join 
(select job
        ,case when count(score)%2=0 then ceiling(count(score)/2) else ceiling(count(score)/2)
        end as start1
        ,case when count(score)%2=0 then ceiling(count(score)/2+1) else ceiling(count(score)/2)
        end as end1
from grade
group by job) t2
on t1.job=t2.job 
where t1.s_rank=t2.start1 or t1.s_rank=t2.end1
order by t1.id;

对于条件多的问题,数字类型的判断(排序/个数统计的等)可以放到外部再判断,这样会简单些

SELECT * 
FROM order_info 
WHERE date > '2025-10-15' 
AND status = 'completed' 
AND product_name IN ('C++', 'Java', 'Python') 
ORDER BY id;
SELECT user_id
FROM order_info
WHERE date > '2025-10-15'
AND product_name in ('C++','Java','Python')
AND status = 'completed'
GROUP BY user_id
HAVING COUNT(user_id) > 1
ORDER BY user_id;
  1. 两个表也可以做,外表多加个寻找 >= 2 的 id 的子查询(注意若直接 group by 然后提取 id 每个 user_id 只会输出一个)
    两个表不能是相同的名字, group by 之后也不能再加查询
SELECT t1.id, t1.user_id, t1.product_name, t1.status, t1.client_id, t1.date
FROM (SELECT *, COUNT(id) OVER (PARTITION BY user_id) AS number
      FROM order_info
      WHERE DATEDIFF(date, "2025-10-15") > 0
      and status ="completed"
      and product_name in ("C++","Java","Python")) AS t1
WHERE t1.number > 1
ORDER BY t1.id;
  1. 直接聚合,或者用 row_number() 排序
SELECT user_id, MIN(date) AS first_buy_date, COUNT(user_id) AS cnt
FROM order_info
WHERE date > '2025-10-15'
AND product_name IN ('C++','Java','Python')
AND status = 'completed'
GROUP BY user_id
HAVING count(user_id) > 1
ORDER BY user_id
  1. 使用 lead 函数也可以 (总之都是实现为0还是取值,相当于下面的 IF(date_rk = 1, date, NULL))
    LEAD(col,n,DEFAULT) 用于统计窗口内往下第n行值,参数1为列名,参数2为往下第n行(可选,默认为1),参数3为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)
SELECT user_id, MAX(IF(date_rk = 1, date, NULL)), MAX(IF(date_rk = 2, date, NULL)), MAX(cnt) 
FROM (SELECT user_id,
      date, 
      RANK() OVER (PARTITION BY user_id ORDER BY date) AS date_rk, 
      count(1) OVER (PARTITION BY user_id) AS cnt
      FROM order_info
      WHERE date>'2025-10-15' AND product_name in ('C++','Java','Python') AND status='completed') AS t1
WHERE cnt >= 2
GROUP BY user_id
ORDER BY user_id;
select
 a.user_id,
 min(a.date) as first_buy_date,
 a.next_date as second_buy_date,
 count(*) as cnt
from
    (select
     * ,
     lead(date,1,0) over(partition by user_id order by date) as next_date
    from order_info
    where date>='2025-10-16'
      and status='completed'
      and product_name in('C++','Java','Python')
    ) a
group by a.user_id having count(*)>=2
order by a.user_id ;

泛用的写法可在min/max内嵌套if/iif/case…when

select
 a.user_id,
 min(a.date) as first_buy_date,
 max(a.date) as second_buy_date,
 a.cnt
from
    (select
     user_id,
     date,
     row_number() over(partition by user_id order by date) as rank_no,
     count(*) over(partition by user_id) as cnt
    from order_info
    where date>='2025-10-16'
      and status='completed'
      and product_name in('C++','Java','Python')
    ) a
where a.rank_no<=2 and a.cnt>=2
group by a.user_id,a.cnt
order by a.user_id ;
  1. 这个系列的问题关键就在于把 >=2 这个条件和其它条件分开
SELECT t.id, t.is_group_buy, c.name
FROM (SELECT *, COUNT(*) OVER (PARTITION BY user_id) AS cnt
      FROM order_info
      WHERE date >'2025-10-15'
      AND product_name IN ('C++', 'Java', 'Python')
      AND status = 'completed') AS t
LEFT JOIN client AS c ON t.client_id = c.id
WHERE t.cnt >= 2
ORDER BY t.id;
  1. 要么多拼一列计数的然后筛选,要么把包含 >=2 这个条件的作为子查询结合不包含这个条件的子查询
SELECT IF(T1.is_group_buy = "No", T2.name, 'GroupBuy') AS source, COUNT(*) AS cnt
FROM (SELECT *, COUNT(*) OVER (PARTITION BY user_id) AS ucnt
      FROM order_info 
      WHERE date > '2025-10-15'
      AND product_name IN ('C++', 'Python', 'Java')
      AND status = 'completed') AS T1
LEFT JOIN client T2 ON T2.id = T1.client_id
WHERE ucnt > 1
GROUP BY source
ORDER BY source;
SELECT IFNULL(T2.name, 'GroupBuy') AS source, COUNT(*) AS cnt
FROM (SELECT *, COUNT(*) OVER (PARTITION BY user_id) AS ucnt
      FROM order_info 
      WHERE date > '2025-10-15'
      AND product_name IN ('C++', 'Python', 'Java')
      AND status = 'completed') AS T
LEFT JOIN client T2 ON T2.id = T.client_id
WHERE ucnt > 1
GROUP BY source
ORDER BY source;

SELECT job, sum(num) AS cnt
FROM resume_info
WHERE YEAR(date) = "2025"
GROUP BY job
ORDER BY cnt DESC;
  1. date_format(date,‘%Y-%m’) 也可以用
SELECT job, LEFT(date, 7) AS mon, SUM(num) AS cnt
FROM resume_info
WHERE YEAR(date) = 2025
GROUP BY job, mon
ORDER BY mon DESC, cnt DESC
  1. group by 后接两个条件,join 后是两个条件
SELECT s1.job, DATE_FORMAT(s1.date, '%Y-%m') AS first_year_mon, s1.cnt AS first_year_cnt, DATE_FORMAT(s2.date, '%Y-%m') AS second_year_mon, s2.cnt AS second_year_cnt
FROM (SELECT job, date, SUM(num) AS cnt
      FROM resume_info
      WHERE YEAR(date) = '2025'
      GROUP BY job, MONTH(date)) AS s1
JOIN (SELECT job, date, SUM(num) AS cnt
      FROM resume_info
      WHERE YEAR(date) = '2026'
      GROUP BY job, MONTH(date)) AS s2
ON s1.job = s2.job AND MONTH(s1.date) = MONTH(s2.date)
ORDER BY first_year_mon DESC, job DESC

select grade, sum(number) over (order by grade) as t_rank
from class_grade
order by grade;
  1. sum 是聚合函数所以要单独一个 select 才能输出多个
SELECT t.grade FROM (SELECT grade,
                    (SELECT SUM(number) FROM class_grade) AS total,
                    SUM(number) OVER (ORDER BY grade) AS a,
                    SUM(number) OVER (ORDER BY grade DESC) AS b
                    FROM class_grade) AS t
WHERE t.a >= t.total / 2 AND t.b >= t.total / 2
ORDER BY t.grade;

SELECT u.name, SUM(grade_num) AS grade_sum
FROM user AS u
JOIN grade_info AS g
ON u.id = g.user_id
GROUP BY u.id
ORDER BY grade_sum DESC
LIMIT 0, 1;
  1. 因为要多次使用同一个表,所以创建临时表
WITH tmp_table AS (
    SELECT t1.id, t1.name, SUM(t2.grade_num) AS grade_sum
    FROM user AS t1 
    JOIN grade_info AS t2 ON t1.id = t2.user_id
    GROUP BY t1.id)
SELECT id, name, grade_sum
FROM tmp_table
WHERE grade_sum = (SELECT MAX(grade_sum) FROM tmp_table)
ORDER BY id;
  1. case when 或者 if
select t1.id, t1.name, t0.grade as grade_sum
from (
     select user_id, grade, rank() over (order by grade desc) as t
     from(
         select user_id, sum(if(type='add',grade_num, -1*grade_num)) as grade
         from grade_info
         group by user_id
         ) a0
) t0
join user as t1
on t0.user_id = t1.id
where t = 1

补充:

  1. 查询连续登录7天用户(1)因为每天用户登录次数可能不止一次,所以需要先将用户每天的登录日期去重(2)再用row_number() over(partition by _ order by _)函数将用户id分组,按照登陆时间进行排序(3)计算登录日期减去第二步骤得到的结果值,用户连续登陆情况下,每次相减的结果都相同(4)按照id和日期分组并求和,筛选大于等于7的即为连续7天登陆的用户。
select id,count(*)
from (select *,date(日期)-cum as 结果 
   	 from (select *,row_number() over (PARTITION by id order by 日期) as cum
   	 		  from (select DISTINCT date(date) as 日期,id 
   	 		           from log_data) a) b) c GROUP BY id,结果 having count(*)>=7;
  1. 每日活跃用户及次日留存
-- 每天的活跃用户数
select login_date, count(distinct user_id) as 活跃用户数
from user
where login_date between"2021-01-01" and "2021-02-01"
group by login_date;

-- 次日留存率
select u1.login_date,  -- 日期
   count(distinct u2.user_id), -- 次日留存
   count(distinct u2.user_id) / count(distinct u1.user_id)   -- 次日留存率
from user u1
left join user u2 
on u1.user_id = u2.user_id
and u1.login_date between "2021-01-01" and "2021-02-01" 
and datediff(d, u1.login_date, u2.login_date) = 1
group by u1.login_date;
  1. 同时输出次日、三日、七日留存数和留存率
-- 只有一张登录表:
select u1.login_date,
   count (distinct case when by_days = 1 then u2.user_id else null end)) as 次日留存数
   count (distinct case when by_days = 1 then u2.user_id else null end)) / count(distinct u1.user_id) as 次日留存率
   count (distinct case when by_days = 3 then u2.user_id else null end)) as 三日留存数
   count (distinct case when by_days = 3 then u2.user_id else null end)) / count(distinct u1.user_id) as 三日留存率
   count (distinct case when by_days = 7 then u2.user_id else null end)) as 七日留存数
   count (distinct case when by_days = 7 then u2.user_id else null end)) / count(distinct u1.user_id) as 七日留存率
from
(
 select u1.user_id, u2.user_id, u1.login_date
   datediff(d, u1.login_date, u2.login_date) as by_days
 from user u1
 left join user u2
 on u1.user_id = u2.user_id
 and u1.login_date between "2021-01-01" and "2021-02-01" 
) as tmp
group by u1.login_date;
-- 有登录表(login_log)和 注册用户表(user_info)时把自连接换成两个表user_info left join login_log即可
  1. 真题
    在这里插入图片描述
-- 分组统计
select u.gender, u.age,
	count(distinct o.用户id) as 成交用户数,
	count(o.订单id) as 成交量,
	sum(o.订单金额) as 成交金额
from 订单表 o
inner join 用户表 u -- 用户数据可能有缺失,所以要innerjoin
on u.用户id = o.用户id
and o.时间 between "2019-01-01" and "2019-03-31"
group by u.gender, u.age;

-- 留存&次月留存
-- 错误回答:(错误原因:对每个用户购买的次月时间是不同的)
select count(distinct o1.用户id) ,
	count(distinct o2.用户id) / count(distinct o1.用户id) 
from 订单表 o1
left join 订单表 o2
on o1.时间 between "2019-01-01" and "2019-03-31"
and o2.时间 between "2019-04-01" and "2019-04-30";
-- 正确回答:
select count(distinct o1.userid) as Q1季度产生订单的用户
	count(distinct o2.userid) as 这些用户的次月留存率
from 订单表 o1
left join  订单表 o2
on o1.userid = o2.userid
and o1.时间 between "2019-01-01" and "2019-03-31"
where datediff(m, o1.时间, o2.时间) = 1;-- 时间间隔一个月

在这里插入图片描述

select movie, category
from moive lateral view explode(category_list) tt as category

在这里插入图片描述
在这里插入图片描述
7. https://blog.csdn.net/sinat_26811377/article/details/107188075

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值