八、分组查询(★★★★★)
8.1 什么是分组查询
在实际的应用中,可能需要先进行分组,然后对每一组的数据进行操作。这时候我们需要使用分组查询,如何进行分组查询?
示例,
select ...
from ...
group by ...
计算每个部门的工资和?
计算每个工作岗位的平均薪资?
找出每个工作岗位的最高薪资?
...
8.2 关键字顺序(★★★★★)
select ...
from ...
where ...
group by ...
having ...
order by...
# 执行顺序:
1. from
2. where
3. group by
4. having
5. select
6. order by
从某张表中查询数据,
先经过where条件筛选出有价值的数据,
对这些有价值的数据进行分组,
分组之后可以使用having继续筛选,
select查询显示处理,
最后排序输出。
为什么分组函数不能直接用在where后面呢?
# 示例1:
mysql> select ename, sal from emp where sal > min(sal);
ERROR 1111 (HY000): Invalid use of group function
因为分组函数在使用的时候必须先分组之后才能使用。where 执行的时候还没有分组。所以where后面不能出现分组函数。
# 示例2:
mysql> select sum(sal) from emp;
+----------+
| sum(sal) |
+----------+
| 29025.00 |
+----------+
1 row in set (0.00 sec)
这个没有分组,为什么sum()可以用呢?因为select在group by之后执行。
重要结论:
在一条select语句中,如果有group by语句的话,select后面只能跟 参加分组的字段,以及分组函数。其他的一律不能跟。
8.3 示例
1、找出每个工作岗位的工资和:
思路:按照工作岗位分组,对工资求和
mysql> select job, sum(sal)
-> from emp
-> group by job;
+-----------+----------+
| job | sum(sal) |
+-----------+----------+
| ANALYST | 6000.00 |
| CLERK | 4150.00 |
| MANAGER | 8275.00 |
| PRESIDENT | 5000.00 |
| SALESMAN | 5600.00 |
+-----------+----------+
5 rows in set (0.00 sec)
执行顺序:
- 先从emp表中查询数据
- 接着根据job字段进行分组
- 最后对每一组的数据进行sum(sal)
下面的语句无法执行:
mysql> select ename,job,sum(sal)
-> from emp
-> group by job;
ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'bjpowernod.emp.ENAME' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
重要结论:
在一条select语句中,如果有group by语句的话,select后面只能跟 参加分组的字段,以及分组函数。其他的一律不能跟。
2、找出每个部门的最高薪资
思路:按照部门编号分组,求每一组额最大值
mysql> select deptno, max(sal) from emp group by deptno;
+--------+----------+
| deptno | max(sal) |
+--------+----------+
| 10 | 5000.00 |
| 20 | 3000.00 |
| 30 | 2850.00 |
+--------+----------+
3 rows in set (0.00 sec)
3、找出“每个部门,不同工作岗位”的最高薪资
mysql> select deptno, job, max(sal)
-> from emp
-> group by deptno, job;
+--------+-----------+----------+
| deptno | job | max(sal) |
+--------+-----------+----------+
| 10 | CLERK | 1300.00 |
| 10 | MANAGER | 2450.00 |
| 10 | PRESIDENT | 5000.00 |
| 20 | ANALYST | 3000.00 |
| 20 | CLERK | 1100.00 |
| 20 | MANAGER | 2975.00 |
| 30 | CLERK | 950.00 |
| 30 | MANAGER | 2850.00 |
| 30 | SALESMAN | 1600.00 |
+--------+-----------+----------+
9 rows in set (0.00 sec)
# 附件
mysql> select * from emp;
+-------+--------+-----------+------+------------+---------+---------+--------+
| EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO |
+-------+--------+-----------+------+------------+---------+---------+--------+
| 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | NULL | 20 |
| 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30 |
| 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30 |
| 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | NULL | 20 |
| 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30 |
| 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | NULL | 30 |
| 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | NULL | 10 |
| 7788 | SCOTT | ANALYST | 7566 | 1987-04-19 | 3000.00 | NULL | 20 |
| 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000.00 | NULL | 10 |
| 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 | 0.00 | 30 |
| 7876 | ADAMS | CLERK | 7788 | 1987-05-23 | 1100.00 | NULL | 20 |
| 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950.00 | NULL | 30 |
| 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | NULL | 20 |
| 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | NULL | 10 |
+-------+--------+-----------+------+------------+---------+---------+--------+
14 rows in set (0.00 sec)
8.4 使用having可以对分完组之后的数据进一步过滤
having 不能单独使用,having不能代替where,having必须和group by联合使用。
找出每个部门最高薪资,要求显示最高薪资大于3000的?
- 第一步,找出每个部门最高薪资
- 第二部,要求显示最高薪资大于3000
mysql> select deptno, max(sal) from emp group by deptno;
+--------+----------+
| deptno | max(sal) |
+--------+----------+
| 10 | 5000.00 |
| 20 | 3000.00 |
| 30 | 2850.00 |
+--------+----------+
3 rows in set (0.00 sec)
mysql> select deptno, max(sal)
-> from emp
-> group by deptno
-> having max(sal) > 3000;
+--------+----------+
| deptno | max(sal) |
+--------+----------+
| 10 | 5000.00 |
+--------+----------+
1 row in set (0.00 sec)
思考:以上的sql语句执行效率是不是有点低?
====》比较低,可以先将大于3000的薪资找出来,再进行分组:
mysql> select deptno, max(sal)
-> from emp
-> where sal > 3000
-> group by deptno;
+--------+----------+
| deptno | max(sal) |
+--------+----------+
| 10 | 5000.00 |
+--------+----------+
1 row in set (0.00 sec)
总结----优化策略:
whrer和having,优先选择where,where实在完成不了了,再去选择having。
哪些是 where 没办法的?
找出每个部门平均薪资,要求显示平均薪资高于2500的:
# 下面这条语句无法执行
mysql> select deptno, avg(sal) from emp where avg(sal) > 2500 group by deptno;
ERROR 1111 (HY000): Invalid use of group function
正确步骤:
- 第一步,找出每个部门的平均薪资
- 第二步,要求显示平均薪资高于2500的
# 第一步
mysql> select deptno, avg(sal)
-> from emp
-> group by deptno;
+--------+-------------+
| deptno | avg(sal) |
+--------+-------------+
| 10 | 2916.666667 |
| 20 | 2175.000000 |
| 30 | 1566.666667 |
+--------+-------------+
3 rows in set (0.00 sec)
# 第二步
mysql> select deptno, avg(sal)
-> from emp
-> group by deptno
-> having avg(sal) > 2500;
+--------+-------------+
| deptno | avg(sal) |
+--------+-------------+
| 10 | 2916.666667 |
+--------+-------------+
1 row in set (0.01 sec)
8.5 distinct 给查询结果去重
去重关键字:distinct。原表数据不会被修改,只是查询结果去重。
distinct只能出现在所有字段的最前方
# 例1
mysql> select distinct job from emp;
+-----------+
| job |
+-----------+
| CLERK |
| SALESMAN |
| MANAGER |
| ANALYST |
| PRESIDENT |
+-----------+
5 rows in set (0.00 sec)
# 例2
# distinct只能出现在所有字段的最前方
# 错误写法
mysql> select ename, distinct job from emp;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'distinct job from emp' at line 1
# 正确写法,表示两个字段联合起来去重
mysql> select distinct job, deptno from emp;
+-----------+--------+
| job | deptno |
+-----------+--------+
| CLERK | 20 |
| SALESMAN | 30 |
| MANAGER | 20 |
| MANAGER | 30 |
| MANAGER | 10 |
| ANALYST | 20 |
| PRESIDENT | 10 |
| CLERK | 30 |
| CLERK | 10 |
+-----------+--------+
9 rows in set (0.00 sec)
# 例3,统计一下工作岗位的数量
mysql> select count(distinct job) from emp;
+---------------------+
| count(distinct job) |
+---------------------+
| 5 |
+---------------------+
1 row in set (0.00 sec)