一、数据分组
数据分组是根据统计研究的需要,将原始数据按照某种标准划分成不同的组别,分组后的的数据称为分组数据。数据分组的方法有单变量值分组和组距分组两种。数据分组的主要目的是观察数据的分布特征,在进行数据分组后再计算出各组中数据出现的频数**,**就形成了一张频数分布表。
二、创建分组
新的子句:GROUP BY
- 用法:用GROUP BY 指定分组数据
实例:
计算下表每个部门(deptno)的平均年薪(sal×12)
+-------+--------+-----------+------+------------+---------+---------+--------+
| EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO |
+-------+--------+-----------+------+------------+---------+---------+--------+
| 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | NULL | 20 |
| 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30 |
| 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30 |
| 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | NULL | 20 |
| 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30 |
| 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | NULL | 30 |
| 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | NULL | 10 |
| 7788 | SCOTT | ANALYST | 7566 | 1987-04-19 | 3000.00 | NULL | 20 |
| 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000.00 | NULL | 10 |
| 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 | 0.00 | 30 |
| 7876 | ADAMS | CLERK | 7788 | 1987-05-23 | 1100.00 | NULL | 20 |
| 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950.00 | NULL | 30 |
| 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | NULL | 20 |
| 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | NULL | 10 |
+-------+--------+-----------+------+------------+---------+---------+--------+
SQL:
mysql> select deptno '部门' , avg(sal*12) '平均年薪',max(sal*12) '最高年薪' ,min(sal*12) '最低年薪' from emp group by deptno;
+--------+--------------+--------------+--------------+
| 部门 | 平均年薪 | 最高年薪 | 最低年薪 |
+--------+--------------+--------------+--------------+
| 20 | 26100.000000 | 36000.00 | 9600.00 |
| 30 | 18800.000000 | 34200.00 | 11400.00 |
| 10 | 35000.000000 | 60000.00 | 15600.00 |
+--------+--------------+--------------+--------------+
3 rows in set (0.00 sec)
这里的聚集函数avg
,max
,min
就不是对整个列(14列)进行聚集计算了,而是按照部门分组后聚集计算每个组的数据
三、GROUP BY
使用规范
-
group by
语句可以包含任意数目的列,这使得能过对分组进行嵌套,为数据分组进行更细致的控制。 -
select deptno,job,avg(sal*12) '平均年薪',max(sal*12) '最高年薪' ,min(sal*12) '最低年薪' from emp group by deptno,job;
-
如果在group by语句中国嵌套子分组,数据将在最后规定的分组上进行汇总。
-
select deptno,job,avg(sal*12) '平均年薪',max(sal*12) '最高年薪' ,min(sal*12) '最低年薪' from emp group by deptno,job; +--------+-----------+--------------+--------------+--------------+ | deptno | job | 平均年薪 | 最高年薪 | 最低年薪 | +--------+-----------+--------------+--------------+--------------+ | 20 | CLERK | 11400.000000 | 13200.00 | 9600.00 | | 30 | SALESMAN | 16800.000000 | 19200.00 | 15000.00 | | 20 | MANAGER | 35700.000000 | 35700.00 | 35700.00 | | 30 | MANAGER | 34200.000000 | 34200.00 | 34200.00 | | 10 | MANAGER | 29400.000000 | 29400.00 | 29400.00 | | 20 | ANALYST | 36000.000000 | 36000.00 | 36000.00 | | 10 | PRESIDENT | 60000.000000 | 60000.00 | 60000.00 | | 30 | CLERK | 11400.000000 | 11400.00 | 11400.00 | | 10 | CLERK | 15600.000000 | 15600.00 | 15600.00 | +--------+-----------+--------------+--------------+--------------+
-
group by子句中列出的每个列都必须是检索列或者有效的表达式,如果在select语句中使用表达式,则必须在group by子句中使用相同的表达式。
-
除聚集函数以外,select语句中列举的所有列必须在group by语句中给出(顺序自定义)。
mysql> select job '工作',deptno '部門',avg(sal*12) '平均年薪',max(sal*12) '最高年薪' ,min(sal*12) '最低年薪' from emp group
by job;
/**报错*/
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'mysqltest.emp.DEPTNO' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
- 如果列中有NULL值,则NULL将作为一个分组返回,如果有多个NULL值,则将他们分为一组。
mysql> select comm , avg(sal*12) '平均年薪',max(sal*12) '最高年薪' ,min(sal*12) '最低年薪' from emp group by comm;
+---------+--------------+--------------+--------------+
| comm | 平均年薪 | 最高年薪 | 最低年薪 |
+---------+--------------+--------------+--------------+
| NULL | 28110.000000 | 60000.00 | 9600.00 |
| 300.00 | 19200.000000 | 19200.00 | 19200.00 |
| 500.00 | 15000.000000 | 15000.00 | 15000.00 |
| 1400.00 | 15000.000000 | 15000.00 | 15000.00 |
| 0.00 | 18000.000000 | 18000.00 | 18000.00 |
+---------+--------------+--------------+--------------+
5 rows in set (0.00 sec)
- group by语句必须出现在where子句之后,order by子句之前。
先找到具体数据,然后再分组排序
- 使用with rollup 关键子可以显示每个分组汇总级别。
四、过滤分组(HAVING)
如何过滤分组:
mysql> select job ,avg(sal) from emp group by job having avg(sal) > 1000;
+-----------+-------------+
| job | avg(sal) |
+-----------+-------------+
| CLERK | 1037.500000 |
| SALESMAN | 1400.000000 |
| MANAGER | 2758.333333 |
| ANALYST | 3000.000000 |
| PRESIDENT | 5000.000000 |
+-----------+-------------+
5 rows in set (0.00 sec)
where子句能使用的操作符,having完全都是能使用。
having和where的区别
where子句后面过滤的是行数据,在数据分组之前就进行过滤,having在数据分组之后才过滤。where过滤掉的行不在分组后的数据中。
mysql> select job , sal from emp ;
+-----------+---------+
| job | sal |
+-----------+---------+
| CLERK | 800.00 |
| SALESMAN | 1600.00 |
| SALESMAN | 1250.00 |
| MANAGER | 2975.00 |
| SALESMAN | 1250.00 |
| MANAGER | 2850.00 |
| MANAGER | 2450.00 |
| ANALYST | 3000.00 |
| PRESIDENT | 5000.00 |
| SALESMAN | 1500.00 |
| CLERK | 1100.00 |
| CLERK | 950.00 |
| ANALYST | 3000.00 |
| CLERK | 1300.00 |
+-----------+---------+
14 rows in set (0.00 sec)
mysql> select job , sal from emp where sal >1200;
+-----------+---------+
| job | sal |
+-----------+---------+
| SALESMAN | 1600.00 |
| SALESMAN | 1250.00 |
| MANAGER | 2975.00 |
| SALESMAN | 1250.00 |
| MANAGER | 2850.00 |
| MANAGER | 2450.00 |
| ANALYST | 3000.00 |
| PRESIDENT | 5000.00 |
| SALESMAN | 1500.00 |
| ANALYST | 3000.00 |
| CLERK | 1300.00 |
+-----------+---------+
11 rows in set (0.00 sec)
mysql> select job , avg(sal) from emp where sal >1200 group by job;
+-----------+-------------+
| job | avg(sal) |
+-----------+-------------+
| SALESMAN | 1400.000000 |
| MANAGER | 2758.333333 |
| ANALYST | 3000.000000 |
| PRESIDENT | 5000.000000 |
| CLERK | 1300.000000 |
+-----------+-------------+
5 rows in set (0.00 sec)
mysql> select job , avg(sal) from emp where sal >1200 group by job having avg(sal)>1500;
+-----------+-------------+
| job | avg(sal) |
+-----------+-------------+
| MANAGER | 2758.333333 |
| ANALYST | 3000.000000 |
| PRESIDENT | 5000.000000 |
+-----------+-------------+
3 rows in set (0.00 sec)
五、总结SELECT子句顺序
关键字 | SELECT | DISTINCT | FROM | WHERE | GROUP BY | HAVING | ORDER BY | LIMIT |
---|---|---|---|---|---|---|---|---|
说明 | 要返回的列或者表达式 | 去重 | 目标表 | 行过滤 | 分组说明 | 组级过滤 | 排序说明 | 检索的行数 |
是否必须使用 | √ | × | 选表时使用 | × | 聚集函数使用 | × | × | × |