数据的分组与过滤
MySQL允许对检索的数据进行分组计算和过滤,涉及的SELECT语句子句是GROUP BY 和 HAVING。
原始数据 SELECT * FROM products;
+---------+---------+----------------+------------+----------------------------------------------------------------+
| prod_id | vend_id | prod_name | prod_price | prod_desc |
+---------+---------+----------------+------------+----------------------------------------------------------------+
| ANV01 | 1001 | .5 ton anvil | 5.99 | .5 ton anvil, black, complete with handy hook |
| ANV02 | 1001 | 1 ton anvil | 9.99 | 1 ton anvil, black, complete with handy hook and carrying case |
| ANV03 | 1001 | 2 ton anvil | 14.99 | 2 ton anvil, black, complete with handy hook and carrying case |
| DTNTR | 1003 | Detonator | 13.00 | Detonator (plunger powered), fuses not included |
| FB | 1003 | Bird seed | 10.00 | Large bag (suitable for road runners) |
| FC | 1003 | Carrots | 2.50 | Carrots (rabbit hunting season only) |
| FU1 | 1002 | Fuses | 3.42 | 1 dozen, extra long |
| JP1000 | 1005 | JetPack 1000 | 35.00 | JetPack 1000, intended for single use |
| JP2000 | 1005 | JetPack 2000 | 55.00 | JetPack 2000, multi-use |
| OL1 | 1002 | Oil can | 8.99 | Oil can, red |
| SAFE | 1003 | Safe | 50.00 | Safe with combination lock |
| SLING | 1003 | Sling | 4.49 | Sling, one size fits all |
| TNT1 | 1003 | TNT (1 stick) | 2.50 | TNT, red, single stick |
| TNT2 | 1003 | TNT (5 sticks) | 10.00 | TNT, red, pack of 10 sticks |
+---------+---------+----------------+------------+----------------------------------------------------------------+
14 rows in set (0.00 sec)
创建分组
统计products表中每个供应商提供的商品数:
SELECT vend_id, COUNT(*) AS num_prods FROM products GROUP BY vend_id;
+---------+-----------+
| vend_id | num_prods |
+---------+-----------+
| 1001 | 3 |
| 1002 | 2 |
| 1003 | 7 |
| 1005 | 2 |
+---------+-----------+
4 rows in set (0.00 sec)
GROUP BY 子句指示mysql分组数据,然后对每个分组进行聚集计算。
GROUP BY 的规定:
- GROUP BY 子句可以包含任意数目的列。使得能够对分组进行嵌套,为数据分组提供更细致的控制。
eg: SELECT vend_id,prod_price, COUNT(*) AS num_prods FROM products GROUP BY vend_id,prod_price;
+---------+------------+-----------+
| vend_id | prod_price | num_prods |
+---------+------------+-----------+
| 1001 | 5.99 | 1 |
| 1001 | 9.99 | 1 |
| 1001 | 14.99 | 1 |
| 1002 | 3.42 | 1 |
| 1002 | 8.99 | 1 |
| 1003 | 2.50 | 2 |
| 1003 | 4.49 | 1 |
| 1003 | 10.00 | 2 |
| 1003 | 13.00 | 1 |
| 1003 | 50.00 | 1 |
| 1005 | 35.00 | 1 |
| 1005 | 55.00 | 1 |
+---------+------------+-----------+
12 rows in set (0.00 sec)
- 如果在GROUP BY子句中嵌套了分组,数据将在最后规定的分组上进行汇总。
- GROUP BY 语句中的每个列都必须是表中的检索列或者是有效的表达式(但不能是聚集函数)。如果在SELECT中使用了表达式,则必须在GROUP BY 子句中指定相同的表达式,不能使用别名。
- 除聚集计算子句外,SELECT 语句中的每个列都必须在GROUP BY 子句中给出。
- 如果分组列中有NULL值,则NULL将作为一个分组返回,多个NULL值视为一组。
- GROUP BY 子句必须出现在WHERE子句之后,ORDER BY子句之前。
过滤分组
MySQL中利用HAVING子句来进行分组过滤,HAVING子句非常类似于WHERE子句,但是WHERE子句用来过滤行(记录),HAVING用来过滤分组,即先分组再过滤。例如我们想要从订单表中查询出所有至少包含两个订单的客户,就必须先将表中记录按照客户id进行分组,然后过滤出记录数大于等于2的那些分组,求出对应的客户id,针对分组的过滤就必须使用HAVING,如下:
mysql> SELECT cust_id,COUNT(*) AS orders FROM orders GROUP BY cust_id HAVING COUNT(*) >= 2;
+---------+--------+
| cust_id | orders |
+---------+--------+
| 10001 | 2 |
+---------+--------+
1 row in set (0.01 sec)
HAVING
和WHERE
子句有相似之处,但是又有着各自的不同
- WHERE
在数据进行分组之前进行过滤,针对的是单条记录,HAVING
在数据分组后进行过滤,针对的是分组
- WHERE
在数据分组和聚集之前进行过滤,控制哪些记录进入分组或聚集,因此WHERE
后不能包含聚集函数,HAVING
后总是包含聚集函数,因为往往需要计算分组中哪些分组符合需求
- 绝大部分WHERE
语句可以用HAVING
替换,反之则不行,但是不提倡这么做,语义不清晰
- WHERE
和 HAVING
可以同时使用,WHERE
控制哪些记录参与分组,HAVING
用来计算分组中的哪些组符合需求,如下:
mysql> SELECT vend_id,COUNT(*) AS num_prods FROM products WHERE prod_price >= 10 GROUP BY vend_id HAVING COUNT(*) >= 2;
+---------+-----------+
| vend_id | num_prods |
+---------+-----------+
| 1003 | 4 |
| 1005 | 2 |
+---------+-----------+
2 rows in set (0.00 sec)
满足条件prod_price >=0
的记录进行分组,再筛选出记录数大于2的组
作为对比,不使用WHERE
筛选,对所有数据都进行分组,再帅选出记录数大于2的运行结果如下:
mysql> SELECT vend_id,COUNT(*) AS num_prods FROM products GROUP BY vend_id HAVING COUNT(*) >= 2;
+---------+-----------+
| vend_id | num_prods |
+---------+-----------+
| 1001 | 3 |
| 1002 | 2 |
| 1003 | 7 |
| 1005 | 2 |
+---------+-----------+
4 rows in set (0.00 sec)