一、汇总数据
我们经常需要汇总数据而不用把它们实际检索出来,为此sql提供了专门的函数。
聚集函数
为了方便数据汇总,sql给出了5个聚集函数,与前面的数据处理函数不同,sql的聚集函数得到了主要sql实现相当一值的支持。
函数 | 说明 |
---|---|
AVG() | 返回某列的平均值 |
COUNT() | 返回某列的行数 |
MAX() | 返回某列的最大值 |
MIN() | 返回某列的最小值 |
SUM() | 返回某列之和 |
在每个函数中可以使用DISTINCT
关键字,这样能起到去重的效果,例如对于1 3 4 4
四个数据,不使用DISTINCT
的AVG()函数会计算4个数的均值,得到4,而使用的话则只统计1 3 4
三个数,得到3。
值得一提的是这5个聚集函数经过特别的优化,具有相当的效率。
使用例子:
select avg(distinct score) as distinct_avg,
avg(score) as avg_score,
max(score) as max_score,
min(score) as min_score
from students;
输出为:
+--------------+-----------+-----------+-----------+
| distinct_avg | avg_score | max_score | min_score |
+--------------+-----------+-----------+-----------+
| 50.0000 | 51.1860 | 100 | 0 |
+--------------+-----------+-----------+-----------+
二、分组数据
1、使用group by创建分组
按分组统计是非常常见的需求,sql里使用group by字句实现数据的分组统计,理解它的最好办法是看一个例子:
select sex,avg(score)
from student group by sex;
输出为:
+--------+------------+
| sex | avg(score) |
+--------+------------+
| female | 52.2207 |
| male | 50.0605 |
+--------+------------+
如上所示,此命令分男女统计了均分。
group by的使用有如下几个要点:
- 可以包含任意数目的列,使得能对分组进行嵌套,进行更精细的控制。
- 如果嵌套了分组,数据将在最后规定的进行汇总。
- group by字句列出的列必须是检索列或者有效的表达式(不能是聚集函数),如果在select里使用了表达式,则group by字句也必须使用相同的表达式,不能使用别名。
- 大多数sql实现不允许group by列带有长度可变的数据类型
- 除聚集计算语句以外,select的每个列都必须在group by字句给出。
- NULL值也被作为一个单独的分组返回。
一条命令理解group by嵌套:
select age,sex,avg(score)
from students group by age,sex;
输出为:
+-----+--------+------------+
| age | sex | avg(score) |
+-----+--------+------------+
| 0 | female | 48.6000 |
| 0 | male | 31.0000 |
| 1 | female | 66.0000 |
| 1 | male | 61.0000 |
| 2 | female | 81.0000 |
| 2 | male | 55.5000 |
| 3 | female | 50.0000 |
| 3 | male | 17.5000 |
| 4 | female | 52.8000 |
| 4 | male | 76.5000 |
......
可以看到,group by字句依据后面所跟列的次序形成了一个类似树状图的关系:
2、使用having过滤分组
having句法与where完全相同,例:
select age,sex,score
from students
group by age,sex
having age<10;
将过滤出年龄小于10的分组:
+-----+--------+-------+
| age | sex | score |
+-----+--------+-------+
| 0 | female | 18 |
| 0 | male | 34 |
| 1 | female | 38 |
| 1 | male | 72 |
| 2 | female | 75 |
| 2 | male | 8 |
| 3 | female | 21 |
| 3 | male | 33 |
| 4 | female | 80 |
| 4 | male | 62 |
| 5 | female | 74 |
| 5 | male | 97 |
| 6 | female | 14 |
| 6 | male | 96 |
| 7 | female | 25 |
| 7 | male | 36 |
| 8 | female | 96 |
| 8 | male | 65 |
| 9 | female | 12 |
| 9 | male | 91 |
+-----+--------+-------+
这与以下命令等效:
select age,sex,score from students
where age<10
group by age,sex;
二者的区别在于,where作用于表中所有的行,而having仅在分出的组上进行过滤。
二者可以结合起来完成一些精细的操作:
select sex,age,count(*)
from students
where score<60
group by sex,age
having count(*)>=4;
输出为:
+--------+-----+----------+
| sex | age | count(*) |
+--------+-----+----------+
| female | 4 | 5 |
| female | 6 | 5 |
| female | 9 | 5 |
| female | 11 | 6 |
| female | 12 | 7 |
| female | 15 | 5 |
| female | 17 | 4 |
| female | 21 | 5 |
| female | 23 | 6 |
| female | 25 | 5 |
| female | 26 | 4 |
| female | 30 | 4 |
| female | 34 | 4 |
| female | 37 | 5 |
| female | 41 | 5 |
| female | 43 | 5 |
......
上面语句实现的功能为分性别统计不同年龄段的不及格人数,由此可以总结出where与having字句的适用场景:对group by列之外的列过滤需要使用where,整表过滤使用where,对基于分组的统计结果进行过滤则需使用having。
3、结合order by字句进行排序
若要进行排序,需要使用order by字句:
select sex,age,count(*)
from students
where score<60
group by sex,age
having count(*)>=4
order by age;
输出为:
+--------+-----+----------+
| sex | age | count(*) |
+--------+-----+----------+
| male | 2 | 4 |
| female | 4 | 5 |
| female | 6 | 5 |
| female | 9 | 5 |
| female | 11 | 6 |
| female | 12 | 7 |
| female | 15 | 5 |
| female | 17 | 4 |
| female | 21 | 5 |
| female | 23 | 6 |
......
三、select后跟的字句顺序
子句 | 说明 | 是否必须 |
---|---|---|
select | 要返回的列或表达式 | 是 |
from | 从中检索数据的表 | 仅在从表选择数据时使用 |
where | 行级过滤 | 否 |
group by | 分组说明 | 仅在按组计算聚集时使用 |
having | 组级过滤 | 否 |
order by | 输出排序顺序 | 否 |