具体业务场景:根据某些字段组合去重得到所有字段结果。
distinct:
distinct这个关键字来过滤掉多余的重复记录只保留一条,但往往只用它来返回不重复记录的条数,而不是用它来返回不重记录的所有值。其原因是 distinct只能返回它的目标字段,而无法返回其它字段。
$ select count(distinct name) from students;
根据两个字段组合去重
$ select distinct name,sex from students;
group by:
GROUP BY 语句根据一个或多个列对结果集进行分组。
在分组的列上我们可以使用 COUNT, SUM, AVG,等函数。
解决上述业务场景:
$ select * from table group by fieldA,fieldB,fieldc;
分组后的条件使用 HAVING 来限定,WHERE 是对原始数据进行条件限制。几个关键字的使用顺序为 where 、group by 、having、order by
举个栗子:
$ select *,count(*) as count from drug_pool group by drugLicense having count>1 order by id;
性能对比
我们来拿leetcode上的 连续出现的数字 来作为业务需求,进行对比。
example1:
SELECT DISTINCT
l.Num AS ConsecutiveNums
FROM
LOGS AS l
LEFT JOIN LOGS AS l2 ON l.id = l2.id - 1
LEFT JOIN LOGS AS l3 ON l.id = l3.id - 2
WHERE
l.Num = l2.Num
AND l.Num = l3.Num;
用时608ms
example2:
SELECT
l.Num AS ConsecutiveNums
FROM
LOGS AS l
LEFT JOIN LOGS AS l2 ON l.id = l2.id - 1
LEFT JOIN LOGS AS l3 ON l.id = l3.id - 2
WHERE
l.Num = l2.Num
AND l.Num = l3.Num
GROUP BY
l.Num;
用时1041ms
当然这也是有偏差的,但至少告诉我们group by 性能更高一点。