本文主要来源课堂笔记和PPT
为什么会有SQL优化?
因为SQL基于关系代数,可以进行等价变换
SQL的优化与所写的SQL语句有关,有时为了优化SQL,需要重写SQL。
在某查询中,条件A可以保留10%的结果,条件B可以保留80%的结果,出于减少查询的中间状态数的目的,此时我们称A条件是好条件。
假设有100条记录。
先A后B: 100→10→8
先B后A: 100→80→8
好条件要先做查询
避免在最高层使用distinct。
例如,在选择人名的时候,会出现同名的现象。
刘嘉故事会时间:
比如选出南京最近六个月买宝马的人,有买7辆不同颜色配衣服的刘嘉,也有买打折宝马T恤的刘嘉。
下面的错误示例都是说的 distinct
错误的SQL语句1:
select distinct c.custname
from customers c
join orders o
on o.custid = c.custid
join orderdetail od
on od.ordid = o.ordid
join articles a
on a.artid = od.artid
where c.city = ‘Nanjing'
and a.artname = ‘BMW'
and o.ordered >= somefunc /*函数,返回六个月前的具体日期*/
错误的SQL语句2:
select distinct c.custname
from customers c,
orders o,
orderdetail od,
articles a
where c.city = ‘Nanjing'
and c.custid = o.custid
and o.ordid = od.ordid
and od.artid = a.artid
and a.artname = ‘BMW'
and o.ordered >= somefunc
摆脱distinct的方法:
显然是需要引入一个新的量来描述不同的顾客,常用的还是custid
使用嵌套子查询的方法:
select c.custname
from customers c
where c.city = ‘Nanjing'
and exists (select null
from orders o,
orderdetail od,
articles a
where a.artname = ‘BMW'
and a.artid = od.artid
and od.ordid = o.ordid
and o.custid = c.custid
and o.ordered >= somefunc )
使用非关联子查询的方法:
select custname
from customers
where city = ‘Nanjing'
and custid in (select o.custid
from orders o,
orderdetail od,
articles a
where a.artname = ‘BMW'
and a.artid = od.artid
and od.ordid = o.ordid
and o.ordered >= somefunc)
如果使用嵌套子查询,那么就需要外层的select条件比较好
通过SQL改写来指引查询优化器工作
- 找到分辨率最强的条件
- 解决方案不止一种,查询和数据隐含的假设密切相关
- 预先考虑优化器的工作,以确定它能找到所需要的数据
大数据量查询
- 尽快剔除不需要的数据,即使用好条件
将子查询转换为JOIN
- 不包含聚合函数,不出现多种条件选择则不需要子查询
例:
Jobs(employee,title) Ranks(title,rank)Salary(rank,payment)
使用子查询的方法:
Select payment from salary where rank=
(select rank from ranks where title=
(select title from jobs where employee = ‘…’))
使用JOIN的方法:
Select payment from salary, ranks,jobs
Where salary.rank = ranks.rank
And ranks.title = jobs.title
And jobs.employee = ‘…’
查询不存在的内容
例:
是否存在某个等级当前没有分配职位
暴力方法:
Select salary.ranks from salary
Where rank NOT IN (select rank from ranks)
外连接:
Select salary.rank
From salary
LEFT OUTER JOIN ON(salary.rank = ranks.rank)
Where ranks.rank IS NULL
将聚合子查询转换为JOIN
例:
Orders(custid,ordered,totalitems)
需要显示每一个客户购物件数最多的日期
聚合子查询:
Select custid, ordered, totalitems
From orders o1
Where o1.ordered = (
select max(ordered)
from orders o2
where o1.custid = o2.custid )
JOIN查询:
Select o1.custid, o1.ordered, o1.totalitems
From orders o1
JOIN orders o2 on (o1.custid = o2.custid)
Group by o1.custid, o1.ordered, o1.totalitems
Having o1.ordered = max(o2.ordered)
非关联子查询变成内嵌视图
例:
在订单完成前有不同状态,记录在orderstatus(ordid,status,statusdate)中
需求是:列出所有尚未标记为完成状态的订单的下列字段:订单号,客户名,订单的最后状态,以及设置状态的时间
原始写法:
select c.custname, o.ordid, os.status, os.statusdate
from customers c,
orders o,
orderstatus os
where o.ordid = os.ordid
and not exists (select null
from orderstatus os2
where os2.status = 'COMPLETE'
and os2.ordid = o.ordid)
and os.statusdate = (select max(statusdate)
from orderstatus os3
where os3.ordid = o.ordid)
and o.custid = c.custid
修改后写法:
select c.custname, o.ordid, os.status, os.statusdate
from customers c,
orders o,
orderstatus os,
(select ordid, max(statusdate) laststatusdate
from orderstatus
group by ordid) x
where o.ordid = os.ordid
and os.statusdate = x.laststatusdate
and os.ordid = x.ordid
and os.status != 'COMPLETE'
and o.custid = c.custid