第三章 数据库的用户接口和SQL语言
3.1 数据库的用户接口
1.查询语言
- 形式化查询语言
- 表格化查询语言
- 图形化查询语言
- 受限制的自然语言查询
2.图形化工具GUI
3.APIs
4.类库
3.2 SQL语言概况
分类:
- 数据定义语言DDL:定义、删除、维护表的数据模式
- 查询语言QL:用于检索数据。e.g.:select及其子句group by、order by等
- 数据操纵语言DML:插入、删除、更新语句
- 数据控制语言DCL:控制用户对数据的访问权限
3.2.1 重要术语和概念
1.基表Base table:真正存在于磁盘上的一个关系,称为基表。
2.视图View:虚表。根据实际应用的需求,在基表上映射算出虚表,并不存储在数据库中。
3.数据类型:
4.NULL:保留字,空值
5.UNIQUE:保留字,说明某属性是否允许重复。
6.DEFAULT:保留字,指定某属性的缺省值。
7.PRIMARY KEY:
8.FOREIGN KEY:另一张表的主键
9.CHECK:保留字,定义完整性约束。
3.3 查询语句QL语句
3.3.1 查询
1.基础SQL查询
SELECT [DISTINCT] target-list
FROM relation-list
WHERE qualification
target-list:关系表中需要的属性
relation-list:关系表名
qualification:使用AND、OR、NOT等的布尔表达式
[DISTINCT]:可选。默认没有。有的话对查询结果筛选删除重复元素。
3.基于集合的查询UNION、INTERSECT
查找订过红船or绿船的水手id
SELECT S.sid
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND (B.color=‘red’ OR B.color=‘green’)
SELECT S.sid
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’
UNION //并
SELECT S.sid
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘green’
查找订过红船and绿船的水手的id
//法1:自连接方法
SELECT S.sid
FROM Sailors S, Boats B1, Reserves R1,
Boats B2, Reserves R2
WHERE S.sid=R1.sid AND R1.bid=B1.bid
AND S.sid=R2.sid AND R2.bid=B2.bid
AND (B1.color=‘red’ AND
B2.color=‘green’)
//法2:集合交
SELECT S.sid
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’
INTERSECT //交
SELECT S.sid
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘green’
有些数据库系统不支持交
//法3:嵌套查询(最好理解)
SELECT S.sid
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND B.bid=R.bid AND B.color='red'
AND S.sid IN (SELECT S2.sid
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid=R2.sid AND B2.bid=R2.bid AND B2.colpr='green')
【思考】查找订过红船和绿船的水手的name
//法1:自连接方法-直接换sid为sname
SELECT S.sname
FROM Sailors S, Boats B1, Reserves R1,
Boats B2, Reserves R2
WHERE S.sid=R1.sid AND R1.bid=B1.bid
AND S.sid=R2.sid AND R2.bid=B2.bid
AND (B1.color=‘red’ AND
B2.color=‘green’)
法2:集合交-不能直接换,因为可能有满足条件的重名水手,sname取交后只剩一个。采用嵌套查询
SELECT S.sname
FROM Sailor S
WHERE S.sid IN (SELECT S2.sid
FROM Sailors S2, Boats B, Reserves R
WHERE S2.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’
INTERSECT //交
SELECT S2.sid
FROM Sailors S2, Boats B, Reserves R
WHERE S2.sid=R.sid AND R.bid=B.bid
AND B.color=‘green’)
//法3:嵌套查询(最好理解)-直接换sid为sname
SELECT S.sname
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND B.bid=R.bid AND B.color='red'
AND S.sid IN (SELECT S2.sid
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid=R2.sid AND B2.bid=R2.bid AND B2.colpr='green')
4.嵌套查询
在where子句条件中可以嵌套一个查询。
嵌套查询:查找订过103号船的水手的name
该嵌套子查询只需做一次。
SELECT S.sname
FROM Sailors S
WHERE S.sid IN (SELECT R.sid
FROM Reserves R
WHERE R.bid=103)
关联嵌套查询:查找订过103号船的水手的name
S.sid与外层Sailors S产生关联。类似于二层循环。把外层(Sailors)每条元组的属性(S.sid)代入内层进行筛选,保留外层符合条件的元组。
SELECT S.sname
FROM Sailors S
WHERE EXISTS (SELECT *
FROM Reserves R
WHERE R.bid=103 AND S.sid=R.sid)
非关联嵌套中子查询只做一次,关联嵌套中子查询做多次(次数 = Sailors的元组个数)。显然关联查询效率更高。
【思考】查找订过103号船且只订过1次的水手的name:Reserve表中不存在两个满足 (R1.sid=R2.sid AND R1.bid=103 AND R2.bid=103) 且不满足R1.day=R2.day的元组
//感觉不对
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (SELECT *
FROM Reserves R1, Reserves R2
WHERE S.sid=R1.sid AND R1.sid=R2.sid AND R1.bid=103 AND R2.bid=103 AND R1.day!=R2.day)
//非关联查询
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid=R.sid AND R.bid=103 //订过103号船的水手
AND S.sid NOT IN (SELECT S2.sid
FROM Reserves R2, Reserves R3
WHERE R2.sid=R3.sid AND R2.bid=103 AND R3.bid=103 AND R2.day!=R3.day) //订过多次103号船的水手
//关联查询??
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid=R.sid AND R.bid=103 //订过103号船的水手
AND NOT EXISTS (SELECT *
FROM Reserves R2
WHERE R2.sid=S.sid AND R2.bid=103 AND R2.day!=R.day)
查找只有一个人订过的船的id
//关联查询!
SELECT R1.bid
FROM Reserves R1
WHERE NOT IN (SELECT R2.bid
FROM Reserves R2
WHERE R1.sid!=R2.sid)
外层查询:遍历表R1中所有记录,内层查询:查找表R2中sid!=R1.sid的别的水手订的所有船的id的列表,如果R1.bid NOT IN 该列表,则R1.bid只有一个人订过
查找任意级别高于任意名为Horatio的水手
SELECT *
FROM Sailors S
WHERE S.rating > ANY (SELECT S2.rating
FROM Sailors S2
WHERE S2.sname='Horatio')
内层查询:查找表S2中名为Horatio的水手的等级列表。外层查询:S.rating大于该列表中任一rating,则找到符合要求的水手
5.除法
查找订过所有船的水手
//除法/集合差:EXCEPT
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS ((SELECT B.bid
FROM Boats B)
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid=S.sid))
外层循环:查找表S中每一水手
内层查询:查询1:查找表B中所有船;查询2:查找表R中每一水手订的全部船只。查询1结果-查询2结果:水手没订过的船的id。
//存在量词:EXISTS
//关联
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (SELECT B.bid
FROM Boats B
WHERE NOT EXISTS (SELECT R.bid
FROM Reserves R
WHERE R.bid=B.bid AND R.sid=S.sid))
最内层查询:水手sid订了船bid。
中层查询:水手sid没订的船的列表。
外层查询:不存在没订的船 的水手sname的列表
3.3.2 聚集函数运算(汇总数据)
1.基础函数
- COUNT (*):计数表中元组个数
- COUNT ([DISTINCT] A):计数属性A有多少个值([DISTINCT]:不同的值)
- SUM ([DISTINCT] A):求和属性A的值([DISTINCT]:仅求和不同的值)
- AVG ([DISTINCT] A):求平均
- MAX (A):属性A的最大值
- MIN (A):属性A的最小值
查询最老的水手的名字
SELECT S.sname
FROM Sailors S
WHERE S.age=(SELECT MAX (S2.age)
FROM Sailors S2)
3.3.3 分组运算
- SELECT [DISTINCT] target-list
FROM relation-list
WHERE qualification
GROUP BY grouping-list
HAVING group-qualification - WHERE字句筛选合格后,按照GROUP BY子句重的属性进行分组。对每一个组,用SELECT子句进行运算。【注意】每一个组产生一个结果元组
- HAVING子句(与WHERE类似)对GROUP BY后的组再做筛选
- SELECT、HAVING子句中出现的属性必为GROUP BY子句属性集合的子集
查找年龄>18,有至少2个水手(不含<18水手)的rating,年龄最小的水手
SELECT S.rating, MIN (S.age) AS minage
FROM Sailors S
WHERE S.age>=18
GROUP BY S.rating
HAVING COUNT (*)>1 //至少有2个水手的组
查找每一红色船的预订数
SELECT B.bid, COUNT (*) AS count
FROM Boats B, Reserves R
WHERE R.bid=B.bid AND B.color='red'
GROUP BY B.bid
【思考】将B.color='red’从WHERE子句中移到GROUP BY子句
SELECT B.bid, COUNT (*) AS count
FROM Boats B, Reserves R
WHERE R.bid=B.bid
GROUP BY B.bid
HAVING B.color='red'
逻辑、概念上没有问题,但数据库会报错:数据库检查:HAVING子句中的属性不在GROUP BY子句中。
查找年龄>18,有至少2个水手(含<18水手)的rating,年龄最小的水手
//嵌套查询
SELECT S.sname
FROM Sailors S
WHERE S.age>18
GROUP BY S.rating
HAVING (SELECT COUNT (*)
FROM Sailors S2
WHERE S2.rating=S.rating)>1
查找平均年龄最小的rating
//首次!from子查询
SELECT Tempo.rating
FROM (SELECT S.rating, AVG (S.age) AS average
FROM Sailors S
GROUP BY S.rating) AS Tempo
WHERE Tempo.average=(SE LECT MIN(Tempo.average)
FROM Tempo)
FROM子句嵌套:生成rating-平均年龄的中间表用于做对照;
WHERE子句嵌套:查找中间表中平均年龄最小的一项
3.3.4 空值NULL
- 空值不是“0”,是“不知道”
- 空值的存在带来的问题
- 一些操作需要判断是否为孔
- 布尔表达式在计算空值时的影响
- 三值逻辑:真、假、不知道
- 构造子句时务必小心,e.g.:WHERE消除了计算结果 ≠ t r u e \neq true =true的行
3.3.5 扩展内容(按DB2定义)
-
CAST表达式:强制类型转换
- 符合函数语法
- 运算精度转换
- 给空值赋数据类型
将Studuents(name, school)和Soldiers(name, service)合并为1个视图
//外并操作,使用CAST给两表各补一列空值并*赋数据类型*。 //使得两表先满足并兼容条件 CREAT VIEW prospects (name, school, service) AS SELECT name, school, CAST(NULL AS Varchar(20)) FROM Students UNION SELECT name, CAST(NULL AS Varchar(20)), service FROM Soldiers
-
CASE表达式:
- 查询时将数据库中编码的属性值转换为容易理解的值
//**status结果转换为容易理解的值** //投影数据到name,status两列,status的数据转换为容易理解的值 SELECT name, CASE status WHEN 1 THEN 'Active Duty' WHEN 2 THEN 'Reserve' WHEN 3 THEN 'Spacial Assignment' ELSE 'Unkown' END AS status FROM Officers;
- 条件判断
//Machines (serialno, type, year, hours_used, accidents) //查询“chain saw”类型的机器故障的次数占所有故障的百分比 SELECT sum (CASE WHEN type='chain saw' THEN accidents ELSE 0e0 END) / sum(accidents)
//查询每种类型的机器故障率 //法1 SELECT type, CASE WHEN sum(hours_used>0) THEN sum(accidents)/sum(hours_used) ELSE NULL END AS accident_rate FROM Machines GROUP BY type //法2 SELECT type, sum(accidents)/sum(hours_used) FROM Machines GROUP BY type HAVING sum(hours_used)>0
法1保留了没有运行过的机器元组,且故障率为NULL;法2不包含没有运行过的机器元组
-
子查询
- 嵌套查询&关联嵌套查询都属于子查询
- 子查询可以在WHERE、SELECT、FROM子句中
- 子查询分为三类:标量子查询:查询结果是一个值;表表达式:子查询结果是一个表;公共表表达式:在复杂查询语句中可能不止出现一次,将其定义,一次计算多次使用,以提高效率。
- 标量子查询:凡是能出现一个值的子句,都能出现标量子查询
//简单! //查找平均奖金>平均工资的部门 SELECT d.departmentid, d.location FROM dept AS d WHERE (SELECT avg(bonus)) FROM emp WHERE deptno=d.deptno) > (SELECT avg(salary) FROM emp WHERE deptno=d.deptno)
//查找所有位于纽约的部门,列出部门id、名称、和全部门最高工资 //法1: //首次!select子句出现子查询 SELECT d.deptid, d.deptname, (SELECT MAX(salary) FROM emp WHERE deptid=d.deptid) AS maxsalary FROM dept AS d WHERE d.location='New York' //法2:GROUP BY(不知道对不对) SELECT e.deptid, e.deptname, MAX(e.salary) AS maxsalary FROM dept d, emp e WHERE d.location='New York' AND e.deptid=d.deptid GROUP BY e.deptid
- 表表达式:
//查询结果为一张临时表 //查询每年入职的员工的平均工资 SELECT startyear, avg(pay) FROM (SELECT name, salary+bonus AS pay, year(startdate) AS startyear FROM emp) AS emp2 GROUP BY startyear
FROM子句相当于建立了一张临时表://查找部门员工总收入超过20w的部门 SELECT deptno,totalpay FROM (SELECT deptno, sum(salary)+sum(bonus) AS totalpay FROM emp GROUP BY deptno) AS payroll WHERE totalpay>200000
depto totalpay … … - 公共表表达式:相当于一个临时视图
- 引入WITH子句,定义一个公共表表达式
//找总收入最高的部门 //分析:该查询需要用到上表2次:1:查找最高总收入;2:查找totalpay=最高总收入的部门 WITH payroll(deptno, totalpay) AS (SELECT deptno, SUM(salary)+SUM(bonus) AS totalpay FROM emp GROUP BY deptno) SELECT deptno, total(salary) FROM payroll WHERE totalpay=(SELECT MAX(totalpay))
//查找部门对:部门1的总收入大于部门2总收入的2倍 WITH deptavg(deptno, avgsal) AS (SELECT deptno, AVG(salary) AS avgsal FROM emp GROUP BY deptno) SELECT d1.deptno, d1.avgsal, d2.deptno, d2.avesal FROM dept d1, dept d2 WHERE d1.avgsal>2*d2.avgsal
-
外连接
- 【注意】EXCEPT会对消除两个表公共部分后的结果做去重操作。想不去重,用EXCEPT ALL
- UNION ALL作用类似
//外连接96年秋季学期的teacher-curse表 //Teacher ( name, rank ) //Course (subject, enrollment, quarter, teacher) WITH innerjoin(name, rank, subject, enrollment) AS (SELECT t.name, t.rank, c.subject, c.enrollment FROM Teacher t, Course c WHERE t.name=c.teacher AND c.quarter='Fall 96') teacher-only(name, rank) AS (SELECT name, rank FROM Teachers EXCEPT SELECT name, rank FROM innerjoin) course-only(subject, enrollment) AS (SELECT subject, enrollment FROM Courses EXCEPT SELECT subject, enrollment FROM innerjoin) SELECT name, rank, subject, enrollment FROM innerjoin UNION ALL SELECT name, rank, CAST (NULL AS Varchar(20)) AS subject,CAST(NULL AS Integer) AS enrollment FROM teacher-only UNION ALL SELECT CAST (NULL AS Varchar(20)) AS name,CAST(NULL AS Varchar(20)) AS rank, subject, enrollment FROM course-only
-
递归查询:在公共表表达式中使用了自己
//查找Hoover所有手下(包括手下的手下)中收入>10w的雇员
WITH agents(name, salary) AS
((SELECT name,salary //初始查询:Hoover的直接下属
FROM Fedemp
WHERE manager='Hoover')
UNION ALL
(SELECT f.name, f.salary //递归查询
FROM agents AS a,Fedemp AS f
WHERE f.manager=a.name))
SELECT name,salary
FROM agents
WHERE salary>100000
经典问题:零件搜索问题
//查找飞机各种零件需要的个数
WITH wingpart(subpart, qty) AS
((SELECT subpart, qty //初始查询:飞机的第一级零件
FROM components
WHERE part='wing')
UNION ALL
(SELECt c.subpart, w.qty*c.qty //w.qty:零件w.subpart的个数;c.qty:1个w.subpart需要的子零件个数
FROM wingpart w,components c
WHERE c.part=w.subpart))
SELECT subpart, SUM(qty) AS qty
FROM wingpart
GROUP BY subpart
经典问题:飞机航线搜索问题:
【分析】建中间表:
destination | route | nsegs | totalcost |
---|---|---|---|
从旧金山出发能到的目的地 | 路径 | 中转次数 | 机票总价 |
查询旧金山-纽约最低票价路径
WITH trips(destination, route, nsegs, totalcost) AS
((SELECT destination, CAST(destination AS varchar(20)), 1, cost // 初始查询:旧金山出发一次航班能到的地方
FROM flights
WHERE origin='SFO')
UNION ALL
(SELECT f.destination, CAST(t.route||','||f.destination AS varchar(20)), t.nsegs+1, t.totalcost+f.cost
FROM flights f,trips t
WHERE t.destination=f.origin
AND f.destination !='SFO' AND f.origin !='JFK' AND t.nsegs<=3)) //递归结束条件。因为是有向有环图。终点不能是旧金山,起点不能是纽约
SELECT route, totalcost
FROM trips
WHERE destination='JFK' AND totalcost=(SELECT MIN(totalcost)
FROM trips
WHERE destination='JFK');
查询旧金山-纽约最短中转航班
……
SELECT route, nsegs
FROM trips
WHERE destination='JFK' AND totalcost=(SELECT MIN(nsegs)
FROM trips
WHERE destination='JFK');