Query Processing 查询处理

本文介绍了查询处理过程中的关键步骤,包括词法语法分析、关系代数操作(如选择、投影和重命名)、查询树构建、语义分析、正则化以及优化策略,如RA重写规则、候选计划生成和成本基础优化。
摘要由CSDN通过智能技术生成

ee62d462fbcf42aeb173692e90e9c0c0.pngQuery Processing Overview • Objective: Provide correct answer to query (almost) as efficiently as possible 查询处理的目的是 高效准确查询42de1e964f7c485f81ae45f14a5d919a.pngQuery Processing Operations • Query processing involves several operations: — Lexical & syntactic analysis – transform SQL into an internal form — Normalisation – collecting AND and OR predicates — Semantic analysis – i.e., does the query make sense? — Simplification – e.g., remove common or redundant sub-expressions — Generating an execution plan – query optimisation — Executing the plan and returning results to the client • To describe most of these, we need to use Relational Algebra - 词法和语法分析 - 将 SQL 转换为内部形式 - 规范化 - 收集 AND 和 OR 谓词 - 语义分析  即查询是否有意义? - 简化--例如,删除常见或多余的子表达式 - 生成执行计划--查询优化 - 执行计划并将结果返回给客户端 - 要描述其中的大部分操作,我们需要使用关系代数

Sets and Cartesian Product • Set – a collection of objects characterized by some defining property — e.g., a column in a database table such as last names of all staff • Cartesian Product of sets (×) – one of the operations on sets — e.g., consider two sets in the staff table — set of all first names ◦ fName = {Mary, David} — set of all last names ◦ lName = Howe, Ford — their cartesian product ◦ fName × lName = (Mary,Howe), (Mary,Ford), (David, Howe), (David, Ford)集合 - 以某些定义属性为特征的对象集合 笛卡尔乘积e1369653a7da46c8b2c850ac503aa47e.png

Relation Relation – defined between two sets and is a subset of cartesian product between those two sets • e.g., FirstNameOf = (Mary, Howe), (David, Ford)关系--在两个集合之间定义,是这两个集合的卡特积子集de6b197467a741a38c4cb6018ce01baa.png

Relational Model • The name ‘relational model’ comes from this mathematical notion of relation — Where a relation is a set (collection) of tuples that have related objects such as first name and last name of the same person — e.g., (fName, lName) is a relation • We can have relations over any number of sets — e.g., (staffNo, fName, lName, position) • In general we can denote a relation as (A,B,C,D,... ,Z) where A, B, C and Z are all its attribute sets 关系是具有相关对象(如同一个人的名字和姓氏)的元组集合(集合)

Introducing Relational Algebra • What is relational algebra (RA) and why is it useful? — RA is a symbolic formal way of describing relational operations — RA says how, as well as what (order is important) — Can use re-write rules to simplify and optimise complex queries... • Maths example: — a + b · x + c · x 2 + d · x 3 ; 3 adds, 3 multiplies, 2 powers; — a + x · (b + x · (c + x · d)); 3 adds, 3 multiplies;关系代数是描述关系操作的一种符号化的正式方式 关系代数说明如何操作以及操作什么(顺序很重要) - 可以使用重写规则来简化和优化复杂的查询 简而言之 关系代数就是描述关系操作符号的含义

Basic Relational Algebra Operators • The basic RA operators are: — Selection σ (Sigma); Projection π (Pi); Rename ρ (Rho) • Examples基本关系代数运算符- 选择 σ (Sigma);投影 π (Pi);重命名 ρ (Rho)b93bf1fee3144e7a95bca34249e34ad4.png

66ee62e9e1c7492faf33d2e639327e2b.png关系代数符号

Query Processing Example • Example: find all managers who work at a London Branch:7288b2fba6a04dda8caeb1d1a6040f32.png

• There are at least 3 ways of writing this in RA notation:8426f166f4d14e4f81b2e118b471245b.pngOne of these will be the most efficient – but which??

Lexical & Syntactical Analysis & Query Trees • Lexical & syntactical analysis involves: — identifying keywords & literals — identifying table names & aliases — mapping aliases to table names — identifying column names — checking columns exist in tables • The output of this phase is a relational algebra tree (RAT)词法和句法分析与查询树 - 词法和句法分析包括: - 识别关键字和文字 - 识别表名和别名 - 将别名映射到表名 - 识别列名 - 检查列是否存在于表中 - 该阶段的输出是关系代数树 (RAT)3eb58d0df4fc4abcac3e6caa23161199.png

Semantic Analysis • Does the query make sense? — Is the query legal SQL? — Is the RAT connected? – if not, query is incomplete! • Can the query be simplified? – for example:查询的意义 是否为合法SQL RAT连接情况(没有连接查询不完整) 查询的简化4818cb6039fe43fba9ac0321753c8ef0.png

Normalisation & Normal Forms 正则化和正则表达式6361dba289464e039054a98cc7879afe.png

• Why is this useful? – sometimes a query might best be split into subqueries (remember set operations?): 将查询拆分成子查询• Disjunctions suggest union:分词代表并集5b4eb476c5054be8870cfa0437467fb7.png

• Conjunctions suggest intersection:连词代表交集

e6d514ca159e4b74a19c3b964b5eff9d.png

Some RA Equivalences Rules (Re-Write Rules) • There are many equivalence rules (see C&B pp.736–739). Here are a few:RA等价规则62f7cb867aa94789b974bf3da9fce46f.png

Generating Query Plans • Most RDBMSs generate candidate query plans by using RA re-write rules to generate alternate RATs and to move operations around each tree:生成查询计划--大多数 RDBMS 通过使用 RA 重写规则生成备用 RAT 并在每个树上移动操作,从而生成候选查询计划db5d2ff9f30f41688c5857203bed9bdd.png

• For complex queries, there may be a very large number of candidate plans...复杂查询可能有多种候选计划

Heuristic Query Optimisation Rules To avoid considering all possible plans, many DBMSs use heuristic rules: • keep together selections (σ) on the same table • perform selections as early as possible • re-write selection on a cartesian product as a join • perform “small joins” first • keep together projections (π) on the same relation • apply projections as early as possible • if duplicates are to be eliminated, use a sort algorithm启发式查询优化规则 为避免考虑所有可能的计划,许多 DBMS 使用启发式规则: - 将同一表上的选择 (σ) 保持在一起 - 尽早执行选择 - 将卡特积上的选择重写为连接 - 先执行 "小连接" - 将同一关系上的投影 (π) 保持在一起 - 尽早应用投影 - 如果要消除重复,则使用排序算法

Cost-Based Query Optimisation • Remember, accessing disc blocks is expensive! • Ideally, the query optimiser should take into account: — the size (cardinality) of each table — which tables have indexes — the type of each index – clustered, non-clustered — which predicates can be evaluated using an index — how much memory query will need – and for how long — whether the query can be split over multiple CPUs基于成本的查询优化  - 理想情况下,查询优化器应考虑以下因素 - 每个表的大小(cardinality) - 哪些表有索引 - 每个索引的类型 聚类、非聚类 - 哪些谓词可以使用索引进行评估 -查询需要多少内存,以及需要多长时间 - 查询是否可以在多个 CPU 上进行分割

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值