🚩2020/04/03
现在是5月3日,终于补到一个月前的课了哈哈哈哈哈
视频前1h57min都在讲期中考试的题目 先直接跳过 之后复习再看 先把课上完🐶
Indices on Multiple Keys
- Composite search keys are search keys containing more than one attribute
• E.g., (dept_name, salary) - Lexicographic(adj. 词典式的) ordering: (a1, a2) < (b1, b2) if either
• a1<b1,or
• a1=b1 and a2 < b2
Indices on Multiple Attributes
Covering indices
• Add extra attributes to index so (some) queries can avoid fetching the actual records (从index文件query而不是数据)
Creation of Indices
- example
create index takes_pk on takes(ID, course_ID, semester, section)
drop index takes_pk
- Indices can greatly speed up lookups, but impose cost on updates
Query Processing and Optimization
(Database Systems Concepts, V7, Ch. 15)
- Parsing and translation
• translate the query into its internal form. This is then translated into relational algebra.
• Parser(语法分析程序) checks syntax, verifies relations - Evaluation
• The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.
Query Optimization:
Amongst all equivalent evaluation plans choose
the one with lowest cost.
• Cost is estimated using statistical information from the database catalog. e.g… number of tuples in each relation, size of tuples, etc.
Measures of Query Cost
- Many factors contribute to time cost
• disk access, CPU, and network communication - Cost can be measured based on
• response time, i.e. total elapsed time for answering query, or
• total resource consumption
P.S. 类比你在机场值机派对,
resource consumption是 when u get to the head of the line. How long does it take the person to check u in?
response time = resource consumption + how many people are in line
-
We use total resource consumption as cost metric
• Response time harder to estimate, and minimizing resource consumption is a good idea in a shared database -
We ignore CPU costs for simplicity
• Real systems do take CPU cost into account
• Network costs must be considered for parallel systems -
We describe how estimate the cost of each operation
• We do not include cost to writing output to disk -
Required data may be buffer resident already, avoiding disk I/O
• But hard to take into account for cost estimation -
Several algorithms can reduce disk IO by using extra buffer space
• Amount of real memory available to buffer depends on other concurrent queries and OS processes, known only during execution -
Worst case estimates assume that no data is initially in buffer and only the minimum amount of memory needed for the operation is available
• But more optimistic estimates are used in practice
Selection Operation
Which index do u pick and is the index clustered or not?