200329 Introduction to Database Lecture 10

🚩2020/03/29

Column-oriented storage

Also known as columnar representation
Store each attribute of a relation separately

  • Benefits
    • Reduced IO(input/output) if only some attributes are accessed
    • Improved CPU cache performance
    • Improved compression
    • Vector processing on modern CPU architectures
  • Drawbacks
    • Cost of tuple reconstruction from columnar representation
    • Cost of tuple deletion and update
    • Cost of decompression
  • Columnar representation found to be more efficient for decision support than
    row-oriented representation
  • Traditional row-oriented representation preferable for transaction processing
  • Some databases support both representations
    • Called hybrid row/column stores

Storage Access

在这里插入图片描述
在这里插入图片描述

Buffer Manager

  • Programs call on the buffer manager when they need a block from disk.
    • If the block is already in the buffer, buffer manager returns the address of the block in main memory
    • If the block is not in the buffer, the buffer manager
    (1)Allocates space in the buffer for the block
    • Replacing (throwing out) some other block, if required, to make space for the new block.
    • Replaced block written back to disk only if it was modified since the most recent time that it was written to/fetched from the disk.
    (2) Reads the block from the disk to the buffer, and returns the address of the block in main memory to requester.

  • Pinned block: memory block that is not allowed to be written back to disk
    Pin done before reading/writing data from a block
    Unpin done when read /write is complete
    • Multiple concurrent pin/unpin operations possible
    Keep a pin count, buffer block can be evicted(驱逐) only if pin count = 0

  • Shared and exclusive locks on buffer
    • Needed to prevent concurrent operations from reading page contents as they are moved/reorganized, and to ensure only one move/reorganize at a time
    • Readers get shared lock, updates to a block require exclusive lock
    Locking rules:
    -Only one process can get exclusive lock at a time
    -Shared lock cannot be concurrently with exclusive lock
    -Multiple processes may be given shared lock concurrently

Buffer-Replacement Policies

The most efficient caching algorithm would be to always discard the information that will not be needed for the longest time in the future. This optimal result is referred to as Bélády’s optimal algorithm/simply optimal replacement policy or the clairvoyant algorithm.

Least Recently Used is based on the simplest assumption
– The information that will not be needed for the longest time.
– Is the information that has not been accessed for the longest time.

在这里插入图片描述
🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩重要🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩
在这里插入图片描述

Replacement Algorithm

The algorithms are more sophisticated in the real world, e.g.
– “Scans” are common, e.g. go through a large query result in order
(will be more clear when discussing cursors).
• The engine knows the current position in the result set.
• Uses the sort order to determine which records will be accessed soon.
• Tags those blocks as not replaceable.
• (A form of clairvoyance).
– Not all users/applications are equally “important.”
• Classify users/applications into priority 1, 2 and 3.
• Sub-allocate the buffer pool into pools P1, P2 and P3.
• Apply LRU within pools and adjust pool sizes based on relative importance.
• This prevents
– A high access rate, low-priority application from taking up a lot of frames
– Result in low access, high priority applications not getting buffer hits.


Indexes

(Database Systems Concepts, V7, Ch. 14)

在这里插入图片描述
P.S. serach-key相当于图书的名字,pointer相当于告诉你图书在哪里

Ordered Indices

在这里插入图片描述

Dense Index Files

在这里插入图片描述
dense index的定义是index = distinct column record
两者都是dense index

👆一一对应是因为该列是primary key & candidate key
但是👇的不是candidate key

P.S. 超键、候选键、主键、外键

在这里插入图片描述

Sparse Index Files

在这里插入图片描述
如果我要找45565,先找index里面大于该值的76766然后往回一个32343start scan

在这里插入图片描述

Secondary Indices Example

在这里插入图片描述

Clustering vs Nonclustering Indices

在这里插入图片描述

Multilevel Index

  • If index does not fit in memory, access becomes expensive.
  • Solution: treat index kept on disk as a sequential file and construct a sparse index on it.
    • outer index – a sparse index of the basic index
    • inner index – the basic index file
  • If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.
  • Indices at all levels must be updated on insertion or deletion from the file.
    在这里插入图片描述

B + B^+ B+ - Tree Index Files

在这里插入图片描述
在这里插入图片描述

The 1st thing about B + B^+ B+ tree is that it’s not a binary tree. Every node in the tree has multiple entries in it.
在这里插入图片描述

B + B^+ B+-Tree Node Structure

在这里插入图片描述

Leaf Nodes in B + B^+ B+-Trees

在这里插入图片描述
⚠️这部分prof跳过了啥查询,插入,删除(有时间自己再看吧)

other issues in indexing

  • Record relocation and secondary indices
    • If a record moves, all secondary indices that store record pointers have to be updated
    • Node splits in B + B^+ B+-tree file organizations become very expensive
    • Solution: use search key of B±tree file organization instead of record pointer in secondary index
    -Add record-id if B + B^+ B+-tree file organization search key is non-unique
    -Extra traversal of file organization to locate record
    • Highercostforqueries,butnodesplitsarecheap

❓index这么有用,为什么不对所有数据index?
因为index是一种文件,会占位置,第二当更新数据的时候,会增加复杂度

Indexing Strings

在这里插入图片描述
在这里插入图片描述

Hashing

non-ordered index

Static Hashing

  • A bucket is a unit of storage containing one or more entries (a bucket is typically a disk block).
    • we obtain the bucket of an entry from its search-key value using a hash function
  • Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.
  • Hash function is used to locate entries for access, insertion as well as deletion.
  • Entries with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate an entry.
  • In a hash index, buckets store entries with pointers to records
  • In a hash file-organization buckets store records

Handling of Bucket Overflows

  • Bucket overflow can occur because of
    • Insufficient buckets
    • Skew in distribution of records. This can occur due to two reasons:
    -multiple records have same search-key value
    -chosen hash function produces non-uniform distribution of key
    values
  • Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is handled by using overflow buckets.

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值