🚩2020/03/29
Column-oriented storage
Also known as columnar representation
Store each attribute of a relation separately
- Benefits
• Reduced IO(input/output) if only some attributes are accessed
• Improved CPU cache performance
• Improved compression
• Vector processing on modern CPU architectures - Drawbacks
• Cost of tuple reconstruction from columnar representation
• Cost of tuple deletion and update
• Cost of decompression - Columnar representation found to be more efficient for decision support than
row-oriented representation - Traditional row-oriented representation preferable for transaction processing
- Some databases support both representations
• Called hybrid row/column stores
Storage Access
Buffer Manager
-
Programs call on the buffer manager when they need a block from disk.
• If the block is already in the buffer, buffer manager returns the address of the block in main memory
• If the block is not in the buffer, the buffer manager
(1)Allocates space in the buffer for the block
• Replacing (throwing out) some other block, if required, to make space for the new block.
• Replaced block written back to disk only if it was modified since the most recent time that it was written to/fetched from the disk.
(2) Reads the block from the disk to the buffer, and returns the address of the block in main memory to requester. -
Pinned block: memory block that is not allowed to be written back to disk
• Pin done before reading/writing data from a block
• Unpin done when read /write is complete
• Multiple concurrent pin/unpin operations possible
Keep a pin count, buffer block can be evicted(驱逐) only if pin count = 0 -
Shared and exclusive locks on buffer
• Needed to prevent concurrent operations from reading page contents as they are moved/reorganized, and to ensure only one move/reorganize at a time
• Readers get shared lock, updates to a block require exclusive lock
• Locking rules:
-Only one process can get exclusive lock at a time
-Shared lock cannot be concurrently with exclusive lock
-Multiple processes may be given shared lock concurrently
Buffer-Replacement Policies
The most efficient caching algorithm would be to always discard the information that will not be needed for the longest time in the future. This optimal result is referred to as Bélády’s optimal algorithm/simply optimal replacement policy or the clairvoyant algorithm.
Least Recently Used is based on the simplest assumption
– The information that will not be needed for the longest time.
– Is the information that has not been accessed for the longest time.
🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩重要🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩
Replacement Algorithm
The algorithms are more sophisticated in the real world, e.g.
– “Scans” are common, e.g. go through a large query result in order
(will be more clear when discussing cursors).
• The engine knows the current position in the result set.
• Uses the sort order to determine which records will be accessed soon.
• Tags those blocks as not replaceable.
• (A form of clairvoyance).
– Not all users/applications are equally “important.”
• Classify users/applications into priority 1, 2 and 3.
• Sub-allocate the buffer pool into pools P1, P2 and P3.
• Apply LRU within pools and adjust pool sizes based on relative importance.
• This prevents
– A high access rate, low-priority application from taking up a lot of frames
– Result in low access, high priority applications not getting buffer hits.
Indexes
(Database Systems Concepts, V7, Ch. 14)
P.S. serach-key相当于图书的名字,pointer相当于告诉你图书在哪里
Ordered Indices
Dense Index Files
dense index的定义是index = distinct column record
两者都是dense index
👆一一对应是因为该列是primary key & candidate key
但是👇的不是candidate key
P.S. 超键、候选键、主键、外键
Sparse Index Files
如果我要找45565,先找index里面大于该值的76766然后往回一个32343start scan
Secondary Indices Example
Clustering vs Nonclustering Indices
Multilevel Index
- If index does not fit in memory, access becomes expensive.
- Solution: treat index kept on disk as a sequential file and construct a sparse index on it.
• outer index – a sparse index of the basic index
• inner index – the basic index file - If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.
- Indices at all levels must be updated on insertion or deletion from the file.
B + B^+ B+ - Tree Index Files
The 1st thing about
B
+
B^+
B+ tree is that it’s not a binary tree. Every node in the tree has multiple entries in it.
B + B^+ B+-Tree Node Structure
Leaf Nodes in B + B^+ B+-Trees
⚠️这部分prof跳过了啥查询,插入,删除(有时间自己再看吧)
other issues in indexing
- Record relocation and secondary indices
• If a record moves, all secondary indices that store record pointers have to be updated
• Node splits in B + B^+ B+-tree file organizations become very expensive
• Solution: use search key of B±tree file organization instead of record pointer in secondary index
-Add record-id if B + B^+ B+-tree file organization search key is non-unique
-Extra traversal of file organization to locate record
• Highercostforqueries,butnodesplitsarecheap
❓index这么有用,为什么不对所有数据index?
因为index是一种文件,会占位置,第二当更新数据的时候,会增加复杂度
Indexing Strings
Hashing
non-ordered index
Static Hashing
- A bucket is a unit of storage containing one or more entries (a bucket is typically a disk block).
• we obtain the bucket of an entry from its search-key value using a hash function - Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.
- Hash function is used to locate entries for access, insertion as well as deletion.
- Entries with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate an entry.
- In a hash index, buckets store entries with pointers to records
- In a hash file-organization buckets store records
Handling of Bucket Overflows
- Bucket overflow can occur because of
• Insufficient buckets
• Skew in distribution of records. This can occur due to two reasons:
-multiple records have same search-key value
-chosen hash function produces non-uniform distribution of key
values - Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is handled by using overflow buckets.