200329 Introduction to Database Lecture 10

最新推荐文章于 2024-08-18 10:54:05 发布

喝冰可乐吃辣火锅的追星胖胖要找工作上岸

最新推荐文章于 2024-08-18 10:54:05 发布

阅读量460

点赞数

分类专栏： # 数据库原理文章标签：数据库

本文链接：https://blog.csdn.net/LAJIANGJIADEXIANGYE/article/details/105898939

版权

数据库原理专栏收录该内容

5 篇文章 0 订阅

订阅专栏

🚩2020/03/29

Column-oriented storage

Also known as columnar representation
Store each attribute of a relation separately

Benefits
• Reduced IO(input/output) if only some attributes are accessed
• Improved CPU cache performance
• Improved compression
• Vector processing on modern CPU architectures
Drawbacks
• Cost of tuple reconstruction from columnar representation
• Cost of tuple deletion and update
• Cost of decompression
Columnar representation found to be more efficient for decision support than
row-oriented representation
Traditional row-oriented representation preferable for transaction processing
Some databases support both representations
• Called hybrid row/column stores

Storage Access

在这里插入图片描述

Buffer Manager

Programs call on the buffer manager when they need a block from disk.
• If the block is already in the buffer, buffer manager returns the address of the block in main memory
• If the block is not in the buffer, the buffer manager
（1）Allocates space in the buffer for the block
• Replacing (throwing out) some other block, if required, to make space for the new block.
• Replaced block written back to disk only if it was modified since the most recent time that it was written to/fetched from the disk.
(2) Reads the block from the disk to the buffer, and returns the address of the block in main memory to requester.
Pinned block: memory block that is not allowed to be written back to disk
• Pin done before reading/writing data from a block
• Unpin done when read /write is complete
• Multiple concurrent pin/unpin operations possible
Keep a pin count, buffer block can be evicted(驱逐) only if pin count = 0
Shared and exclusive locks on buffer
• Needed to prevent concurrent operations from reading page contents as they are moved/reorganized, and to ensure only one move/reorganize at a time
• Readers get shared lock, updates to a block require exclusive lock
• Locking rules:
-Only one process can get exclusive lock at a time
-Shared lock cannot be concurrently with exclusive lock
-Multiple processes may be given shared lock concurrently

Buffer-Replacement Policies

The most efficient caching algorithm would be to always discard the information that will not be needed for the longest time in the future. This optimal result is referred to as Bélády’s optimal algorithm/simply optimal replacement policy or the clairvoyant algorithm.

Least Recently Used is based on the simplest assumption
– The information that will not be needed for the longest time.
– Is the information that has not been accessed for the longest time.

在这里插入图片描述
🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩重要🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩

Replacement Algorithm

The algorithms are more sophisticated in the real world, e.g.
– “Scans” are common, e.g. go through a large query result in order
(will be more clear when discussing cursors).
• The engine knows the current position in the result set.
• Uses the sort order to determine which records will be accessed soon.
• Tags those blocks as not replaceable.
• (A form of clairvoyance).
– Not all users/applications are equally “important.”
• Classify users/applications into priority 1, 2 and 3.
• Sub-allocate the buffer pool into pools P1, P2 and P3.
• Apply LRU within pools and adjust pool sizes based on relative importance.
• This prevents
– A high access rate, low-priority application from taking up a lot of frames
– Result in low access, high priority applications not getting buffer hits.

Indexes

(Database Systems Concepts, V7, Ch. 14)

在这里插入图片描述
P.S. serach-key相当于图书的名字，pointer相当于告诉你图书在哪里

Ordered Indices

在这里插入图片描述

Dense Index Files

在这里插入图片描述
dense index的定义是index = distinct column record
两者都是dense index

👆一一对应是因为该列是primary key & candidate key
但是👇的不是candidate key

P.S. 超键、候选键、主键、外键

在这里插入图片描述

Sparse Index Files

在这里插入图片描述
如果我要找45565，先找index里面大于该值的76766然后往回一个32343start scan

在这里插入图片描述

Secondary Indices Example

在这里插入图片描述

Clustering vs Nonclustering Indices

在这里插入图片描述

Multilevel Index

If index does not fit in memory, access becomes expensive.
Solution: treat index kept on disk as a sequential file and construct a sparse index on it.
• outer index – a sparse index of the basic index
• inner index – the basic index file
If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.
Indices at all levels must be updated on insertion or deletion from the file.

$B^+$ - Tree Index Files

在这里插入图片描述

The 1st thing about $B^+$ tree is that it’s not a binary tree. Every node in the tree has multiple entries in it.
在这里插入图片描述

$B^+$ -Tree Node Structure

在这里插入图片描述

Leaf Nodes in $B^+$ -Trees

在这里插入图片描述
⚠️这部分prof跳过了啥查询，插入，删除（有时间自己再看吧）

other issues in indexing

Record relocation and secondary indices
• If a record moves, all secondary indices that store record pointers have to be updated
• Node splits in $B^+$ -tree file organizations become very expensive
• Solution: use search key of B±tree file organization instead of record pointer in secondary index
-Add record-id if $B^+$ -tree file organization search key is non-unique
-Extra traversal of file organization to locate record
• Highercostforqueries,butnodesplitsarecheap

❓index这么有用，为什么不对所有数据index？
因为index是一种文件，会占位置，第二当更新数据的时候，会增加复杂度

Indexing Strings

在这里插入图片描述

Hashing

non-ordered index

Static Hashing

A bucket is a unit of storage containing one or more entries (a bucket is typically a disk block).
• we obtain the bucket of an entry from its search-key value using a hash function
Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.
Hash function is used to locate entries for access, insertion as well as deletion.
Entries with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate an entry.
In a hash index, buckets store entries with pointers to records
In a hash file-organization buckets store records

Handling of Bucket Overflows

Bucket overflow can occur because of
• Insufficient buckets
• Skew in distribution of records. This can occur due to two reasons:
-multiple records have same search-key value
-chosen hash function produces non-uniform distribution of key
values
Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is handled by using overflow buckets.

在这里插入图片描述

喝冰可乐吃辣火锅的追星胖胖要找工作上岸

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
200329 Introduction to Database Lecture 10

????2020/03/29Column-oriented storageAlso known as columnar representationStore each attribute of a relation separatelyBenefits• Reduced IO(input/output) if only some attributes are accessed• Impr...
复制链接

扫一扫