200316 Introduction to Databases LECTURE 9: Disks, I/O, Indexes 纯理论

最新推荐文章于 2024-01-29 23:26:22 发布

喝冰可乐吃辣火锅的追星胖胖要找工作上岸

最新推荐文章于 2024-01-29 23:26:22 发布

阅读量867

点赞数

分类专栏： # 数据库原理文章标签： mysql 数据库

本文链接：https://blog.csdn.net/LAJIANGJIADEXIANGYE/article/details/105885957

版权

数据库原理专栏收录该内容

5 篇文章 0 订阅

订阅专栏

🚩2020/03/16
我太喜欢这个老师了😂上课风趣👇（虽然很多梗我翻译成中文也听不太懂）
在这里插入图片描述

很抱歉没有认真听课，好多英文听不太懂，有些东西太偏cs理论一头雾水，准备花时间慢慢去琢磨，然后整理出笔记方便期末复习（这部分笔记是上课prof提到的点，skip部分课件）
【仅供学习分享，遵纪守法共同进步】
老师上课的课件书本Database System Concepts 7th Edition

what is database?

The term database refers to a collection of data that is managed by a DBMS(Database Management System数据库管理系统)
The DBMS is expected to

allow users to create new databases and specify their schemas (logical structure of the data), usung a specialized data-definition language(DDL)
give users the ability to query and modify the data (query language or data-manipulation langauge(DML))
support the storage if very large amounts of data over a long period of time, allowing efficient access to the data for queries and database modifications
enable durability, the recovery of the database in the face of failuers, errors of many kinds, or intentional misuse
control access to data from many users at once, without allowing unexpected interations among users (called isolation) and without actions on the data to be performed partially but not completely (called atomicity)

Purpose of Database Systems

In the early days, database applications were built directly on top of file systems, which leads to:

Data redundancy and inconsistency: data is stored in multiple file formats resulting induplication of information in different files
Difficulty in accessing data
• Need to write a new program to carry out each new task
Data isolation
• Multiple files and formats
Integrity problems
• Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly
• Hard to add new constraints or change existing ones
Atomicity of updates
• Failures may leave database in an inconsistent state with partial
updates carried out
• Example: Transfer of funds from one account to another should either complete or not happen at all
Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
Ex: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the same time
Security problems
• Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems

Lecture 9 DBMS Architecture and Implementation: Disks, I/O, Indexes

(Database Systems Concepts, V7, Ch. 12)

Classification of Physical Storage Media

Can differentiate storage into:

volatile（不稳定的） storage: loses contents when power is switched off
non-volatile storage:
• Contents persist even when power is switched off.
• Includes secondary and tertiary storage, as well as batter-backed up main-memory.

Factors affecting choice of storage media include
• Speed with which data can be accessed
• Cost per unit of data
• Reliability

在这里插入图片描述

P.S :
cache is basically on the same chip as the processor. Main memory is addressable. It’s on another chip. 除这两个其他的are external
magnetic tapes 磁带
optical disk 光盘 like physical DVD
magnetic disk 磁盘: the spinning disks, the hard disk drives, so if you’ve got a computer with a hard disk drive or you’ve got an external storage device, it’s got a magnetic hard disk.
flash memory: semiconductor memory. There’s no moving parts, but it’s not volatile.

primary storage: Fastest media but volatile (cache, main memory).
secondary storage: next level in hierarchy, non-volatile, moderately fast access time
• Also called on-line storage
• E.g., flash memory, magnetic disks
tertiary（第三的） storage: lowest level in hierarchy, non-volatile, slow access time
• also called off-line storage and used for archival storage
• e.g., magnetic tape, optical storage
• Magnetic tape

这里太理论了没听懂
在这里插入图片描述

Read-write head
Surface of platter divided into circular tracks
• Over 50K-100K tracks per platter on typical hard disks
Each track is divided into sectors
• A sector is the smallest unit of data that can be read or written
• Sector size typically 512 bytes(最小)
• Typical sectors per track: 500 to 1000 (on inner tracks) to 1000 to 2000 (on outer tracks)
To read/write a sector
• disk arm swings to position head on right track
• platter spins continually; data is read/written as sector passes under head
Head-disk assemblies
• multiple disk platters on a single spindle (1 to 5 usually)
• one head per platter, mounted on a common arm.
Cylinder i consists of $i^{th}$ track of all the platters
Disk controller – interfaces between the computer system and the disk
drive hardware.
• accepts high-level commands to read or write a sector
• initiates actions such as moving the disk arm to the right track and actually reading or writing the data
• Computes and attaches checksums to each sector to verify that data is read back correctly. (If data is corrupted, with very high probability stored checksum won’t match recomputed checksum)
• Ensures successful writing by reading back sector after writing it
• Performs remapping of bad sectors

Performance Measures of Disks

Access time – the time it takes from when a read or write request is
issued to when data transfer begins. Consists of:
• Seek time – time it takes to reposition the arm over the correct track.
• Rotational latency – time it takes for the sector to be accessed to appear under the head.
Data-transfer rate – the rate at which data can be retrieved from or stored to the disk.
Disk block is a logical unit for storage allocation and retrieval
Sequential access pattern
• Successive requests are for successive disk blocks
• Disk seek required only for first block
Random access pattern
• Successive requests are for blocks that can be anywhere on disk
• Each access requires a seek
• Transfer rates are low since a lot of time is wasted in seeks
I/O operations per second (IOPS)
Number of random block reads that a disk can support per second
Mean time to failure (MTTF) – the average time the disk is expected to
run continuously without any failure.

RAID

RAID (redundant array of independent disks) is a data storage virtualization technology that combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy, performance improvement, or both.

Disk organization techniques that manage a large numbers of disks, providing a view of a single disk of
• high capacity and high speed by using multiple disks in parallel,
• high reliability by storing data redundantly, so that data can be recovered even if a disk fails
The chance that some disk out of a set of N disks will fail is much higher than the chance that a specific single disk will fail.

P.S. 橙色的是error correcting codes
One of these things(蓝色) fails, I can still recover the data
比较常见的是RAID-0/1/5/6

在这里插入图片描述

过时但是是基础👇
在这里插入图片描述

Data Storage Structures

(Database Systems Concepts, V7, Ch. 13)

File Organization

The database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields.
P.S. We describe it abstractly: file is a table, records as a row, field as a column
One approach
• Assume record size is fixed
• Each file has records of one particular type only
• Different files are used for different relations
This case is easiest to implement; will consider variable length records later
We assume that records are smaller than a disk block

Terminology

• A tuple in a relation maps to a record. Records may be
– Fixed length
all the records take up the same number of bytes. so all the rows are the same size
– Variable length
within a table, the rows have got different sizes
– Variable format (which we will see in Neo4J, DynamoDB, etc).
• A block
– Is the unit of transfer between disks and memory (buffer pools).
– Contains multiple records, usually but not always from the same relation.
• The database address space contains
– All of the blocks and records that the database manages
– Including blocks/records containing data
– And blocks/records containing free space.

# Fixed length record
# Sample DDL
CREATE TABLE `products` (
`product_id` char(8) NOT NULL,
`product_name` varchar(16) NOT NULL,
`product_description` char(8) NOT NULL,
`product_brand` enum('IBM','HP','Acer','Lenovo','Some really long brand
name') NOT NULL, PRIMARY KEY (`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

enum 枚举用法

1. Fixed-length records

Deletion of record i: alternatives:
• moverecords i+1,…,n to i,…,n–1 -> Record 3 deleted
在这里插入图片描述

• move record n to i -> Record 3 deleted and replaced by record 11
在这里插入图片描述
• do not move records, but link all free records on a free list

2. Variable-length records

(1)一条记录中如何取出每一列？
在这里插入图片描述
（2）一个block中如何提取每一条记录？-> Slotted Page Structure

一个空的block 开始插入时第一个记录存在 block 的末尾，同时在Block Header 中的数组加一项指向这个记录。第二条记录接着往前存，以此类推。可用空间被夹在了中间部分.

删除一条记录直接回收它所暂用的空间，指针数组中把它标记为已删除（如可以把它的 size 设置为 -1）。它前面的记录依次往后移动，同时 header 中的指针数组也要相应的更新。因为一个 block 4~8k，这个移动并没有太大影响。

Slotted-page 要求所有的指针不能直接指向记录本身，只能指向header 中指针数组项（二级指针）。这样记录移动时，引用该记录的地方不需要改变。
作者：lesliefang 链接：https://www.jianshu.com/p/a79995718fc8 来源：简书

Organization of Records in Files

Heap
Sequential
In a multitable clustering file organization records of several different relations can be stored in the same file
• Motivation: store related records on the same block to minimize I/O
$B^+$ -tree file organization
• Ordered storage even with inserts/deletes
• More on this in Chapter 14
Hashing – a hash function computed on search key; the result specifies in
which block of the file the record should be placed
• More on this in Chapter 14

Partitioning

Table partitioning: Records in a relation can be partitioned into smaller
relations that are stored separately （Take the records from a single table and keep them into different files.）
E.g., transaction relation may be partitioned into transaction_2018, transaction_2019, etc.
Queries written on transaction must access records in all partitions
• Unless query has a selection such as year=2019, in which case only one partition in needed
Partitioning
• Reduces costs of some operations such as free space management
• Allows different partitions to be stored on different storage devices
E.g., transaction partition for current year on SSD, for older years on magnetic disk

Data Dictionary Storage

在这里插入图片描述

show table from information_schema;

喝冰可乐吃辣火锅的追星胖胖要找工作上岸

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
200316 Introduction to Databases LECTURE 9: Disks, I/O, Indexes 纯理论

????2020/03/16我太喜欢这个老师了????上课风趣????（虽然很多梗我翻译成中文也听不太懂）很抱歉没有认真听课，好多英文听不太懂，有些东西太偏cs理论一头雾水，准备花时间慢慢去琢磨，然后整理出笔记方便期末复习（这部分笔记是上课prof提到的点，skip部分课件）【仅供学习分享，遵纪守法共同进步】老师上课的课件书本Database System Concepts 7th Editionw...
复制链接

扫一扫

专栏目录