Storage Systems

最新推荐文章于 2024-05-27 09:55:49 发布

连理o

最新推荐文章于 2024-05-27 09:55:49 发布

阅读量418

点赞数

分类专栏：计算机体系结构

本文链接：https://blog.csdn.net/weixin_42437114/article/details/116274407

版权

计算机体系结构专栏收录该内容

11 篇文章 10 订阅

订阅专栏

参考： $Computer\ Arichitecture\ (6\th\ Edition)$

Bus

I/O buses tap into the processor-memory bus via bus adaptors: 适配器用于速度匹配（做缓存）、做接口

Main components of Intel Chipset: Pentium 4

Northbridge (接高速设备的适配器): Handles memory, Graphics
Southbridge (接低速设备的适配器): I/O, PCI bus, Disk controllers, USB controllers, Audio, Serial I/O, Interrupt controller, Timers

IMC（Integrated Memory Controller）

可以看到，CPU 集成度越来越高: Memory Controller 被集成到了 CPU 内部，北桥消失了。同时 L1 和 L2 Cache 被集成到了每个 Core 里，L3 Cache 被四个核共享，也被集成到了 CPU 里
QPI (Quick Path Interconnect)——“快速通道互联”，支持多条系统总线连接，取代前端总线 (FSB)

下一步把 Memory 也集成进 CPU…

The move from Parallel to Serial I/O

Parallel I/O (ISA bus, PCI, SCSI, IDE)
- Parallel bus clock rate limited by clock skew across long bus (~100MHz)
- High power to drive large number of loaded bus lines
- Central bus arbiter (总线仲裁器) adds latency to each transaction, sharing limits throughput
- Expensive parallel connectors and backplanes/cables (all devices pay costs)
Dedicated Point-to-point Serial Links (Ethernet, Infiniband, PCI Express, SATA, USB, Firewire)
- Point-to-point links run at multi-gigabit speed using advanced clock/signal encoding (requires lots of circuitry at each end)
- Lower power since only one well-behaved load
- Multiple simultaneous transfers
- Cheap cables and connectors (trade greater endpoint transistor cost for lower physical wiring cost), customize bandwidth per device using multiple links in parallel
Examples: 硬盘接口: IDE (并行) $\rightarrow$ SATA (串行)

Disk Storage

Storage emphasizes reliability and scalability (可扩展性) as well as cost-performance (性价比)
What is “Software king” that determines which HW features actually used?
- Compiler for processor
- Operating System for storage

Flash: The future of disks? (固态硬盘)

Flash drive advantages: Lower power (no moving parts), Much faster seek time, 100X IOs per second (no moving parts), Greater reliability (no moving parts), Lower noise (no moving parts) (数据不移动时表现好)
Flash disadvantages: Cost (20-100x disk cost/GB), Slow writes with current design (competitive with disks), write endurance (耐久度不行，某一个位置写的次数多就坏了) - not an issue for most applications since use write-leveling to spread wear around blocks on chip (通过软件来处理该问题)

Disk Figure of Metric: Areal Density

Bits recorded along a track; Metric is Bits Per Inch (BPI)
Number of tracks per surface; Metric is Tracks Per Inch (TPI)
bit density per unit area; Metric is Bits Per Square Inch: Areal Density $\textrm{BPI} \times \textrm{TPI}$

Disk Drive Performance

Disk Service Time: Time taken by a disk to complete an I/O request is sum of
- Seek Time (寻道时间), Rotational Latency, Data Transfer Rate（MB/s）

Utilization vs. Response time

利用率和响应时间

利用率 (I/O 请求频率) 越高，响应时间越长

反映存储外设可靠性能的参数

Reliability 系统可靠性: 系统从初始状态开始一直提供服务的能力
- 用平均无故障时间 MTTF (Mean Time to Failure) 来衡量
Availability 系统可用性: 系统正常工作时间在连续两次正常服务间隔时间中所占的比率
- 用 $\frac{\textrm{MTTF}}{\textrm{MTTF} +\textrm{MTTR}}$ （Mean Time To Repair, 平均修复时间）来衡量 (修复 $\rightarrow$ 数据恢复)
- MTTF + MTTR = MTBF（Mean Time Between Failure, 平均故障间隔时间）
Dependability 系统可信性: 多大程度上可以合理地认为服务是可靠的
- 可信性不可度量

Use Arrays of Small Disks?

Replace Small Number of Large Disks with Large Number of Small Disks!

在这里插入图片描述

Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

Array Reliability

Reliability of $N$ disks = Reliability of 1 Disk $\ N$
Arrays (without redundancy) too unreliable to be useful!

RAID

Redundant Arrays of (Inexpensive) Disks; 廉价磁盘冗余阵列

Files are “striped” across multiple disks (将数据以条带化的形式存储在很多磁盘上)
Redundancy yields high data availability 可用性 (Disks will still fail)
- Availability: service still provided to user, even if some components failed
Contents reconstructed from data redundantly stored in the array
- Capacity penalty to store redundant info
- Bandwidth penalty to update redundant info

RAID 0: Striping

数据条带化

RAID 0: 非冗余磁盘阵列，无冗余信息；
将数据分成条带 (stripe)，以条带为单位交叉地分布存放到各个磁盘中，形成一个容量更大，能并行工作的磁盘 (图中 Stripe0, Stripe1… 为按顺序排列的条带，其大小称为条带宽度)

所有磁盘可以并行读，因此性能很高；但不提供数据冗余，只要其中任一磁盘故障，整个系统都无法正常工作
- 适用于需要高带宽磁盘访问的场合

RAID 1: Disk Mirroring/Shadowing

Each disk is fully duplicated onto its “mirror”: Very high availability can be achieved

Bandwidth sacrifice on write: Logical write = two physical writes (并行写入磁盘及其镜像盘，且不需要计算校验信息，因此写入速度比级别更高的 RAID 都快)
Reads may be optimized: 从 RAID 1 读取数据时，磁盘及其镜像盘可独立地同时工作，由最先读出数据的磁盘提供数据
Most expensive solution: 100% capacity overhead

RAID 2: 位交叉式海明编码阵列

每个数据盘存放数据字的一位，按位交叉存放，即 Disk0 存放所有数据字的第 0 位，Disk1 存放第 1 位… 各个数据盘上的相应位计算海明 Hamming 校验码，编码位被存放在多个校验（Ecc）磁盘的对应位上
从数据盘读数据时，也要读出 Hamming 码，用于判断数据是否有错并加以纠正 (Hamming 码可以纠正 1 位错误、检测两位错误)

需要多个磁盘来存放海明校验码信息，冗余磁盘数量与数据磁盘数量的对数成正比（ $log_2m$ ， $m$ 为数据盘的个数）

RAID 3: Bit-interleaved Parity Disk

位交叉奇偶校验盘阵列

当某个磁盘发生故障时，磁盘控制器本身就能发现哪个磁盘出错，因此不需要采用复杂的 Hamming 码，使用奇偶校验即可

Logically, a single high capacity, high transfer rate disk: good for large transfers 单盘容错并行传输 (细粒度磁盘阵列，即条带宽度较小 (1 个字节或 1 位)。因此对于绝大多数 I/O 请求都需要磁盘阵列中所有磁盘为之服务，因此能获得很高的数据传输率)
$1 / N$ capacity cost for parity if $N$ data disks and $1$ parity disk
- Wider arrays reduce capacity costs, but decreases reliability/availability

RAID3 读写特点

假定：有 4 个数据盘和一个冗余盘
- 读出数据，一共需要 5 次磁盘读操作 (同时读 4 个数据盘和一个冗余盘)
- 写数据需要 3 次磁盘读和 2 次磁盘写操作

RAID 4: Block-interleaved Parity Disk

块交叉奇偶校验磁盘阵列

Inspiration for RAID 4

在 RAID 3 中，一次磁盘访问将对磁盘阵列中的所有磁盘进行操作。RAID 4 希望使用较少的磁盘参与操作，以使磁盘阵列可以并行进行多个数据的磁盘操作

RAID 4 数据以块交叉的方式存于各盘，奇偶校验信息存在一台专用盘上 (parity disk)，冗余代价与 RAID 3 相同 (采用粗粒度的磁盘阵列，即采用比较大的条带(块)为单位进行交叉存放和计算奇偶校验)；访问数据的方法与 RAID 3 不同
- Small read: every block has an error detection field——每个磁盘独立的进行读操作；Allows independent reads to different disks simultaneously (只有磁盘出现故障时，才会读校验盘，进行数据重建)
  - To catch errors on read, rely on error detection field vs. the parity disk
- Large write: 写入操作时，由于要重新计算校验码，因此几乎要访问所有磁盘

RAID 5: Block-interleaved Distributed Parity

Inspiration for RAID 5

Small writes (write to one disk): since P has old sum, compare old data to new data, add the difference to P

Small Write Algorithm

1 Logical Write = 2 Physical Reads + 2 Physical Writes

Problems of Disk Arrays: Small Writes

Small writes are limited by Parity Disk:
- Write to $D_0$ , $D_5$ both also write to P disk (因此还是不能同时写 $D_0$ 和 $D_5$ )

RAID 5: High I/O Rate Interleaved Parity

块交叉分布式奇偶校验盘阵列

为了解决上面的问题，把校验信息分布到磁盘阵列中的各个磁盘上，无专用冗余盘，每一行数据块的校验块被依次错开、循环地存放到不同盘中，使奇偶校验信息均匀分布在所有磁盘上
- Independent writes possible because of interleaved parity

RAID 6: 双维奇偶校验独立存取盘阵列

Inspiration:

Recovering from 2 failures

RAID6 特点

双维奇偶校验独立存取盘阵列: 在 RAID5 的基础上增加了一个独立的校验信息，放在另一个校验盘中，写入数据要访问 1 个数据盘和 2 个冗余盘，可容忍双盘出错
数据以块交叉方式存于各盘，检、纠错信息均匀分布在所有磁盘上

RAID 的实现

软件方式：阵列管理软件由主机来实现
- 优点：成本低；
- 缺点：过多地占用主机时间，带宽指标上不去
阵列卡方式：把 RAID 管理软件固化在 I/O 控制卡上，从而可不占用主机时间，一般用于工作站和 PC 机
子系统方式：这是一种基于通用接口总线的开放式平台，可用于各种主机平台和网络系统

Storage Environment

Direct Attached Storage (DAS)

直连

Servers connect directly to the disk array typically via a SCSI interface.

Network Attached Storage (NAS)

网络附加存储——网络上的文件系统

Server 用来提供服务，有另外一套专门的体系负责存储
NAS Devices access the disks in an array via direct connection or through external connectivity

Storage Area Network (SAN)

存储区域网络——网络上的磁盘

Servers access the disk array through a dedicated network designated as SAN (consists of Fibre Channel switches) (专门构建一个网络进行存储介质和服务器之间的交互)

连理o

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Storage Systems

参考：Computer Arichitecture (6th⁡ Edition)Computer\ Arichitecture\ (6\th\ Edition)Computer Arichitecture (6th Edition)目录BusDisk StorageUse Arrays of Small Disks?RAIDRAID 0: StripingRAID 1: Disk Mirroring/ShadowingRAID2：位交叉式海明.
复制链接

扫一扫

专栏目录