Computer Architecture: Introduction； Fundamentals of Quantitative Design and Analysis

最新推荐文章于 2021-10-26 10:51:29 发布

连理o

最新推荐文章于 2021-10-26 10:51:29 发布

阅读量259

点赞数 1

分类专栏：计算机体系结构

本文链接：https://blog.csdn.net/weixin_42437114/article/details/114288068

版权

计算机体系结构专栏收录该内容

11 篇文章 10 订阅

订阅专栏

参考： $Computer\ Arichitecture\ (6\th\ Edition)$

Moore’s Law

摩尔定律

集成电路芯片上所集成的电路数目，每隔 18 个月就翻一番
微处理器的性能每隔 18 个月提高一倍，而价格下降一倍

What is Computer Arichitecture?

Classical Definition

(汇编语言)程序员编写出的能在机器上正确运行的程序所必须了解到的概念性结构和功能特性
- 数据表示 (硬件能直接辩认和处理的数据类型)、寻址规则 (包括最小寻址单元、寻址方式及其表示)、寄存器定义 (各种寄存器的定义、数量和使用方式)、指令集（包括机器指令的操作类型和格式、指令间的排序和控制机构等）、中断系统（中断的类型和中断响应硬件的功能等）、机器工作状态的定义和切换 （如管态和目态等）、存储系统（主存容量、程序员可用的最大存储容量等）、信息保护（包括信息保护方式和硬件对信息保护的支持）、I/O结构（包括I/O连接方式、处理机/存储器与I/O设备间数据传送的方式和格式以及I/O操作的状态等）

Broadest Definition

使用各种可行的制造工艺进行抽象层的设计，使得应用程序有效运行

计算机系统结构 vs 计算机组成 vs 计算机实现

计算机组成：计算机体系结构的逻辑实现。物理机器级内的数据流和控制流的组成以及逻辑设计等
计算机实现：计算机组成的物理实现。着眼于器件技术和微组装技术
一种体系结构可以有多种组成。一种组成可以有多种物理实现
- 例如，指令系统中是否有乘法指令？ $\rightarrow$ 计算机系统结构; 乘法指令用乘法器还是多步加法实现？ $\rightarrow$ 计算机组成; 乘法器/加法器的物理实现：器件选型、组装方法？ $\rightarrow$ 计算机实现

The End of the Uniprocessor Era

Some Walls

“Power wall”: Power expensive, Transistors free
“ILP wall”: law of diminishing returns on more HW for ILP
“Memory wall”: Memory slow, multiplies fast
- Power Wall + ILP Wall + Memory Wall = Brick Wall

Multiple “Cores”

More, simpler processors are more power efficient $\Rightarrow$ Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years)

New Moore’s Law ?

未来计算机硬件不会更快，但会更“宽” (No longer get faster, just wider)
- TLP: 2+ cores / 2 years; DLP: 2x width / 4 years
- Performance Trends: Bandwidth Over Latency. DLP will account for more mainstream parallelism growth than TLP in next decade

Fundamentals of Computer Design

Focus on the Common Case

In making a design trade-off, favor the frequent case over the infrequent case
- E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st
- E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st
Frequent case is often simpler and can be done faster than the infrequent case
- E.g., overflow is rare when adding 2 numbers, so improve performance by optimizing more common case of no overflow
- May slow down overflow, but overall performance improved by optimizing for the normal case

What is frequent case and how much performance improved by making case faster $\rightarrow$ Amdahl’s Law

Amdahl’s Law

$\boldsymbol{\textsf{Fraction}_{\textsf{enhanced}}}$ (增强比例)

计算机执行某个任务的总时间中，可被改进部分的时间所占的百分比
- E.g., if 20s of the execution time of a program (takes 60s in total) can use an enhancement, the fraction is 20/60.

$\boldsymbol{\textsf{Speedup}_{\textsf{enhanced}}}$ (增强加速比)

The improvement gained by the enhanced execution mode; —This value is the time of the original mode over the time of the enhanced mode. ( $\textsf{Speedup}_{\textsf{enhanced}}$ is always greater than 1.)
- E.g., if the enhanced mode takes 2s for a portion of the program, while it is 5s in the original mode, the improvement is 5/2.

在这里插入图片描述

红色部分为不能被优化的部分，灰色部分为能被优化的部分；左图为优化前的任务总时间，右图为优化后的任务总时间
$\textsf{Fraction}_{\textsf{enhanced}}$ 为左图中灰色部分除以左图中总时长； $\textsf{Speedup}_{\textsf{enhanced}}$ 为左图中灰色部分时长除以右图中灰色部分时长

Amdahl’s Law

$\textsf{ExTime}_{\textsf{new}}=\textsf{ExTime}_{\textsf{old}}\times\bigg[(1-\textsf{Fraction}_{\textsf{enhanced}})+\frac{\textsf{Fraction}_{\textsf{enhanced}}}{\textsf{Speedup}_{\textsf{enhanced}}}\bigg] \\\textsf{Speedup}_{\textsf{overall}}=\frac{\textsf{ExTime}_{\textsf{old}}}{\textsf{ExTime}_{\textsf{new}}}=\frac{1}{(1-\textsf{Fraction}_{\textsf{enhanced}})+\frac{\textsf{Fraction}_{\textsf{enhanced}}}{\textsf{Speedup}_{\textsf{enhanced}}}}$

ExTime (Execution Time)

Best you could ever hope to do:
$\textsf{Speedup}_{\textsf{maximum}}=\frac{1}{(1-\textsf{Fraction}_{\textsf{enhanced}})}$
Amdahl 定律告诉我们：系统中某一部件由于采用某种更快的执行方式后，整个系统性能的提高与这种执行方式的使用频率或占总执行时间的比例有关

Amdahl’s Law example

CPU fraction is 40%, New CPU 10X faster
I/O bound server, so 60% time waiting for I/O

$\begin{aligned}\textsf{Speedup}_{\textsf{overall}}&=\frac{1}{(1-\textsf{Fraction}_{\textsf{enhanced}})+\frac{\textsf{Fraction}_{\textsf{enhanced}}}{\textsf{Speedup}_{\textsf{enhanced}}}} \\&=\frac{1}{(1-0.4)+\frac{0.4}{10}}=1.56\end{aligned}$

Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster

Processor performance equation

CPI - Cycles Per Instruction

CPI: 执行一条指令所需平均时钟周期数
$CPI=\frac{\sum_{i=1}^n(CPI_i\times I_i)}{I_N}$ 其中 $I_i$ 表示第 $i$ 类指令在程序中执行条数； $CPI_i$ 表示执行一条第 $i$ 类指令所需要的平均周期数； $n$ 表示程序中所有指令类数； $I_N$ 表示执行程序中指令的总数也可改写为
$CPI=\sum_{i=1}^n(CPI_i\times \frac{I_i}{I_N})$ 其中 $\frac{I_i}{I_N}$ 表示第 $i$ 类指令在程序中所占的比例
Example: Branch Impact
- Assume CPI = 1.0 ignoring branches, Assume solution was stalling for 3 cycles, If 30% branch, Stall 3 cycles (在流水线中，由于 Control Hazards，遇到跳转语句就暂停三个时钟周期)
- new CPI = $0.7 + 1.2 = 1.9$ , or almost 2 times slower

在这里插入图片描述

$\frac{\textsf{Seconds}}{\textsf{Program}}\ \rightarrow$ Seconds per program;
$\textsf{inst}\ \rightarrow$ instruction; (指程序编译后得到的机器指令)

在这里插入图片描述

上图中共有五个部分，其中打 $\times$ 的表示该部分与 Inst Count, CPI, Clock Rate 中的某个因素有关
例如: 对于 Inst. Set. ，使用 RISC 得到的机器指令条数多但每条指令执行时间短

Performance example

假设浮点数 FP 指令的比例为 25%，其中浮点数平方根 FPSQR 占全部指令的比例为 2%。FP 操作的 CPI 为 4，FPSQR 操作的 CPI 为 20，其他（非浮点）指令的平均 CPI 为 1.33

(1) 每条指令的平均执行周期是多少？
$CPI_原 = 4*0.25+1.33*0.75 = 1.9975 ≈ 2$
(2) 把所有的 FP 操作的 CPI 减至 2，平均 CPI 为多少?
- 解法 1： $CPI_新= 2*0.25+1.33*0.75 = 1.4975 ≈ 1.5$
- 解法 2： $CPI_新= CPI_原 −0.25*（4-2） = 2-0.5=1.5$
(3) 把 FPSQR 操作的 CPI 减至 2，平均 CPI 为多少?
$CPI_新= CPI_原 −0.02*（20-2） = 2-0.36=1.64$

连理o

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Computer Architecture: Introduction； Fundamentals of Quantitative Design and Analysis

参考：Computer Arichitecture (6th⁡ Edition)Computer\ Arichitecture\ (6\th\ Edition)Computer Arichitecture (6th Edition)目录IntroductionNavigating the TextCase Studies With ExercisesIntroductionNavigating the TextAll readers .
复制链接

扫一扫

专栏目录