计算机组成原理第二章笔记---计算机进化与性能

最新推荐文章于 2023-04-07 11:00:14 发布

Hydrion-Qlz

最新推荐文章于 2023-04-07 11:00:14 发布

阅读量573

点赞数 1

分类专栏： # 计算机组成原理笔记文章标签： coa 计算机组成原理笔记

本文链接：https://blog.csdn.net/qq_46311811/article/details/122351091

版权

计算机组成原理笔记专栏收录该内容

19 篇文章 16 订阅

订阅专栏

本文内容整理自西安交通大学软件学院李晨老师的课件,仅供学习使用,请勿转载

计算机组成原理系列笔记汇总：计算机组成原理笔记及思维导图汇总附复习建议_Qlz的博客-CSDN博客

文章目录

文章目录
本章思维导图
**Brief History of Computer**
Designing for Performance
- Microprocessor speed
Multicore, MICs and GPGPUs
Embedded System and the ARM
- Embedded System
- Acorn RISC Machine (ARM)
Performance Assessment
Vocabulary
Key points

本章思维导图

计算机进化与性能思维导图

Brief History of Computer

1950 ~ 59 Vacuum tube
1960 ~ 68 Transistor
1969 ~ 77 Integrated Circuits
1978 ~ ? Large-scale integration (LSI) and Very-large-scale integration (VLSI)
2009~? Intelligence

Vacuum tube Computer

ENIAC

The first general-purpose computer

Can conditional Jump and be programmable, distinguished it from earlier ones
Used for computing artillery firing tables
Started 1943 and Finished 1946

details

Decimal (not binary)
20 accumulators of 10 digits
Programmed manually by switches
18,000 vacuum tubes
30 tons
15,000 square feet
140 kW power consumption
5,000 additions per second

图灵机

Von Neumann/Turing(IAS)

Stored program concept
Begin 1946, but not completed 1952
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions from memory and executing them
Input and output equipment operated by control unit
Princeton Institute for Advanced Studies – IAS

Structure of the IAS computer

IAS Memory Formats

The memory of the IAS consists of 1000 storage locations (called words) of 40 bits each
- Both data and instructions are stored there
- Numbers are represented in binary form and each instruction is a binary code

IAS Registers

Memory buffer register (MBR)
- Contains a word to be stored in memory or sent to the I/O unit
- Or is used to receive a word from memory or from the I/O unit
- 内存中取出的指令、数据，以及将被送到内存的数据都要经过MBR
Memory address register (MAR)
- Specifies the address in memory of the word to be written from or read into the MBR
- 将要被操作（取指，取数据，存数据）的地址
Instruction register (IR)
- Contains the 8-bit opcode instruction being executed
- 正在和执行的8位操作码指令
Instruction buffer register (IBR)
- Employed to temporarily hold the right-hand instruction from a word in memory
Program counter (PC)
- Contains the address of the next instruction pair to be fetched from memory
- 下一条将被执行的指令地址
Accumulator (AC) and multiplier quotient (MQ)
- Employed to temporarily hold operands and results of ALU operations
- AC：暂时存储ALU的计算结果

Expanded structure of IAS computer

首先将PC中的下一条要被执行的指令地址放入MAR中，控制单元根据MAR中的地址取主存中取指令放入MBR中，MBR将左右指令分开，左指令放入IR中，右指令放入IBR中，接下来控制单元对IR中的指令进行解析。产生一系列的控制信号，如果需要操作数据的话，需要将数据的地址放入MAR中，控制单元根据MAR中的地址取主存中寻找对应的数据，并将其放入MBR中，完成后将右指令从IBR中放入IR中，接下来一样的操作，如果需要与I/O进行交互的话需要将数据先放入MBR才可以

IAS Instruction Set

21 Instructions
Data Transfer
Unconditional Branch
Conditional Branch
Arithmetic
Address Modify

UNIVAC

the Universal Automatic Computer

the first computer company – Electronic Control Corp 第一家计算机公司–电子控制公司。

UNIVAC tasks involve scientific and commercial applications

Transistor Computer

More complex arithmetic and logic units and control units
The use of high-level programming languages
Provision of system software which provided the ability to:
- load programs
- move data to peripherals（外围设备） and libraries
- perform common computations

IC Computer

Integrated Circuits (IC) 集成电路

SSI & MSI based computer is the 3rd computer
- SSI ：small scale integration（小规模集成电路）
- MSI ：medium scale integration（中规模集成电路）

Family concept

Similar or identical instruction set
Similar or identical operating system
Increasing speed, increasing number of I/O ports, Increasing memory size and Increasing cost

Transistor, resistance, capacitance made from semiconductor， together with whole circuit can be put in a silicon wafer

晶体管、电阻、电容，以及整个电路都可以放在硅片中

Moore’s Law

1965，Gordon Moore - cofounder of Intel

Number of transistors on a chip will double every year 每两年翻一番

Since 1970’s development has slowed a little ，Number of transistors doubles every 18 months

Cost of a chip has remained almost unchanged
Higher packing density means shorter electrical paths, giving higher performance
Smaller size gives increased flexibility
Reduced power and cooling requirements
Fewer interconnections increases reliability

LSI & VLSI Computer

Semiconductor memories（半导体存储）

the first relatively capacious（相对大规模） semiconductor memory（1970, Fairchild）

Quantity and Unit in common use

Bit – b
Byte – B： 8bit= 23
K (Hz, bytes)： $10^{3}$ --1024＝ $2^{10}$
M: Mega (bytes,Hz)： $10^{6}$ – $1024^{2}$ ＝ $2^{20}$
G: Giga (bytes,Hz): $10^{9}$ – $1024^{3}$ ＝ $2^{30}$
T: tera (bytes，Hz): $10^{12}$ – $1024^{4}$ ＝ $2^{40}$
P: peta (bytes，Hz): $10^{15}$ – $1024^{5}$ ＝ $2^{50}$

Designing for Performance

Microprocessor speed

The techniques for meet the CPU speed

Branch prediction(分支预测)
Data flow analysis(数据流分析)
Speculative execution(推测执行)

Other critical components speed lags of CPU’s speed

CPU has to wait
Bottleneck
Reduce the whole performance
Especially, main memory

Solutions

Optimize system structure, balancing the whole performance of CPU, memory and I/O

Improve the interface between CPU and memory
- The interface is the key path responsible for transferring instruction and data
- Increase number of bits retrieved at one time
  - Make DRAM “wider” rather than “deeper”
- Change DRAM interface
  - Cache
- Reduce frequency of memory access
  - More complex cache and cache on chip
- Increase interconnection bandwidth
  - High speed buses
  - Hierarchy of buses
Caching and buffering schemes(缓存和缓冲机制)
Higher-speed interconnection buses and more elaborate(复杂) interconnection structures
Use of multiple-processor configurations can aid in satisfying I/O demands
Increase hardware speed of processor
- Fundamentally due to shrinking logic gate size
  - More gates, packed more tightly, increasing clock rate
  - Propagation time for signals reduced
Increase size and speed of caches
- Dedicating part of processor chip
  - Cache access times drop significantly
Change processor organization and architecture
- Increase effective speed of instruction execution
- Parallelism
Power
- Power density increases with density of logic and clock speed
- Dissipating heat(散热)
RC delay
- Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them
- Delay increases as RC product increases
- Wire interconnects thinner, increasing resistance
- Wires closer together, increasing capacitance
Memory latency(内存延迟)
- Memory speeds lag processor speeds

Multicore, MICs and GPGPUs

Multicore(多核)

The use of multiple processors on the same chip provides the potential to increase performance without increasing the clock rate
Strategy is to use two simpler processors on the chip rather than one more complex processor
With two processors larger caches are justified(合理的)
As caches became larger it made performance sense to create two and then three levels of cache on a chip(多级缓存)

Many Integrated Core (MIC)集成众核

The Leap(飞跃) in performance as well as the challenges in developing software to exploit(利用) such a large number of cores
MIC is a software architecture for Co-Processor
The multicore and MIC strategy involves a homogeneous collection of general purpose processors(通用处理器) on a single chip

Graphics Processing Unit (GPU)

Core designed to perform parallel operations on graphics data
Traditionally found on a plug-in graphics card, it is used to encode and render(渲染) 2D and 3D graphics as well as process video
Used as vector processors for a variety of applications that require repetitive computations
A GPU can support a broad range of applications — GPGPU
- Deep learning

Embedded System and the ARM

Embedded System

Def.： A combination of computer hardware and software, and perhaps additional mechanical or other parts, designed to perform a dedicated (专用的) function.
In many cases, embedded systems are part of a larger system or product, as in the case of an antilock braking system in a car.

Acorn RISC Machine (ARM)

Family of RISC-based microprocessors and microcontrollers
Designs microprocessor and multicore architectures and licenses them to manufacturers(设计微处理器和多核架构，并将其授权给制造商)
Chips are high-speed processors that are known for their small die size(小模具尺寸) and low power requirements
- Widely used in PDAs and other handheld devices
Most widely used processor architecture of any kind

Performance Assessment

Clock frequency

Operations performed by a processor are governed by a system clock
Speed of processor is dictated(决定) by the pulse frequency by the system clock, measured in cycles per second(Hz) — clock frequency处理器的速度由系统时钟的脉冲频率决定，以每秒(Hz)为周期测量——时钟频率
- imprecise(不准确的)

Processor time T

Time needed that a processor execute a given program (执行给定程序耗费的时间)
T= *CPU clock cycles for a program clock cycle τ
= CPU clock cycles for a program/clock rate
$T= CPI ×I_C × clock cycle (τ) = CPI ×I_C / clock rate$
CPI: average cycles per instruction
$I_C$ : instruction count
Influenced by the instruction set architecture, compiler technology, processor implementation and memory hierarchy

Processor Speed

rate at which instructions are executed, expressed as millions of instructions per second (MIPS)
- referred to as the MIPS rate

$MIPS\, rate=\frac{I_C}{T\times 10^6}=\frac{f}{CPI\times 10^6}$

millions of floating-point instructions per second (MFLOPS)
- For scientific and game application

$\, rate=\frac{Number\, of\, executed\, floating-point\, operations\, in\, a\, program}{Execution\, time \times 10^6}$

Benchmark suite(基准程序)

A collection of programs, defined in a high-level language
Attempts to provide a representative test of a computer in a particular application or system programming area

Desirable characteristics of a benchmark(基准测试的理想特征)

Written in a high-level language, making it portable across different machines
Representative of a particular kind of programming style, such as systems
- programming, numerical programming, or commercial programming
Measured easily
Wide distribution

SPEC (Standard Performance Evaluation Corporation)

An industry consortium(组织)
Defines and maintains the best known collection of benchmark suites
Performance measurements are widely used for comparison and research purposes

Amdahl Law

The improved performance of using a faster execution mode is limited by the fraction of the execution time of faster mode in total execution time 使用更快的执行模式的改进性能受到更快模式在总执行时间中的执行时间的限制
The improved performance is limited by the frequency of using a faster mode 改进的性能受到使用一个更快的模式的频率的限制
Amdahl Law defines speed-up that can be gained by using a particular technology

$F_e=\frac{the\,computing\, time\,of\,part\,that\,can\,be\,enhanced(可以被加速部分的时间)}{total\,computing\,time\,before\,enhanced(总时间)}\le1 \\ S_e=\frac{computing\, time\,of\,part\,that\,can\,be\,enhanced\,before\,enhancement(加速前耗费时间)}{computing\, time\,of\,this\,part\,after\,enhanced(加速后耗费时间)}\ge1 \\ T_0:total\,task\,execution\,time\,before\,enhancement(未被加速前所用总时间)$

Execution time of total task after enhancement (总时间): $T_n=T_0(1-F_e)(未被加速的部分)+T_0\times \frac{F_e}{S_e}(被加速的部分)$

System speed-up after enhancement(加速率) : $S_n=\frac{T_0}{T_n}(加速前总时间比加速后总时间)=\frac{1}{(1-F_e)+\frac{F_e}{S_e}}$

Relation between system speed-up and Fe

When$ F_e=0, S_n=1$, this means no part can be enhanced
When $S_e=∞, S_n=1/(1-F_e)$
So, the system performance enhancement is strongly limited by $F_e$

example:

Suppose that a task makes extensive use of floating-point operation, with 40% of the time is consumed by floating-point operations. With a new hardware design, the floating-point module is speeded up by a factor of K. what is the overall speedup gained by this enhancement?

Solutions:

$F_e=0.4,S_e=k$

$S_n=\frac{1}{(1-F_e)+\frac{F_e}{S_e}}=\frac{1}{0.6+\frac{0.4}{K}}$

while $S_e=\infty,S_n=1.67(优化到极致)$

Vocabulary

Pipelining and parallel execution: 流水与并行执行
Speculative execution: 推测执行
Cache: 快速缓存
Decimal: 十进制
Binary: 二进制
General purpose computer: 通用计算机
Von Neumann Machine: 冯-诺依曼计算机
Opcode=operation code: 操作码
Instruction cycle: 指令周期
Fetch cycle: 取（读）周期
Flowchart: 流程图
Condition branch: 条件转移
Data transfer: 数据传送
Upward compatible: 向上兼容
Multiplexor: 复用器
Bus: 总线
Magnetic-core memory: 磁芯存储器
End user: 端用户
Speech recognition: 语音识别
Videoconferencing: 视频会议
Multimedia authoring: 多媒体编著
Workstation: 工作站
Client-server: 客户机-服务器
DRAM—dynamic random access memory: 动态随机存取存储器
Branch prediction: 转移预测
Throughput: 吞吐率
Trade-off : 折衷
Supercomputer: 超级计算机/巨型机
Parallelism: 并行性

Key points

What is the first computer in the world?
What features of von Nuemann machine is there? How about its structure?
Moore law?
What is multicore, MICs and GPU?
CPI, $I_c$ , T, MIPS, MFLOPS
Amdahl Law

Hydrion-Qlz

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
计算机组成原理第二章笔记---计算机进化与性能

本文内容整理自西安交通大学软件学院李晨老师的课件,仅供学习使用,请勿转载文章目录文章目录文章目录本章思维导图**Brief History of Computer**Vacuum tube ComputerENIACdetails图灵机IAS Memory FormatsIAS Registers**IAS Instruction Set**UNIVAC**Transistor Computer****IC Computer**Family conceptMoore’s LawLSI & V.
复制链接

扫一扫