计算机硬件和软件英语,计算机组成与设计:硬件/软件接口(英文版 原书第5版 ARM版)...

本书采用ARMv8-A体系结构, 介绍当前硬件技术的基本原理、汇编语言、计算机算术、流水线、内存层次结构和I/O。本书更加关注后PC时代发生的变革,通过实例、练习等详细介绍*新涌现的移动计算和云计算,更新的内容还包括平板电脑、云基础设施以及ARM(移动计算设备)和x86 (云计算)体系结构。

Preface xv

CHAPTERS

1 Computer Abstractions and Technology 2

1.1 Introduction 3

1.2 Eight Great Ideas in Computer Architecture 11

1.3 Below Your Program 13

1.4 Under the Covers 16

1.5 Technologies for Building Processors and Memory 24

1.6 Performance 28

1.7 The Power Wall 40

1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43

1.9 Real Stuff: Benchmarking the Intel Core i7 46

1.10 Fallacies and Pitfalls 49

1.11 Concluding Remarks 52

1.12 Historical Perspective and Further Reading 54

1.13 Exercises 54

2 Instructions: Language of the Computer 60

2.1 Introduction 62

2.2 Operations of the Computer Hardware 63

2.3 Operands of the Computer Hardware 67

2.4 Signed and Unsigned Numbers 75

2.5 Representing Instructions in the Computer 82

2.6 Logical Operations 90

2.7 Instructions for Making Decisions 93

2.8 Supporting Procedures in Computer Hardware 100

2.9 Communicating with People 110

2.10 LEGv8 Addressing for Wide Immediates and Addresses 115

2.11 Parallelism and Instructions: Synchronization 125

2.12 Translating and Starting a Program 128

2.13 A C Sort Example to Put it All Together 137

2.14 Arrays versus Pointers 146

2.15 Advanced Material: Compiling C and Interpreting Java 150

2.16 Real Stuff: MIPS Instructions 150

2.17 Real Stuff: ARMv7 (32-bit) Instructions 152

2.18 Real Stuff: x86 Instructions 154

2.19 Real Stuff: The Rest of the ARMv8 Instruction Set 163

2.20 Fallacies and Pitfalls 169

2.21 Concluding Remarks 171

2.22 Historical Perspective and Further Reading 173

2.23 Exercises 174

3 Arithmetic for Computers 186

3.1 Introduction 188

3.2 Addition and Subtraction 188

3.3 Multiplication 191

3.4 Division 197

3.5 Floating Point 205

3.6 Parallelism and Computer Arithmetic: Subword Parallelism 230

3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 232

3.8 Real Stuff: The Rest of the ARMv8 Arithmetic Instructions 234

3.9 Going Faster: Subword Parallelism and Matrix Multiply 238

3.10 Fallacies and Pitfalls 242

3.11 Concluding Remarks 245

3.12 Historical Perspective and Further Reading 248

3.13 Exercises 249

4 The Processor 254

4.1 Introduction 256

4.2 Logic Design Conventions 260

4.3 Building a Datapath 263

4.4 A Simple Implementation Scheme 271

4.5 An Overview of Pipelining 283

4.6 Pipelined Datapath and Control 297

4.7 Data Hazards: Forwarding versus Stalling 316

4.8 Control Hazards 328

4.9 Exceptions 336

4.10 Parallelism via Instructions 342

4.11 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Pipelines 355

4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply 363

4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 366

4.14 Fallacies and Pitfalls 366

4.15 Concluding Remarks 367

4.16 Historical Perspective and Further Reading 368

4.17 Exercises 368

5 Large and Fast: Exploiting Memory Hierarchy 386

5.1 Introduction 388

5.2 Memory Technologies 392

5.3 The Basics of Caches 397

5.4 Measuring and Improving Cache Performance 412

5.5 Dependable Memory Hierarchy 432

5.6 Virtual Machines 438

5.7 Virtual Memory 441

5.8 A Common Framework for Memory Hierarchy 465

5.9 Using a Finite-State Machine to Control a Simple Cache 472

5.10 Parallelism and Memory Hierarchy: Cache Coherence 477

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 481

5.12 Advanced Material: Implementing Cache Controllers 482

5.13 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Memory Hierarchies 482

5.14 Real Stuff: The Rest of the ARMv8 System and Spe Instructions 487

5.15 Going Faster: Cache Blocking and Matrix Multiply 488

5.16 Fallacies and Pitfalls 491

5.17 Concluding Remarks 496

5.18 Historical Perspective and Further Reading 497

5.19 Exercises 497

6 Parallel Processors from Client to Cloud 514

6.1 Introduction 516

6.2 The Difficulty of Creating Parallel Processing Programs 518

6.3 SISD, MIMD, SIMD, SPMD, and Vector 523

6.4 Hardware Multithreading 530

6.5 Multicore and Other Shared Memory Multiprocessors 533

6.6 Introduction to Graphics Processing Units 538

6.7 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors 545

6.8 Introduction to Multiprocessor Network Topologies 550

6.9 Communicating to the Outside World: Cluster Networking 553

6.10 Multiprocessor Benchmarks and Performance Models 554

6.11 Real Stuff: Benchmarking and Rooflines of the Intel Core i7 960 and the NVIDIA Tesla GPU 564

6.12 Going Faster: Multiple Processors and Matrix Multiply 569

6.13 Fallacies and Pitfalls 572

6.14 Concluding Remarks 574

6.15 Historical Perspective and Further Reading 577

6.16 Exercises 577

APPENDIX

A The Basics of Logic Design A-2

A.1 Introduction A-3

A.2 Gates, Truth Tables, and Logic Equations A-4

A.3 Combinational Logic A-9

A.4 Using a Hardware Description Language A-20

A.5 Constructing a Basic Arithmetic Logic Unit A-26

A.6 Faster Addition: Carry Lookahead A-37

A.7 Clocks A-47

A.8 Memory Elements: Flip-Flops, Latches, and Registers A-49

A.9 Memory Elements: SRAMs and DRAMs A-57

A.10 Finite-State Machines A-66

A.11 Timing Methodologies A-71

A.12 Field Programmable Devices A-77

A.13 Concluding Remarks A-78

A.14 Exercises A-79

Index I-1

ONLINE CONTENT

B Graphics and Computing GPUs B-2

B.1 Introduction B-3

B.2 GPU System Architectures B-7

B.3 Programming GPUs B-12

B.4 Multithreaded Multiprocessor Architecture B-25

B.5 Parallel Memory System B-36

B.6 Floating Point Arithmetic B-41

B.7 Real Stuff: The NVIDIA GeForce 8800 B-46

B.8 Real Stuff: Mapping Applications to GPUs B-55

B.9 Fallacies and Pitfalls B-72

B.10 Concluding Remarks B-76

B.11 Historical Perspective and Further Reading B-77

C Mapping Control to Hardware C-2

C.1 Introduction C-3

C.2 Implementing Combinational Control Units C-4

C.3 Implementing Finite-State Machine Control C-8

C.4 Implementing the Next-State Function with a Sequencer C-22

C.5 Translating a Microprogram to Hardware C-28

C.6 Concluding Remarks C-32

C.7 Exercises C-33

D A Survey of RISC Architectures for Desktop, Server, and Embedded Computers D-2

D.1 Introduction D-3

D.2 Addressing Modes and Instruction Formats D-5

D.3 Instructions: The MIPS Core Subset D-9

D.4 Instructions: Multimedia Extensions of the Desktop/Server RISCs D-16

D.5 Instructions: Digital Signal-Processing Extensions of the Embedded RISCs D-19

D.6 Instructions: Common Extensions to MIPS Core D-20

D.7 Instructions Unique to MIPS-64 D-25

D.8 Instructions Unique to Alpha D-27

D.9 Instructions Unique to SPARC v9 D-29

D.10 Instructions Unique to PowerPC D-32

D.11 Instructions Unique to PA-RISC 2.0 D-34

D.12 Instructions Unique to ARM D-36

D.13 Instructions Unique to Thumb D-38

D.14 Instructions Unique to SuperH D-39

D.15 Instructions Unique to M32R D-40

D.16 Instructions Unique to MIPS-16 D-40

D.17 Concluding Remarks D-43

Glossary G-1

Further Reading FR-1

这本最畅销的计算机组成书籍经过全面更新,关注现今发生在计算机体系结构领域的革命性变革:从单处理器发展到多核微处理器。此外,出这本书的ARM是为了强调嵌入式系统对于全亚洲计算行业的重要性,并采用ARM处理器来讨论实际计算机的指令集和算术运算,因为ARM是用于嵌入式设备的最流行的指令集架构,而全世界每年约销售40亿个嵌入式设备。与前几一样,本书采用了一个MIPS处理器来展示计算机硬件技术、流水线、存储器层次结构以及I/O等基本功能。此外,本书还包括一些关于x86架构的介绍。   本书主要特点   ·采用ARMv6(ARM11系列)为主要架构来展示指令系统和计算机算术运算的基本功能。   ·覆盖从串行计算到并行计算的革命性变革,新增了关于并行化的一章,并且每章中还有一些强调并行硬件软件主题的小节。   ·新增一个由NVIDIA的首席科学家和架构主管撰写的附录,介绍了现代GPU的出现和重要性,首次详细描述了这个针对可视计算进行了优化的高度并行化、多线程、多核的处理器。   ·描述一种度量多核性能的独特方法——“Roofline model”,自带benchmark测试和分析AMD Opteron X4、Intel Xeon 5000、Sun UltraSPARC T2和 IBM Cell的性能。   ·涵盖了一些关于闪存和虚拟机的新内容。   ·提供了大量富有启发性的练习题,内容达200多页。   ·将AMD Opteron X4和Intel Nehalem作为贯穿本书的实例。   ·用SPEC CPU2006组件更新了所有处理器性能实例。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值