体系结构量化研究方法第四章-1

黎明沐白

已于 2024-12-30 13:35:24 修改

阅读量1k

点赞数 12

分类专栏：读书笔记文章标签：硬件架构笔记

于 2024-12-28 20:44:02 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_42047140/article/details/144794797

版权

Chapter 4: Data-Level Parallelism in Vector, SIMD, and GPU Architectures

DLP Optimization Problem
- Objective: Speed up applications with explicit data level parallelism
  - High throughput
  - Low cost
  - Low programming complexity
- Hardware Constraint
  - Vector Hardware (RV64V)
  - Multi-core Processor (X86)
  - GPU Hardware (NVIDIA)
- Architecture Solutions
  - Vector Architecture
  - SIMD Extensions
  - GPU Architecture

Data Level Parallelism

SIMD can exploit significant data-level parallelism for:
- Matrix-oriented scientific computing
  - Machine Learning
- Media-oriented image and sound processors
SIMD is more energy-efficient than MIMD
- Only needs to fetch one instruction per data operation
- Makes SIMD attractive for personal mobile devices
SIMD brings programming convenience
- Allows programmer to continue to think sequentially

Vector Architecture

Basic Vector Architecture

Vector Architectures: Idea and Benefits

Basic idea:
- Read sets of data elements into “vector registers”
- Operate on those registers
- Disperse the results back into memory
Benefits:
- Amortize memory latency
  - Vector loads and stores are deeply pipelined
    - The program pays the long memory latency only once per vector
- Deliver good performance with low energy and design complexity
  - No need to address out-of-order

RV64V: Vector Extension for RISC-V

Basic structure of RV64V
- Vector Registers
  - 32 registers, each hosting a vector (32×64-bit)
  - Vector register file needs to provide enough ports to feed all the vector functional units
- Vector Functional Units
  - Fully pipelined
  - A control unit detects structural and data hazards
- Vector Load-store Unit
  - Fully pipelined
  - One word per clock cycle after initial latency
- Scalar Registers
  - 31 general-purpose registers
  - 32 floating-point registers
  - Provide data and address

![[Pasted image 20241227133626.png|550]]

Dynamic Register Typing: A Typical RV64V Property

Dynamic Register Typing
- Associate a data type and data size with each vector register
  - No need to specify data type and size in regular instructions
- Program needs to configure data-type/widths of the vector registers

![[Pasted image 20241227133807.png]]

Advantages of dynamic register typing
- Concise instruction set
  - Otherwise would be very huge to cover diverse

最低0.47元/天解锁文章

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。