Chapter 4: Data-Level Parallelism in Vector, SIMD, and GPU Architectures
- DLP Optimization Problem
- Objective: Speed up applications with explicit data level parallelism
- High throughput
- Low cost
- Low programming complexity
- Hardware Constraint
- Vector Hardware (RV64V)
- Multi-core Processor (X86)
- GPU Hardware (NVIDIA)
- Architecture Solutions
- Vector Architecture
- SIMD Extensions
- GPU Architecture
- Objective: Speed up applications with explicit data level parallelism
Data Level Parallelism
- SIMD can exploit significant data-level parallelism for:
- Matrix-oriented scientific computing
- Machine Learning
- Media-oriented image and sound processors
- Matrix-oriented scientific computing
- SIMD is more energy-efficient than MIMD
- Only needs to fetch one instruction per data operation
- Makes SIMD attractive for personal mobile devices
- SIMD brings programming convenience
- Allows programmer to continue to think sequentially
Vector Architecture
Basic Vector Architecture
Vector Architectures: Idea and Benefits
- Basic idea:
- Read sets of data elements into “vector registers”
- Operate on those registers
- Disperse the results back into memory
- Benefits:
- Amortize memory latency
- Vector loads and stores are deeply pipelined
- The program pays the long memory latency only once per vector
- Vector loads and stores are deeply pipelined
- Deliver good performance with low energy and design complexity
- No need to address out-of-order
- Amortize memory latency
RV64V: Vector Extension for RISC-V
- Basic structure of RV64V
- Vector Registers
- 32 registers, each hosting a vector (32×64-bit)
- Vector register file needs to provide enough ports to feed all the vector functional units
- Vector Functional Units
- Fully pipelined
- A control unit detects structural and data hazards
- Vector Load-store Unit
- Fully pipelined
- One word per clock cycle after initial latency
- Scalar Registers
- 31 general-purpose registers
- 32 floating-point registers
- Provide data and address
- Vector Registers
Dynamic Register Typing: A Typical RV64V Property
- Dynamic Register Typing
- Associate a data type and data size with each vector register
- No need to specify data type and size in regular instructions
- Program needs to configure data-type/widths of the vector registers
- Associate a data type and data size with each vector register
- Advantages of dynamic register typing
- Concise instruction set
- Otherwise would be very huge to cover diverse
- Concise instruction set