体系结构量化研究方法 第四章-1

Chapter 4: Data-Level Parallelism in Vector, SIMD, and GPU Architectures

  • DLP Optimization Problem
    • Objective: Speed up applications with explicit data level parallelism
      • High throughput
      • Low cost
      • Low programming complexity
    • Hardware Constraint
      • Vector Hardware (RV64V)
      • Multi-core Processor (X86)
      • GPU Hardware (NVIDIA)
    • Architecture Solutions
      • Vector Architecture
      • SIMD Extensions
      • GPU Architecture

Data Level Parallelism

  • SIMD can exploit significant data-level parallelism for:
    • Matrix-oriented scientific computing
      • Machine Learning
    • Media-oriented image and sound processors
  • SIMD is more energy-efficient than MIMD
    • Only needs to fetch one instruction per data operation
    • Makes SIMD attractive for personal mobile devices
  • SIMD brings programming convenience
    • Allows programmer to continue to think sequentially

Vector Architecture

Basic Vector Architecture

Vector Architectures: Idea and Benefits
  • Basic idea:
    • Read sets of data elements into “vector registers”
    • Operate on those registers
    • Disperse the results back into memory
  • Benefits:
    • Amortize memory latency
      • Vector loads and stores are deeply pipelined
        • The program pays the long memory latency only once per vector
    • Deliver good performance with low energy and design complexity
      • No need to address out-of-order
RV64V: Vector Extension for RISC-V
  • Basic structure of RV64V
    • Vector Registers
      • 32 registers, each hosting a vector (32×64-bit)
      • Vector register file needs to provide enough ports to feed all the vector functional units
    • Vector Functional Units
      • Fully pipelined
      • A control unit detects structural and data hazards
    • Vector Load-store Unit
      • Fully pipelined
      • One word per clock cycle after initial latency
    • Scalar Registers
      • 31 general-purpose registers
      • 32 floating-point registers
      • Provide data and address

![[Pasted image 20241227133626.png|550]]

Dynamic Register Typing: A Typical RV64V Property
  • Dynamic Register Typing
    • Associate a data type and data size with each vector register
      • No need to specify data type and size in regular instructions
    • Program needs to configure data-type/widths of the vector registers

![[Pasted image 20241227133807.png]]

  • Advantages of dynamic register typing
    • Concise instruction set
      • Otherwise would be very huge to cover diverse
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值