基于可变数据压缩的GPU核辅助加速策略

报告介绍了Core-Assisted Bottleneck Acceleration (CABA)框架,旨在利用未充分利用的GPU资源解决内存带宽瓶颈问题。CABA通过硬件辅助数据压缩,启用帮助线程(assist warps)来加速应用程序执行,减少计算、内存和数据依赖停滞时的闲置资源。使用Base-Delta-Immediate (BDI)压缩方法,CABA在一组带宽敏感的GPU应用上平均提高了41.7%的系统性能。
摘要由CSDN通过智能技术生成
A report brief about paper A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps.
A comprehensive design and evaluation.
Off-chip memory bandwidth is one of the bottleneck of GPU execution, which makes the GPU computational resources idle. The paper introduces Core-Assisted Bottleneck Acceleration CABA framework to make use of unutilized on-chip resources to solve the idle problem mentioned.


CABA employs hardware available on-chip but underutilized, and offers versatility in algorithm choice(hardware-based) for different applications, once the application can't benefits from compression, CABA can easily disable the compression. Unutilized compute resources come from Compute, Memory and  Data Dependence Stalls. And unutilized on-chip memory is limited by the available registers and shared memory, the hard limit on the number of threads and thread blocks per core, the number of thread blocks in the occupancy. The helper thread of CABA is low overhead, and need to be treated differently from regular threads. To implement low overhead, helper thread is easy to be managed, to enable, trigger and kill threads, and it should be flexible enough to adapt to the runtime behavior of the regular program, while communicating with original thread.


Assist warps compress data,executing code to speed up application execution, and shares the same context as the regular warp to simplify scheduling and data communication. Assist warps compress cache blocks before written to memory, and decompress before cache blocks placed into cache.


The CABA Framework is based on hardware/software co-design, with pure software only will have high overhead and with pure hardware would make register allocation and data communication more difficult. In hardware level, sequences of instructions are dynamically inserted into the execution stream.The author track and manage of the instructions at the granularity of a warp, which was called Assist Warps. The assist warps does not own a separate context, but shares both a context and a warp ID with regular warp. For different actions, helper thread requires a different number of registers, which have a short lifetime. And the subroutine of the assist warp can be written both by CUDA extensions with PTX instructions or the microarchitecture in the internal GPU instruction format. There are three main hardware additions, Assist Warp Store, Assist Warp COntroller and Assist Warp Buffer.


To compress the data, Base-Delta-Immediate compression BDI is used. BDI represents a cache line with low dynamic range using a common base (or multiple bases) and an array of deltas. The author views a cache line as a set of fixed-size values, and decompression is simply a masked vector addition of the deltas to the appropriate bases.


The use of CABA or memory compression improves system performance about 41.7% on average on a set of bandwidth-sensitive GPU applications.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值