文献阅读（245）Roller

tiaozhanzhe1900

于 2022-07-27 10:44:49 发布

阅读量327

点赞数

分类专栏：编译优化文章标签：深度学习

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/tiaozhanzhe1900/article/details/126009441

版权

编译优化专栏收录该内容

17 篇文章

订阅专栏

文章目录

题目：Roller: Fast and Efficient Tensor Compilation for Deep Learning
时间：2022
会议：OSDI
研究机构：微软

本篇论文的主要的motivation在于现在DNN的编译探索工作的时间比较长，特别是针对Nivida以外的硬件平台如AMD GPU和Graphcore IPU，所以这篇论文换了一个思路，采用构造的方式生成kernel，首先介绍基本概念：

rTile：最基础的抽象层级，就是一个data tile，不过与计算和访存的基础尺寸对应
rProgram：包括了load、store、compute的基于rTile的程序，可以满足GPU中的SM执行
kernel：利用rProgram构造kernel

在这里插入图片描述

rTile is a new tile abstraction that encapsulates tensor shapes that align with the key features of the underlying accelerator, thus achieving efficient execution by limiting the shape choices.
rProgram: adopts a recursive rTile-based construction algorithm to gradually increase the size of the rTile shape to construct an rProgram that saturates a single execution unit of the accelerator (e.g., an SM, a streaming multi-processor in a NVIDIA GPU)
kernel: performs the scale-out process, which simply replicates the resulting rProgram to other parallel execution units

在这里插入图片描述

tiaozhanzhe1900

博客等级

码龄7年

207
原创

120
点赞

421
收藏

369
粉丝

关注

私信

热门文章

分类专栏

芯片互联 45篇
CGRA 7篇
算法论文 13篇
编译优化 17篇
二专毕设 6篇
操作系统 3篇
提纲总结 8篇
NPU 76篇
课堂笔记 16篇
FPGA 5篇
mooc 4篇

最新评论

Arteris Training
#Neverland: 你好，有相关的文档可以分享下吗？
文献阅读（208）multi-FPGA
Mendelay: 我对此非常疑惑，望大佬讲讲
文献阅读（208）multi-FPGA
Mendelay: 我顺着DFX的思路，计算了一下DFX的延迟时间。在文中有一个这样的论述，Each of the FP16 multiplier and adder is mapped to one digital signal processing slice (DSP) and two DSPs.The multiplier takes 6 cycles, and the adder takes 11 cycles. 那这里计算至少需要17cycles. 从文中还可以看到200MHz，那实际上如果是16x64的一次计算，那么算力为： 200,000,000 × 16 × 64 / 17 = 12 GFLOPs(FP16)。也太低了吧，根本达不到标称的184GFLOPs。
文献阅读（108）二值化神经网络
CSDN-Ada助手: 神经网络中的递归神经网络有哪些应用？
Arteris Training
qq_45498816: 您好，Init IP与多个不同target并发访问这个场景是怎么发生的呢？我没太理解

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。