文献阅读（246）Glow

tiaozhanzhe1900

已于 2022-08-12 16:00:42 修改

阅读量633

点赞数

分类专栏：编译优化文章标签：机器学习深度学习人工智能

于 2020-01-12 10:11:32 首次发布

本文链接：https://blog.csdn.net/tiaozhanzhe1900/article/details/103105574

版权

编译优化专栏收录该内容

17 篇文章

订阅专栏

文章目录

1 缩写 & 引用
2 abstract & introduction
3 related work
3.1 compiler-related project
4 中间层表示
5 定点化

题目：Glow: Graph Lowering Compiler Techniques for Neural Networks
时间：2019
未发表：arXiv
研究机构：Facebook
GitHub：https://github.com/pytorch/glow

1 缩写 & 引用

Automatic differentiation in PyTorch 2017 NIPS-W
Caffe: Convolutional Architecture for Fast Feature Embedding 2014 In Proceedings of the 22nd ACM International Conference on Multimedia
NNVM Compiler: Open Compiler for AI Frameworks. http://tvmlang.org/2017/10/06/nnvm-compiler-announcement.html
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. ArXiv

2 abstract & introduction

Glow是一个针对异构hardware的机器学习编译器，输入是神经网络dataflow grap，产生两层中间表示，一个是high-level，可以进行domain-specific优化；一个是lower-level，是基于指令和address的，可以进行和memory相关的优化，如指令调度，静态memory分配和copy elimination；在最后产生特定硬件的机器代码

Hennessy和Patterson提出的机器学习domain specific architecture的五点原则：

dedicated local memories
大量的算术单元
并行的简单形式
降低的bit-width
domain-specific编程模型

3 related work

3.1 compiler-related project

XLA: 把计算图结点转化成基本线性代数操作，对不同的硬件调用backend-specific库，然后用LLVM产生代码
Glow的思路跟这个类似，区别是对于常见的矩阵运算算子提供了代码生成的模板
TVM/NNVM：把node转化成low-level Halide-based IR，这过程中进行循环优化，Halide之后产生LLVM或者CUDA代码
DLVM：把DLVM IR转化成LLVM IR，LLVM就可以优化并产生代码
nGraph：把framework的计算图转化成IR，对不同的后端再用不同的库如cuDNN，MKL-DNN
Tensor Comprehension: JIT编译器，search for最高效的执行计划

4 中间层表示

在这里插入图片描述

4.1 motivation

编译器最开始的几个阶段是target不相关的，之后到instruction selection时就变成target-specific
作者认为high-level domain specific IR是有必要的，这样编译器就可以reason about并优化high-level constructs如张量和操作
在这里插入图片描述

4.2 High-level IR

High-level IR是一个基于数据流结点图的表示，由DNN网络模型输入，直接的把一个操作翻译成一个或多个结点，这过程中可以进行基本的转换，例如replacing all useds of some node with another node，或者modifying the content of input tensors known at compile time
storage结点分成constant结点和placeholder结点

这里可以进行target无关的优化