Halide学习笔记--04--Vectorize, parallelize, unroll and tile

最新推荐文章于 2022-05-02 16:42:43 发布

young_s%

最新推荐文章于 2022-05-02 16:42:43 发布

阅读量341

点赞数

分类专栏： c++ Halide 文章标签：图像处理

本文链接：https://blog.csdn.net/weixin_45331269/article/details/120081589

版权

c++ 同时被 2 个专栏收录

17 篇文章 1 订阅

订阅专栏

Halide

6 篇文章 0 订阅

订阅专栏

前言

接上, 本文学习Halide lesson_05_scheduling

** schedule Func in different ways **

内容

本节主要学习几个概念: 矢量, 并行, 展开, 平铺  用来加速图像像素级计算的过程;

row-major && column-major

//row-major
Func gradient;
gradient(x, y) = x + y;
;;;
//column-major
gradient.reorder(y, x);
;;;

row-major
column-major

split && fuse
** split给不能整除的split factor时, 会出现重复计算的问题, **

//breaks the loop over x into two nested loops;
Func gradient;
gradient(x, y) = x + y;
Var x_outer, x_inner;
gradient.split(x, x_outer, x_inner, 2);	//2--split factor 应该是内层for循环的次数

//Fuse two variables into one; the opposite of split
Var fused;
gradient.fuse(x, y, fused);

tiled traversal
** can be good for performance if neighboring pixels use overlapping input data, for example in a blur **

//Evaluating in tiles;   split and reorder
Func gradient;
gradient(x, y) = x + y;
Var x_outer, x_inner, y_outer, y_inner;
gradient.tile(x, y, x_outer, y_outer, x_inner, y_inner, 4, 4); //4--split factor

tiled

vectorize

** 向量化, 好像是更快了, 可以指定split factor; **

//Evaluating in vectors;
Func gradient;
gradient(x, y) = x + y;
gradient.vectorize(x, 4); //拆分内循环为4的向量;
//because on X86 we can use SSE to compute in 4-wide vectors.

vectors

unroll
** If multiple pixels share overlapping data, it can make sense to unroll a computation so that shared values are only computed or loaded once. **

//unroll 好像是把内循环展开了, 变成并列语句;
Func gradient;
gradient(x, y) = x + y;
gradient.unroll(x, 2);

tiles in parallel
** combine parallel with fusing as tiling to express a useful pattern; **
** This is where fusing shines. **

//fusing parallel避免低效的嵌套并行, 
//The tiles should occur in arbitrary order, but within each
// tile the pixels will be traversed in row-major order.
Func gradient;
gradient(x, y) = x + y;
Var x_outer, y_outer, x_inner, y_inner, tile_index;
gradient
	.tile(x, y, x_outer, y_outer, x_inner, y_inner, 4, 4)
	.fuse(x_outer, y_outer, tile_index)
	.parallel(tile_index); //tile_index应该是x_outer, y_outer融合后的维度

paralle_tiles

Finally

** Putting it all together. using all of the features above **

		Func gradient_fast;
		gradient_fast(x, y) = x + y;

        // We'll process 64x64 tiles in parallel.
        Var x_outer, y_outer, x_inner, y_inner, tile_index;
        gradient_fast
            .tile(x, y, x_outer, y_outer, x_inner, y_inner, 64, 64)
            .fuse(x_outer, y_outer, tile_index)
            .parallel(tile_index);

        // We'll compute two scanlines at once while we walk across
        // each tile. We'll also vectorize in x. The easiest way to
        // express this is to recursively tile again within each tile
        // into 4x2 subtiles, then vectorize the subtiles across x and
        // unroll them across y:
        Var x_inner_outer, y_inner_outer, x_vectors, y_pairs;
        gradient_fast
            .tile(x_inner, y_inner, x_inner_outer, y_inner_outer, x_vectors, y_pairs, 4, 2)
            .vectorize(x_vectors)
            .unroll(y_pairs);

        // Note that we didn't do any explicit splitting or
        // reordering. Those are the most important primitive
        // operations, but mostly they are buried underneath tiling,
        // vectorizing, or unrolling calls.

        // Now let's evaluate this over a range which is not a
        // multiple of the tile size.

        // If you like you can turn on tracing, but it's going to
        // produce a lot of printfs. Instead we'll compute the answer
        // both in C and Halide and see if the answers match.
        Buffer<int> result = gradient_fast.realize({350, 250});

** Note that in the Halide version, the algorithm is specified once at the top, separately from the optimizations, and there aren’t that many lines of code total. **

fast

End

感觉到现在才接触到Halide的特性, the algorithm is specified once at the top, separately from the optimizations.

young_s%

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Halide学习笔记--04--Vectorize, parallelize, unroll and tile

前言接上, 本文学习Halide lesson_05_scheduling** schedule Func in different ways **内容本节主要学习几个概念: 矢量, 并行, 展开, 平铺用来加速图像像素级计算的过程; row-major && column-major//row-majorFunc gradient;gradient(x, y) = x + y;;;;//column-majorgradient.reorder(y, x);
复制链接

扫一扫

专栏目录