Halide学习笔记--04--Vectorize, parallelize, unroll and tile

6 篇文章 0 订阅

前言

接上, 本文学习Halide lesson_05_scheduling

** schedule Func in different ways **

内容

本节主要学习几个概念: 矢量, 并行, 展开, 平铺  用来加速图像像素级计算的过程; 

row-major && column-major

//row-major
Func gradient;
gradient(x, y) = x + y;
;;;
//column-major
gradient.reorder(y, x);
;;;

row-major
column-major

split && fuse
** split给不能整除的split factor时, 会出现重复计算的问题, **

//breaks the loop over x into two nested loops;
Func gradient;
gradient(x, y) = x + y;
Var x_outer, x_inner;
gradient.split(x, x_outer, x_inner, 2);	//2--split factor 应该是内层for循环的次数

//Fuse two variables into one; the opposite of split
Var fused;
gradient.fuse(x, y, fused);

tiled traversal
** can be good for performance if neighboring pixels use overlapping input data, for example in a blur **

//Evaluating in tiles;   split and reorder
Func gradient;
gradient(x, y) = x + y;
Var x_outer, x_inner, y_outer, y_inner;
gradient.tile(x, y, x_outer, y_outer, x_inner, y_inner, 4, 4); //4--split factor

tiled

vectorize

** 向量化, 好像是更快了, 可以指定split factor; **
//Evaluating in vectors;
Func gradient;
gradient(x, y) = x + y;
gradient.vectorize(x, 4); //拆分内循环为4的向量;
//because on X86 we can use SSE to compute in 4-wide vectors.

vectors

unroll
** If multiple pixels share overlapping data, it can make sense to unroll a computation so that shared values are only computed or loaded once. **

//unroll 好像是把内循环展开了, 变成并列语句;
Func gradient;
gradient(x, y) = x + y;
gradient.unroll(x, 2);

tiles in parallel
** combine parallel with fusing as tiling to express a useful pattern; **
** This is where fusing shines. **

//fusing parallel避免低效的嵌套并行, 
//The tiles should occur in arbitrary order, but within each
// tile the pixels will be traversed in row-major order.
Func gradient;
gradient(x, y) = x + y;
Var x_outer, y_outer, x_inner, y_inner, tile_index;
gradient
	.tile(x, y, x_outer, y_outer, x_inner, y_inner, 4, 4)
	.fuse(x_outer, y_outer, tile_index)
	.parallel(tile_index); //tile_index应该是x_outer, y_outer融合后的维度
	

paralle_tiles

Finally

** Putting it all together. using all of the features above **

		Func gradient_fast;
		gradient_fast(x, y) = x + y;

        // We'll process 64x64 tiles in parallel.
        Var x_outer, y_outer, x_inner, y_inner, tile_index;
        gradient_fast
            .tile(x, y, x_outer, y_outer, x_inner, y_inner, 64, 64)
            .fuse(x_outer, y_outer, tile_index)
            .parallel(tile_index);

        // We'll compute two scanlines at once while we walk across
        // each tile. We'll also vectorize in x. The easiest way to
        // express this is to recursively tile again within each tile
        // into 4x2 subtiles, then vectorize the subtiles across x and
        // unroll them across y:
        Var x_inner_outer, y_inner_outer, x_vectors, y_pairs;
        gradient_fast
            .tile(x_inner, y_inner, x_inner_outer, y_inner_outer, x_vectors, y_pairs, 4, 2)
            .vectorize(x_vectors)
            .unroll(y_pairs);

        // Note that we didn't do any explicit splitting or
        // reordering. Those are the most important primitive
        // operations, but mostly they are buried underneath tiling,
        // vectorizing, or unrolling calls.

        // Now let's evaluate this over a range which is not a
        // multiple of the tile size.

        // If you like you can turn on tracing, but it's going to
        // produce a lot of printfs. Instead we'll compute the answer
        // both in C and Halide and see if the answers match.
        Buffer<int> result = gradient_fast.realize({350, 250});

** Note that in the Halide version, the algorithm is specified once at the top, separately from the optimizations, and there aren’t that many lines of code total. **

fast

End

感觉到现在才接触到Halide的特性, the algorithm is specified once at the top, separately from the optimizations.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值