本文是halide编程指南的连载,已同步至公众号
第十章 halide 编译 (AOT 编译)
// 第一部分:halide的编译
// 本节演示如何用halide到达传统编译器的功能,也就是提前编译.
// 本课程分为两个文件。第一个构建一个halide管道并将其编译为静态库和头。第二节,使用该静态库实际运行管道。这意味着编译此代码是一个多步骤的过程。
// 在linux上,你可以像这样编译和运行:
// g++ lesson_10_generate.cpp -g -std=c++11 -I ../include -L ../bin -lHalide -lpthread -ldl -o lesson_10_generate
// LD_LIBRARY_PATH=../bin ./lesson_10_generate
// g++ lesson_10_run.cpp lesson_10_halide.a -std=c++11 -I ../include -lpthread -ldl -o lesson_10_run
// ./lesson_10_run
// 在os x上:
// g++ lesson_10*generate.cpp -g -std=c++11 -I ../include -L ../bin -lHalide -o lesson_10_generate
// DYLD_LIBRARY_PATH=../bin ./lesson_10_generate// g++ lesson_10*run.cpp lesson_10_halide.a -o lesson_10_run -I ../include
// ./lesson_10_run
// 这种方法的好处是,最终的程序可以:
// -在运行时不进行任何jit编译,所以速度很快。
// - 它完全不依赖于libHalide,所以它是一个小的,易于部署的二进制文件。
// 如果你有整个halide目录树,你也可以通过,在位于halide目录树顶部的当前目录shell中运行:make tutorial_lesson_10_aot_compilation_run
#include "Halide.h"
#include <stdio.h>
using namespace Halide;
int main(int argc, char **argv) {
// 定义一个简单的管道:
Func brighter;
Var x, y;
// 管道将依赖于一个标量参数.
Param<uint8_t> offset;
// 取一个灰度8位输入buffer。第一个构造函数参数给出像素的类型,第二个参数指定维度的数量(而不是通道的数量!)。对于灰度图像,这是2;对于彩色图像,是3。目前,四个维度是输入和输出的最大值
ImageParam input(type_of<uint8_t>(), 2);
// 如果我们是jit编译的话,它们将只是一个int和一个Buffer,但是因为我们希望只编译管道一次,并让它为参数的任何值工作,我们需要创建一个Param对象,它可以像Expr一样使用,而ImageParam对象可以像buffer一样使用。
// 定义 Func.
brighter(x, y) = input(x, y) + offset;
// 安排.
brighter.vectorize(x, 16).parallel(y);
// 这次,不是调用brighter.realize(...),因为它会立即编译和执行这个管道。我们将调用另一个方法,将管道编译为静态库和头。
// 对于AOT编译的代码,我们需要显式地声明例程的参数。这个程序需要两个。参数通常是Params或ImageParams。
brighter.compile_to_static_library("lesson_10_halide", {input, offset}, "brighter");
printf("Halide pipeline compiled, but not yet run.\n");
// To continue this lesson, look in the file lesson_10_aot_compilation_run.cpp
return 0;
}
// 第二部分
// 在开始之前, 请阅读lesson_10_aot_compilation_generate.cpp
// 这是实际使用我们编译的halide管道的代码. 它不依赖于 libHalide, 所以不会包含Halide.h.相反,它取决于运行lesson_10_generate时生成的头文件#include "lesson_10_halide.h"
// 想继续用 Halide::Buffer 利用AOT编译的代码, 所以我们显式地包含它。它是一个只包含头的类,不需要libHalides
#include "HalideBuffer.h"
#include <stdio.h>
int main(int argc, char **argv) {
// 看看上面的头文件, (在你运行lesson_10_generate之前,它不会存在). 底部是我们生成的函数的签名:
// int brighter(halide_buffer_t *_input_buffer, uint8_t _offset, halide_buffer_t *_brighter_buffer);
// ImageParam输入已成为指向“halide_buffer_t”结构的指针。这是halide用来表示数据数组的结构。除非您是从纯C代码调用halide管道,否则您不希望直接使用它。Halide::Runtime::Buffer是一个围绕Halide_Buffer_t的简单包装器,它将隐式转换为Halide_Buffer_t*。我们将在这些插槽(slots)中传递Halide::Runtime::Buffer对象。
//Halide::Buffer 类实际上是Halide::Runtime::Buffer类的共享指针。他们有相同的API.
// 最后,"brighter"的返回值是错误代码,0表示成功
// 为输入输出创建buffer.
Halide::Runtime::Buffer<uint8_t> input(640, 480), output(640, 480);
// Halide::Runtime::Buffer还具有包装现有数据而不是分配新内存的构造函数。如果您有自己想要使用的图像类型,请使用这些。
int offset = 5;
int error = brighter(input, offset, output);
if (error) {
printf("Halide returned an error: %d\n", error);
return -1;
}
// 现在让我们检查一下执行的滤波器。它应该为每个输入像素添加偏移量。
for (int y = 0; y < 480; y++) {
for (int x = 0; x < 640; x++) {
uint8_t input_val = input(x, y);
uint8_t output_val = output(x, y);
uint8_t correct_val = input_val + offset;
if (output_val != correct_val) {
printf("output(%d, %d) was %d instead of %d\n",
x, y, output_val, correct_val);
return -1;
}
}
}
// 搞得不错!
printf("Success!\n");
return 0;
}
lesson_10_aot_compilation_generate.cpp
// Halide tutorial lesson 10: AOT compilation part 1
// This lesson demonstrates how to use Halide as an more traditional
// ahead-of-time (AOT) compiler.
// This lesson is split across two files. The first (this one), builds
// a Halide pipeline and compiles it to a static library and
// header. The second (lesson_10_aot_compilation_run.cpp), uses that
// static library to actually run the pipeline. This means that
// compiling this code is a multi-step process.
// On linux, you can compile and run it like so:
// g++ lesson_10*generate.cpp -g -std=c++11 -I ../include -L ../bin -lHalide -lpthread -ldl -o lesson_10_generate
// LD_LIBRARY_PATH=../bin ./lesson_10_generate
// g++ lesson_10*run.cpp lesson_10_halide.a -std=c++11 -I ../include -lpthread -ldl -o lesson_10_run
// ./lesson_10_run
// On os x:
// g++ lesson_10*generate.cpp -g -std=c++11 -I ../include -L ../bin -lHalide -o lesson_10_generate
// DYLD_LIBRARY_PATH=../bin ./lesson_10_generate
// g++ lesson_10*run.cpp lesson_10_halide.a -o lesson_10_run -I ../include
// ./lesson_10_run
// The benefits of this approach are that the final program:
// - Doesn't do any jit compilation at runtime, so it's fast.
// - Doesn't depend on libHalide at all, so it's a small, easy-to-deploy binary.
// If you have the entire Halide source tree, you can also build it by
// running:
// make tutorial_lesson_10_aot_compilation_run
// in a shell with the current directory at the top of the halide
// source tree.
#include "Halide.h"
#include <stdio.h>
using namespace Halide;
int main(int argc, char **argv) {
// We'll define a simple one-stage pipeline:
Func brighter;
Var x, y;
// The pipeline will depend on one scalar parameter.
Param<uint8_t> offset;
// And take one grayscale 8-bit input buffer. The first
// constructor argument gives the type of a pixel, and the second
// specifies the number of dimensions (not the number of
// channels!). For a grayscale image this is two; for a color
// image it's three. Currently, four dimensions is the maximum for
// inputs and outputs.
ImageParam input(type_of<uint8_t>(), 2);
// If we were jit-compiling, these would just be an int and a
// Buffer, but because we want to compile the pipeline once and
// have it work for any value of the parameter, we need to make a
// Param object, which can be used like an Expr, and an ImageParam
// object, which can be used like a Buffer.
// Define the Func.
brighter(x, y) = input(x, y) + offset;
// Schedule it.
brighter.vectorize(x, 16).parallel(y);
// This time, instead of calling brighter.realize(...), which
// would compile and run the pipeline immediately, we'll call a
// method that compiles the pipeline to a static library and header.
//
// For AOT-compiled code, we need to explicitly declare the
// arguments to the routine. This routine takes two. Arguments are
// usually Params or ImageParams.
brighter.compile_to_static_library("lesson_10_halide", {input, offset}, "brighter");
printf("Halide pipeline compiled, but not yet run.\n");
// To continue this lesson, look in the file lesson_10_aot_compilation_run.cpp
return 0;
}
lesson_10_aot_compilation_run.cpp
// Halide tutorial lesson 10: AOT compilation part 2
// Before reading this file, see lesson_10_aot_compilation_generate.cpp
// This is the code that actually uses the Halide pipeline we've
// compiled. It does not depend on libHalide, so we won't be including
// Halide.h.
//
// Instead, it depends on the header file that lesson_10_generate
// produced when we ran it:
#include "lesson_10_halide.h"
// We want to continue to use our Halide::Buffer with AOT-compiled
// code, so we explicitly include it. It's a header-only class, and
// doesn't require libHalide.
#include "HalideBuffer.h"
#include <stdio.h>
int main(int argc, char **argv) {
// Have a look in the header file above (it won't exist until you've run
// lesson_10_generate). At the bottom is the signature of the function we generated:
// int brighter(halide_buffer_t *_input_buffer, uint8_t _offset, halide_buffer_t *_brighter_buffer);
// The ImageParam inputs have become pointers to "halide_buffer_t"
// structs. This is struct that Halide uses to represent arrays of
// data. Unless you're calling the Halide pipeline from pure C
// code, you don't want to use it
// directly. Halide::Runtime::Buffer is a simple wrapper around
// halide_buffer_t that will implicitly convert to a
// halide_buffer_t *. We will pass Halide::Runtime::Buffer objects
// in those slots.
// The Halide::Buffer class we have been using in JIT code is in
// fact just a shared pointer to the simpler
// Halide::Runtime::Buffer class. They share the same API.
// Finally, the return value of "brighter" is an error code. It's
// zero on success.
// Let's make a buffer for our input and output.
Halide::Runtime::Buffer<uint8_t> input(640, 480), output(640, 480);
// Halide::Runtime::Buffer also has constructors that wrap
// existing data instead of allocating new memory. Use these if
// you have your own Image type that you want to use.
int offset = 5;
int error = brighter(input, offset, output);
if (error) {
printf("Halide returned an error: %d\n", error);
return -1;
}
// Now let's check the filter performed as advertised. It was
// supposed to add the offset to every input pixel.
for (int y = 0; y < 480; y++) {
for (int x = 0; x < 640; x++) {
uint8_t input_val = input(x, y);
uint8_t output_val = output(x, y);
uint8_t correct_val = input_val + offset;
if (output_val != correct_val) {
printf("output(%d, %d) was %d instead of %d\n",
x, y, output_val, correct_val);
return -1;
}
}
}
// Everything worked!
printf("Success!\n");
return 0;
}