LLVM学习入门(4):添加 JIT 和 Optimizer 支持

会做饭的好同学

已于 2022-10-10 18:10:40 修改

阅读量987

点赞数 1

分类专栏： LLVM 文章标签： LLVM

于 2022-10-08 13:00:31 首次发布

本文链接：https://blog.csdn.net/m0_43400575/article/details/127206715

版权

LLVM 专栏收录该内容

8 篇文章 15 订阅

订阅专栏

添加 JIT 和 Optimizer 支持

本文是使用 LLVM 开发新语言 Kaleidoscope 教程第四篇文章。前三篇介绍了一种简单语言的实现并增加了对生成 LLVM IR 的支持。本篇将介绍两种新的技术：为我们的语言添加优化器支持，以及添加 JIT compiler 支持。这些功能将会让 Kaleidoscope 更加高效。

JIT：Just-In-Time optimization.

4.1 Trivial Constant Folding 简单常量折叠

我们在第三篇中的演示中易于扩展，但是不会产生出色的代码。IRBuilder 在编译简单代码时确实为我们提供了明显的优化，例如：

ready> def test(x) 1+2+x;
Read function definition:
define double @test(double %x) {
entry:
    	// 直接用 3 加了
        %addtmp = fadd double 3.000000e+00, %x
        ret double %addtmp
}

此代码不是通过解析输入而构建的 AST 的形式。那将是：

ready> def test(x) 1+2+x;
Read function definition:
define double @test(double %x) {
entry:
        %addtmp = fadd double 2.000000e+00, 1.000000e+00
        %addtmp1 = fadd double %addtmp, %x
        ret double %addtmp1
}

如上所示，常量折叠是非常常见且非常重要的优化，许多语言 builder 在其 AST 表示都支持常量折叠。

使用 LLVM，我们在 AST 中不需要此支持。由于所有构建 LLVM IR 的调用都要经过 LLVM IR 构建器,因此构建器本身会检查调用时是否存在持续的折叠机会。如果是这样，它只执行常量折叠并返回常量，而不创建指令。

实际上这很容易，我们建议 IRBuilder 在生成此类代码时始终使用。它没有使用 “syntactic overhead” (我们不必在各处进行常量检查编译器)，并且在某些情况下（特别是对于具有宏预处理或宏的语言而言）可以明显的减少 LLVM IR 的数量，避免使用很多常量。（折叠起来自然减少很多常量）

另一方面，IRBuilder 它受一下实际的限制：它在生成代码时会与代码内联地进行所有分析。如果我们使用一个稍微复杂点的示例：

ready> def test(x) (1+2+x)*(x+(1+2));
ready> Read function definition:
define double @test(double %x) {
entry:
        %addtmp = fadd double 3.000000e+00, %x
        %addtmp1 = fadd double %x, 3.000000e+00
        %multmp = fmul double %addtmp, %addtmp1
        ret double %multmp
}

在这种情况下，乘法的 LHS 和 RHS 是相同的值。我们希望看到它生成 tmp = x + 3;result = tmp * tmp;,而不是计算两次 x + 3。

然而，没有任何本地分析方法能够检测和纠正上述问题。这种通常需要两种转换：表达式的重新关联(以使添加在词法上相同)和通用子表达式消除(CSE) （删除冗余添加指令）。幸运的是， LLVM 以 Passes 的形式提供了可以使用的多种优化。

4.2 LLVM Optimization Passes

Warning

Due to the transition to the new PassManager infrastructure this tutorial is based on llvm::legacy::FunctionPassManager which can be found in LegacyPassManager.h. For the purpose of the this tutorial the above should be used until the pass manager transition is complete.

LLVM 提供了许多优化 passer，这些 passes 可以完成许多不同的事情并且具有不同的权衡。与其他系统不同，LLVM 不会错误地认为一组优化适用于所有语言和所有情况。LLVM 允许 compiler implementor 对要使用的优化，以何种顺序以及在哪种情况做出完整的决策。

作为一个具体的示例，LLVM 支持两个“整个模块”遍历，它们遍历了尽可能多的代码体（通常是整个文件，但是如果在链接时运行，则这可能是整个程序的重要部分）。它还支持并包括 pre-function 遍历，这些遍历仅一次在一项功能上运行，则无需查看其他功能。对于更多关于 passes 的信息以及它们怎么运行，请参考 How to Write a Pass 和 List of LLVM Passes .

对于 Kaleidoscope，我们目前正在动态生成函数代码，每次用户输入一次生成一个函数代码。我们并没有在这种设置中寻求最终的优化经验，但是我们也想在其中找到简单快捷的功能。这样，我们将在用户键入函数时选择运行一些针对每个函数的优化。如果我们创建“静态 Kaleidoscope 编译器”，我们将完全使用现在的代码，只是推迟运行优化过程，直到解析了整个文件。

为了使每个功能都可以进行优化，我们需要设置 FunctionPassManager 来保存和组织要运行的 LLVM 优化。一旦有了这些，就可以添加一组优化来运行。对于每个要优化的模块，我们都需要一个新的 FunctionPassManager ，因此我们将编写一个函数来创建和初始化模块，以及为我们设置过程管理器：

void InitializeModuleAndPassManager(void) {
  // Open a new module.
  TheModule = std::make_unique<Module>("my cool jit", TheContext);

  // Create a new pass manager attached to it.
  TheFPM = std::make_unique<FunctionPassManager>(TheModule.get());

  // Do simple "peephole" optimizations and bit-twiddling optzns.
  TheFPM->add(createInstructionCombiningPass());
  // Reassociate expressions.
  TheFPM->add(createReassociatePass());
  // Eliminate Common SubExpressions.
  TheFPM->add(createGVNPass());
  // Simplify the control flow graph (deleting unreachable blocks, etc).
  TheFPM->add(createCFGSimplificationPass());

  TheFPM->doInitialization();
}

该代码初始化全局模块 TheModule 和 TheFPM 附加到函数传递器 TheModule 。设置通道管理器后，我们将使用一系列 “add” 调用来添加一堆 LLVM 通道。

在这种情况下，我们选择添加四个优化 Passes。我们在这里选择的 pass 是一组相当标准的 “cleanup” 优化，可用于各种代码。我们暂时不研究他们的工作原理。

设置 PassManger 后,我们需要使用它。我们通过在构造新创建的函数之后（在 FunctionAST::codegen() 中）但在将其返回给客户端之前运行它来执行此操作：

if (Value *RetVal = Body->codegen()) {
  // Finish off the function.
  // 创建 LLVM ret instruction
  Builder.CreateRet(RetVal);

  // Validate the generated code, checking for consistency.
  // 对生成的代码一致性进行检查，捕获很多 error
  verifyFunction(*TheFunction);

  // Optimize the function.
  // 新加的优化函数
  TheFPM->run(*TheFunction);

  return TheFunction;
}

如上述代码，通过 FunctionPassManger 优化和更新，以替代 LLVM Function* ，有可能提高了它的主体性能。有了这个，我们可以尝试上面的测试：

ready> def test(x) (1+2+x)*(x+(1+2));
ready> Read function definition:
define double @test(double %x) {
entry:
        %addtmp = fadd double %x, 3.000000e+00
        %multmp = fmul double %addtmp, %addtmp
        ret double %multmp
}

LLVM 提供了多种可在某些情况下使用的优化。有一些 Pass 的文档介绍。另一个好的想法来源可以查看 clang 开始运行的过程。opt 工具可让我们从命令行尝试使用 passes ，因此可以查看它们是否有任何作用。现在我们的前端已经有了合理的代码，接下来看我们如何执行它！

4.3 Adding a JIT Compiler 添加一个JIT编译器

LLVM IR 中可用的代码可以应用在多种工具上。例如，我们可以对其进行优化，例如4.2中的优化例子；可以将文本或者二进制转储；可以将代码编译为某个目标的汇编文件（.s）；也可以使用 JIT 编译它。LLVM IR 表示的一个好处就是在很多编译器的部分是通用的。

在这个部分我们将为解释器添加 JIT 编译器的支持。我们希望 Kaleidoscope 的基本思想是让用户像现在一样输入函数体，但是立即求值他们键入的顶级表达式。例如，如果他们输入 1 + 2; ,我们应该求值并打印出 3 。如果他们定义了一个函数，他们应该能够从命令行中调用它。

为此，我们首先准备环境来为当前本机目标创建代码，然后声明并初始化 JIT。这是通过调用一些 InitializeNativeTarget\* 函数并添加一个全局变量 TheJIT 并在 main 以下位置将其初始化来完成的：

static std::unique_ptr<KaleidoscopeJIT> TheJIT;
...
int main() {
  // 初始化工作
  InitializeNativeTarget();
  InitializeNativeTargetAsmPrinter();
  InitializeNativeTargetAsmParser();

  // Install standard binary operators.
  // 1 is lowest precedence.
  BinopPrecedence['<'] = 10;
  BinopPrecedence['+'] = 20;
  BinopPrecedence['-'] = 20;
  BinopPrecedence['*'] = 40; // highest.

  // Prime the first token.
  fprintf(stderr, "ready> ");
  getNextToken();

  TheJIT = std::make_unique<KaleidoscopeJIT>();

  // Run the main "interpreter loop" now.
  MainLoop();

  return 0;
}

我们还需要为 JIT 设置数据布局：

void InitializeModuleAndPassManager(void) {
  // Open a new module.
  TheModule = std::make_unique<Module>("my cool jit", TheContext);
  TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());

  // Create a new pass manager attached to it.
  TheFPM = std::make_unique<FunctionPassManager>(TheModule.get());
  ...

Kaleidoscope JIT类是专门为这些教程构建的简单 JIT，可在llvm-src/examples/Kaleidoscope/include/KaleidoscopeJIT.h 的 LLVM 源代码中找到。在后面的章节中，我们将研究它的工作原理并使用新功能对其进行扩展，但是现在，我们将按照给出的内容进行操作。它的 API 非常简单: addModule 在 JIT 中添加 LLVM IR 模块，使其功能可以执行；removeModule 卸下一个模块，释放与该模块中的代码关联的所有内存；findSymbol 允许我们查找指向已编译代码的指针。

我们可以这个简单的 API 并更改解析顶级表达式的代码，如下所示：

static void HandleTopLevelExpression() {
  // Evaluate a top-level expression into an anonymous function.
  // 解析
  if (auto FnAST = ParseTopLevelExpr()) {
    // 代码生成
    if (FnAST->codegen()) {

      // JIT the module containing the anonymous expression, keeping a handle so
      // we can free it later.
      auto H = TheJIT->addModule(std::move(TheModule));
      InitializeModuleAndPassManager();
	
      // 获取最终代码生成的指针
      // Search the JIT for the __anon_expr symbol.
      auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
      assert(ExprSymbol && "Function not found");

      // Get the symbol's address and cast it to the right type (takes no
      // arguments, returns a double) so we can call it as a native function.
      double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
      fprintf(stderr, "Evaluated to %f\n", FP());

      // Delete the anonymous expression module from the JIT.
      // 删除模块
      TheJIT->removeModule(H);
    }

如果解析和代码生成成功，则下一步是将包含顶级表达式的模块添加到 JIT。我们通过调用 addModule 来做到这一点，后者会触发模块中所有功能的代码生成，并返回一个句柄，该句柄可用于稍后从 JIT 中删除该模块。将模块添加到 JIT 后，就无法再对其进行修改，因此我们还通过调用来打开一个新模块来保存后续代码 InitializeModuleAndPassManager() 。

将模块添加到 JIT 之后，我们需要获取指向最终生成的代码的指针。为此，我们调用 JIT 的 findSymbol 方法，并传递顶级表达式函数的名称：__anon_expr 。由于我们只是添加了此函数，因此我们assert findSymbol 返回了一个结果。

接下来，我们 __anon_expr 通过调用 getAddress() 符号获得函数的内存地址。回想一下，我们将顶级表达式编译成一个自包含的 LLVM 函数，该函数不带任何参数并返回计算出 double。因为 LLVM JIT 编译器与本机平台 ABI 相匹配，所以这意味着您可以将结果指针转换为该类型的函数指针并直接调用它。也就是说，JIT 编译代码和静态链接到您的应用程序的本机代码之间没有区别。

最后，由于我们不支持对顶级表达式进行重新求值，因此在完成释放关联内存的操作后，我们将从 JIT 中删除该模块。但是，我们之前（通过 InitializeModuleAndPassManager）创建的模块仍处于打开状态，正在等待添加新的代码。

仅通过这两个更改，来看一下 Kaleidoscope 现在是如何工作的：

ready> 4+5;
Read top-level expression:
define double @0() {
entry:
  ret double 9.000000e+00
}

Evaluated to 9.000000

以上输出看起来似乎基本正常。该函数的 dump 显示了 “no argument function that always returns double”，我们为键入的每个顶级表达式合成了该函数。这演示了非常基本的功能，我们还可以做更多的事情：

ready> def testfunc(x y) x + y*2;
Read function definition:
define double @testfunc(double %x, double %y) {
entry:
  %multmp = fmul double %y, 2.000000e+00
  %addtmp = fadd double %multmp, %x
  ret double %addtmp
}

ready> testfunc(4, 10);
Read top-level expression:
define double @1() {
entry:
  %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
  ret double %calltmp
}

Evaluated to 24.000000

ready> testfunc(5, 10);
ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!

函数定义和调用也可以，但是最后一行出了一点问题。该调用看起来有效，那么发生了什么？你可能已经从 API 中猜到了，模块是 JIT 的分配单位，而 testfunc 是包含匿名表达式的同一模块的一部分。当我们从 JIT 中删除该模块以释放匿名表达式的内存时，我们删除了 testfunc 它的定义。然后，当我们尝试第二次调用 testfunc 时， JIT 无法找到它。

解决此问题的最简单方法是将匿名表达式与其余函数定义放在单独的模块中。只要被调用的每个函数都有一个原型，并且在调用之前将其添加到 JIT 中，JIT 就会愉快的解决跨模块边界的函数调用。通过将匿名表达式放在其他模块中，我们可以删除它而不会影响其余功能。

实际上，我们将更进一步，将每个函数放入其自己的模块中。这样做使我们能够利用 KaleidoscopeJIT 的有用属性，这将使我们的环境更像 REPL：函数可以多次添加到 JIT（与每个模块都必须具有唯一定义的模块不同）。在 KaleidoscopeJIT 中查找符号时，它将始终返回最新的定义：

ready> def foo(x) x + 1;
Read function definition:
define double @foo(double %x) {
entry:
  %addtmp = fadd double %x, 1.000000e+00
  ret double %addtmp
}

ready> foo(2);
Evaluated to 3.000000

ready> def foo(x) x + 2;
define double @foo(double %x) {
entry:
  %addtmp = fadd double %x, 2.000000e+00
  ret double %addtmp
}

ready> foo(2);
Evaluated to 4.000000

为了使每个函数都可以驻留在自己的模块中，我们需要一种方法来将先前的函数声明重新生成到我们打开的每个新模块中：

static std::unique_ptr<KaleidoscopeJIT> TheJIT;

...

Function *getFunction(std::string Name) {
  // First, see if the function has already been added to the current module.
  if (auto *F = TheModule->getFunction(Name))
    return F;

  // If not, check whether we can codegen the declaration from some existing
  // prototype.
  auto FI = FunctionProtos.find(Name);
  if (FI != FunctionProtos.end())
    return FI->second->codegen();

  // If no existing prototype exists, return null.
  return nullptr;
}

...

Value *CallExprAST::codegen() {
  // Look up the name in the global module table.
  Function *CalleeF = getFunction(Callee);

...

Function *FunctionAST::codegen() {
  // Transfer ownership of the prototype to the FunctionProtos map, but keep a
  // reference to it for use below.
  auto &P = *Proto;
  // 先更新 FunctionProtos映射
  FunctionProtos[Proto->getName()] = std::move(Proto);
  Function *TheFunction = getFunction(P.getName());
  if (!TheFunction)
    return nullptr;

为此，我们将首先添加一个新的全局 FunctionProtos ,其中包含每个函数的最新原型。我们还将添加一种便捷方法 getFunction() ,以替换调用 TheModule->getFunction() 。我们的便捷方法搜索 TheModule 现有的函数声明，如果找不到，则回退到从 FunctionProtos 生成新的声明。在这里，CallExprAST::codegen() 我们只需要替换调用即可 TheModule->getFunction()。在这种情况下，FunctionAST::codegen() 我们需要先更新 FunctionProtos映射 ，然后调用 getFunction()。完成此操作后，我们始终可以在当前模块中为任何先前声明的函数获取函数声明。

另外，我们还需要更新 HandleDefinition 和 HandleExtern :

static void HandleDefinition() {
  // 添加新定义的函数传输到 JIT，并打开一个新模块
  if (auto FnAST = ParseDefinition()) {
    if (auto *FnIR = FnAST->codegen()) {
      fprintf(stderr, "Read function definition:");
      FnIR->print(errs());
      fprintf(stderr, "\n");
      TheJIT->addModule(std::move(TheModule));
      InitializeModuleAndPassManager();
    }
  } else {
    // Skip token for error recovery.
     getNextToken();
  }
}

static void HandleExtern() {
  if (auto ProtoAST = ParseExtern()) {
    if (auto *FnIR = ProtoAST->codegen()) {
      fprintf(stderr, "Read extern: ");
      FnIR->print(errs());
      fprintf(stderr, "\n");
      FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
    }
  } else {
    // Skip token for error recovery.
    getNextToken();
  }
}

在 HandleDefinition 中，我们添加了两行以将新定义的函数传输到 JIT 并打开一个新模块。在 HandleDefinition 中，我们只需要添加一行即可将原型添加到 FunctionProtos。

Warning

Duplication of symbols in separate modules is not allowed since LLVM-9. That means you can not redefine function in your Kaleidoscope as its shown below. Just skip this part.

The reason is that the newer OrcV2 JIT APIs are trying to stay very close to the static and dynamic linker rules, including rejecting duplicate symbols. Requiring symbol names to be unique allows us to support concurrent compilation for symbols using the (unique) symbol names as keys for tracking.

完成这些更改后，让我再次尝试 REPL（这次删除了匿名函数的 dump）：

ready> def foo(x) x + 1;
ready> foo(2);
Evaluated to 3.000000

ready> def foo(x) x + 2;
ready> foo(2);
Evaluated to 4.000000

成功！

即使使用此简单的代码，我们也获得了一些令人惊讶的强大功能，如以下内容：

ready> extern sin(x);
Read extern:
declare double @sin(double)

ready> extern cos(x);
Read extern:
declare double @cos(double)

ready> sin(1.0);
Read top-level expression:
define double @2() {
entry:
  ret double 0x3FEAED548F090CEE
}

Evaluated to 0.841471

ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
Read function definition:
define double @foo(double %x) {
entry:
  %calltmp = call double @sin(double %x)
  %multmp = fmul double %calltmp, %calltmp
  %calltmp2 = call double @cos(double %x)
  %multmp4 = fmul double %calltmp2, %calltmp2
  %addtmp = fadd double %multmp, %multmp4
  ret double %addtmp
}

ready> foo(4.0);
Read top-level expression:
define double @3() {
entry:
  %calltmp = call double @foo(double 4.000000e+00)
  ret double %calltmp
}

Evaluated to 1.000000

JIT 是如何知道 sin 和 cos 函数的？答案非常简单：KaleidoscopeJIT 具有简单的符号解析规则，可用于查找任何给定模块中不可用的符号：首先，它搜索已添加到 JIT 的所有模块，从最新版本到最旧的，以找到最新的定义。如果 JIT 中未找到定义，则它会回退到 dlsym("sin") 在 Kaleidoscope 进程本身上调用。由于 sin 是在 JIT 的地址空间中定义的，因此它仅修补模块中的调用以直接调用 libm 版本的 sin。但是在某些情况下，这甚至会更进一步：由于 sin 和 cos 是标准数学函数的名称，因此，使用 sin(1.0) 时，它会直接返回函数正确结果。

将来，我们将看到如何调整此符号解析规则来启用各种有用的功能，从安全性（限制可用于 JIT 代码的符号集）到基于符号名称的动态代码生成，以及甚至懒惰的编译。

符号解析规则的直接好处是我们现在可以通过编写任意 C++ 代码来实现操作来扩展语言。例如，如果我们添加：

#ifdef _WIN32
#define DLLEXPORT __declspec(dllexport)
#else
#define DLLEXPORT
#endif

/// putchard - putchar that takes a double and returns 0.
extern "C" DLLEXPORT double putchard(double X) {
  fputc((char)X, stderr);
  return 0;
}