LLVM读书笔记

最新推荐文章于 2023-12-16 17:55:25 发布

HizT_1999

最新推荐文章于 2023-12-16 17:55:25 发布

阅读量1k

点赞数 3

分类专栏：随笔文章标签：编译器

本文链接：https://blog.csdn.net/HizT_1999/article/details/106952387

版权

本文详细介绍了LLVM编译框架的组成部分，包括LLVM简介、词法分析器、语法分析器与抽象语法树（AST）、语义分析、中间表示（IR）以及后端部分，如指令选择、指令调度、寄存器分配和代码发行。LLVM以其模块化设计，支持多种编程语言和硬件平台，简化了编译器的开发和优化。

摘要由CSDN通过智能技术生成

LLVM读书笔记

1.LLVM简介

传统编译器：前端 -> 优化器 -> 后端。
在这里插入图片描述
LLVM：多种前端 -> 统一中间代码 -> 面向多种机器的多种后端

· 不同的前端后端使用统一的中间代码LLVM Intermediate Representation (LLVM IR)
· 如果需要支持一种新的编程语言，那么只需要实现一个新的前端
· 如果需要支持一种新的硬件设备，那么只需要实现一个新的后端
· 优化阶段是一个通用的阶段，它针对的是统一的LLVM IR，不论是支持新的编程语言，还是支持新的硬件设备，都不需要对优化阶段做修改

广义LLVM：整个LLVM架构；
狭义LLVM：LLVM后端（代码优化、目标代码生成等）
在本报告中参考的LLVM代码中，LLVM框架的前端使用的是Clang，所以框架可以概括为下图。
在这里插入图片描述

2.词法分析器

功能：将用户输入的字符串切分成为“语元(token)”，然后再做进一步处理。
LLVM实现：编译器第一个步骤是词法分析（Lexical analysis）。词法分析器读入组成源程序的字节流，并将他们组成有意义的词素（Lexeme）序列。对于每个词素，词法分析器产生词单元（token）作为输出，并生成相关符号表。词法库包含了几个紧密相连的类，他们涉及到词法和C源码预处理。
1）保留字定义：有我们熟悉的”if”,”include”等。

PPKEYWORD(if)
PPKEYWORD(ifdef)
PPKEYWORD(ifndef)
PPKEYWORD(elif)
PPKEYWORD(else)
PPKEYWORD(endif)
PPKEYWORD(defined)
// C99 6.10.2 - Source File Inclusion.
PPKEYWORD(include)
PPKEYWORD(__include_macros)
// C99 6.10.3 - Macro Replacement.
PPKEYWORD(define)
PPKEYWORD(undef)
// C99 6.10.4 - Line Control.
PPKEYWORD(line)
// C99 6.10.5 - Error Directive.
PPKEYWORD(error)
// C99 6.10.6 - Pragma Directive.
PPKEYWORD(pragma)
// GNU Extensions.
PPKEYWORD(import)
PPKEYWORD(include_next)
PPKEYWORD(warning)
PPKEYWORD(ident)
PPKEYWORD(sccs)
PPKEYWORD(assert)
PPKEYWORD(unassert)
// Clang extensions
PPKEYWORD(__public_macro)
PPKEYWORD(__private_macro)
...

2）Token类的部分定义

class Token {
  /// The location of the token. This is actually a SourceLocation.
  unsigned Loc;

  // Conceptually these next two fields could be in a union.  However, this
  // causes gcc 4.2 to pessimize LexTokenInternal, a very performance critical
  // routine. Keeping as separate members with casts until a more beautiful fix
  // presents itself.

  /// UintData - This holds either the length of the token text, when
  /// a normal token, or the end of the SourceRange when an annotation
  /// token.
  unsigned UintData;

  /// PtrData - This is a union of four different pointer types, which depends
  /// on what type of token this is:
  ///  Identifiers, keywords, etc:
  ///    This is an IdentifierInfo*, which contains the uniqued identifier
  ///    spelling.
  ///  Literals:  isLiteral() returns true.
  ///    This is a pointer to the start of the token in a text buffer, which
  ///