源码路径
llvm\include\llvm\IR\Instruction.h
llvm\include\llvm\IR\Instruction.def
llvm\include\llvm\IR\Instructions.h
llvm\include\llvm\IR\InstrTypes.h
Instruction和BasicBlock
在分析代码前,需要先介绍一下llvm IR的组织结构。在llmv中,一个Module中可以有n个Function,Function内可以有n个BasicBlock,BasicBlock是单进单出的n条Instruction序列。
以下面代码为例:
//test.c
int foo(int a) {
int b;
if(a == 1) {
b = 2;
} else {
b = 3;
}
return b;
}
使用命令:clang -emit-llvm -S test.c,生成IR文件test.ll:
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @foo(i32 %a) #0 {
entry:
%a.addr = alloca i32, align 4
%b = alloca i32, align 4
%c = alloca i32, align 4
%d = alloca i32, align 4
%e = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
%0 = load i32, i32* %a.addr, align 4
%cmp = icmp eq i32 %0, 3
br i1 %cmp, label %if.then, label %if.else
if.then: ; preds = %entry
store i32 3, i32* %b, align 4
store i32 2, i32* %c, align 4
store i32 3, i32* %d, align 4
br label %if.end
if.else: ; preds = %entry
store i32 2, i32* %b, align 4
store i32 3, i32* %c, align 4
br label %if.end
if.end: ; preds = %if.else, %if.then
%1 = load i32, i32* %b, align 4
%2 = load i32, i32* %c, align 4
%add = add nsw i32 %1, %2
store i32 %add, i32* %e, align 4
%3 = load i32, i32* %d, align 4
ret i32 %3
}
可以看到,Function foo被分割成了n块(entry/if.then/if.else/if.end),这n块既为n个BasicBlock。
llvm Instruction class
在llvm中,Instruction类是所有IR指令的基类。定义如下:
class Instruction : public User,
public ilist_node_with_parent<Instruction, BasicBlock> {
...
};
Instruction继承自User比较好理解,因为指令大部分都有操作数(Operand),所以天然是User。
下面重点分析一下ilist_node_with_parent<Instruction, BasicBlock>。顾名思义,这个是一个拥有父节点的链表节点,主要的功能有2个:1)通过当前节点(Instruction),获取父节点(BasicBlock)。2)通过当前节点(Instruction),遍历链表上其他节点(Instruction)。这两部分功能的代码如下:
template <typename NodeTy, typename ParentTy, class... Options>
class ilist_node_with_parent : public ilist_node<NodeTy, Options...> {
protected:
ilist_node_with_parent() = default;
private:
/// Forward to NodeTy::getParent().
const ParentTy *getNodeParent() const {
return static_cast<const NodeTy *>(this)->getParent();
}
public:
/// Get the previous node, or \c nullptr for the list head.
NodeTy *getPrevNode() {
// Should be separated to a reused function, but then we couldn't use auto
// (and would need the type of the list).
const auto &List =
getNodeParent()->*(ParentTy::getSublistAccess((NodeTy *)nullptr));
return List.getPrevNode(*static_cast<NodeTy *>(this));
}
/// Get the next node, or \c nullptr for the list tail.
NodeTy *getNextNode() {
// Should be separated to a reused function, but then we couldn't use auto
// (and would need the type of the list).
const auto &List =
getNodeParent()->*(ParentTy::getSublistAccess((NodeTy *)nullptr));
return List.getNextNode(*static_cast<NodeTy *>(this));
}
};
} // end namespace llvm
可以看到,该基类要求NodeTy(Instruction)必须有一个getParent方法:
inline BasicBlock* Instruction::getParent() { return Parent; }
要求ParentTy(BasicBlock)必须有一个getSublistAccess方法:
/// Returns a pointer to a member of the instruction list.
static InstListType BasicBlock::*getSublistAccess(Instruction*) {
return &BasicBlock::InstList;
}
完成上述实现后,即可通过如下代码,遍历BasicBlock中的Instruction:
BasicBlock& BB = ...
for (Instruction &I : BB)
// The next statement works since operator<<(ostream&,...)
// is overloaded for Instruction&
errs() << I << "\n";
Instruction子类
Instruction的子类按照llvm\include\llvm\IR\Instruction.def中的定义,可以分为如下类别:
- Terminator Instructions
- Unary Operators
- Binary Operators
- Memory Operators
- Cast Operators
- Other Operators
在实现层面,Instruction的继承树并未严格按照这个分类去实现,而是分为了如下类别,类别中的实际的子类也与Instruction.def中的分类有所不同。