LLVM IR学习记录（1） GetElementPtr指令

最新推荐文章于 2023-12-16 12:45:17 发布

许你一片梁城

最新推荐文章于 2023-12-16 12:45:17 发布

阅读量5.8k

点赞数 10

文章标签： llvm

本文链接：https://blog.csdn.net/woiyyn/article/details/118670736

版权

GetElementPtr指令是一条指针计算语句，本身并不进行任何数据的访问或修改，只进行指针的计算。使用语法如下：

<result> = getelementptr <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
<result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*
<result> = getelementptr <ty>, <ptr vector> <ptrval>, [inrange] <vector index type> <idx>

第一个参数为要进行计算原始指针的类型；第二个参数是原始指针，往往是一个结构体指针，或数组首地址指针。第二个参数及以后的参数，都称为index，表示要进行计算的参数，index相当于offset，作用在第二个参数给出的初始指针，如结构体的第几个元素，数组的第几个元素。结合示例，来对应看一下是如何工作的：

struct RT {
  char A;
  int B[10][20];
  char C;
};
struct ST {
  int X;
  double Y;
  struct RT Z;
};

int *foo(struct ST *s) {
  return &s[1].Z.B[5][13];
}

LLVM IR如下：

%struct.RT = type { i8, [10 x [20 x i32]], i8 }
%struct.ST = type { i32, double, %struct.RT }

define i32* @foo(%struct.ST* %s) nounwind uwtable readnone optsize ssp {
entry:
  %arrayidx = getelementptr inbounds %struct.ST, %struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13
  ret i32* %arrayidx
}

首先第一个参数为%struct.ST表示初始指针的类型；第二个参数表示初始操作的指针%s；第三个参数为指针计算，返回的是一个结构体%struct.ST={ i32, double, %struct.RT }，对应于源代码中的是S[1]；第四个参数返回%struct.RT={ i8 , [10 x [20 x i32]], i8 }，因为Z在结构体ST中的角标为2，所以对应i32 2；后面的指针计算同理读者自己理解。

其中inbounds为越界检查，然后根据官方文档给出的上面示例的分解，对应LLVM IR如下：

define i32* @foo(%struct.ST* %s) {
  %t1 = getelementptr %struct.ST, %struct.ST* %s, i32 1                        ; yields %struct.ST*:%t1
  %t2 = getelementptr %struct.ST, %struct.ST* %t1, i32 0, i32 2                ; yields %struct.RT*:%t2
  %t3 = getelementptr %struct.RT, %struct.RT* %t2, i32 0, i32 1                ; yields [10 x [20 x i32]]*:%t3
  %t4 = getelementptr [10 x [20 x i32]], [10 x [20 x i32]]* %t3, i32 0, i32 5  ; yields [20 x i32]*:%t4
  %t5 = getelementptr [20 x i32], [20 x i32]* %t4, i32 0, i32 13               ; yields i32*:%t5
  ret i32* %t5
}

每行代码后面的注释为当前getelementptr的返回值类型，这里注意一点，第一条getelementptr指令后面只有一个index “i32 1”，而其他的getelementptr指令中都有两个index，其中第一个index都为“i32 0"，官方文档对于这里也有相应的解释：Why is the extra 0 index required?

我们拿官方的实例进行分析，实例LLVM IR如下：
%MyStruct = uninitialized global { float*, i32 }
...
%idx = getelementptr { float*, i32 }, { float*, i32 }* %MyStruct, i64 0, i32 1
首先看到%MyStruct的定义，uninitialized global定义了一个未初始化的全局指针，所以这里%MyStruct不是一个结构体，而是一个结构体指针，通过getelementprt指令的第二个参数可以看出，而这条指令的目的是访问此结构体的第二个元素 i32，返回值类型改变了，不再是{ float*，i32}，第一个index i64 0表示的是在基指针上面的偏移量（官方文档给出的解释是：The first index, i64 0 is required to step over the global variable %MyStruct. Since the second argument to the GEP instruction must always be a value of pointer type, the first index steps through that pointer. A value of 0 means 0 elements offset from that pointer.）说白了就是在基地址上面的偏移量，这个参数是必需的。

再回到正在分析的IR文件上，第一条指令返回%struct.ST*:%t1，与基地址类型一致，映射源代码中的&s[1]，地址为%s+sizeof(struct ST)；后面几个getelementprt指令返回值与第二个参数指针类型是不同的，在基地址上面需要指定地址偏移。

下面有更加详细的例子：

%MyVar = global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 0, i32 0, i64 1
%idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1

上述两条命令简化表示GEP x,0,0,1和GEP x,1，这两个计算出来的地址是不一样的。

idx1计算的地址为%MyVar+0+0+4(因为数组中元素是i32，四个字节)；idx2计算出的地址为%MyVar+1*10*4。

与上述例子很相似的一个例子如下：

%MyVar = global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1, i32 0, i64 0
%idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1

这两条命令计算出的地址是一样的，都为%MyVar+1*10*4，但是类型不一样。idx1类型为i32*，而idx2类型为{ [10 x i32] }*。

许你一片梁城

关注

10
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
LLVM IR学习记录（1） GetElementPtr指令

GetElementPtr指令是一条指针计算语句，本身并不进行任何数据的访问或修改，只进行指针的计算。使用语法如下：<result> = getelementptr <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*<result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inra.
复制链接

扫一扫