GCC后端及汇编发布（12）

最新推荐文章于 2022-07-17 15:31:16 发布

wuhui_gdnt

最新推荐文章于 2022-07-17 15:31:16 发布

阅读量1.1k

点赞数

分类专栏： GCC后端及汇编发布文章标签：汇编 gcc output struct 优化 parallel

本文链接：https://blog.csdn.net/wuhui_gdnt/article/details/6401254

版权

GCC后端及汇编发布专栏收录该内容

49 篇文章 8 订阅

订阅专栏

5.3. 为define_split 产生代码

对于我们的例子，分解后的模式，同样有一个如下的 gen_split 模式。

t33

图 33 ： genouput - define_insn_and_split 模式的例子 – split 部分

929 static void

930 gen_split (rtx split, int lineno) in genoutput.c

931 {

932 struct data *d = xmalloc (sizeof (struct data));

933 int i;

934

935 d->code_number = next_code_number ;

936 d->index_number = next_index_number ;

937 d->lineno = lineno;

938 d->name = 0;

939

940 /* Build up the list in the same order as the insns are seen

941 i n the machine description. */

942 d->next = 0;

943 *idata_end = d;

944 idata_end = &d->next;

945

946 max_opno = -1;

947 num_dups = 0;

948 memset (d->operand, 0, sizeof (d->operand));

949

950 /* Get the number of operands by scanning all the patterns of the

951 split patterns. But ignore all the rest of the information thus

952 obtained. */

953 for (i = 0; i < XVECLEN (split, 0); i++)

954 scan_operands (d, XVECEXP (split, 0, i), 0, 0);

955

956 d->n_operands = max_opno + 1;

957 d->n_dups = 0;

958 d->n_alternatives = 0;

959 d->template = 0;

960 d->output_format = INSN_OUTPUT_FORMAT_NONE;

961

962 place_operands (d);

963 }

对于 define_split 模式，虽然它具有约束，现在这里没有检查，正如 gccinfo 所提及的那样。没有任何检查，操作数将被放入由 odata 管理的链表。

main (continued)

1006 printf("/n/n");

1007 output_operand_data ();

1008 output_insn_data ();

1009 output_get_insn_name ();

1010

1011 fflush (stdout);

1012 return (ferror (stdout) != 0 || have_error

1013 ? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE);

1014 }

5.4. 为 define_peephole 产生代码

5.4.1. 概览 – 关于 define_peephole

在编译过程中，组合器（ combiner ）不会注意到某些窥孔优化，如果程序中的数据流没有启发它应该做这个尝试。例如，有时两条相邻的，目的一致的指令可以被合并，即便第二条看起来不使用一个保存第一条计算结果的寄存器。一个机器特定的窥孔优化器可以检测出这样的机会。

有两种窥孔定义可能被使用。原始的 define_peephole 运行在汇编输出时刻，来匹配指令并替换汇编代码。 define_peephole 的使用已经过时。

一个更新的 define_peephole2 匹配指令，并替换为新指令。 peephole2 遍，在寄存器分配之后，但指令调度之前运行，它可能会，为执行指令调度的目标，产生好得多的代码。

一个定义看起来就像这样：

(define_peephole

[INSN-PATTERN-1

INSN-PATTERN-2

...]

"CONDITION"

"TEMPLATE"

"OPTIONAL-INSN-ATTRIBUTES")

最后的字符串操作数可能被忽略，如果在这个机器描述中，你不使用任何机器特定的信息。如果出现，它必须遵循在 define_insn 中相同的规则。

在这个概要中， INSN-PATTERN-1 ，诸如此类的，是匹配连续指令的模式。当 INSN-PATTERN-1 匹配第一条指令， INSN-PATTERN-2 匹配第二条指令，依此类推时，这个优化应用于这个指令序列。

每个被一个窥孔匹配的指令必须也要匹配一个 define_insn 。窥孔只在代码产生之前的最后阶段被检查，并且是可选的。因此，任意匹配一个窥孔但不匹配 define_insn 的指令将导致，一个非优化编译在代码产生阶段崩溃，或在不同的优化阶段崩溃。

指令的操作数，如常，由 match_operands ， match_operator 及 match_dup 来匹配。不寻常的是，操作数的数目应用到定义中所有的指令模式里。因此，你可以在两条指令中，通过使用在一条指令中使用 match_operand ，在另一条中使用 match_dup 来检查同一个的操作数。

用在 match_operand 模式中的操作数约束，对窥孔的应用，没有直接的影响，但是它们将在后面进行有效性检查，因此确认你的约束足够通用于窥孔匹配。如果窥孔匹配了，但约束不能满足，编译器将崩溃。

忽略窥孔所有操作数的约束是安全的；或者你可以写出，作为之前测试准则的二次检查的约束。

一旦一个指令序列匹配了这些模式， CONDITION 将被检查。这是一个 C 表达式，它进行最后的判断，是否执行这个优化（如果这个表达式不是 0 ，我们执行优化）。如果 CONDITION 被忽略（换而言之，这个字符串是空的），那么优化被应用到每个匹配这些模式的指令序列。

所定义的窥孔优化在寄存器分配之后应用。因此，仅通过查看这个操作数，窥孔定义可以检查操作数最终在哪个类别的寄存器中。

引用在 CONDITION 中的操作数的方式是，对于操作数 I ，使用 operands[I] （由 (match_operand I ...) 匹配）。使用变量 insn 引用最后一个被匹配的指令；使用 prev_active_insn 找出前面的指令。

当优化使用中间结果的计算时，你可以使，仅当这个中间结果不在别处使用时， CONDITION 得到匹配。使用 C 表达式 dead_or_set_p (INSN, OP) ，其中 INSN 是，你所期望的，上一次这个值所被使用的指令（从 insn 值，连同使用 prev_nonnote_insn ），而 OP 是这个中间值（从 operands[I] ）。

应用这个优化意味着用一个新指令替换这个指令序列。 TEMPLATE 控制着这个合并指令的汇编代码的最终输出。它就像 define_insn 的模板那样工作。在这个模板中操作数的数目，与用在匹配原始指令序列的操作数数目相同。

一个被定义的窥孔优化器的结果不需要匹配，在这个机器描述中的，任一个指令模式；它甚至没有机会去匹配它们。这个窥孔优化器定义本身被用作指令模式，来控制如何输出这个指令。

所定义的窥孔优化器被运行做将要输出的汇编代码，因此它们产生的指令不会以任何方式合并或重排。

这里是一个例子，取自 68000 的机器描述。

(define_peephole

[(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4)))

(set (match_operand:DF 0 "register_operand" "=f")

(match_operand:DF 1 "register_operand" "ad"))]

"FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])"

{

rtx xoperands[2];

xoperands[1] = gen_rtx (REG, SImode, REGNO (operands[1]) + 1);

#ifdef MOTOROLA

output_asm_insn ("move.l %1,(sp)", xoperands);

output_asm_insn ("move.l %1,-(sp)", operands);

return "fmove.d (sp)+,%0";

#else

output_asm_insn ("movel %1,sp@", xoperands);

output_asm_insn ("movel %1,sp@-", operands);

return "fmoved sp@+,%0";

#endif

})

这个优化的效果是改变

jbsr _foobar

addql #4,sp

movel d1,sp@-

movel d0,sp@-

fmoved sp@+,fp0

为

jbsr _foobar

movel d1,sp@

movel d0,sp@-

fmoved sp@+,fp0

INSN-PATTERN-1 等等看起来就像 define_insn 的第二个操作数。这儿有一个重要的区别： define_insn 的第二个操作数包含一个或多个封闭在方括号里的 RTX 对象。通常，只有一个：那么相同的行为可以被写作一个 define_peephole 的一个元素。不过当在一个 define_insn 中有多个行为时，它们被隐含地包括在一个 parallel 中。那么在这个 define_peephole 里，你必须显式地写出这个 parallel ，及其中的方括号。这样如果一个指令模式看起来像这样，

(define_insn "divmodsi4"

[(set (match_operand:SI 0 "general_operand" "=d")

(div:SI (match_operand:SI 1 "general_operand" "0")

(match_operand:SI 2 "general_operand" "dmsK")))

(set (match_operand:SI 3 "general_operand" "=d")

(mod:SI (match_dup 1) (match_dup 2)))]

"TARGET_68020"

"divsl%.l %2,%3:%0")

那么在一个 peephole 中如下提及这个指令：

(define_peephole

[...

(parallel

[(set (match_operand:SI 0 "general_operand" "=d")

(div:SI (match_operand:SI 1 "general_operand" "0")

(match_operand:SI 2 "general_operand" "dmsK")))

(set (match_operand:SI 3 "general_operand" "=d")

(mod:SI (match_dup 1) (match_dup 2)))])

...]

...)

这里的处理也是非常类似：把模式中的操作数提取出来，生成唯一实例保存在 odata 中。

848 static void

849 gen_peephole (rtx peep, int lineno) in genoutput.c

850 {

851 struct data *d = xmalloc (sizeof (struct data));

852 int i;

853

854 d->code_number = next_code_number ;

855 d->index_number = next_index_number ;

856 d->lineno = lineno;

857 d->name = 0;

858

859 /* Build up the list in the same order as the insns are seen

860 i n the machine description. */

861 d->next = 0;

862 *idata_end = d;

863 idata_end = &d->next;

864

865 max_opno = -1;

866 num_dups = 0;

867 memset (d->operand, 0, sizeof (d->operand));

868

869 /* Get the number of operands by scanning all the patterns of the

870 peephole optimizer. But ignore all the rest of the information

871 thus obtained. */

872 for (i = 0; i < XVECLEN (peep, 0); i++)

873 scan_operands (d, XVECEXP (peep, 0, i), 0, 0);

874

875 d->n_operands = max_opno + 1;

876 d->n_dups = 0;

877

878 validate_insn_alternatives (d);

879 place_operands (d);

880 process_template (d, XTMPL (peep, 2));

881 }

5.5. 为 define_peephole2 产生代码

5.5.1. define_peephole2 的概览

define_peephole2 定义告诉编译器，如何用另一个指令序列来代替一个指令序列，那个额外的草稿（ scratch ）寄存器可能需要，及它们的生命周期。

(define_peephole2

[INSN-PATTERN-1

INSN-PATTERN-2

...]

"CONDITION"

[NEW-INSN-PATTERN-1

NEW-INSN-PATTERN-2

...]

"PREPARATION-STATEMENTS")

这个定义几乎与 define_split 相同，除了匹配的模式不是单个指令，而是一个指令序列。

在输出模板中要求额外的草稿寄存器是可能的。如果适合的寄存器没有释放，这个模式将简单地视作不匹配。

由在输入模式的最上层的一个 match_scratch 模式来要求草稿寄存器。被分配的寄存器（一开始）在原始序列中，在要求的这一点上将死去。如果这个草稿寄存器在多个点上使用，在输入模式的最上层，一个 match_dup 模式标记输入序列中的最后位置，在那点上这个寄存器必须可用。

这里是一个来自 IA-32 机器描述的例子：

(define_peephole2

[(match_scratch:SI 2 "r")

(parallel [(set (match_operand:SI 0 "register_operand" "")

(match_operator:SI 3 "arith_or_logical_operator"

[(match_dup 0)

(match_operand:SI 1 "memory_operand" "")]))

(clobber (reg:CC 17))])]

"! optimize_size && ! TARGET_READ_MODIFY"

[(set (match_dup 2) (match_dup 1))

(parallel [(set (match_dup 0)

(match_op_dup 3 [(match_dup 0) (match_dup 2)]))

(clobber (reg:CC 17))])]

"")

这个模式尝试分解一个载入，期望这样我们可以在内存载入延迟前后安排指令。它分配了一个类别为 GENERAL_REGS 的 Simode 寄存器（ r ），它只需要存活到算术运算之前。

一个真正要求扩展草稿寄存器生命期的例子是很难得到的，因此这里是一个愚蠢的、捏造的例子：

(define_peephole2

[(match_scratch:SI 4 "r")

(set (match_operand:SI 0 "" "") (match_operand:SI 1 "" ""))

(set (match_operand:SI 2 "" "") (match_dup 1))

(match_dup 4)

(set (match_operand:SI 3 "" "") (match_dup 1))]

"/* determine 1 does not overlap 0 and 2 */"

[(set (match_dup 4) (match_dup 1))

(set (match_dup 0) (match_dup 4))

(set (match_dup 2) (match_dup 4))]

(set (match_dup 3) (match_dup 4))]

"")

如果我们没有在输入序列的中间加入 (match_dup 4) ，它可能是这样的情形：我们在序列开头选择的寄存器被第一或第二个 set 所杀死。

在工具 genrecog （ genrecog工具）中， define_peephole2 模式被用于产生识别树。也就是说这个模式的识别由 recog 相关函数完成，而 define_peephole ，正如我们在 genpeep工具所见，则通过 peephole 。

929 static void

930 gen_split (rtx split, int lineno) in genoutput.c

931 {

932 struct data *d = xmalloc (sizeof (struct data));

933 int i;

934

935 d->code_number = next_code_number ;

936 d->index_number = next_index_number ;

937 d->lineno = lineno;

938 d->name = 0;

939

940 /* Build up the list in the same order as the insns are seen

941 i n the machine description. */

942 d->next = 0;

943 *idata_end = d;

944 idata_end = &d->next;

945

946 max_opno = -1;

947 num_dups = 0;

948 memset (d->operand, 0, sizeof (d->operand));

949

950 /* Get the number of operands by scanning all the patterns of the

951 split patterns. But ignore all the rest of the information thus

952 obtained. */

953 for (i = 0; i < XVECLEN (split, 0); i++)

954 scan_operands (d, XVECEXP (split, 0, i), 0, 0);

955

956 d->n_operands = max_opno + 1;

957 d->n_dups = 0;

958 d->n_alternatives = 0;

959 d->template = 0;

960 d->output_format = INSN_OUTPUT_FORMAT_NONE;

961

962 place_operands (d);

963 }

注意到在这个函数中仅保存了操作数，并随后输出。 define_peephole2 的转移函数（ transformation function ）在为 define_peephole2产生代码中产生。

5.6. 输出数据

回到 main , 接下来输出代码，记住 odata 保存了操作数，而 idata 保存了在 gen_insn ， gen_split ， gen_peephole 等模式处理函数中产生的 data 。这些余下的函数相当简单。

245 static void

246 output_operand_data (void) in genoutput.c

247 {

248 struct operand_data *d;

249

250 printf ("/nstatic const struct insn_operand_data operand_data[] = /n{/n");

251

252 for (d = odata ; d; d = d->next)

253 {

254 printf (" {/n");

255

256 printf (" %s,/n",

257 d->predicate && d->predicate[0] ? d->predicate : "0");

258

259 printf (" /"%s/",/n", d->constraint ? d->constraint : "");

260

261 printf (" %smode,/n", GET_MODE_NAME (d->mode));

262

263 printf (" %d,/n", d->strict_low);

264

265 printf (" %d/n", d->eliminable);

266

267 printf(" },/n");

268 }

269 printf("};/n/n/n");

270 }

output_operand_data 将输出类型为 insn_operand_data 的数组，它定义在 recog.h 中，其类型如下。

218 struct insn_operand_data in recog.h

219 {

220 const insn_operand_predicate_fn predicate;

221

222 const char *const constraint;

223

224 ENUM_BITFIELD(machine_mode) const mode : 16;

225

226 const char strict_low;

227

228 const char eliminable;

229 };

注意到在 recog.h 中很重要的数组 operand_data 在上面输出了。它记录了机器描述中所有不相同的操作数。

272 static void

273 output_insn_data (void) in genoutput.c

274 {

275 struct data *d;

276 int name_offset = 0;

277 int next_name_offset;

278 const char * last_name = 0;

279 const char * next_name = 0;

280 struct data *n;

281

282 for (n = idata , next_name_offset = 1; n; n = n->next, next_name_offset++)

283 if (n->name)

284 {

285 next_name = n->name;

286 break ;

287 }

288

289 printf ("#if GCC_VERSION >= 2007/n__extension__/n#endif/n");

290 printf ("/nconst struct insn_data insn_data[] = /n{/n");

291

292 for (d = idata ; d; d = d->next)

293 {

294 printf (" {/n");

295

296 if (d->name)

297 {

298 printf (" /"%s/",/n", d->name);

299 name_offset = 0;

300 last_name = d->name;

301 next_name = 0;

302 for (n = d->next, next_name_offset = 1; n;

303 n = n->next, next_name_offset++)

304 {

305 if (n->name)

306 {

307 next_name = n->name;

308 break ;

309 }

310 }

311 }

312 else

313 {

314 name_offset++;

315 if (next_name && (last_name == 0

316 || name_offset > next_name_offset / 2))

317 printf (" /"%s-%d/",/n", next_name,

318 next_name_offset - name_offset);

319 else

320 printf (" /"%s+%d/",/n", last_name, name_offset);

321 }

322

323 switch (d->output_format)

324 {

325 case INSN_OUTPUT_FORMAT_NONE:

326 printf ("#if HAVE_DESIGNATED_INITIALIZERS/n");

327 printf (" { 0 },/n");

328 printf ("#else/n");

329 printf (" { 0, 0, 0 },/n");

330 printf ("#endif/n");

331 break ;

332 case INSN_OUTPUT_FORMAT_SINGLE:

333 {

334 const char *p = d->template;

335 char prev = 0;

336

337 printf ("#if HAVE_DESIGNATED_INITIALIZERS/n");

338 printf (" { .single =/n");

339 printf ("#else/n");

340 printf (" {/n");

341 printf ("#endif/n");

342 printf (" /"");

343 while (*p)

344 {

345 if (IS_VSPACE (*p) && prev != '//')

346 {

347 /* Preserve two consecutive /n's or /r's, but treat /r/n

348 as a single newline. */

349 if (*p == '/n' && prev != '/r')

350 printf ("//n///n");

351 }

352 else

353 putchar (*p);

354 prev = *p;

355 ++p;

356 }

357 printf ("/",/n");

358 printf ("#if HAVE_DESIGNATED_INITIALIZERS/n");

359 printf (" },/n");

360 printf ("#else/n");

361 printf (" 0, 0 },/n");

362 printf ("#endif/n");

363 }

364 break ;

365 case INSN_OUTPUT_FORMAT_MULTI:

366 printf ("#if HAVE_DESIGNATED_INITIALIZERS/n");

367 printf (" { .multi = output_%d },/n", d->code_number);

368 printf ("#else/n");

369 printf (" { 0, output_%d, 0 },/n", d->code_number);

370 printf ("#endif/n");

371 break ;

372 case INSN_OUTPUT_FORMAT_FUNCTION:

373 printf ("#if HAVE_DESIGNATED_INITIALIZERS/n");

374 printf (" { .function = output_%d },/n", d->code_number);

375 printf ("#else/n");

376 printf (" { 0, 0, output_%d },/n", d->code_number);

377 printf ("#endif/n");

378 break ;

379 default :

380 abort ();

381 }

382

383 if (d->name && d->name[0] != '*')

384 printf (" (insn_gen_fn) gen_%s,/n", d->name);

385 else

386 printf (" 0,/n");

387

388 printf (" &operand_data[%d],/n", d->operand_number);

389 printf (" %d,/n", d->n_operands);

390 printf (" %d,/n", d->n_dups);

391 printf (" %d,/n", d->n_alternatives);

392 printf (" %d/n", d->output_format);

393

394 printf(" },/n");

395 }

396 printf ("};/n/n/n");

397 }

另一个在 recog.h 中重要的数组 insn_data 由 output_insn_data 输出。注意到 operand_data 是依次输出的，因此在 388 行， operand_number 可以被用作所有。其类型定义如下：

238 struct insn_data

239 {

240 const char *const name;

241 #if HAVE_DESIGNATED_INITIALIZERS

242 union {

243 const char *single;

244 const char *const *multi;

245 insn_output_fn function;

246 } output;

247 #else

248 struct {

249 const char *single;

250 const char *const *multi;

251 insn_output_fn function;

252 } output;

253 #endif

254 const insn_gen_fn genfun;

255 const struct insn_operand_data *const operand;

256

257 const char n_operands;

258 const char n_dups;

259 const char n_alternatives;

260 const char output_format;

261 };

显然，从这个数组，我们可以知道为一个模式产生汇编代码的输出代码。而 254 行的 genfun 实际上将指向由 genemit 工具产生的 gen_* 函数，注意其类型是：

typedef rtx (*insn_gen_fn) (rtx, ...);

可以适配所有的 gen_* 函数。下面的函数则是生成返回指令名的函数。

399 static void

400 output_get_insn_name (void) in genoutput.c

401 {

402 printf ("const char */n");

403 printf ("get_insn_name (int code)/n");

404 printf ("{/n");

405 printf (" if (code == NOOP_MOVE_INSN_CODE)/n");

406 printf (" return /"NOOP_MOVE/";/n");

407 printf (" else/n");

408 printf (" return insn_data[code].name;/n");

409 printf ("}/n");

410 }

wuhui_gdnt

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
GCC后端及汇编发布（12）

 5.3.为define_split产生代码 对于我们的例子，分解后的模式，同样有一个如下的gen_split模式。 图33：genouput - define_insn_and_split模式的例子– split部分 929 static void 930 gen_split (rtxsplit, int lineno)
复制链接

扫一扫