《编译 - 编译杂记》GCC优化等级说明

最新推荐文章于 2023-10-09 22:05:40 发布

Bruceoxl

最新推荐文章于 2023-10-09 22:05:40 发布

阅读量4.1k

点赞数 2

分类专栏：《编译》编译杂记文章标签： GCC 优化等级

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/bruceoxl/article/details/121963682

版权

《编译》编译杂记专栏收录该内容

1 篇文章 2 订阅

订阅专栏

GCC提供了大量的优化选项，通过不同的参数可以对编译时间、目标文件大小、执行效率三个维度进行平衡，不同的GCC版本器优化等级略有不同，笔者本文以GCC 7.5为例进行说明。

在这里插入图片描述

最新的版本是GCC 11.2.0。GCC 4.6.4以上的版本优化等级是一样的，只是优化选项略有差异。

1 优化等级说明

GCC 4.6.4以上版本，有 -O0、-O1、-O2、-O3、-Os、-Ofast、-Og 几个优化等级，参数 -O1、-O2、-O3 中，随着数字变大，代码的优化程度也越高，不过这在某种意义上来说，也是以牺牲程序的可调试性为代价的。

所有的优化选项如下：

-faggressive-loop-optimizations -falign-functions[=n]
-falign-jumps[=n]
-falign-labels[=n] -falign-loops[=n]
-fassociative-math -fauto-profile -fauto-profile[=path]
-fauto-inc-dec -fbranch-probabilities
-fbranch-target-load-optimize -fbranch-target-load-optimize2
-fbtr-bb-exclusive -fcaller-saves
-fcombine-stack-adjustments -fconserve-stack
-fcompare-elim -fcprop-registers -fcrossjumping
-fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules
-fcx-limited-range
-fdata-sections -fdce -fdelayed-branch
-fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively
-fdevirtualize-at-ltrans -fdse
-fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects
-ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=style
-fforward-propagate -ffp-contract=style -ffunction-sections
-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity
-fgcse-sm -fhoist-adjacent-loads -fif-conversion
-fif-conversion2 -findirect-inlining
-finline-functions -finline-functions-called-once -finline-limit=n
-finline-small-functions -fipa-cp -fipa-cp-clone
-fipa-bit-cp -fipa-vrp
-fipa-pta -fipa-profile -fipa-pure-const -fipa-reference -fipa-icf
-fira-algorithm=algorithm
-fira-region=region -fira-hoist-pressure
-fira-loop-pressure -fno-ira-share-save-slots
-fno-ira-share-spill-slots
-fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute
-fivopts -fkeep-inline-functions -fkeep-static-functions
-fkeep-static-consts -flimit-function-alignment -flive-range-shrinkage
-floop-block -floop-interchange -floop-strip-mine
-floop-unroll-and-jam -floop-nest-optimize
-floop-parallelize-all -flra-remat -flto -flto-compression-level
-flto-partition=alg -fmerge-all-constants
-fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves
-fmove-loop-invariants -fno-branch-count-reg
-fno-defer-pop -fno-fp-int-builtin-inexact -fno-function-cse
-fno-guess-branch-probability -fno-inline -fno-math-errno -fno-peephole
-fno-peephole2 -fno-printf-return-value -fno-sched-interblock
-fno-sched-spec -fno-signed-zeros
-fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss
-fomit-frame-pointer -foptimize-sibling-calls
-fpartial-inlining -fpeel-loops -fpredictive-commoning
-fprefetch-loop-arrays
-fprofile-correction
-fprofile-use -fprofile-use=path -fprofile-values
-fprofile-reorder-functions
-freciprocal-math -free -frename-registers -freorder-blocks
-freorder-blocks-algorithm=algorithm
-freorder-blocks-and-partition -freorder-functions
-frerun-cse-after-loop -freschedule-modulo-scheduled-loops
-frounding-math -fsched2-use-superblocks -fsched-pressure
-fsched-spec-load -fsched-spec-load-dangerous
-fsched-stalled-insns-dep[=n] -fsched-stalled-insns[=n]
-fsched-group-heuristic -fsched-critical-path-heuristic
-fsched-spec-insn-heuristic -fsched-rank-heuristic
-fsched-last-insn-heuristic -fsched-dep-count-heuristic
-fschedule-fusion
-fschedule-insns -fschedule-insns2 -fsection-anchors
-fselective-scheduling -fselective-scheduling2
-fsel-sched-pipelining -fsel-sched-pipelining-outer-loops
-fsemantic-interposition -fshrink-wrap -fshrink-wrap-separate
-fsignaling-nans
-fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-loops
-fsplit-paths
-fsplit-wide-types -fssa-backprop -fssa-phiopt
-fstdarg-opt -fstore-merging -fstrict-aliasing
-fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp
-ftree-builtin-call-dce -ftree-ccp -ftree-ch
-ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts
-ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting
-ftree-loop-if-convert -ftree-loop-im
-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns
-ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize
-ftree-loop-vectorize
-ftree-parallelize-loops=n -ftree-pre -ftree-partial-pre -ftree-pta
-ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra
-ftree-switch-conversion -ftree-tail-merge
-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons
-funit-at-a-time -funroll-all-loops -funroll-loops
-funsafe-math-optimizations -funswitch-loops
-fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt
-fweb -fwhole-program -fwpa -fuse-linker-plugin

1.1 -O0

GCC编译的默认优化等级。如果没有指定上面的任何优化参数，则默认为 -O0，即没有任何选项优化。

1.2 -O1

这是最基本的优化的等级，该优化等级的目的是在短时间内生成可执行文件，主要对代码的分支、常量以及表达式等进行优化。该优化等级打开的选项如下：

-fauto-inc-dec 
-fbranch-count-reg 
-fcombine-stack-adjustments 
-fcompare-elim 
-fcprop-registers 
-fdce 
-fdefer-pop 
-fdelayed-branch 
-fdse 
-fforward-propagate 
-fguess-branch-probability 
-fif-conversion2 
-fif-conversion 
-finline-functions-called-once 
-fipa-pure-const 
-fipa-profile 
-fipa-reference 
-fmerge-constants 
-fmove-loop-invariants 
-freorder-blocks 
-fshrink-wrap 
-fshrink-wrap-separate 
-fsplit-wide-types 
-fssa-backprop 
-fssa-phiopt 
-ftree-bit-ccp 
-ftree-ccp 
-ftree-ch 
-ftree-coalesce-vars 
-ftree-copy-prop 
-ftree-dce 
-ftree-dominator-opts 
-ftree-dse 
-ftree-forwprop 
-ftree-fre 
-ftree-phiprop 
-ftree-sink 
-ftree-slsr 
-ftree-sra 
-ftree-pta 
-ftree-ter 
-funit-at-a-time

1.3 -O2

与O1比较而言，O2优化增加了编译时间的基础上，提高了生成代码的执行效率。相对-O1打开了如下选项：

-fthread-jumps 
-falign-functions  -falign-jumps 
-falign-loops  -falign-labels 
-fcaller-saves 
-fcrossjumping 
-fcse-follow-jumps  -fcse-skip-blocks 
-fdelete-null-pointer-checks 
-fdevirtualize -fdevirtualize-speculatively 
-fexpensive-optimizations 
-fgcse  -fgcse-lm  
-fhoist-adjacent-loads 
-finline-small-functions 
-findirect-inlining 
-fipa-cp 
-fipa-bit-cp 
-fipa-vrp 
-fipa-sra 
-fipa-icf 
-fisolate-erroneous-paths-dereference 
-flra-remat 
-foptimize-sibling-calls 
-foptimize-strlen 
-fpartial-inlining 
-fpeephole2 
-freorder-blocks-algorithm=stc 
-freorder-blocks-and-partition -freorder-functions 
-frerun-cse-after-loop  
-fsched-interblock  -fsched-spec 
-fschedule-insns  -fschedule-insns2 
-fstore-merging 
-fstrict-aliasing -fstrict-overflow 
-ftree-builtin-call-dce 
-ftree-switch-conversion -ftree-tail-merge 
-fcode-hoisting 
-ftree-pre 
-ftree-vrp 
-fipa-ra

1.4 -Os

-Os 是在 -O2 的基础上，去掉了那些会导致最终可执行程序增大的优化，如果想要更小的可执行程序，可选择这个参数。

在-O2基础上关闭了以下参数：

-falign-functions 
-falign-jumps  
-falign-loops 
-falign-labels  
-fprefetch-loop-arrays

1.5-O3

在-O2的基础上进行更多的优化，例如使用伪寄存器网络，普通函数的内联，以及针对循环的更多优化。该优化等级会延长编译时间，用-O3来编译所有的软件包将产生更大体积更耗内存的二进制文件，大大增加编译失败的机会或不可预知的程序行为，不建议使用。

该优化等级是在包含了O2所有的优化的基础上，打开了以下优化选项：

-finline-functions
-funswitch-loops
-fpredictive-commoning
-fgcse-after-reload
-ftree-loop-vectorize
-ftree-loop-distribute-patterns
-fsplit-paths -ftree-slp-vectorize
-fvect-cost-model
-ftree-partial-pre
-fpeel-loops -fipa-cp-clone

1.6 -Ofast

-Ofast 是在 -O3 的基础上，添加了一些非常规优化，这些优化是通过打破一些国际标准（比如一些数学函数的实现标准）来实现的，所以一般不推荐使用该参数。

1.7 -Og

-Og 是在 -O1 的基础上，去掉了那些影响调试的优化，所以如果最终是为了调试程序，可以使用这个参数。不过光有这个参数也是不行的，这个参数只是告诉编译器，编译后的代码不要影响调试，但调试信息的生成还是靠 -g 参数的。

如果想看当前版本的GCC优化等级开启了何种选项，可以使用 gcc -Q --help=optimizers 命令来查询。

$gcc -Q --help=optimizers -O1

在这里插入图片描述

2 实例

接下来通过一个实例来说明优化等级的区别。

完整代码如下：

/**
  ******************************************************************************
  * @file                main.c
  * @author              BruceOu
  * @version             V1.0
  * @date                2021-12-06
  * @blog                https://blog.bruceou.cn/
  * @Official Accounts   嵌入式实验楼
  * @brief               
  ******************************************************************************
  */
/**Includes*********************************************************************/
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>

/**Typedef**********************************************************************/
typedef int data_t;

typedef struct _node_
{
	data_t data;
	struct _node_ *next;
} linknode_t, *linklist;

typedef struct
{
	linklist front, rear;
} linkqueue;

/**Function********************************************************************/
linkqueue *CreateEmptyLinkqueue();
int EmptyLinkqueue(linkqueue *lqueue);
int EnLinkqueue(linkqueue *lqueue, data_t x);
int DeLinkqueue(linkqueue *lqueue,data_t *x);
void Linkqueue_show(linkqueue *lqueue);
void ClearLinkqueue(linkqueue *lqueue);
void DestroyLinkqueue(linkqueue *lqueue);


/**
  * @brief     主函数
  * @param     None
  * @retval    None
  */
int main(int argc,char **argv)
{
	int i,x;
	linkqueue *lqueue;

	lqueue = CreateEmptyLinkqueue();
	for (i=1; i<=6; i++)
	{
		EnLinkqueue(lqueue, i);
	}
	Linkqueue_show(lqueue);
	i = 3;
	while (i-- )
	{
		DeLinkqueue(lqueue,&x);
		printf("%d ",x);
	}
	printf("\n");
	Linkqueue_show(lqueue);
	
	ClearLinkqueue(lqueue);

	if(EmptyLinkqueue(lqueue))
	{
		printf("The lqueue is empty!\n"); 
	}
	
	DestroyLinkqueue(lqueue);
	printf("The lqueue is destroyed!\n"); 		
	return 0;
}


/**
  * @brief     创建链式队列函数
  * @param     None
  * @retval    成功返回lq
  */
linkqueue *CreateEmptyLinkqueue()
{
	linkqueue *lqueue;

	lqueue = (linkqueue *)malloc(sizeof(linkqueue));
	if(lqueue == NULL)  
        return NULL;  

	lqueue->front = lqueue->rear = (linklist)malloc(sizeof(linknode_t));
	if(lqueue->front == NULL)  
        return NULL; 

	lqueue->front->next = NULL;

	return lqueue;
}

/**
  * @brief     判断链式队列是否为空函数
  * @param     lqueue
  * @retval    为空返回1，不为空返回0，失败返回-1
  */
int EmptyLinkqueue(linkqueue *lqueue)
{
	if(lqueue == NULL)  
		return -1;  

	return ((lqueue->front == lqueue->rear)?1:0);
}

/**
  * @brief     链式队列入队函数
  * @param     lqueue
               x
  * @retval    成功返回0，失败返回-1
  */
int EnLinkqueue(linkqueue *lqueue, data_t x)
{
	linklist p;
	if(lqueue == NULL)  
        return -1;  

	p = (linklist)malloc(sizeof(linknode_t));
	if(p == NULL)
	{
		return -1;
	}
	p->data = x;
	p->next = NULL;

	if(lqueue->front->next == NULL)  
    {  
        lqueue->front->next = lqueue->rear = p;  
    }  
    else  
    {  
        lqueue->rear->next = p;  
        lqueue->rear = p;  
    }  
    return 0;     
}

/**
  * @brief     链式队列出队函数
  * @param     lqueue
               x
  * @retval    成功返回0，失败返回-1
  */
int DeLinkqueue(linkqueue *lqueue,data_t *x)
{
	linknode_t *node_remove;  
    if(lqueue == NULL || lqueue->front->next == NULL) 
	{
		return -1;  
	}
    node_remove = lqueue->front->next;  
    lqueue->front->next = node_remove->next;  
    
	if(x != NULL)  
    {
		*x = node_remove->data;  
	}
    free(node_remove);  
    return 0;  
}

/**
  * @brief     打印链式队列数据函数
  * @param     queue
  * @retval    None
  */
void Linkqueue_show(linkqueue *lqueue)
{
	linknode_t *p;
	
	if(lqueue->front) 
	{
		p = lqueue->front->next;
	}
	while(p)
	{
		printf("%d ",p->data);
		p = p->next;
	}
	printf("\n");
}

/**
  * @brief     清空链式队列函数
  * @param     lqueue
  * @retval    None
  */
void ClearLinkqueue(linkqueue *lqueue)
{
	linknode_t *qnode;  
  
    while(lqueue->front)  
    {  
        qnode = lqueue->front;  
        lqueue->front= qnode->next;  
        free(qnode);  
    }  
	lqueue->rear = NULL;
}

/**
  * @brief     摧毁链式队列函数
  * @param     lqueue
  * @retval    None
  */
void DestroyLinkqueue(linkqueue *lqueue)  
{  
    if(lqueue != NULL)  
    {  
        ClearLinkqueue(lqueue);  
        free(lqueue);  
    }  
}

默认的编译方式如下：

$gcc -O0 main.c -o main-O0

在这里插入图片描述

接下来选择更高的优化等级。

在这里插入图片描述

优化等级越高，所需时间越长，但程序运行起来一般会更高效。值得注意的是，-Os不仅优化了代码，而且优化了尺寸，因此相对其他优化方式尺寸更小，这个优化选项在嵌入式中就有尤为关键，毕竟MCU的资源比较稀缺。

看大小不好去比较，我们可以通过看看汇编文件。

GCC生成汇编 (Assembly)只需要加参数-S即可。。

$ gcc -O0 -S main.c -o main-O0.s
$ gcc -Os -S main.c -o main-Os.s

在这里插入图片描述

这里生成-O0和-Os不同优化等级的汇编文件。
【main-O0.s】

	.file	"main.c"
	.text
	.section	.rodata
.LC0:
	.string	"%d "
.LC1:
	.string	"The lqueue is empty!"
.LC2:
	.string	"The lqueue is destroyed!"
	.text
	.globl	main
	.type	main, @function
main:
.LFB5:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$48, %rsp
	movl	%edi, -36(%rbp)
	movq	%rsi, -48(%rbp)
	movq	%fs:40, %rax
	movq	%rax, -8(%rbp)
	xorl	%eax, %eax
	movl	$0, %eax
	call	CreateEmptyLinkqueue
	movq	%rax, -16(%rbp)
	movl	$1, -20(%rbp)
	jmp	.L2
.L3:
	movl	-20(%rbp), %edx
	movq	-16(%rbp), %rax
	movl	%edx, %esi
	movq	%rax, %rdi
	call	EnLinkqueue
	addl	$1, -20(%rbp)
.L2:
	cmpl	$6, -20(%rbp)
	jle	.L3
	movq	-16(%rbp), %rax
	movq	%rax, %rdi
	call	Linkqueue_show
	movl	$3, -20(%rbp)
	jmp	.L4
.L5:
	leaq	-24(%rbp), %rdx
	movq	-16(%rbp), %rax
	movq	%rdx, %rsi
	movq	%rax, %rdi
	call	DeLinkqueue
	movl	-24(%rbp), %eax
	movl	%eax, %esi
	leaq	.LC0(%rip), %rdi
	movl	$0, %eax
	call	printf@PLT
.L4:
	movl	-20(%rbp), %eax
	leal	-1(%rax), %edx
	movl	%edx, -20(%rbp)
	testl	%eax, %eax
	jne	.L5
	movl	$10, %edi
	call	putchar@PLT
	movq	-16(%rbp), %rax
	movq	%rax, %rdi
	call	Linkqueue_show
	movq	-16(%rbp), %rax
	movq	%rax, %rdi
	call	ClearLinkqueue
	movq	-16(%rbp), %rax
	movq	%rax, %rdi
	call	EmptyLinkqueue
	testl	%eax, %eax
	je	.L6
	leaq	.LC1(%rip), %rdi
	call	puts@PLT
.L6:
	movq	-16(%rbp), %rax
	movq	%rax, %rdi
	call	DestroyLinkqueue
	leaq	.LC2(%rip), %rdi
	call	puts@PLT
	movl	$0, %eax
	movq	-8(%rbp), %rcx
	xorq	%fs:40, %rcx
	je	.L8
	call	__stack_chk_fail@PLT
.L8:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE5:
	.size	main, .-main
	.globl	CreateEmptyLinkqueue
	.type	CreateEmptyLinkqueue, @function
CreateEmptyLinkqueue:
.LFB6:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	$16, %edi
	call	malloc@PLT
	movq	%rax, -8(%rbp)
	cmpq	$0, -8(%rbp)
	jne	.L10
	movl	$0, %eax
	jmp	.L11
.L10:
	movl	$16, %edi
	call	malloc@PLT
	movq	%rax, %rdx
	movq	-8(%rbp), %rax
	movq	%rdx, 8(%rax)
	movq	-8(%rbp), %rax
	movq	8(%rax), %rdx
	movq	-8(%rbp), %rax
	movq	%rdx, (%rax)
	movq	-8(%rbp), %rax
	movq	(%rax), %rax
	testq	%rax, %rax
	jne	.L12
	movl	$0, %eax
	jmp	.L11
.L12:
	movq	-8(%rbp), %rax
	movq	(%rax), %rax
	movq	$0, 8(%rax)
	movq	-8(%rbp), %rax
.L11:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE6:
	.size	CreateEmptyLinkqueue, .-CreateEmptyLinkqueue
	.globl	EmptyLinkqueue
	.type	EmptyLinkqueue, @function
EmptyLinkqueue:
.LFB7:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movq	%rdi, -8(%rbp)
	cmpq	$0, -8(%rbp)
	jne	.L14
	movl	$-1, %eax
	jmp	.L15
.L14:
	movq	-8(%rbp), %rax
	movq	(%rax), %rdx
	movq	-8(%rbp), %rax
	movq	8(%rax), %rax
	cmpq	%rax, %rdx
	sete	%al
	movzbl	%al, %eax
.L15:
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE7:
	.size	EmptyLinkqueue, .-EmptyLinkqueue
	.globl	EnLinkqueue
	.type	EnLinkqueue, @function
EnLinkqueue:
.LFB8:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$32, %rsp
	movq	%rdi, -24(%rbp)
	movl	%esi, -28(%rbp)
	cmpq	$0, -24(%rbp)
	jne	.L17
	movl	$-1, %eax
	jmp	.L18
.L17:
	movl	$16, %edi
	call	malloc@PLT
	movq	%rax, -8(%rbp)
	cmpq	$0, -8(%rbp)
	jne	.L19
	movl	$-1, %eax
	jmp	.L18
.L19:
	movq	-8(%rbp), %rax
	movl	-28(%rbp), %edx
	movl	%edx, (%rax)
	movq	-8(%rbp), %rax
	movq	$0, 8(%rax)
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	movq	8(%rax), %rax
	testq	%rax, %rax
	jne	.L20
	movq	-24(%rbp), %rax
	movq	-8(%rbp), %rdx
	movq	%rdx, 8(%rax)
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	movq	-24(%rbp), %rdx
	movq	8(%rdx), %rdx
	movq	%rdx, 8(%rax)
	jmp	.L21
.L20:
	movq	-24(%rbp), %rax
	movq	8(%rax), %rax
	movq	-8(%rbp), %rdx
	movq	%rdx, 8(%rax)
	movq	-24(%rbp), %rax
	movq	-8(%rbp), %rdx
	movq	%rdx, 8(%rax)
.L21:
	movl	$0, %eax
.L18:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE8:
	.size	EnLinkqueue, .-EnLinkqueue
	.globl	DeLinkqueue
	.type	DeLinkqueue, @function
DeLinkqueue:
.LFB9:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$32, %rsp
	movq	%rdi, -24(%rbp)
	movq	%rsi, -32(%rbp)
	cmpq	$0, -24(%rbp)
	je	.L23
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	movq	8(%rax), %rax
	testq	%rax, %rax
	jne	.L24
.L23:
	movl	$-1, %eax
	jmp	.L25
.L24:
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	movq	8(%rax), %rax
	movq	%rax, -8(%rbp)
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	movq	-8(%rbp), %rdx
	movq	8(%rdx), %rdx
	movq	%rdx, 8(%rax)
	cmpq	$0, -32(%rbp)
	je	.L26
	movq	-8(%rbp), %rax
	movl	(%rax), %edx
	movq	-32(%rbp), %rax
	movl	%edx, (%rax)
.L26:
	movq	-8(%rbp), %rax
	movq	%rax, %rdi
	call	free@PLT
	movl	$0, %eax
.L25:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE9:
	.size	DeLinkqueue, .-DeLinkqueue
	.globl	Linkqueue_show
	.type	Linkqueue_show, @function
Linkqueue_show:
.LFB10:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$32, %rsp
	movq	%rdi, -24(%rbp)
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	testq	%rax, %rax
	je	.L29
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	movq	8(%rax), %rax
	movq	%rax, -8(%rbp)
	jmp	.L29
.L30:
	movq	-8(%rbp), %rax
	movl	(%rax), %eax
	movl	%eax, %esi
	leaq	.LC0(%rip), %rdi
	movl	$0, %eax
	call	printf@PLT
	movq	-8(%rbp), %rax
	movq	8(%rax), %rax
	movq	%rax, -8(%rbp)
.L29:
	cmpq	$0, -8(%rbp)
	jne	.L30
	movl	$10, %edi
	call	putchar@PLT
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE10:
	.size	Linkqueue_show, .-Linkqueue_show
	.globl	ClearLinkqueue
	.type	ClearLinkqueue, @function
ClearLinkqueue:
.LFB11:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$32, %rsp
	movq	%rdi, -24(%rbp)
	jmp	.L32
.L33:
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	movq	%rax, -8(%rbp)
	movq	-8(%rbp), %rax
	movq	8(%rax), %rdx
	movq	-24(%rbp), %rax
	movq	%rdx, (%rax)
	movq	-8(%rbp), %rax
	movq	%rax, %rdi
	call	free@PLT
.L32:
	movq	-24(%rbp), %rax
	movq	(%rax), %rax
	testq	%rax, %rax
	jne	.L33
	movq	-24(%rbp), %rax
	movq	$0, 8(%rax)
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE11:
	.size	ClearLinkqueue, .-ClearLinkqueue
	.globl	DestroyLinkqueue
	.type	DestroyLinkqueue, @function
DestroyLinkqueue:
.LFB12:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movq	%rdi, -8(%rbp)
	cmpq	$0, -8(%rbp)
	je	.L36
	movq	-8(%rbp), %rax
	movq	%rax, %rdi
	call	ClearLinkqueue
	movq	-8(%rbp), %rax
	movq	%rax, %rdi
	call	free@PLT
.L36:
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE12:
	.size	DestroyLinkqueue, .-DestroyLinkqueue
	.ident	"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
	.section	.note.GNU-stack,"",@progbits

【main-Os.s】

	.file	"main.c"
	.text
	.globl	CreateEmptyLinkqueue
	.type	CreateEmptyLinkqueue, @function
CreateEmptyLinkqueue:
.LFB26:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	.cfi_offset 3, -16
	movl	$16, %edi
	call	malloc@PLT
	testq	%rax, %rax
	jne	.L2
.L4:
	xorl	%ebx, %ebx
	jmp	.L1
.L2:
	movl	$16, %edi
	movq	%rax, %rbx
	call	malloc@PLT
	testq	%rax, %rax
	movq	%rax, 8(%rbx)
	movq	%rax, (%rbx)
	je	.L4
	movq	$0, 8(%rax)
.L1:
	movq	%rbx, %rax
	popq	%rbx
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE26:
	.size	CreateEmptyLinkqueue, .-CreateEmptyLinkqueue
	.globl	EmptyLinkqueue
	.type	EmptyLinkqueue, @function
EmptyLinkqueue:
.LFB27:
	.cfi_startproc
	orl	$-1, %eax
	testq	%rdi, %rdi
	je	.L10
	movq	8(%rdi), %rax
	cmpq	%rax, (%rdi)
	sete	%al
	movzbl	%al, %eax
.L10:
	ret
	.cfi_endproc
.LFE27:
	.size	EmptyLinkqueue, .-EmptyLinkqueue
	.globl	EnLinkqueue
	.type	EnLinkqueue, @function
EnLinkqueue:
.LFB28:
	.cfi_startproc
	testq	%rdi, %rdi
	je	.L25
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	pushq	%rbx
	.cfi_def_cfa_offset 24
	.cfi_offset 3, -24
	movq	%rdi, %rbx
	movl	$16, %edi
	movl	%esi, %ebp
	subq	$8, %rsp
	.cfi_def_cfa_offset 32
	call	malloc@PLT
	testq	%rax, %rax
	jne	.L26
	orl	$-1, %eax
	jmp	.L13
.L26:
	movq	(%rbx), %rdx
	movq	$0, 8(%rax)
	movl	%ebp, (%rax)
	cmpq	$0, 8(%rdx)
	jne	.L17
	movq	%rax, 8(%rbx)
	movq	%rax, 8(%rdx)
	jmp	.L24
.L17:
	movq	8(%rbx), %rdx
	movq	%rax, 8(%rdx)
	movq	%rax, 8(%rbx)
.L24:
	xorl	%eax, %eax
.L13:
	popq	%rdx
	.cfi_def_cfa_offset 24
	popq	%rbx
	.cfi_def_cfa_offset 16
	popq	%rbp
	.cfi_def_cfa_offset 8
	ret
.L25:
	.cfi_restore 3
	.cfi_restore 6
	orl	$-1, %eax
	ret
	.cfi_endproc
.LFE28:
	.size	EnLinkqueue, .-EnLinkqueue
	.globl	DeLinkqueue
	.type	DeLinkqueue, @function
DeLinkqueue:
.LFB29:
	.cfi_startproc
	orl	$-1, %eax
	testq	%rdi, %rdi
	je	.L36
	movq	(%rdi), %rdx
	movq	8(%rdx), %rdi
	testq	%rdi, %rdi
	je	.L36
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	movq	8(%rdi), %rax
	testq	%rsi, %rsi
	movq	%rax, 8(%rdx)
	je	.L29
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
.L29:
	call	free@PLT
	xorl	%eax, %eax
	popq	%rdx
	.cfi_def_cfa_offset 8
	ret
.L36:
	ret
	.cfi_endproc
.LFE29:
	.size	DeLinkqueue, .-DeLinkqueue
	.section	.rodata.str1.1,"aMS",@progbits,1
.LC0:
	.string	"%d "
	.text
	.globl	Linkqueue_show
	.type	Linkqueue_show, @function
Linkqueue_show:
.LFB30:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	pushq	%rbx
	.cfi_def_cfa_offset 24
	.cfi_offset 3, -24
	subq	$8, %rsp
	.cfi_def_cfa_offset 32
	movq	(%rdi), %rax
	testq	%rax, %rax
	je	.L40
	movq	8(%rax), %rbx
.L40:
	leaq	.LC0(%rip), %rbp
.L41:
	testq	%rbx, %rbx
	je	.L47
	movl	(%rbx), %edx
	movq	%rbp, %rsi
	movl	$1, %edi
	xorl	%eax, %eax
	call	__printf_chk@PLT
	movq	8(%rbx), %rbx
	jmp	.L41
.L47:
	popq	%rax
	.cfi_def_cfa_offset 24
	popq	%rbx
	.cfi_def_cfa_offset 16
	popq	%rbp
	.cfi_def_cfa_offset 8
	movl	$10, %edi
	jmp	putchar@PLT
	.cfi_endproc
.LFE30:
	.size	Linkqueue_show, .-Linkqueue_show
	.globl	ClearLinkqueue
	.type	ClearLinkqueue, @function
ClearLinkqueue:
.LFB31:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	.cfi_offset 3, -16
	movq	%rdi, %rbx
.L49:
	movq	(%rbx), %rdi
	testq	%rdi, %rdi
	je	.L52
	movq	8(%rdi), %rax
	movq	%rax, (%rbx)
	call	free@PLT
	jmp	.L49
.L52:
	movq	$0, 8(%rbx)
	popq	%rbx
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE31:
	.size	ClearLinkqueue, .-ClearLinkqueue
	.globl	DestroyLinkqueue
	.type	DestroyLinkqueue, @function
DestroyLinkqueue:
.LFB32:
	.cfi_startproc
	testq	%rdi, %rdi
	je	.L53
	pushq	%rbx
	.cfi_def_cfa_offset 16
	.cfi_offset 3, -16
	movq	%rdi, %rbx
	call	ClearLinkqueue
	movq	%rbx, %rdi
	popq	%rbx
	.cfi_restore 3
	.cfi_def_cfa_offset 8
	jmp	free@PLT
.L53:
	ret
	.cfi_endproc
.LFE32:
	.size	DestroyLinkqueue, .-DestroyLinkqueue
	.section	.rodata.str1.1
.LC1:
	.string	"The lqueue is empty!"
.LC2:
	.string	"The lqueue is destroyed!"
	.section	.text.startup,"ax",@progbits
	.globl	main
	.type	main, @function
main:
.LFB25:
	.cfi_startproc
	pushq	%r12
	.cfi_def_cfa_offset 16
	.cfi_offset 12, -16
	pushq	%rbp
	.cfi_def_cfa_offset 24
	.cfi_offset 6, -24
	movl	$1, %ebp
	pushq	%rbx
	.cfi_def_cfa_offset 32
	.cfi_offset 3, -32
	subq	$16, %rsp
	.cfi_def_cfa_offset 48
	movq	%fs:40, %rax
	movq	%rax, 8(%rsp)
	xorl	%eax, %eax
	call	CreateEmptyLinkqueue
	movq	%rax, %rbx
.L59:
	movl	%ebp, %esi
	movq	%rbx, %rdi
	incl	%ebp
	call	EnLinkqueue
	cmpl	$7, %ebp
	jne	.L59
	leaq	4(%rsp), %r12
	movq	%rbx, %rdi
	movl	$4, %ebp
	call	Linkqueue_show
.L60:
	decl	%ebp
	je	.L69
	movq	%r12, %rsi
	movq	%rbx, %rdi
	call	DeLinkqueue
	movl	4(%rsp), %edx
	leaq	.LC0(%rip), %rsi
	movl	$1, %edi
	xorl	%eax, %eax
	call	__printf_chk@PLT
	jmp	.L60
.L69:
	movl	$10, %edi
	call	putchar@PLT
	movq	%rbx, %rdi
	call	Linkqueue_show
	movq	%rbx, %rdi
	call	ClearLinkqueue
	movq	%rbx, %rdi
	call	EmptyLinkqueue
	testl	%eax, %eax
	je	.L62
	leaq	.LC1(%rip), %rdi
	call	puts@PLT
.L62:
	movq	%rbx, %rdi
	call	DestroyLinkqueue
	leaq	.LC2(%rip), %rdi
	call	puts@PLT
	xorl	%eax, %eax
	movq	8(%rsp), %rcx
	xorq	%fs:40, %rcx
	je	.L63
	call	__stack_chk_fail@PLT
.L63:
	addq	$16, %rsp
	.cfi_def_cfa_offset 32
	popq	%rbx
	.cfi_def_cfa_offset 24
	popq	%rbp
	.cfi_def_cfa_offset 16
	popq	%r12
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE25:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
	.section	.note.GNU-stack,"",@progbits

从汇编代码可以看出，不仅大小有所变化，其中有很多细节都是被优化过的。

关于GCC的的优化选项可参看GCC官方手册。

Optimize-Options

笔者本文是就gcc做的分析，arm-gcc也是差不多，不同版本有些许选项不同罢了。