g++中的常用编译参数（持续更新-编译常见错误）

bostonAlen

已于 2024-01-23 14:50:41 修改

阅读量807

点赞数 1

文章标签： linux centos c++

于 2023-10-10 14:35:33 首次发布

本文链接：https://blog.csdn.net/BostonRayAlen/article/details/133746459

版权

g++ -m64 -c -o *.o -g -Ofast -std=c++11 -mcx16 -m64 -maes -mfpmath=sse -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512vbmi -march=x86-64 -mcmodel=large -Wall -Wno-write-strings -fno-defer-pop -fsigned-char -pipe

-m64：生成64位目标文件。指定生成64位代码。
-c：只编译源文件，生成目标文件，而不进行链接。常用于生成目标文件以供后续链接。
-o *.o：指定编译输出的目标文件名。这里的*.o表示输出的目标文件名是根据源文件自动生成的。
-g：生成调试信息。使得编译生成的目标文件包含调试信息，方便调试程序。
-Ofast：启用快速优化。该选项打开了多个优化选项，这样编译器可以尽量优化代码运行速度。
-std=c++11：指定使用的C++标准版本为C++11。这个参数告诉编译器要按照C++11的标准进行编译。
-mcx16：启用扩展处理器功能。该选项启用了对扩展处理器功能的支持。
-maes：启用AES指令集。该选项启用了对AES指令集的支持。
-mfpmath=sse：使用SSE指令集进行浮点数运算。该选项指定使用SSE指令集进行浮点数运算，提高程序的执行效率。
-mavx512f：启用AVX-512指令集的基础指令。该选项启用了AVX-512指令集的基础指令支持。
-mavx512dq：启用AVX-512双字队列指令。该选项启用了AVX-512双字队列指令的支持。
-mavx512ifma：启用AVX-512整数乘法指令。该选项启用了AVX-512整数乘法指令的支持。
-mavx512cd：启用AVX-512冲突检测指令。该选项启用了AVX-512冲突检测指令的支持。
-mavx512bw：启用AVX-512字节/字/字块指令。该选项启用了AVX-512字节/字/字块指令的支持。
-mavx512vl：启用AVX-512向量长度指令。该选项启用了AVX-512向量长度指令的支持。
-mavx512vbmi：启用AVX-512位变换指令。该选项启用了AVX-512位变换指令的支持。
-march=x86-64：生成编译针对x86-64架构的代码。指定生成针对x86-64架构的目标代码。
-mcmodel=large：使用大模型内存模型。该选项指定使用较大的内存模型，以便处理大型数据和代码。
-Wall：打开警告选项。该选项打开所有警告信息，使得编译器可以提示潜在的问题。
-Wno-write-strings：关闭字符串字面值警告。该选项关闭字符串字面值赋值给非const char*类型的警告。
-fno-defer-pop：不推迟栈指针的修复。该选项告诉编译器不要推迟修复栈指针，以提高程序的性能。
-fsigned-char：将char类型视为有符号类型。该选项告诉编译器将char类型视为有符号类型。
-pipe：使用管道而不是临时文件。该选项使用管道来进行编译和链接过程中的数据传输，提高编译速度。

其中各个参数可根据自己需要修改，比如m64可以是m32，-Ofast可以是-O3等等，这里仅仅列举了极少部分，但我理解这些指令集应该也比较常用。

1、***.o:(debug_info+0x1e6026):relocation truncated to fit: R_X86_64_32 agaunst ‘.debug_loc’
增加-gno-variable-location-views flag。

2、error: invalid conversion from ‘long int’ to ‘U8*’ {aka ‘unsigned char*’} [-fpermissive]
增加-fpermissive

3、编译libyang时候AttributeError: module ‘pip’ has no attribute ‘locations’ #13
https://github.com/YangCatalog/yang-validator-extractor/issues/13
修改为location = None
try:
import pip.locations as locations
location = locations.distutils_scheme(‘pyang’)
except:
try:
import pip._internal.locations as locations
location = locations.distutils_scheme(‘pyang’)
except:
pass
if location is not None:
self.dirs.append(os.path.join(location[‘data’],
‘share’,‘yang’,‘modules’))

4、python3没有生成libpython3.6m.so.1.0

../configure --enable-shared  --enable-profiling --enable-optimizations

5、docker pull失败配置 proxy
[root@QCl2opt build]# docker pull hello-world
Using default tag: latest
Error response from daemon: Get “https://registry-1.docker.io/v2/”: proxyconnect tcp: tls: first record does not look like a TLS handshake

mkdir /etc/systemd/system/docker.service.d
vi /etc/systemd/system/docker.service.d/proxy.conf
 
[Service]
 
Environment="HTTP_PROXY=http://10.71.132.38:80" 
Environment="NO_PROXY=localhost,127.0.0.0/8,10.0.0.0/8,192.168.0.0/16,172.16.0.0/12"
Environment="HTTPS_PROXY=http://10.71.132.38:80"
 
systemctl daemon-reload
systemctl restart docker

6、redhat8 bazel编译tcmalloc出错。
https://github.com/google/tcmalloc

[root@QCl2opt tcmalloc]# python
Python 3.6.6 (default, Nov 13 2023, 01:01:46)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from importlib.resources import read_binary
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'importlib.resources'
>>> import importlib.resources
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'importlib.resources'

但其实这个module已经安装了

[root@QCl2opt tcmalloc]# pip install importlib.resources
Requirement already satisfied: importlib.resources in /usr/local/lib/python3.6/site-packages (5.4.0)
Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.6/site-packages (from importlib.resources) (3.6.0)

使用的 Python 版本太旧导致的。importlib.resources 是在 Python 3.7 版本中引入的模块，因此在 Python 3.6 中使用它可能会出现问题。

要解决这个问题，您可以升级到 Python 3.7 或更高版本，或者尝试使用 importlib_resources 替代 importlib.resources。importlib_resources 是一个 Python 包，可以在 Python 2.7、3.4、3.5 和 3.6 中使用，提供了与 importlib.resources 相同的 API。

尝试安装 importlib_resources 并使用它来读取二进制文件：

pip install importlib_resources

然后替换使用以下语句：

from importlib_resources import read_binary

然后遇到

rules_python编译错误bazel

rules_python/python/pip_install/pip_repository.bzl", line 241, column 13, in _pip_repository_impl
                fail("rules_python failed: %s (%s)" % (result.stdout, result.stderr))
Error in fail: rules_python failed:  (Traceback (most recent call last):

pip_repository.bzl 内容如下，241是fail的地方，
    result = rctx.execute(
        args,
        # Manually construct the PYTHONPATH since we cannot use the toolchain here
        environment = _create_repository_execution_environment(rctx),
        timeout = rctx.attr.timeout,
        quiet = rctx.attr.quiet,
    )

    if result.return_code:
        fail("rules_python failed: %s (%s)" % (result.stdout, result.stderr))

升级python从3.6到3.8解决了。

#remove 3.6
whereis python3 |xargs rm -frv
whereis pip
rm /usr/local/bin/pip
#build and install 3.8
wget https://www.python.org/ftp/python/3.8.1/Python-3.8.1.tgz
tar -xf Python-3.8.1.tgz
cd Python-3.8.1/
mkdir build
cd build/
#--shared 生成动态库
../configure --enable-shared --enable-profiling --enable-optimizations
make -j8
make altinstall
ln -s /usr/local/bin/python3.8 /usr/bin/python3
ln -s /usr/local/bin/pip3 /usr/local/bin/pip
#如果找不到python的so，默认安装在这个路径
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH

然后遇到：

[root@QCl2opt tcmalloc]# bazel test //tcmalloc/...
INFO: Analyzed 924 targets (0 packages loaded, 0 targets configured).
INFO: Found 125 targets and 799 test targets...
ERROR: /root/tcmalloc-4/tcmalloc-master/tcmalloc/internal/BUILD:552:11: Compiling tcmalloc/internal/percpu_rseq_asm.S failed: (Exit 1): gcc failed: error executing command (from target //tcmalloc/internal:percpu) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -MD -MF ... (remaining 28 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
tcmalloc/internal/percpu_rseq_asm.S: Assembler messages:
tcmalloc/internal/percpu_rseq_asm.S:33: Error: junk at end of line, first unrecognized character is `,'
INFO: Elapsed time: 2.094s, Critical Path: 1.46s

在这里插入图片描述

7、libssh0.7.5 ge25519.h 编译错误
CMakeFiles/ssh_shared.dir/external/ge25519.c.o:/root/qc_int_tcmalloc_repo/5g_platform/build/staging/du/x86/libssh/libssh-0.7.5/include/libssh/ge25519.h:31: multiple definition of `ge25519_base’
CMakeFiles/ssh_shared.dir/external/ed25519.c.o:/root/qc_int_tcmalloc_repo/5g_platform/build/staging/du/x86/libssh/libssh-0.7.5/include/libssh/ge25519.h:31: first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [src/CMakeFiles/ssh_shared.dir/build.make:898: src/libssh.so.4.4.2] Error 1
make[1]: *** [CMakeFiles/Makefile2:148: src/CMakeFiles/ssh_shared.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

ge25519.h

增加extern
extern const ge25519 ge25519_base;

8、add -fPIC flag
/usr/bin/ld: ./obj/*.o: TLS transition from R_X86_64_GOTTPOFF to R_X86_64_TPOFF32 against REGIONID' at 0xa9f4 in section .text’ failed
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
make: *** [makefile:38: du_app] Error 1

9、gcc优化

-march=native 启用特定于给定CPU架构的指令，并且这些指令在不同架构上可能不存在。如果在不同的CPU系统上运行程序，则程序可能根本无法工作，或者速度显着降低（因为它还启用了mtune=native），因此如果您决定使用它，请注意此事项。更多信息here。
-Ofast，如您所述，启用了一些不符合标准的优化，因此也应谨慎使用。更多信息here。
尝试的其他GCC标志

不同标志的详细信息可以在这里找到。

-Ofast 启用 -ffast-math，进而启用 -fno-math-errno，-funsafe-math-optimizations，-ffinite-math-only，-fno-rounding-math，-fno-signaling-nans 和 -fcx-limited-range。您可以通过有选择地添加一些额外的标志，如 -fno-signed-zeros，-fno-trapping-math 等，更进一步地进行浮点数计算优化。这些标志不包括在 -Ofast 中，但可以在计算中提供一些额外的性能提升，但您必须检查它们是否真正对您有益并且不会破坏任何计算。
GCC 还具有大量其他未由任何 “-O” 选项启用的其他优化标志。它们被列为“可能产生错误代码的实验性选项”，因此应谨慎使用，并通过测试正确性和基准测试来检查它们的效果。尽管如此，我经常使用 -frename-registers，这个选项从未为我产生过不良结果，并且往往会给出明显的性能提升（即可以在基准测试中测量）。这是一种非常依赖于您的处理器类型的标志。 -funroll-loops 有时也会产生良好的结果（并且还暗示了 -frename-registers），但它取决于您的实际代码。
PGO

GCC具有基于性能分析的优化（Profile-Guided Optimisations）功能。虽然GCC没有太多关于此功能的精确文档，但是让它运行起来非常简单。

首先使用-fprofile-generate编译您的程序。
让程序运行（执行时间会显著变慢，因为代码还会生成.gcda文件中的性能分析信息）。
使用-fprofile-use重新编译程序。如果您的应用程序是多线程的，请添加-fprofile-correction标志。
使用GCC进行PGO可以产生惊人的结果，并且真正显著提高性能（我最近参与的一个项目中看到了15-20％的速度提升）。显然，问题在于拥有一些足够代表您的应用程序执行的数据，这并不总是可用或易于获取。

GCC的并行模式

GCC具有并行模式，该模式是在GCC 4.2编译器发布时首次推出的。

基本上，它为您提供了许多C++标准库算法的并行实现。要在全局范围内启用它们，您只需要向编译器添加-fopenmp和-D_GLIBCXX_PARALLEL标志。您也可以在需要时选择性地启用每个算法，但这将需要进行一些小的代码更改。

如果经常在大型数据结构上使用这些算法，并且有许多硬件线程上下文可用，这些并行实现可以大大提高性能。到目前为止，我只使用了sort的并行实现，但为了给出一个粗略的想法，我设法将排序时间从14秒降低到4秒，测试环境为：具有自定义比较器功能和8个内核的100万对象向量。

额外技巧
与前面的部分不同，这一部分需要对代码进行一些小的更改。它们也是GCC特定的（其中一些也适用于Clang），因此应使用编译时宏来保持代码在其他编译器上的可移植性。该部分包含一些更高级的技术，如果您没有一定的汇编水平理解，则不应使用。还要注意，处理器和编译器现在非常聪明，因此可能很难从这里描述的函数中获得任何显着的好处。
GCC内置函数，可以在这里找到。像__builtin_expect这样的构造可以通过提供分支预测信息帮助编译器进行更好的优化。其他构造，例如__builtin_prefetch将数据带入缓存以在访问之前帮助减少缓存未命中。
函数属性可以在这里找到。特别是，应该查看hot和cold属性；前者将指示编译器函数是程序的热点，并且更积极地优化该函数，并将其置于文本部分的一个特殊子部分，以便更好地定位；后者将为大小优化函数，并将其放置在文本部分的另一个特殊子部分。