Linux的CentOS7.9内核升级后GPFS编译全过程指南

背景:

原GPFS(General Parallel File System)使用了4年,一直很正常, 基本没有什么问题, 最近应用环境升级, 把linux的 CentOS7.9的内核由3.10.0-1160.el7.x86_64升级到了3.10.0-1160.119.1.el7.x86_64, 导制IB网卡的驱动也要升级, 同时GPFS报错无法启动.升级IB网卡驱动后, GPFS更新编译后一直报错,无法升级.

系统版本: CentOS7.9

内核: 3.10.0-1160.119.1.el7.x86_64

GPFS: 5.0.5.4

报错信息: 

mmbuildgpl: Building GPL (5.0.5.4) module begins at Tue May 13 13:15:50 CST 2025.
--------------------------------------------------------/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o: warning: objtool: kxGanesha.cold()+0x0: frame pointer state mismatch
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/ss_x86_64.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfslinux.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/kdump-kern.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/kdump-stub.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/kdump-kern-dummy.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/kdump-kern-dwarfs.o
  HOSTCC  /usr/lpp/mmfs/src/gpl-linux/lxtrace.o
  HOSTCC  /usr/lpp/mmfs/src/gpl-linux/lxtrace_rl.o
  HOSTCC  /usr/lpp/mmfs/src/gpl-linux/overwrite.o
  HOSTLD  /usr/lpp/mmfs/src/gpl-linux/lxtrace
  Building modules, stage 2.
  MODPOST 5 modules
  CC      /usr/lpp/mmfs/src/gpl-linux/kdump-kern-dummy.mod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/kdump-kern-dummy.ko
  CC      /usr/lpp/mmfs/src/gpl-linux/kdump-kern-dwarfs.mod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/kdump-kern-dwarfs.ko
  CC      /usr/lpp/mmfs/src/gpl-linux/mmfs26.mod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.ko
  CC      /usr/lpp/mmfs/src/gpl-linux/mmfslinux.mod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfslinux.ko
  CC      /usr/lpp/mmfs/src/gpl-linux/tracedev.mod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.ko
make[2]: Leaving directory `/usr/src/kernels/3.10.0-1160.119.1.el7.x86_64'
cc -g0 -O2 -fomit-frame-pointer -fno-strict-aliasing -fno-exceptions -fPIC -D_FORTIFY_SOURCE=2 -fstack-protector   -I.  -I/usr/lpp/mmfs/src/../include -I/usr/lpp/mmfs/src/include -I/usr/lpp/mmfs/src/include/cxi -I/usr/lpp/mmfs/src/include/gpl-linux -I/usr/include  -DGPFS_ARCH_X86_64 -D__64BIT__ -DGPFS_LITTLE_ENDIAN -DAPI_32BIT -DGPFS_IEEE754_FLOAT -D__USE_BSD -D_LARGEFILE64_SOURCE -DGPFS_LINUX -DYESSTR=__YESSTR -DNOSTR=__NOSTR -DSSEG_SWIZZLE_PTRS -DCTDB -DGANESHA -DFAST_BUFF_SCAN  -DP_NFS4    -DASYNC_PREFETCH -DRDMA_SUPPORT -DFAST_CONDVAR -DUSER_COUNTERS -DUC_RDMA_PERF  -DGNR_NO_SCSI -DENABLE_NVME_SUPPORT -DEXIST_NVME_H -DNSD_FAST_CONDVAR -DNSPD_FAST_CONDVAR -DDEBUG_MEM_LEAK  -DREDHAT_AS_LINUX  -DZLIB -DLWE_KAFKA -DLWE_QFI -DLARGE_MEM_POOL -DUSE_CLOCK_GETTIME -DLINUX_KERNEL_VERSION=31000999 -DLINUX_KERNEL_VERSION_VERBOSE=310001160119001  -DLIMIT_KSTACKS -DADMIN_PREOPEN_FILES -DKEY_PROTECT_ADMIN -DADMIN_SUEXEC_WRAPPER -DGPFS_CACHE_ASYNC -DSNAP_TYPE  -DPTH_DISABLE_CANCELLATION -DSANERGY -DUIDREMAP -DTRIGGERS -DFSCK_REPLICA_REPAIR -DCSTORE -DSQLITE_PRESENT -DGPFS_CRYPTO_KERNELPATH -DPARALLEL_LOGRECOVERY -DRGCM_BACKPORT -DRGCM -DMESTOR -DMESTOR_GEMS -DMESTOR_GEMS_SMP  -DNHIST -DOHIST -DMMPMON_EZSTATS  -DDISKMAN_PROXY -DDISKMAN_LMR  -DIEXPAND_V2 -DDYN_NSD_SERVER -DQOSIO -DQOSFSET -DVARIANT_SUBBLOCKS -DVARIANT_SUBBLOCKS_DEBUG -DGPFS_THIN_DISK -DGPFS_THIN_DISK_ADMIN -DGPFS_THIN_DISK_DEBUG -DGNR_TRIM_SUPPORT  -DRAPID_REPAIR_PLUS  -DCES -DADVLUM -DCAPACITY_PRICING -DTSCOMM_SECURITY_GSKIT -DDAEMON_NO_AUTH -DNO_AUTH_ADMIN -DNETWORK_DIAG  -DQUOTA_ONLINE_PERFILESET_CHANGE -DFILESET_COMPLIANCE_PLUS_SEMANTICS -DENC_KEY_PROTECT -DTM_DYN_MALLOC -DGEN_NODE_UID -DTRACK_USECOUNT -DUSE_UTF8SCAN_DIRSEARCH -DUSE_UTF8SCAN_FOLDNAME  -DENABLE_UTF8SCAN -DDISABLE_UTF8WIDE -DSMOOTH_BACKGROUND_SYNC -DNO_DEV_ADMIN -DNO_DEV_MMFS -DSYSLOG_SUPPORT_ADMIN -DSNAPSHOT_ILM  -DPCACHE_MONITORING  -DLROC  -DMUTEXHELD_CHECK -DPOLICY_PARTITIONS -DMAINT_MODE       -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR()" -fno-stack-protector -Wformat=0 -Wno-format-security -I/usr/lpp/mmfs/src/gpl-linux -c kdump.c
cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump     -lpthread
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: kdump-kern.o: in function `GetOffset':
kdump-kern.c:(.text+0x15): undefined reference to `__x86_return_thunk'
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: kdump-kern.o: in function `KernInit':
kdump-kern.c:(.text+0x1a9): undefined reference to `__x86_return_thunk'
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: kdump-kern.o: in function `GenericGet':
kdump-kern.c:(.text+0x34a): undefined reference to `__x86_return_thunk'
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: kdump-kern.c:(.text+0x360): undefined reference to `__x86_return_thunk'
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: kdump-kern.o: in function `tiInit':
kdump-kern.c:(.text+0x3bc): undefined reference to `__x86_return_thunk'
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: kdump-kern.o:kdump-kern.c:(.text+0x445): more undefined references to `__x86_return_thunk' follow
collect2: error: ld returned 1 exit status
make[1]: *** [modules] Error 1
make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
make: *** [Modules] Error 1
--------------------------------------------------------
mmbuildgpl: Building GPL module failed at Tue May 13 13:16:05 CST 2025.
--------------------------------------------------------
mmbuildgpl: Command failed. Examine previous error messages to determine cause.

处理过程

Debug思路

编译GPFS内核出错(/usr/lpp/mmfs/bin/mmbuildgpl), 通过提示是报“__x86_return_thunk”错误, 可以肯定编译环境有关, 大概的方向就是编译工具的兼容性问题,处理好兼容性问题,应该就可以正常编译了.

处理方法

利用AI分析错误提示日志, 显示GCC版本太低,所以升级GCC

[root@node9 ~]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

升级GCC

yum update
yum -y install gcc   gcc-gfortran gcc-c++

发现升级后, 还是gcc (GCC) 4.8.5, 卸载后重装最高版本也只能是gcc (GCC) 4.8.5.

通义和ChatGPT后发现需要升级, 提示升到gcc8, 手动按装. 同时从 Makefile 中移除 -fno-cf-protection

# 安装 devtoolset-8
yum install centos-release-scl
yum install devtoolset-8

# 启用新的工具链
scl enable devtoolset-8 bash

照做了, 把/usr/lpp/mmfs/src中的makefile里移除 -fno-cf-protection, 依然报错.

/bin/bash: line 4: d: command not found
make[1]: Entering directory `/usr/lpp/mmfs/src'
make[1]: *** No rule to make target `modules'.  Stop.
make[1]: Leaving directory `/usr/lpp/mmfs/src'
make: *** [Modules] Error 1

以为没有报“__x86_return_thunk”错,怀疑8版本是不是也太低, 就依次升级到了9, 10, 11 都同样不行. 中间还有一段小插曲, gcc是可以按装多个版本, 虽然我装了新版本, 但是依旧是用默认的4.8.5编译. 最后发现新版的gcc是有虚拟环境,可以通过scl enable devtoolset-<X> bash来切换不同的环境.

经过一番对makefile的分析后, 发现我修改的地方没什么用. 

原码目录如下:

/usr/lpp/mmfs/src/
├── bin
├── config
│   └── debian
├── gpl-linux
├── ibm-kxi
├── ibm-linux
├── include
│   ├── cxi
│   └── gpl-linux
└── lib

src下有makefile, gpl-linux下有makefile

经过源码 gpl-linux下的makefile代码分析后, 发现要修改这个文件: /usr/lpp/mmfs/src/config/def.mk, makefile的参数都写在这个文件里.

修改好后经过编译, 依旧是同样的错误. 

最后把 /usr/lpp/mmfs/src/config/configure脚本撸一遍, 才发现def.mk是生成的,即修改的参数都不作, 编译时都会被/usr/lpp/mmfs/src/config/def.mk.pro进行覆盖. 

再把def.mk.pro代码进行了一次通读, 修改了对应的内核统译参数, 最终在gcc 9版本下编译成功.

下面是def.mk.pro的代码, 贴出来供大家研究:


Please refer to the online GPFS FAQ at

  https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html

  for the latest information about GPFS updates, supported kernel levels,
  and kernel patches before building the portability layer.

Linux kernel patches for GPFS can be obtained at

  http://sourceforge.net/tracker/?atid=719124&group_id=130828&func=browse

  which is the patches link for the project
  "General Parallel File System (GPFS)" on Source Forge; only Linux
  kernel patches are kept on the Source Forge site.

You can send license and source code inquiries to gpfs@us.ibm.com
or, in writing, to:

  GPFS Product Manager
  International Business Machines Company
  Dept. LNLA  Mail Station P963
  2455 South Road
  Poughkeepsie NY 12601-5400
  United States of America


To build the Linux portability interface for GPFS
-------------------------------------------------

  It is strongly suggested that you build the source at
  non root level.  You will need to change the owner
  and group permissions accordingly in /usr/lpp/mmfs/src.

  When you build as a non root user it is also important
  that you have read access to the linux kernel source files
  on your machine.  These files are normally found in
  /lib/modules/`uname -r`/build but may appear in a different
  place as enumerated below in step 2 section D.

  You can use the mmbuildgpl command to simplify the build process.
  To build the GPFS portability layer using mmbuildgpl, enter the
  following command:
       /usr/lpp/mmfs/bin/mmbuildgpl
  Each kernel module is specific to a Linux version and platform.
  If you have multiple nodes running exactly the same operating
  system level on the same platform, and only some of these nodes
  have a compiler available, you can build the kernel module on one
  node, then create an installable package that contains the binary
  module for ease of distribution.
  If you choose to generate an installable package for portability
  layer binaries, perform the following additional step:
       /usr/lpp/mmfs/bin/mmbuildgpl --build-package

  If you choose to generate an installable package for portability
  layer binaries for a specific RHEL kernel release, perform the following
  additional step instead:
       /usr/lpp/mmfs/bin/mmbuildgpl --build-package --kernel-release KernelRelease

  Or another way to build the portability layer is to use
  the "Autoconfig" process to create the env.mcr file for you.

  1) cd /usr/lpp/mmfs/src or
     for BGP IO nodes only, cd /bgsys/drivers/ppcfloor/linux/OS/usr/lpp/mmfs/src
  2) make Autoconfig
  3) make World
  4) make InstallImages

  Alternatively, if you have a platform that requires
  customization, automatic or manual creation of the env.mcr file
  will be necessary as follows.

  1) cd /usr/lpp/mmfs/src/config or
     (for BGP IO nodes only, cd /bgsys/drivers/ppcfloor/linux/OS/usr/lpp/mmfs/src)
     cp env.mcr.sample env.mcr

  2a) Manual creation of the env.mcr file.

     A) The default architecture is GPFS_ARCH_X86_64. Only modify the
        architecture choice if your platform is not Intel x86_64 bit based.

     B) Modify the Linux distribution choice according to the software
        distribution your machine runs.

     C) Modify the LINUX_KERNEL_VERSION according to the Linux kernel
        level you are using with your software distribution.

     D) Note also that the kernel header file search path,
        KERNEL_BUILD_DIR, is by default /lib/modules/`uname -r`/build

        Customers who have their kernel source in a different directory
        will need to modify KERNEL_BUILD_DIR.

        YOU MUST HAVE READ ACCESS TO THESE FILES!

  2b) Automatic creation of the env.mcr file.

     A) cd /usr/lpp/mmfs/src

     B) make LINUX_KERNEL_RELEASE=customized_kernel_release Autoconfig

        customized_kernel_release follows format similar as the output from "uname -r".
        For example:
        make LINUX_KERNEL_RELEASE=3.10.0-1127.10.1.el7.x86_64 Autoconfig

     The procedure is only supported for the RHEL release, and customized_kernel_release
     should lower than the running one (uname -r) from the host node.

  3) cd /usr/lpp/mmfs/src or
     for BGP IO nodes only, cd /bgsys/drivers/ppcfloor/linux/OS/usr/lpp/mmfs/src

  4) make World

  All of the libraries and binaries reside in the "bin" subdirectory after the
  build. Kernel modules however are in the "gpl-linux" subdirectory.


To install the Linux portability interface for GPFS
--------------------------------------------------------------------------

  This step installs the binaries for the portability interface.

  1) su

  2) make InstallImages

     Binaries and kernel modules will be generated after the build finishes
     for portability layer.
     Two  binaries lxtrace-`uname -r` kdump-`uname -r` generated during the
     build step will be installed in /usr/lpp/mmfs/bin.
     Three kernel modules tracedev.ko, mmfslinux.ko, and mmfs26 will be
     installed to /lib/modules/`uname -r`/extra.

     If this kernel configuration applies to other machines in the cluster,
     you may either generate the binary RPM package by "make rpm", and then
     have the package installed on those machines or iteratively perform the
     above compile and install steps on each machine.

  If you choose to generate a RPM package for portability layer binaries, the
  following additional step is needed:

  3) make rpm

     You may then copy the generated rpm packages to other machines for
     deployment. The generated RPM can ONLY be deployed to the machine with
     identical architecture, distribution level, Linux kernel version and GPFS
     version.

     Note that during the package generation, temporary files will be put to
     /tmp/rpm directory. Please make sure there is sufficient space available.
     By default, the generated RPM goes to /usr/src/packages/RPMS/<arch>
     for Suse Linux Enterprise Server and /usr/src/redhat/RPMS/<arch> for
     Redhat Enterprise Linux.

Additional Notes
----------------

  Most of the source code headers are in the "ibm-kxi" and "ibm-linux"
  directories, while the source code is in the "gpl-linux" directory.

  The "config" directory provides the configuration information for
  the build (env.mcr, def.mcr).


Troubleshooting build problems
------------------------------
Most commonly encountered build problems are caused by an incorrect setting
of the kernel source tree.  GPFS leverages Kbuild infrastructure in Linux
kernel in order to build; a subset of kernel tree which is sufficient for
Kbuild to run is enough to GPFS. For SuSE Enterprise Server and Redhat
Enterprise Linux, the default source tree that work is
/lib/modules/`uname -r`/build.

Some files in the full source tree, most notably include/linux/version.h and
include/linux/autoconf.h, are often configured dynamically by an external shell
script (e.g. /etc/init.d/running-kernel on SuSE Enterprise Server), which
copies a set of files into the source tree to make it match the currently
booted kernel (that way one can boot different kernels but have the same
source tree always configured to match it).

If the kernel source tree is updated but the shell reconfiguration script
hasn't run (it usually runs during the boot sequence), or the script didn't
work correctly, the kernel source tree may not be configured properly.
This may be due to the existence of a /usr/src/linux/.config file.  When
this file exists, running-kernel assumes the kernel has been customized and
does not perform synchronization of the header files-GPFS's makefiles detect
this configuration and decline to run.  To correct this, remove or rename the
.config file.

NOTE: On SuSE Enterprise Server 10 for PowerPC, /etc/init.d/running-kernel
that ships with the GA level of distribution (kernel-source-2.6.16.21-0.8)
contains a bug that results in the wrong set of files being copied to the
kernel source tree.  This bug will be fixed with SLES10 SP1.  If the official
fix is unavailable, the following change should also address the problem:

--- running-kernel.orig 2006-10-06 14:54:36.000000000 -0500
+++ /etc/init.d/running-kernel  2006-10-06 14:59:58.000000000 -0500
@@ -53,6 +53,7 @@
     arm*|sa110)        arch=arm ;;
     s390x)     arch=s390 ;;
     parisc64)  arch=parisc ;;
+    ppc64)     arch=powerpc ;;
     esac
     # FIXME: How to handle uml?

添加了这两个参数后即可以解决报错问题:

-mindirect-branch=keep -mfunction-return=keep

总结:

  1. 软件系统升级,有时候并不是因为在软件本身有问题, 而操作系统升级了,被升级了, 需要适应新的操作系统. 生产环境下千万不要随便升级内核. 一定要要找一个搞过环境测试一下. (linux系统也可以像windows一样进行全盘克隆的. 我曾经亲身经历过在线复刻全系统盘的项目,就为了测试R的版本升级,会不会影响R的包的影响) 
  2. GPFS的代码都是可以适用多种系统环境, 代码很多都是模块化,其中变量参数都用配置文件来保存(/usr/lpp/mmfs/src/config/),可以修改参数模版文件. 例如. def.mk的模版 def.mk.proto.
  3. 不同的内核下都有默认的gcc版本, 可以手动安装其他版本.但要生效需要切换环境.scl enable devtoolset-<X> bash

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值