背景:
有一台机器,要恢复多套openGauss数据库的备份,恢复完成后启动opengGauss、验证数据正常,然后关闭openGauss,目的是为了验证备份有效性。
问题描述:
openGauss 2.0 备份的数据库恢复后,启动时报错 could not create semaphores: No space left on device,启动失败。
... [BACKEND] FATAL: could not create semaphores: No space left on device
... [BACKEND] DETAIL: Failed system call was semget(15727078, 17, 03600).
... [BACKEND] HINT: This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter.
The PostgreSQL documentation contains more information about configuring your system for PostgreSQL.
... [BACKEND] LOG: FiniNuma allocIndex: 0.
... [gs_ctl]: waitpid 58725 failed, exitstatus is 256, ret is 2
... [gs_ctl]: stopped waiting
... [gs_ctl]: could not start server Examine the log output.
原因分析及处理
这个意思是信号量不够用了,超出系统限制
1. 先查询系统限制,使用的命令是 ipcs -ls
# ipcs -ls
------ Semaphore Limits --------
max number of arrays = 25600
max semaphores per array = 50100
max semaphores system wide = 128256000
max ops per semop call = 50100
semaphore max value = 32767
2. 再看当前使用了多少,使用的命令是 ipcs -s
# ipcs -s
------ Semaphore Arrays --------
key semid owner perms nsems
...
0x00dc3319 48300035 omm 600 17
0x00dc331a 48332804 omm 600 17
0x00dc331b 48365573 omm 600 17
0x00dc331c 48398342 omm 600 17
0x00dc331d 48431111 omm 600 17
0x00dc331e 48463880 omm 600 17
0x00dc331f 48496649 omm 600 17
0x00dc3320 48529418 omm 600 17
0x00dc3321 48562187 omm 600 17
...
0x00dc332e 48988184 omm 600 17
...
发现 Semaphore Arrays 确实超过了 25600 ,于是通过修改信号量限制解决了问题。
但是,随着后续恢复、启动、关闭的数据库越来越多,发现问题又出现了。但是已经启动的openGauss数据库都已经停止了,怎么还残留这么多信号量呢? 经过测试发现,如果使用kill -9命令杀掉openGauss数据库的进程,那么会残留信号量,占用资源。如果使用gs_ctl stop停止,则没有问题。
另,对于残留的信号量,可以使用ipcrm来清理
# ipcrm --help
Usage:
ipcrm [options]
ipcrm shm|msg|sem <id>...
Remove certain IPC resources.
Options:
-m, --shmem-id <id> remove shared memory segment by id
-M, --shmem-key <key> remove shared memory segment by key
-q, --queue-id <id> remove message queue by id
-Q, --queue-key <key> remove message queue by key
-s, --semaphore-id <id> remove semaphore by id
-S, --semaphore-key <key> remove semaphore by key
-a, --all[=shm|msg|sem] remove all (in the specified category)
-v, --verbose explain what is being done
-h, --help display this help
-V, --version display version
For more details see ipcrm(1).