数据库学习日常案例20231219-19C集群ORA-27301 ORA-27300分析处理

执笔画情ora

已于 2023-12-19 11:17:02 修改

阅读量1k

点赞数 27

文章标签：学习网络服务器

于 2023-12-19 10:20:56 首次发布

本文链接：https://blog.csdn.net/oradbm/article/details/135078213

版权

文章描述了在数据库检查中遇到的ORACLE服务器错误，涉及ORACLE服务器会话终止、IPC问题和网络通信失败，原因是内存消耗高和网络缓冲区空间不足，解决方案是调整主机的NUMA设置和MTU值，以及优化vm.min_free_kbytes参数。

摘要由CSDN通过智能技术生成

1 问题概述

今日对数据库检查发现如下错误

Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 6 of thread 1
ORA-00312: online log 6 thread 1: '+REDO/orcldb/ONLINELOG/group_6.266.1070296359'
ORA-17503: ksfdopn:2 Failed to open file +REDO/orcldb/ONLINELOG/group_6.266.1070296359
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:32.788217+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 7 of thread 1
ORA-00312: online log 7 thread 1: '+DGDATA4/orcldb/ONLINELOG/group_7.262.1066409025'
ORA-17503: ksfdopn:2 Failed to open file +DGDATA4/orcldb/ONLINELOG/group_7.262.1066409025
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:42.795004+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 8 of thread 2
ORA-00312: online log 8 thread 2: '+REDO/orcldb/ONLINELOG/group_8.267.1070296539'
ORA-17503: ksfdopn:2 Failed to open file +REDO/orcldb/ONLINELOG/group_8.267.1070296539
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:52.801743+08:00

2 数据库numa和mtu值信息

[root@orcl01 ~]# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 0 size: 195001 MB
node 0 free: 1759 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
node 1 size: 196608 MB
node 1 free: 629 MB
node 2 cpus: 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
node 2 size: 131072 MB
node 2 free: 613 MB
node 3 cpus: 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 3 size: 131072 MB
node 3 free: 465 MB
node distances:
node 0 1 2 3
0: 10 21 21 21
1: 21 10 21 21
2: 21 21 10 21
3: 21 21 21 10
[root@orcl01 ~]# ip ad sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c4:b8:b4:2e:b6:f0 brd ff:ff:ff:ff:ff:ff
3: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 34:0a:98:9c:ed:37 brd ff:ff:ff:ff:ff:ff
inet 10.2.0.101/24 brd 10.2.0.255 scope global ens3f0
valid_lft forever preferred_lft forever
inet 169.254.22.17/19 brd 169.254.31.255 scope global ens3f0:1
valid_lft forever preferred_lft forever
inet6 fe80::360a:98ff:fe9c:ed37/64 scope link
valid_lft forever preferred_lft forever
4: ens3f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 34:0a:98:9c:ed:38 brd ff:ff:ff:ff:ff:ff
5: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether c4:b8:b4:2e:b6:f1 brd ff:ff:ff:ff:ff:ff
inet 10.228.224.4/26 brd 10.228.224.63 scope global eno3
valid_lft forever preferred_lft forever
inet 10.228.224.10/26 brd 10.228.224.63 scope global secondary eno3:1
valid_lft forever preferred_lft forever
inet 10.228.224.12/26 brd 10.228.224.63 scope global secondary eno3:2
valid_lft forever preferred_lft forever
inet6 2409:8002:5a06:120:10:0:2:d00c/116 scope global deprecated
valid_lft forever preferred_lft 0sec
inet6 2409:8002:5a06:120:10:0:2:d00a/116 scope global deprecated
valid_lft forever preferred_lft 0sec
inet6 2409:8002:5a06:120:10:0:2:d004/116 scope global
valid_lft forever preferred_lft forever
inet6 fe80::c6b8:b4ff:fe2e:b6f1/64 scope link
valid_lft forever preferred_lft forever
6: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c4:b8:b4:2e:b6:f2 brd ff:ff:ff:ff:ff:ff
7: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c4:b8:b4:2e:b6:f3 brd ff:ff:ff:ff:ff:ff
8: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 52:54:00:76:94:f3 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:76:94:f3 brd ff:ff:ff:ff:ff:ff
[root@orcl01 ~]#

3 问题分析处理

很早以前遇到过IPC这样的问题无数，

这个就是咱们现在大内存的主机下，缓冲区碎片整理单元有点小，并且回环地址的mtu值有问题。我们按照官方MOS文档指示进行更改即可；

On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.
*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***

On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.

On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here ‘n’ is the number of NUMA nodes.

Additionally, the MTU value should be modified as below
#ifconfig lo mtu 16436

To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :

MTU=16436
Save the file and restart the network service to load the changes

#service network restart

Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.

数据库日常数据库检查发现报错如下：

Database Alert log captures below:

mtype: 61 process 14152 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /oracle/admin/diag/rdbms/<DB>/<SID>/trace/<SID>_ora_14152.trc (incident=249601):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /oracle/admin/diag/rdbms/<DB>/<SID>/incident/incdir_249601/<SID>_ora_14152_i249601.trc