oracle 12C rac数据库scm0进程bug处理（CPU占用率100%）

本文链接：https://blog.csdn.net/weixin_41607523/article/details/122121972

项目场景：

用户应用厂商发现数据库服务器CPU的top使用率在95%~100%，经过相关查询发现是由ora_scm0进程所占用

问题描述：

提示：如下代码所示，cpu占用率超出正常范围达到了100%;但是我在查询时候发现top和sar查询出来的结果不一样，有懂的小伙伴可以科普一下。

[root@rac2 ~]# top                                                                                                              
top - 20:22:01 up 667 days, 23:36,  1 user,  load average: 2.25, 2.39, 2.61                                                     
Tasks: 987 total,   2 running, 985 sleeping,   0 stopped,   0 zombie                                                            
%Cpu(s):  3.1 us,  0.7 sy,  0.0 ni, 94.0 id,  2.3 wa,  0.0 hi,  0.0 si,  0.0 st                                                 
KiB Mem : 13097886+total,  4762016 free, 83879424 used, 42337424 buff/cache                                                     
KiB Swap: 13421772+total, 13290905+free,  1308672 used. 41708892 avail Mem                                                      

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                    
189061 oracle    20   0   40.4g  33260  26560 R  100.0  0.0 374940:59 ora_scm0_orcl12                                            
 58346 root      20   0  162812   2928   1540 R  15.0  0.0   0:00.06 top                                                        
217733 oracle    20   0   40.4g  41640  34344 S   5.0  0.0   8:07.40 oracle_217733_o                                            
224310 oracle    20   0   40.4g  34384  26744 S   5.0  0.0   0:59.73 oracle_224310_o                                            
     1 root      20   0  191548   3884   2308 S   0.0  0.0 576:22.21 systemd                                                    
     2 root      20   0       0      0      0 S   0.0  0.0   0:26.53 kthreadd                                                   
     4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                               
     6 root      20   0       0      0      0 S   0.0  0.0 140:58.68 ksoftirqd/0                                                
     7 root      rt   0       0      0      0 S   0.0  0.0   1:45.28 migration/0                                                
     8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                                                     
     9 root      20   0       0      0      0 S   0.0  0.0   3199:42 rcu_sched                                                  
    10 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 lru-add-drain                                              
    11 root      rt   0       0      0      0 S   0.0  0.0   2:06.79 watchdog/0                                                 
    12 root      rt   0       0      0      0 S   0.0  0.0   2:14.56 watchdog/1                                                 
    13 root      rt   0       0      0      0 S   0.0  0.0   1:53.11 migration/1                                                
    14 root      20   0       0      0      0 S   0.0  0.0 109:25.44 ksoftirqd/1                                                
137989 root      20   0  899332  32748  22400 S   0.7  0.0   7618:28 orarootagent.bi

[root@rac2 ~]# sar 1 10
Linux 3.10.0-1062.el7.x86_64 (rac2)     2021骞?2?3妤  _x86_64_        (40 CPU)                                                

20娑22?7绉     CPU     %user     %nice   %system   %iowait    %steal     %idle                                                 
20娑22?8绉     all      3.71      0.00      0.75      2.43      0.00     93.11                                                 
20娑22?9绉     all      4.02      0.00      0.35      2.49      0.00     93.15                                                 
20娑22?0绉     all      3.26      0.00      0.28      2.48      0.00     93.98                                                 
20娑22?1绉     all      3.56      0.00      0.43      2.43      0.00     93.59                                                 
20娑22?2绉     all      3.26      0.00      0.25      2.58      0.00     93.91                                                 
20娑22?3绉     all      4.09      0.00      0.33      2.36      0.00     93.23                                                 
20娑22?4绉     all      3.86      0.00      0.48      2.58      0.00     93.08                                                 
20娑22?5绉     all      4.16      0.00      0.33      2.41      0.00     93.11                                                 
20娑22?6绉     all      3.73      0.00      0.38      2.15      0.00     93.74                                                 
20娑22?7绉     all      3.23      0.00      0.33      2.21      0.00     94.23                                                 
骞冲娑榇:     all      3.69      0.00      0.39      2.41      0.00     93.51

原因分析：

alert日志：

ORA-00603: ORACLE server session terminated by fatal error                                                              
ORA-27504: IPC error creating OSD context                                                                               
ORA-27300: OS system dependent operation:sendmsg failed with status: 105                                                
ORA-27301: OS failure message: No buffer space available                                                                
ORA-27302: failure occurred at: sskgxpsnd2

经过分析，该报错是 SCM0进程收集GES、GCS的相关的统计信息时间超长后抛出异常，根据MOS文档ID 2373451.1说明，SCM0收集的信息在12.2版本无价值，
因此本次报错可忽略或者禁用DLM收集统计信息参数。
在这里插入图片描述
DLM描述说明：
分布式锁管理器（distributed lock management DLM），简单说对于RAC环境，所有数据的修改，都需要事先以节点为单位，去DLM申请节点对块的修改权限，DLM对块的资源进行多节点修改进行协调。
SCM0进程即DLM数据的采集以及管理进程
SCM0简短描述
收集和管理与全局排队服务（GES）和全局缓存服务（GCS）相关的统计信息

解决方案：

在这里插入图片描述

此次方案定为：修改_dlm_stats_collect参数，进行单节点停机，恢复后再次停机第二个节点。

SQL> alter system set "_dlm_stats_collect"=0 scope=spfile sid='*';                                                                                                         

System altered.                                                                                                                                                    

SQL>                                                                                                                                                               
SQL>exit

[root@rac2 ~]$ ps -ef|grep scm
oracle 507879 1 0 May22 ? 00:02:28 ora_scm0_ANBOB1

[root@rac2 ~]# kill -9 507879
[root@rac2 ~]# su - oracle                                                                                                                                         
涓涓娆＄诲斤煎 12?23 20:26:13 CST 2021pts/0 涓                                                                                                                    
^[[?1;2c[oracle@rac2 ~]$                                                                                                                                           
[oracle@rac2 ~]$                                                                                                                                                   
[oracle@rac2 ~]$ sqlplus / as sysdba                                                                                                                               

SQL*Plus: Release 12.2.0.1.0 Production on Thu Dec 23 20:46:59 2021                                                                                                

Copyright (c) 1982, 2016, Oracle.  All rights reserved.                                                                                                            


Connected to:                                                                                                                                                      
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production                                                                                       

                                                                                                                                                                                                        
SQL> shutdown immediate                                                                                                                                            
Database closed.                                                                                                                                                   
Database dismounted.                                                                                                                                               
ORACLE instance shut down.                                                                                                                                         
SQL> startup                                                                                                                                                       
ORACLE instance started.                                                                                                                                           
Total System Global Area 4.2950E+10 bytes                                                                                                                          
Fixed Size        29865680 bytes                                                                                                                                   
Variable Size   9261025584 bytes                                                                                                                                   
Database Buffers   3.3554E+10 bytes                                                                                                                                
Redo Buffers     104349696 bytes 
ORA-01105: mount is incompatible with mounts by other instances
ORA-01174: DB_FILES is 2000 buts needs to be 200 to be compatible
#到这里发现报错，db_files问题，遂判断节点一、二的db_files参数不一致。修改db_files参数需要两节点同时停机，才可以保证参数一致重启，因本次没有俩节点全部重启权限，遂将db_files参数修改一致后，再次重启
SQL> show parameter db_files                                                                                                                                       

NAME                TYPE     VALUE                                                                                                                                 
------------------------------------ ----------- ------------------------------                                                                                    
db_files             integer     2000
SQL> alter system set db_files=200 scope=spfile sid='*';                                                                                                         

System altered.                                                                                                                                                    

SQL>                                                                                                                                                               
SQL>                                                                                                                                                               
SQL> shutdown immediate                                                                                                                                            
Database closed.                                                                                                                                                   
Database dismounted.                                                                                                                                               
ORACLE instance shut down.                                                                                                                                         
SQL> startup                                                                                                                                                       
ORACLE instance started.                                                                                                                                           

Total System Global Area 4.2950E+10 bytes                                                                                                                          
Fixed Size        29865680 bytes                                                                                                                                   
Variable Size   9261025584 bytes                                                                                                                                   
Database Buffers   3.3554E+10 bytes                                                                                                                                
Redo Buffers     104349696 bytes                                                                                                                                   
Database mounted.                                                                                                                                                  
Database opened.                                                                                                                                                   
SQL>            
#节点一按照同样方法进行重启，先kill掉scm0进程，再进行shutdown immediate