存在很多oracle 会话,大量会话处于CSS initialization等待

接到一个问题,客户的某个系统突然CPU冲高,一个小时内CPU从5%冲到60%以上,在数据库中发现大量的会话在等待CSS initialization。

该系统是非RAC非ASM的系统,一般来说,不会出现CSS(cluster synchronization service)的等待,只有在访问voting disk或ocr的时候才会有这个等待,但是为什么在一个非RAC非ASM系统中出现了这个等待呢?

进一步检查会话的SQL,发现是一个监控程序发起的,里面有关于空间的检查,有查到了v$asm_diskgroup视图。这个倒是可以理解,查asm diskgroup的时候,出现了css的等待,但是这个等待一般不会持续很长时间,为什么会长时间的hang在这里?我们在测试机上执行select * from v$asm_diskgroup不到1秒就出结果了,从10046的trace看,虽然期间也有CSS initialization的等待,但是只是维持了很短一瞬间,而在客户的生产环境中,这个等待却持续很长时间,5分钟了还一直hang在那里不动。

查询metalink,发现有个Bug 10024824 – Database/session hang with ‘CSS initialization’ ,版本是10.2.0.5,而我们的数据库版本也正好是这个版本。

==============================================================================

== Bug 10024824 : DATABASE/SESSION HANGS WITH 'CSS INITIALIZATION' ==

PROBLEM:

--------

10.2.0.5 non-RAC ASM set up. After upgrading to 10.2.0.5 patchset, running

dbms_stats, hangs everything.

DIAGNOSTIC ANALYSIS:

--------------------

Happens only when ( and immediately ) when dbms_stats is run from sqlplus.

Does not seem to happen when run automatically during the job's window.

From systemstate dump ( uploaded ) we see process 42 is waiting on 'CSS

initialization'. There is nothing unusual in CSS logs

WORKAROUND:

-----------

RELATED BUGS:

-------------

9917797

==============================================================================

== Bug 10024824 - Database/session hang with 'CSS initialization' [ID 10024824.8] ==

Description

Database/session hang with 'CSS initialization' can occur

when the OH/log//client directory has the wrong permissions

in a RAC environment.

Workaround

Change the permission of OH/log//client directory to 771

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

==============================================================================

==Bug10024824:DATABASE/SESSIONHANGSWITH'CSS INITIALIZATION'==

PROBLEM:

--------

10.2.0.5non-RACASMsetup.Afterupgradingto10.2.0.5patchset,running

dbms_stats,hangseverything.

DIAGNOSTICANALYSIS:

--------------------

Happensonlywhen(andimmediately)whendbms_statsisrunfromsqlplus.

Doesnotseemtohappenwhenrunautomaticallyduringthejob's window.

From systemstate dump ( uploaded ) we see process 42 is waiting on 'CSS

initialization'. There is nothing unusual in CSS logs

WORKAROUND:

-----------

RELATED BUGS:

-------------

9917797

==============================================================================

== Bug 10024824 - Database/session hang with 'CSSinitialization' [ID 10024824.8]  ==

Description

Database/session hang with 'CSSinitialization'canoccur

whentheOH/log//clientdirectoryhasthewrongpermissions

inaRACenvironment.

Workaround

ChangethepermissionofOH/log//clientdirectoryto771

看上去像是和目录权限有关。我们去检查该目录的权限,发现权限是751:

mydb2013:ctdb > pwd

/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013

mydb2013:ctdb > ls -l

total 1024

drwxr-xr-x 2 opctdb dba 516096 Jul 9 18:22 client

mydb2013:ctdb >

1

2

3

4

5

6

mydb2013:ctdb>pwd

/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013

mydb2013:ctdb>ls-l

total1024

drwxr-xr-x2opctdbdba516096Jul918:22client

mydb2013:ctdb>

难道真的要把权限改成771才能解决这个问题?

在之前的测试机上的测试中,我们发现每查询一次v$asm_diskgroup都会在client目录下生成一个新的cssN.log,如果是按照bug来说,是权限的问题,那么client目录一定是不能被写入,所以才hang住。我们继续检查目录和目录中的文件。

检查时我们发现这个目录下面存在大量的文件,ls的结果返回很多很多,如果想看看是否有最近的文件,ls -lrt根本没法看,所以我们只能先ls -lrt到一个文件,再tail看最后的几行是不是最近的,以确定是不是一直在写:

mydb2013:ctdb > ls -lrt >111.txt

mydb2013:ctdb > cat 111.txt |wc -l

20189

mydb2013:ctdb > tail -20 111.txt

-rw-r--r-- 1 opctdb dba 182 Jul 9 16:59 css19996.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:00 css19997.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:00 css19998.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:00 css19999.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:00 css20000.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:00 css20001.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:00 css20002.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:05 css20003.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:05 css20004.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:05 css20005.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:10 css20006.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:14 css20007.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:18 css20008.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:37 css20009.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 17:56 css20010.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 18:16 css20011.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 18:16 css20012.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 18:18 css20013.log

-rw-r--r-- 1 opctdb dba 182 Jul 9 18:22 css20014.log

-rw-r--r-- 1 opctdb dba 0 Jul 9 18:26 111.txt

mydb2013:ctdb >

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

mydb2013:ctdb>ls-lrt>111.txt

mydb2013:ctdb>cat111.txt|wc-l

20189

mydb2013:ctdb>tail-20111.txt

-rw-r--r--1opctdbdba182Jul916:59css19996.log

-rw-r--r--1opctdbdba182Jul917:00css19997.log

-rw-r--r--1opctdbdba182Jul917:00css19998.log

-rw-r--r--1opctdbdba182Jul917:00css19999.log

-rw-r--r--1opctdbdba182Jul917:00css20000.log

-rw-r--r--1opctdbdba182Jul917:00css20001.log

-rw-r--r--1opctdbdba182Jul917:00css20002.log

-rw-r--r--1opctdbdba182Jul917:05css20003.log

-rw-r--r--1opctdbdba182Jul917:05css20004.log

-rw-r--r--1opctdbdba182Jul917:05css20005.log

-rw-r--r--1opctdbdba182Jul917:10css20006.log

-rw-r--r--1opctdbdba182Jul917:14css20007.log

-rw-r--r--1opctdbdba182Jul917:18css20008.log

-rw-r--r--1opctdbdba182Jul917:37css20009.log

-rw-r--r--1opctdbdba182Jul917:56css20010.log

-rw-r--r--1opctdbdba182Jul918:16css20011.log

-rw-r--r--1opctdbdba182Jul918:16css20012.log

-rw-r--r--1opctdbdba182Jul918:18css20013.log

-rw-r--r--1opctdbdba182Jul918:22css20014.log

-rw-r--r--1opctdbdba0Jul918:26111.txt

mydb2013:ctdb>

我们发现cssN.log文件一直在产生,最近的一个是4分钟前产生,平均2,3分钟就产生一个,忙的时候每分钟产生5,6个。目前已经大约有2万多个了,从属主看,权限设置正常,看上去不像权限设置771之后就能解决的问题。

我们继续分析,我们再次测试运行一个查询,并且用truss追踪该进程,我们此时才发现了问题的根源:进程大部分的时间是花在遍历client下cssN.log文件:

truss -o /tmp/mylog.log -laefdD -p xxx

2896/1: 14.3703 0.0000 brk(0x10666A310) = 0

2896/1: 14.3706 0.0003 access("/var/tmp/.oracle", F_OK) = 0

2896/1: 14.3708 0.0002 chmod("/var/tmp/.oracle", 01777) Err#1 EPERM [ALL]

2896/1: 14.3709 0.0001 brk(0x10666A310) = 0

2896/1: 14.3710 0.0001 brk(0x10666E310) = 0

2896/1: 14.3711 0.0001 so_socket(PF_UNIX, SOCK_STREAM, 0, "", SOV_DEFAULT) = 14

2896/1: 14.3712 0.0001 access("/var/tmp/.oracle/sOCSSD_LL_mydb2013_", F_OK) Err#2 ENOENT

2896/1: 14.3713 0.0001 access("/var/tmp/o/sOCSSD_LL_mydb2013_", F_OK) Err#2 ENOENT

2896/1: 14.3714 0.0001 close(14) = 0

2896/1: 14.3716 0.0002 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 14.3718 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 14.3719 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 14.3721 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 14.3722 0.0001 brk(0x10666E310) = 0

2896/1: 14.3723 0.0001 brk(0x106672310) = 0

2896/1: 14.3724 0.0001 brk(0x106672310) = 0

2896/1: 14.3725 0.0001 brk(0x106692310) = 0

2896/1: 14.3730 0.0005 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css.log", F_OK) = 0 //<==从css.log开始遍历起

2896/1: 14.4276 0.0546 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 14.4278 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 14.4279 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 14.4282 0.0003 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 14.4287 0.0005 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css1.log", F_OK) = 0 //<==读css1.log

2896/1: 14.4716 0.0429 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 14.4719 0.0003 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 14.4720 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 14.4722 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 14.4725 0.0003 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css2.log", F_OK) = 0 //<==读css2.log

2896/1: 14.4870 0.0145 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 14.4872 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 14.4873 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 14.4874 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 14.4877 0.0003 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css3.log", F_OK) = 0

2896/1: 14.5022 0.0145 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 14.5025 0.0003 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 14.5026 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 14.5028 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 14.5031 0.0003 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css4.log", F_OK) = 0

2896/1: 14.5585 0.0554 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 14.5587 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 14.5589 0.0002 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 14.5590 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 14.5595 0.0005 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css5.log", F_OK) = 0

2896/1: 14.5738 0.0143 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 14.5739 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 14.5741 0.0002 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 14.5742 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

......

2896/1: 149.3333 0.0003 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9069.log", F_OK) = 0

2896/1: 149.3470 0.0137 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 149.3471 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 149.3473 0.0002 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 149.3474 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 149.3478 0.0004 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9070.log", F_OK) = 0

2896/1: 149.3616 0.0138 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 149.3618 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 149.3619 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 149.3620 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 149.3623 0.0003 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9071.log", F_OK) = 0

2896/1: 149.3763 0.0140 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 149.3765 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 149.3766 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 149.3767 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 149.3770 0.0003 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9072.log", F_OK) = 0 //<==读css9072.log

2896/1: 149.3907 0.0137 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

2896/1: 149.3909 0.0002 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client") = 0

2896/1: 149.3910 0.0001 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client", 1024) = 0

2896/1: 149.3911 0.0001 chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs") = 0

2896/1: 149.3914 0.0003 access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9073.log", F_OK) = 0 //<==读css9073.log

2896/1: 149.4051 0.0137 getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs", 1024) = 0

...... //<==后面还一直读cssN.log

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

truss-o/tmp/mylog.log-laefdD-pxxx

2896/1:14.37030.0000brk(0x10666A310)=0

2896/1:14.37060.0003access("/var/tmp/.oracle",F_OK)=0

2896/1:14.37080.0002chmod("/var/tmp/.oracle",01777)Err#1 EPERM [ALL]

2896/1:14.37090.0001brk(0x10666A310)=0

2896/1:14.37100.0001brk(0x10666E310)=0

2896/1:14.37110.0001so_socket(PF_UNIX,SOCK_STREAM,0,"",SOV_DEFAULT)=14

2896/1:14.37120.0001access("/var/tmp/.oracle/sOCSSD_LL_mydb2013_",F_OK)Err#2 ENOENT

2896/1:14.37130.0001access("/var/tmp/o/sOCSSD_LL_mydb2013_",F_OK)Err#2 ENOENT

2896/1:14.37140.0001close(14)=0

2896/1:14.37160.0002getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:14.37180.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:14.37190.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:14.37210.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:14.37220.0001brk(0x10666E310)=0

2896/1:14.37230.0001brk(0x106672310)=0

2896/1:14.37240.0001brk(0x106672310)=0

2896/1:14.37250.0001brk(0x106692310)=0

2896/1:14.37300.0005access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css.log",F_OK)=0//<==从css.log开始遍历起

2896/1:14.42760.0546getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:14.42780.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:14.42790.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:14.42820.0003chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:14.42870.0005access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css1.log",F_OK)=0//<==读css1.log

2896/1:14.47160.0429getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:14.47190.0003chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:14.47200.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:14.47220.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:14.47250.0003access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css2.log",F_OK)=0//<==读css2.log

2896/1:14.48700.0145getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:14.48720.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:14.48730.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:14.48740.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:14.48770.0003access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css3.log",F_OK)=0

2896/1:14.50220.0145getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:14.50250.0003chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:14.50260.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:14.50280.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:14.50310.0003access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css4.log",F_OK)=0

2896/1:14.55850.0554getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:14.55870.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:14.55890.0002getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:14.55900.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:14.55950.0005access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css5.log",F_OK)=0

2896/1:14.57380.0143getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:14.57390.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:14.57410.0002getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:14.57420.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

......

2896/1:149.33330.0003access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9069.log",F_OK)=0

2896/1:149.34700.0137getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:149.34710.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:149.34730.0002getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:149.34740.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:149.34780.0004access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9070.log",F_OK)=0

2896/1:149.36160.0138getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:149.36180.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:149.36190.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:149.36200.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:149.36230.0003access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9071.log",F_OK)=0

2896/1:149.37630.0140getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:149.37650.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:149.37660.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:149.37670.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:149.37700.0003access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9072.log",F_OK)=0//<==读css9072.log

2896/1:149.39070.0137getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

2896/1:149.39090.0002chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client")=0

2896/1:149.39100.0001getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client",1024)=0

2896/1:149.39110.0001chdir("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs")=0

2896/1:149.39140.0003access("/orac/my/ctdb/data/app/oracle/product/10.2.0/log/mydb2013/client/css9073.log",F_OK)=0//<==读css9073.log

2896/1:149.40510.0137getcwd("/orac/my/ctdb/data/app/oracle/product/10.2.0/dbs",1024)=0

......//<==后面还一直读cssN.log

因此,我们判断,在每次查询v$asm_diskgroup的时候,都会在client下生成一个新的cssN.log文件(10.2.0.5才有,其他版本没发现),生成的命名规则是前一个数字加1。因此,生成新的cssN.log文件时,需要遍历整个client目录下的cssN.log文件,才能知道最大的数字是多少,才能生成第N+1的文件。而在client下不断生成大量文件,这个和oracle的一个unpublish bug 6004127 有关(ID 729349.1)。目前没有patch,文档上说解决的方法是用crontab定期清理client下的cssN.log

综上,所以监控软件查询v$asm_diskgroup时,总是会hang住(其实也不是完全hang住,等遍历完client下所有文件,就能跑完了,但是这会随着每次查询,每生成一个cssN.log,变得越来越慢)。当比较多的foglight进程查询v$asm_diskgroup就会堵住。

解决方案:定期清理$ORACLE_HOME/log/hostname/client下的文件,或者升级数据库到10.2.0.5以上,或者更改代码,不查询v$asm_diskgroup。

ps:当晚客户清理了目录,测试查询v$asm_diskgroup,不到1秒就出结果了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值