转载 :一个Job运行失败导致数据库挂死

最新推荐文章于 2021-04-06 22:13:09 发布

weixin_33868027

最新推荐文章于 2021-04-06 22:13:09 发布

阅读量293

点赞数

文章标签：数据库 awk 操作系统

原文链接：http://blog.51cto.com/snowwolfsas/982711

版权

jobssteve的博客

今天上午10点多的时候，同事接到一个电话，某数据库任何连接都连不上数据库，登录主机后发现，该数据库已经挂死，sqlplus都无法登陆，在alertlog中发现大量的“PMON failed to acquire latch, see PMON dump”。无奈之下，杀掉了oracle的进程，重启了数据库。

事后，我们来看看究竟是什么原因，造成了这次数据库的挂死。

我们看alertlog的相关报错，我们发现“PMON failed to acquire latch, see PMON dump”5月4日的10：32，而这个报错发生在之前，还有一个报错，在5月2日的4：51还有一个“>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=11”

……
Sun May? 1 12:00:16 2011
Thread 1 advanced to log sequence 1940
? Current log# 1 seq# 1940 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo01.log
Mon May? 2 03:39:57 2011
Starting control autobackup
Control autobackup written to DISK device
?? ?handle '/u08/IGP2TCACAC0/flash_recovery_area/IGP2TCACAC0/autobackup/2011_05_02/o1_mf_s_750051597_6vvfrh25_.bkp'
Mon May? 2 04:51:03 2011
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=11
System State dumped to trace file /u04/admin/IGP2TCACAC0/bdump/igp2tcacac0_mmon_811148.trc
Wed May? 4 10:32:32 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:33:32 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:34:33 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:35:34 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:36:35 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:37:35 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:38:36 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:39:37 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:40:37 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:43:06 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:44:07 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:45:08 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:46:09 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:47:10 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:48:10 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:49:11 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:50:12 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:51:13 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:53:42 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:54:42 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:55:43 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:56:44 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:57:45 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:58:45 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 10:59:46 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:00:47 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:01:47 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:04:16 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:05:17 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:06:18 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:07:19 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:08:19 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:09:20 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:10:21 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:11:21 2011
PMON failed to acquire latch, see PMON dump
Wed May? 4 11:12:22 2011
……

这是一个很重要的线索，因为这个报错，我们看出为什么当时同事在处理的时候，连sqlplus都登录不上了，因为sqlplus的登录需要row cache的latch，如果获取不到，就会长时间等待，出现挂死。

那么，为什么会出现WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK，我们从alertlog中发现，当出现WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK告警的时候，oracle还dump出来一个system state：/u04/admin/IGP2TCACAC0/bdump/igp2tcacac0_mmon_811148.trc（点击这里可以下载该trace file）

system state dump里面包含很多信息，包含每个进程的情况，但是这个文件往往比较大，对于初步分析比较困难了。幸好我们还有一个ass的工具可以帮助我们看system state dump（点击这里下载windows版的ass109.awk）。

我们通过ass先来看看当时的系统情况：

E:\budmp>awk -f ass109.awk igp2tcacac0_mmon_811148.trc
?
Starting Systemstate 1
................................
Ass.Awk Version 1.0.9 - Processing igp2tcacac0_mmon_811148.trc
?
System State 1
~~~~~~~~~~~~~~~~
1:
2:? waiting for 'pmon timer'? ? ? ? ? ? wait
3:? waiting for 'rdbms ipc message'? ? ?wait
4:? waiting for 'rdbms ipc message'? ? ?wait
5:? waiting for 'rdbms ipc message'? ? ?wait
6:? waiting for 'rdbms ipc message'? ? ?wait
7:? waiting for 'control file parallel write' wait
8:? waiting for 'latch: library cache' [Latch 7000000239479a0] wait
9:? waiting for 'rdbms ipc message'? ? ?wait
10: waiting for 'rdbms ipc message'? ? ?wait
11: last wait for 'ksdxexeotherwait'? ?[Rcache object=7000000259b04c0,]
12: waiting for 'rdbms ipc message'? ? ?wait
13:
14:
15: last wait for 'latch: library cache'[Latch 7000000239479a0]
16: last wait for 'latch: library cache'[Latch 7000000239479a0]
17:
18: last wait for 'latch: library cache'[Latch 7000000239479a0]
19: waiting for 'rdbms ipc message'? ? ?wait
20: waiting for 'rdbms ipc message'? ? ?wait
21: waiting for 'Streams AQ: qmn coordinator idle wait' wait
22: waiting for 'SQL*Net message from client' wait
23: waiting for 'SQL*Net message from client' wait
24: waiting for 'SQL*Net message from client' wait
25: last wait for 'latch: library cache'[Latch 7000000239479a0]
26: last wait for 'latch: library cache'[Latch 7000000239479a0]
27: waiting for 'Streams AQ: qmn slave idle wait' wait
28: last wait for 'latch: library cache'[Latch 7000000239479a0]
29: last wait for 'latch: library cache'[Latch 7000000239479a0]
30: last wait for 'latch: library cache'[Latch 7000000239479a0]
31: last wait for 'latch: library cache'[Latch 7000000239479a0]
32: last wait for 'latch: library cache'[Latch 7000000239479a0]
Blockers
~~~~~~~~
?
?? ? ? ?Above is a list of all the processes. If they are waiting for a resource
?? ? ? ?then it will be given in square brackets. Below is a summary of the
?? ? ? ?waited upon resources, together with the holder of that resource.
?? ? ? ?Notes:
?? ? ? ?~~~~~
?? ? ? ? o A process id of '???' implies that the holder was not found in the
?? ? ? ? ? systemstate.
?
?? ? ? ? ? ? ? ? ? ?Resource Holder State
?? ? ? Latch 7000000239479a0? ? ??? Blocker
Rcache object=7000000259b04c0,? ? 25: 25: is waiting for Latch 7000000239479a0
?
Object Names
~~~~~~~~~~~~
Latch 7000000239479a0? ?Child library cache
Rcache object=7000000259b04c0,
?
?
18724 Lines Processed.

从Resource Holder State部分，我们看到，有一个blocker，该blocker的latch是7000000239479a0。而且是这个latcher的holder，造成了其他的process，如process 15，16，25，26等等的last wait for ‘latch: library cache’。

我们从这个system state dump的原始文件中去看latch为7000000239479a0是什么，搜索7000000239479a0+holding：

PROCESS 17:
? ----------------------------------------
? SO: 700000025e5ef20, type: 2, owner: 0, flag: INIT/-/-/0x00
? (process) Oracle pid=17, calls cur/top: 70000002267efb0/700000025f9edc0, flag: (0) -
?? ? ? ? ? ?int error: 0, call error: 0, sess error: 0, txn error 0
? (post info) last post received: 0 0 0
?? ? ? ? ? ? ?last post received-location: No post
?? ? ? ? ? ? ?last process to post me: none
?? ? ? ? ? ? ?last post sent: 0 0 24
?? ? ? ? ? ? ?last post sent-location: ksasnd
?? ? ? ? ? ? ?last process posted by me: 700000025e59880 1 6
?? ?(latch info) wait_event=0 bits=20
?? ? ?holding? ? (efd=33) 7000000239479a0 Child library cache level=5 child#=3
?? ? ? ?Location from where latch is held: kglLockCursor:
?? ? ? ?Context saved from call: 2
?? ? ? ?state=busy, wlstate=free
?? ? ? ? ?waiters [orapid (seconds since: put on list, posted, alive check)]:
?? ? ? ? ? 15 (4311, 1304283076, 3)
?? ? ? ? ? 26 (4293, 1304283076, 3)
?? ? ? ? ? 16 (4263, 1304283076, 3)
?? ? ? ? ? 8 (4179, 1304283076, 3)
?? ? ? ? ? 18 (3153, 1304283076, 3)
?? ? ? ? ? 25 (3019, 1304283076, 3)
?? ? ? ? ? 28 (2811, 1304283076, 3)
?? ? ? ? ? 29 (1770, 1304283076, 3)
?? ? ? ? ? 30 (1770, 1304283076, 3)
?? ? ? ? ? 31 (390, 1304283076, 3)
?? ? ? ? ? 32 (390, 1304283076, 3)
?? ? ? ? ? waiter count=11
?? ?Process Group: DEFAULT, pseudo proc: 700000025ea17a8
?? ?O/S info: user: oracle1, term: UNKNOWN, ospid: 2592926
?? ?OSD pid info: Unix process pid: 2592926, p_w_picpath: oracle@sg2as059 (J000)

我们看到hold住latch 7000000239479a0是oracle@sg2as059 (J000)进程，也就是job的进程。也就说，由于这个j000进程的异常，hold住了7000000239479a0 的latch。

我们同时也看到job的cjq0进程有一个trace文件产生：

……
*** 2011-05-02 03:36:41.482
Waited for process J000 to initialize for 60 seconds
*** 2011-05-02 03:36:41.672
Dumping diagnostic information for J000:
OS pid = 2592926
loadavg : 0.64 1.00 1.22
swap info: free_mem = 83.24M rsv = 32.00M
?? ? ? ? ? alloc = 3498.95M avail = 8192.00M swap_free = 4693.05M
?? ? ? F S? ? ? UID? ? ?PID? ? PPID? ?C PRI NI ADDR? ? SZ? ? WCHAN? ? STIME? ? TTY? TIME CMD
? 240001 A? oracle1 2592926? ? ? ?1? ?0? 60 20 d90bf510 90668? ? ? ? ? 03:35:41? ? ? -? 0:00 ora_j000_IGP2TCACAC0
open: Permission denied
Warning: executed in non-root mode
procstack cannot verify that /unix matches the running kernel.
Kernel symbols might not be validated.
skgpgpstack: read() for cmd /bin/sh -c '/usr/bin/procstack 2592926 2>&1' timed out after 60 seconds

这里也可以看到，j000死掉之后，也无法释放掉，操作系统无法对其做procstack。

进一步的，我们看到，在类似的数据库重启前，都有一个cjq0的trace文件。我们初步判断，这个job在某些时候，会死掉且不能释放，而且，最后都是通过杀进程，重启database来解决的。

Errors in file /u04/admin/IGP2TCACAC0/bdump/igp2tcacac0_cjq0_864422.trc:
?
Wed Dec? 1 20:32:00 2010
Thread 1 advanced to log sequence 1508
? Current log# 1 seq# 1508 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo01.log
Thu Dec? 2 04:01:02 2010
Thread 1 advanced to log sequence 1509
? Current log# 2 seq# 1509 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo02.log
Thu Dec? 2 22:01:37 2010
Thread 1 advanced to log sequence 1510
? Current log# 3 seq# 1510 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo03.log
Fri Dec? 3 01:23:34 2010
Thread 1 advanced to log sequence 1511
? Current log# 1 seq# 1511 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo01.log
Fri Dec? 3 02:02:45 2010
Thread 1 advanced to log sequence 1512
? Current log# 2 seq# 1512 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo02.log
Fri Dec? 3 09:27:57 2010
Starting background process EMN0
EMN0 started with pid=22, OS id=1298498
Fri Dec? 3 09:27:57 2010
Shutting down instance: further logons disabled
Fri Dec? 3 09:27:58 2010
Stopping background process CJQ0
Fri Dec? 3 09:27:58 2010
Stopping background process QMNC
Fri Dec? 3 09:28:03 2010
Stopping background process MMNL
Fri Dec? 3 09:28:05 2010
Stopping background process MMON
Fri Dec? 3 09:28:08 2010
Shutting down instance (immediate)
License high water mark = 8
Fri Dec? 3 09:28:08 2010
Stopping Job queue slave processes
Fri Dec? 3 09:28:08 2010
Job queue slave processes stopped
Waiting for dispatcher 'D000' to shutdown
Waiting for shared server 'S000' to die
All dispatchers and shared servers shutdown
Fri Dec? 3 09:28:18 2010
Process OS id : 1556568 alive after kill
Errors in file
Fri Dec? 3 09:28:20 2010
ALTER DATABASE CLOSE NORMAL
Fri Dec? 3 09:28:21 2010
SMON: disabling tx recovery
SMON: disabling cache recovery
Fri Dec? 3 09:28:34 2010
Shutting down archive processes
Archiving is disabled
Fri Dec? 3 09:28:39 2010
ARCH shutting down
ARC1: Archival stopped
Fri Dec? 3 09:28:44 2010
ARCH shutting down
ARC0: Archival stopped
Fri Dec? 3 09:28:48 2010
Thread 1 closed at log sequence 1512
Successful close of redo thread 1
Fri Dec? 3 09:29:32 2010
Completed: ALTER DATABASE CLOSE NORMAL
Fri Dec? 3 09:29:32 2010
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Fri Dec? 3 09:46:29 2010
Starting ORACLE instance (normal)
……
Errors in file /u04/admin/IGP2TCACAC0/bdump/igp2tcacac0_ora_1269980.trc:
ORA-00610: Internal error code
Thu Dec 30 19:21:31 2010
Process J001 died, see its trace file
Thu Dec 30 19:21:31 2010
kkjcre1p: unable to spawn jobq slave process
Thu Dec 30 19:21:31 2010
Errors in file /u04/admin/IGP2TCACAC0/bdump/igp2tcacac0_cjq0_778370.trc:
?
Thu Dec 30 20:02:17 2010
Thread 1 advanced to log sequence 1573
? Current log# 3 seq# 1573 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo03.log
Thu Dec 30 22:25:41 2010
Thread 1 advanced to log sequence 1574
? Current log# 1 seq# 1574 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo01.log
Fri Dec 31 18:42:09 2010
Starting background process EMN0
EMN0 started with pid=21, OS id=725114
Fri Dec 31 18:42:09 2010
Shutting down instance: further logons disabled
Fri Dec 31 18:42:10 2010
Stopping background process CJQ0
Fri Dec 31 18:42:10 2010
Stopping background process QMNC
Fri Dec 31 18:42:14 2010
Stopping background process MMNL
Fri Dec 31 18:42:16 2010
Stopping background process MMON
Fri Dec 31 18:42:21 2010
Shutting down instance (immediate)
License high water mark = 8
Fri Dec 31 18:42:21 2010
Stopping Job queue slave processes
Fri Dec 31 18:42:21 2010
Job queue slave processes stopped
Waiting for dispatcher 'D000' to shutdown
All dispatchers and shared servers shutdown
Fri Dec 31 18:42:29 2010
Process OS id : 1540222 alive after kill
Errors in file
Fri Dec 31 18:42:32 2010
ALTER DATABASE CLOSE NORMAL
Fri Dec 31 18:42:33 2010
SMON: disabling tx recovery
SMON: disabling cache recovery
Fri Dec 31 18:42:40 2010
Shutting down archive processes
Archiving is disabled
Fri Dec 31 18:42:45 2010
ARCH shutting down
ARC1: Archival stopped
Fri Dec 31 18:42:50 2010
ARCH shutting down
ARC0: Archival stopped
Fri Dec 31 18:42:54 2010
Thread 1 closed at log sequence 1574
Successful close of redo thread 1
Fri Dec 31 18:43:40 2010
Completed: ALTER DATABASE CLOSE NORMAL
Fri Dec 31 18:43:40 2010
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Sun Jan? 2 05:15:39 2011
Starting ORACLE instance (normal)
……
kkjcre1p: unable to spawn jobq slave process
Tue Mar? 1 19:16:52 2011
Errors in file /u04/admin/IGP2TCACAC0/bdump/igp2tcacac0_cjq0_720992.trc:
?
Tue Mar? 1 22:03:30 2011
Thread 1 advanced to log sequence 1697
? Current log# 1 seq# 1697 mem# 0: /u08/IGP2TCACAC0/IGP2TCAC/redo01.log
Wed Mar? 2 02:42:10 2011
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
……

我们进一步来看看是什么job，很幸运，这个系统就只有一个job：

SQL>select*fromdba_jobs;
?
?? ? ?JOBLOG_USER? ? ? ? ? ? ? ? ? ? ? ?PRIV_USER? ? ? ? ? ? ? ? ? ? ?SCHEMA_USER? ? ? ? ? ? ? ? ? ?LAST_DATLAST_SEC? ? ? ? ? ? ? ? ? ? ? ? ?THIS_DATTHIS_SEC? ? ? ? ? ? ? ? ? ? ? ? ?NEXT_DATNEXT_SEC? ? ? ? ? ? ? ? ? ? ? ? ?TOTAL_TIMEBINTERVAL? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?FAILURES
---------- ------------------------------ ------------------------------ ------------------------------ -------- -------------------------------- -------- -------------------------------- -------- -------------------------------- ---------- - -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------
WHAT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NLS_ENV
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MISC_ENV? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?INSTANCE
---------------------------------------------------------------- ----------
?? ? ? ?1SYSMAN? ? ? ? ? ? ? ? ? ? ? ? ?SYSMAN? ? ? ? ? ? ? ? ? ? ? ? ?SYSMAN? ? ? ? ? ? ? ? ? ? ? ? ?05/05/1117:00:26? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?05/05/1117:01:26? ? ? ? ? ? ? ? ? ? ? ? ? ? ?374519Nsysdate+1/(24*60)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0
EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS();
NLS_LANGUAGE='AMERICAN'NLS_TERRITORY='FRANCE'NLS_CURRENCY='?'NLS_ISO_CURRENCY='FRANCE'NLS_NUMERIC_CHARACTERS=','NLS_DATE_FORMAT='DD/MM/RR'NLS_DATE_LANGUAGE='AMERICAN'NLS_SORT='BINARY'
0102000000000000? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0
?
?
SQL>

我们查到metalink中，这个job的用处，What is EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS dbms_job and how to Remove / Re-create it [ID 444033.1]

Applies to:
Enterprise Manager Grid Control - Version: 10.1.0.2 to 11.1.0.7 - Release: 10.1 to 11.1
Information in this document applies to any platform.
Enterprise Manager Database Control - 10.1.0.2 onwards
Goal
?
This document describes the usage of the EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS dbms_job in a 10g and above version database. This also lists the steps to be taken if this job needs to be removed or added back manually.
?
NOTE : The dbms job EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS is present in the database running a DBConsole and the note does not applies to Enterprise Manager Grid Control.
?
Solution
?
If Database Control has been configured for a 10G / 11G version database, then logging in as SYSMAN user and running this query returns :
SQL> select what from dba_jobs;
?
WHAT
------------------------------------------------------
EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS();
?
The EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS job performs all the necessary maintenance tasks for the database control repository. These tasks include :
?
+ Agent Ping Verification (EM_PING.MARK_NODE_STATUS)
+ Job Purge (MGMT_JOB_ENGINE.APPLY_PURGE_POLICIES)
+ Metric Rollup (EMD_LOADER.ROLLUP)
+ Purge Policies (EM_PURGE.APPLY_PURGE_POLICIES)
+ Repository Metric Severity Calculation (EM_SEVERITY_REPOS.EXECUTE_REPOS_SEVERITY_EVAL)
+ Repository Side Collections (EMD_COLLECTION.RUN_COLLECTIONS)
+ Send Notifications
?
This job should be running every minute for performing all the above operations.
?
1. If Database Control is not required, then this job can be removed :
?
- Log in as sysman user on the database
- Note down the value of the job_queue_processes parameter in the database :
SQL> show parameter job_queue_processes
?
- Set the job_queue_processes to 0 and check that there are no dbms_jobs running :
SQL> alter system set job_queue_processes=0;
SQL> select * from dba_jobs_running;
?
Re-run the second sql query a few times, till the output of the query returns "no rows selected".
?
- Login to the database as the SYSMAN user and run the following SQL scripts to submit this job :
SQL> @<ORACLE_HOME>\sysman\admin\emdrep\sql\core\latest\admin\
admin_remove_dbms_jobs.sql;
?
?? OR
?
SQL> exec emd_maintenance.remove_em_dbms_jobs;
SQL> commit;
?
2. If the job has been accidentay removed, follow these steps to re-submit it :
?
- Login to repository as sysman user.
?
- Ensure that the job_queue_processes parameter has been set to a value > 1 :
SQL> show parameter job_queue_processes
?
- To submit the job :
SQL> @<ORACLE_HOME>\sysman\admin\emdrep\sql\core\latest\admin\
admin_submit_dbms_jobs.sql;
?
?? OR
?
SQL> exec emd_maintenance.submit_em_dbms_jobs;
?
- Recompile the invalid objects :
For 10g and 11.1.0.6 Database
SQL> exec emd_maintenance.recompile_invalid_objects;
For 11.1.0.7.0 and above databases:
SQL> exec emd_maint_util.recompile_invalid_objects;
?
- Check that the EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS job has been created in the sysman schema.

在这个系统中，由于我们不用dbconsole，所以我们可以移除这个job，避免问题的再次发生。

转载于:https://blog.51cto.com/snowwolfsas/982711

weixin_33868027

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
转载 :一个Job运行失败导致数据库挂死

jobssteve的博客今天上午10点多的时候，同事接到一个电话，某数据库任何连接都连不上数据库，登录主机后发现，该数据库已经挂死，sqlplus都无法登陆，在alertlog中发现大量的“PMON failed to acquire latch, see PMON dump”。无奈之下，杀掉了oracle的进程，重启了数据库。事后，我们来看看究竟是什么原因...
复制链接

扫一扫