这周做的事情很少,现在整天都无所事事的。ORACLE、.NET也没有继续去学习,都是一知半解的样子,前段时间学ORACLE的积极性到不错,但是也没有什么钻研精神,没有那股执着劲。后面要做点WEB.NET的东东,有去照这MSDN的例子,然后东查西查总算是交差,不过也只是学会了新闻发布和文件上传下载这两块,了解了一下DATAGRID,他的确是个不错的控件。这周看了一下《行政》,对主机做了一下循检,发现了些问题,就写一下发现的问题和做的处理吧。
1、检查到数据库服务器的时候,发现两台服务器的负载有点高,下面是top的结果:
System: jtcwdb1 Thu Jul 28 10:49:31 2005
Load averages: 0.51, 0.52, 0.53
266 processes: 233 sleeping, 31 running, 2 zombies
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.49 23.7% 0.0% 20.5% 55.8% 0.0% 0.0% 0.0% 0.0%
1 0.57 30.9% 0.0% 31.9% 37.3% 0.0% 0.0% 0.0% 0.0%
2 0.49 47.7% 0.0% 12.6% 39.7% 0.0% 0.0% 0.0% 0.0%
3 0.51 35.1% 0.0% 17.9% 47.0% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 0.51 34.3% 0.0% 20.7% 45.0% 0.0% 0.0% 0.0% 0.0%
Memory: 3300248K (2132824K) real, 5175308K (3415928K) virtual, 222336K free Page# 1/30
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
0 ? 1168 root 152 20 28736K 1860K run 31397:49 98.52 98.34 rpcd
0 ? 12762 oracle 154 20 3922M 10032K sleep 1:42 16.11 16.08 oraclepgjt1
0 ? 7732 oracle 154 20 3916M 5600K sleep 4:49 15.61 15.58 oraclepgjt1
3 ? 12088 oracle 154 20 3916M 5528K sleep 0:05 11.00 10.98 oraclepgjt1
2 ? 14367 oracle 154 20 3916M 5560K sleep 0:13 10.16 10.14 oraclepgjt1
1 ? 10366 oracle 154 20 3915M 4668K sleep 1:49 8.20 8.19 oraclepgjt1
2 ? 8285 oracle 154 20 3917M 6660K sleep 1:52 8.08 8.07 oraclepgjt1
1 ? 27322 oracle 154 20 3916M 5836K sleep 10:40 8.04 8.03 oraclepgjt1
0 ? 27343 oracle 154 20 3917M 6384K sleep 13:41 5.59 5.58 oraclepgjt1
jtcwdb2的负载更是达到0.8以上,自从做了RAC后还没有这么高的负载。看一下进程,两台机器的rpcd进程的cpu占用都快100%,主机要求是不能宕机的,不敢随便处理,网上去查资料;
rpcd : Remote Procedure Call daemon 远程过程调用守护进程
有人问了同样的问题:
subject: rpcd daemon using 100% of 1 cpu .......why?
Steven Conkling
Jul 5, 2005 20:25:43 GMT
--------------------------------------------------------------------------------
I see this was discussed in other threads, but I never found a reason or answer if this is
right or wrong?? Should rpcd be using this much CPU? If not, what can be done to correct
it. On a 4 CPU system running HP-UX 11.i rpcd is using almost 100% of 1 cpu. This is the
same across at least 3 of my other HP-UX servers.
下面是解答的方法:
Jul 6, 2005 05:05:02 GMT unassigned
--------------------------------------------------------------------------------
hi steve
try
Rpcd stop
Rpcd start
regards
Vinod K
下面是提问者的解决办法:
Jul 6, 2005 15:45:27 GMT N/A: Question Author
--------------------------------------------------------------------------------
Thanks for all of your suggestions. I ended up using "kill -9" on the parent process "rpcd"
which has a defunct child process. None of the stop and starts of Rpcd did anything to stop
the process. Although the problem is corrected for now, I still don't understand why it
spun out of control on several servers, while others are o.k. I guess I need to open a call
with HP to find out if this is a known problem and a patch needs to be applied. But, thanks
to all............
一般是
/sbin/init.d/rpcd stop
来停止rpcd进程
# ps -ef |grep rpcd
root 1189 1 0 Jan 16 ? 31494:25 /opt/dce/sbin/rpcd
root 17565 17535 0 12:20:31 pts/0 0:00 grep rpcd
# cd /opt/dce/sbin
# rpcd stop
# top
rpcd进程还在。没有办法,在经过和别人商量,觉得反正是远程过程调用守护进程,应该影响不大,只
能KILL了
# kill -9 1189
同样方法处理JTCWDB2,处理后负载降低。
2、5分钟后又发现jtcwdb2 的负载还是不正常,下面是top:
System: jtcwdb2 Thu Jul 28 14:46:26 2005
Load averages: 0.88, 0.91, 0.85
243 processes: 208 sleeping, 34 running, 1 zombie
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.85 42.5% 0.0% 31.8% 25.6% 0.0% 0.0% 0.0% 0.0%
1 0.96 57.9% 0.0% 22.5% 19.7% 0.0% 0.0% 0.0% 0.0%
2 0.86 36.2% 0.0% 30.0% 33.8% 0.0% 0.0% 0.0% 0.0%
3 0.85 33.3% 0.0% 42.4% 24.3% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 0.88 42.5% 0.0% 31.6% 25.8% 0.0% 0.0% 0.0% 0.0%
Memory: 3627240K (2080452K) real, 5601848K (3607244K) virtual, 135464K free Page# 1/27
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
0 ? 13700 oracle 241 20 3901M 11036K run 61:37 96.32 96.15 oraclepgjt2
0 ? 29327 oracle 158 20 1063M 427M sleep 4157:27 11.92 11.90 tnslsnr
1 ? 15777 oracle 154 20 3899M 8660K sleep 2:35 10.23 10.21 oraclepgjt2
1 ? 5244 root 152 20 455M 426M run 2652:17 8.56 8.54 caiUxOs
0 ? 19288 oracle 154 20 3892M 4688K sleep 3:17 6.74 6.73 oraclepgjt2
0 ? 27128 oracle 154 20 3895M 5264K sleep 0:02 6.48 6.46 oraclepgjt2
0 ? 26950 oracle 198 20 3896M 6772K run 0:08 3.99 3.98 oraclepgjt2
1 ? 18686 oracle 154 20 3898M 8236K sleep 19:33 3.72 3.71 oraclepgjt2
3 ? 18688 oracle 154 20 3894M 5336K sleep 16:48 3.63 3.62 oraclepgjt2
一个oraclepgjt2数据库连接不正常:
select * from v$session where sid=(select pid from v$process where spid=13700);
eygle的捕获相关SQL教本:
这里用到了我的以下脚本getsqlbysid:
SELECT sql_text
FROM v$sqltext a
WHERE a.hash_value = (SELECT sql_hash_value
FROM v$session b
WHERE b.SID = ’&sid’)
ORDER BY piece ASC
使用下面的语句跟踪连接的SQL脚本,开始跟踪到语句,但是页面设置显示不够,set后,没有语句了。我晕可能是SGA已经处理完了,但是占用的资源还是没有释放,我又只能KILL了。
select sql_text from v$sqltext a where a.hash_value=(select sql_hash_value from v$session where sid=(select pid from v$process where spid=13700));
kill -9 13700
由于这个HP培训还没有做,所以很多问题处理不了,估计8月就要出去做培训了,好想去哦,希望能有我。