我在
java进程和nrpe检查方面遇到了一些问题.我们有一些进程有时在32核系统上使用1000%cpu.在您执行此操作之前,系统非常敏感
ps aux
或尝试在/ proc / pid#中做任何事情
[root@flume07.domain.com /proc/18679]# ls
hangs..
ps aux的一个片段
stat("/etc/localtime",{st_mode=S_IFREG|0644,st_size=2819,...}) = 0
stat("/etc/localtime",...}) = 0
stat("/dev/pts1",0x7fffb8526f00) = -1 ENOENT (No such file or directory)
stat("/dev/pts",{st_mode=S_IFDIR|0755,st_size=0,...}) = 0
readlink("/proc/15693/fd/2","/dev/pts/1",127) = 10
stat("/dev/pts/1",{st_mode=S_IFCHR|0620,st_rdev=makedev(136,1),...}) = 0
write(1,"root 15693 15692 0 06:25 pt"...,55root 15693 15692 0 06:25 pts/1 00:00:00 ps -Af
) = 55
stat("/proc/18679",{st_mode=S_IFDIR|0555,...}) = 0
open("/proc/18679/stat",O_RDONLY) = 5
read(5,"18679 (java) S 1 18662 3738 3481"...,1023) = 264
close(5) = 0
open("/proc/18679/status",O_RDONLY) = 5
read(5,"Name:\tjava\nState:\tS (sleeping)\nT"...,1023) = 889
close(5) = 0
open("/proc/18679/cmdline",O_RDONLY) = 5
read(5,
java进程正在运行并且完成得很好但是问题是它让我们的监控变得疯狂,因为它等待ps aux完成超时.
我尝试过这样的事情
nice -19 ionice -c1 /usr/lib64/nagios/plugins/check_procs -w 1:1 -c 1:1 -a 'diamond' -u root -t 30
没有运气
编辑
系统规格
> 32核心英特尔(R)Xeon(R)cpu E5-2650 0 @ 2.00GHz
> 128gig的ram
> 12个4Tb 7200驱动器
> CentOS 6.5
>我不确定模型,但供应商是SuperMicro
发生这种情况时的负荷约为90-160ish,持续1分钟.
奇怪的是我可以进入任何其他/ proc / pid#并且它工作得很好.当我进入ssh时,系统会响应.就像当我们收到高负荷的警报时,我可以直接判断.
另一个编辑
我一直在使用调度程序的截止日期
[root@dn07.domain.com ~]# for i in {a..m}; do cat /sys/block/sd${i}/queue/scheduler; done
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
noop anticipatory [deadline] cfq
山看起来像
[root@dn07.manage.com ~]# mount
/dev/sda3 on / type ext4 (rw,noatime,barrier=0)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext2 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/sdb1 on /disk1 type xfs (rw,nobarrier)
/dev/sdc1 on /disk2 type xfs (rw,nobarrier)
/dev/sdd1 on /disk3 type xfs (rw,nobarrier)
/dev/sde1 on /disk4 type xfs (rw,nobarrier)
/dev/sdf1 on /disk5 type xfs (rw,nobarrier)
/dev/sdg1 on /disk6 type xfs (rw,nobarrier)
/dev/sdh1 on /disk7 type xfs (rw,nobarrier)
/dev/sdi1 on /disk8 type xfs (rw,nobarrier)
/dev/sdj1 on /disk9 type xfs (rw,nobarrier)
/dev/sdk1 on /disk10 type xfs (rw,nobarrier)
/dev/sdl1 on /disk11 type xfs (rw,nobarrier)
/dev/sdm1 on /disk12 type xfs (rw,nobarrier)
好的我尝试安装tuned并将其设置为吞吐量性能.
[root@dn07.domain.com ~]# tuned-adm profile throughput-performance
Switching to profile 'throughput-performance'
Applying deadline elevator: sda sdb sdc sdd sde sdf sdg sdh[ OK ] sdk sdl sdm
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf: [ OK ]
Calling '/etc/ktune.d/tunedadm.sh start': [ OK ]
Applying sysctl settings from /etc/sysctl.d/99-chef-attributes.conf
Applying sysctl settings from /etc/sysctl.conf
Starting tuned: [ OK ]