linux 性能优化笔记。。。 (1 G内存服务器)

近期服务器性能极不稳定,打开首页速度时快时慢(目前只有一个用户,属于企业内部应用,重点是数据量非常大, 1K~ 100K 之间)

下面是从我的邮件摘录出一部分内容:

1. 访问速度不稳定。时快时慢。

[quote]下面是 今天的 (2012-05-25) 的日志: 可以看到, 前两个操作,耗费的时间非常久。。。

00:50:33 INFO: Completed 200 OK in 270450ms (Views: 1512.9ms | ActiveRecord: 13358.6ms)
01:42:15 INFO: Completed 200 OK in 246410ms (Views: 1429.3ms | ActiveRecord: 10685.1ms)
01:43:35 INFO: Completed 200 OK in 72ms (Views: 71.3ms | ActiveRecord: 0.0ms)
01:43:42 INFO: Completed 200 OK in 1852ms (Views: 1326.3ms | ActiveRecord: 165.6ms)
01:53:41 INFO: Completed 200 OK in 3749ms (Views: 1334.6ms | ActiveRecord: 2063.1ms)
01:54:23 INFO: Completed 200 OK in 13ms (Views: 7.4ms | ActiveRecord: 0.0ms)
01:54:25 INFO: Completed 200 OK in 284ms (Views: 43.3ms | ActiveRecord: 225.3ms)
01:54:31 INFO: Completed 200 OK in 246ms (Views: 42.4ms | ActiveRecord: 200.2ms)
01:54:31 INFO: Completed 200 OK in 477ms (Views: 313.1ms | ActiveRecord: 160.1ms)


另外从昨天的日志(05-24)进行一下分析,可以看到 ip : 172.21.100.1 打开首页,最慢的用了 9,704秒。(2个多小时,详细见附件: log_0524 ) (使用命令:cat log_from_configfun/manually_2012-05-24.log | grep -E 'Completed|GET|POST' > log_0524 )


867 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 19:11:54 +0000 2012
868 19:12:13 INFO: Completed 200 OK in 39137ms (Views: 5839.8ms | ActiveRecord: 4412.1ms)
869 Started GET "/babble_portal/device_groups" for 172.21.100.1 at Thu May 24 19:12:25 +0000 2012
870 19:13:45 INFO: Completed 200 OK in 93696ms (Views: 38701.7ms | ActiveRecord: 40828.7ms)
871 Started GET "/babble_portal/device_groups" for 172.21.100.1 at Thu May 24 19:14:42 +0000 2012
872 21:57:41 INFO: Completed 200 OK in 9704339ms (Views: 1176.9ms | ActiveRecord: 76436.0ms)
873 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:57:43 +0000 2012
874 21:58:20 INFO: Completed 200 OK in 9954427ms (Views: 174.3ms | ActiveRecord: 68178.3ms)
875 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:58:21 +0000 2012
876 21:58:21 INFO: Completed 200 OK in 38122ms (Views: 1435.4ms | ActiveRecord: 35950.5ms)
877 21:58:45 INFO: Completed 200 OK in 23981ms (Views: 2407.9ms | ActiveRecord: 20904.1ms)


而对于 打开首页的速度,我也做了一个分析:( 见附件: landing_page_log, 使用的命令是: cat log_from_configfun/manually_2012-05-24.log | grep -E 'Completed|GET.*portal\/"' | grep GET -A 1 > landing_page_log ) 可以看出,绿色的部分表示程序运行正常,而红色的部分表示程序运行极慢。。。

49 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:46:38 +0000 2012
50 18:46:39 INFO: Completed 200 OK in 975ms (Views: 677.1ms | ActiveRecord: 105.7ms)
51 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:47:04 +0000 2012
52 18:47:05 INFO: Completed 200 OK in 981ms (Views: 685.0ms | ActiveRecord: 98.0ms)
53 --
54 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:48:45 +0000 2012
55 18:48:47 INFO: Completed 200 OK in 1109ms (Views: 670.0ms | ActiveRecord: 89.7ms)
56 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:48:51 +0000 2012
57 18:48:51 INFO: Completed 200 OK in 944ms (Views: 656.4ms | ActiveRecord: 99.7ms)
58 --
59 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:54:40 +0000 2012
60 18:54:42 INFO: Completed 200 OK in 1847ms (Views: 1647.4ms | ActiveRecord: 116.9ms)
61 --
62 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:55:10 +0000 2012
63 18:55:12 INFO: Completed 200 OK in 1330ms (Views: 1142.7ms | ActiveRecord: 102.4ms)
64 --
65 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:56:34 +0000 2012
66 18:56:35 INFO: Completed 200 OK in 1095ms (Views: 779.4ms | ActiveRecord: 100.7ms)
67 --
68 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 19:11:54 +0000 2012
69 19:12:13 INFO: Completed 200 OK in 39137ms (Views: 5839.8ms | ActiveRecord: 4412.1ms)
70 --
71 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:57:43 +0000 2012
72 21:58:20 INFO: Completed 200 OK in 9954427ms (Views: 174.3ms | ActiveRecord: 68178.3ms)
73 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:58:21 +0000 2012
74 21:58:21 INFO: Completed 200 OK in 38122ms (Views: 1435.4ms | ActiveRecord: 35950.5ms)


所以我现在考虑的方向是,是不是configfun 服务器当时 ( 21:58分, 19:12分)正在忙于运行其他程序?[/quote]

2. 仔细看了服务器的日志,没有发现任何出错的问题

[quote]
没有看到任何 FATAL, WARN 这样级别的日志。

看起来都很正常。

就是服务器的响应时间时快时慢。

我查询了其他日志(delayed_job ,rake )也没有看到相关的。

我准备看一下 /var/log里面的文件,看能不能找到线索[/quote]


3. 初步找到问题所在

[quote]
经过对/var/log 目录的排查,我发现了3个相关的程序:

1. tomcat 的crontab。 它每分钟都运行一次。

May 24 21:56:55 configfun crond[30679]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:56:55 configfun crond[30658]: (tomcat) error: Job execution of per-minute job scheduled for 19:18 de
May 24 21:56:55 configfun crond[30658]: CRON (tomcat) ERROR: cannot set security context
May 24 21:57:01 configfun crond[30724]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:58:01 configfun crond[30815]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:59:03 configfun crond[30875]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null


具体命令:
(tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/null)

大概看了一下,不会PERL, 但是发现应该是用来备份TOMCAT的日志。。。
每分钟运行一次。。。
1. 不过系统中没有tomcat 或者jetty在运行啊 ( ps -ef | grep tomcat ,没结果)
2. 估计它对性能影响不大。

附件 ( move_tomcat_log.pl ) 就是这个脚本。 同时,相关的日志是: /var/log/cron (见附件:cron, 877k)


2. /usr/lib64/nagios/plugins/contrib/check_logfiles

通过查看 /var/log/secure 发现的,如下:

May 24 18:47:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:49:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:52:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:54:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:57:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:59:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:02:55 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:04:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:07:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:09:55 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:13:03 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:15:00 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:56:46 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:57:54 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:57:54 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con


这个程序的运行时间不固定,但是也非常频繁。我对它不了解,它的官方站点可能是: http://labs.consol.de/lang/en/nagios/check_logfiles/
相关的附件: secure 692k

3. 由于 系统内存太少, apache 进程被关掉了。

这个我觉得可能是 9k 秒才打开首页的原因。见附件: messages 216k

May 24 21:57:47 configfun kernel: DMA: 15*4kB 11*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048k
May 24 21:57:47 configfun kernel: DMA32: empty
May 24 21:57:47 configfun kernel: Normal: empty
May 24 21:57:47 configfun kernel: HighMem: empty
May 24 21:57:47 configfun kernel: 1340 pagecache pages
May 24 21:57:47 configfun kernel: Swap cache: add 18501044, delete 18500782, find 19025765/21574350, race 2+74
May 24 21:57:47 configfun kernel: Free swap = 0kB
May 24 21:57:47 configfun kernel: Total swap = 1052248kB
May 24 21:57:47 configfun kernel: Free swap: 0kB
May 24 21:57:47 configfun kernel: 264192 pages of RAM
May 24 21:57:47 configfun kernel: 8353 reserved pages
May 24 21:57:47 configfun kernel: 12936 pages shared
May 24 21:57:47 configfun kernel: 262 pages swap cached
May 24 21:57:47 configfun kernel: Out of memory: Killed process 4553, UID 48, (httpd).


可以看出,进程被删掉的时间,跟 passenger的日志很吻合( 21:57:xx)

另外,今天下午,我写了一个小脚本,每隔1分20秒,就抓取一次babble首页, 响应的速度很好,服务器端的日志都在1秒以内。

07:23:39 INFO: Completed 200 OK in 878ms (Views: 596.6ms | ActiveRecord: 107.6ms)
07:25:03 INFO: Completed 200 OK in 837ms (Views: 595.4ms | ActiveRecord: 90.2ms)
07:26:27 INFO: Completed 200 OK in 844ms (Views: 671.8ms | ActiveRecord: 105.1ms)
07:27:51 INFO: Completed 200 OK in 823ms (Views: 547.6ms | ActiveRecord: 104.4ms)
07:29:15 INFO: Completed 200 OK in 802ms (Views: 539.5ms | ActiveRecord: 94.2ms)
07:30:39 INFO: Completed 200 OK in 872ms (Views: 596.1ms | ActiveRecord: 101.2ms)
07:32:04 INFO: Completed 200 OK in 933ms (Views: 664.1ms | ActiveRecord: 99.6ms)
07:33:28 INFO: Completed 200 OK in 854ms (Views: 679.0ms | ActiveRecord: 103.8ms)
07:34:52 INFO: Completed 200 OK in 799ms (Views: 639.7ms | ActiveRecord: 89.4ms)
07:36:16 INFO: Completed 200 OK in 817ms (Views: 552.5ms | ActiveRecord: 88.8ms)
07:37:40 INFO: Completed 200 OK in 857ms (Views: 577.7ms | ActiveRecord: 101.2ms)
07:39:04 INFO: Completed 200 OK in 878ms (Views: 595.0ms | ActiveRecord: 100.6ms)
07:40:29 INFO: Completed 200 OK in 937ms (Views: 663.3ms | ActiveRecord: 97.4ms)
07:41:53 INFO: Completed 200 OK in 788ms (Views: 626.4ms | ActiveRecord: 93.4ms)
07:43:17 INFO: Completed 200 OK in 844ms (Views: 564.9ms | ActiveRecord: 94.4ms)
07:44:28 INFO: Completed 200 OK in 860ms (Views: 583.1ms | ActiveRecord: 109.1ms)
07:44:42 INFO: Completed 200 OK in 789ms (Views: 534.1ms | ActiveRecord: 93.6ms)
07:46:06 INFO: Completed 200 OK in 969ms (Views: 696.8ms | ActiveRecord: 95.7ms)
07:47:31 INFO: Completed 200 OK in 839ms (Views: 674.5ms | ActiveRecord: 95.3ms)
07:48:56 INFO: Completed 200 OK in 858ms (Views: 691.3ms | ActiveRecord: 98.6ms)
07:50:20 INFO: Completed 200 OK in 847ms (Views: 581.7ms | ActiveRecord: 94.6ms)
07:51:45 INFO: Completed 200 OK in 835ms (Views: 561.5ms | ActiveRecord: 96.5ms)
07:53:09 INFO: Completed 200 OK in 865ms (Views: 603.2ms | ActiveRecord: 105.3ms)
07:54:33 INFO: Completed 200 OK in 953ms (Views: 687.8ms | ActiveRecord: 101.7ms)
07:55:58 INFO: Completed 200 OK in 892ms (Views: 720.2ms | ActiveRecord: 103.2ms)
07:57:22 INFO: Completed 200 OK in 781ms (Views: 523.7ms | ActiveRecord: 87.9ms)
07:58:46 INFO: Completed 200 OK in 857ms (Views: 596.3ms | ActiveRecord: 102.2ms)
08:00:11 INFO: Completed 200 OK in 890ms (Views: 596.2ms | ActiveRecord: 113.1ms)
08:01:36 INFO: Completed 200 OK in 859ms (Views: 589.9ms | ActiveRecord: 98.9ms)


所以,我觉得是不是configfun的内存太小 ( 1 G 内存) ,运行的程序太多,导致的它性能不稳定?
能否申请给它增加 内存?
[/quote]


4. 不过,在增加内存(从硬件着手解决问题)之前,还是要分析一下,看看这1G的现有内存是如何被程序占用的。
根据 [url]http://stackoverflow.com/questions/131303/linux-how-to-measure-actual-memory-usage-of-an-application-or-process[/url] ,先使用 $ ps aux

重点查看 VSZ 和 RSS [quote]
RSS: Resident set size, the non-swapped physical memory that a task has used (in kiloBytes). 物理内存

VSZ: virtual memory usage of entire process. vm_lib + vm_exe + vm_data + vm_stack 使用的总内存吧?[/quote]

$ ps aux --sort -rss,-vsz (针对 RSS, VSZ 倒序排序, 参考: [url]http://www.devdaily.com/linux/unix-linux-process-memory-sort-ps-command-cpu[/url])

结果:

[quote]USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 15152 0.0 9.3 183640 97556 ? S 01:23 0:01 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 10880 0.0 6.1 148424 64632 ? S 00:01 0:00 delayed_job
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 31266 0.0 6.0 148412 63160 ? S May11 6:59 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 25447 0.0 5.9 147764 62648 ? S May07 7:05 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38064 8820 ? S May27 0:00 Passenger spawn server
ntp 1705 0.0 0.5 19620 5316 ? SLs Apr17 3:36 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 1924 0.0 0.4 256936 4708 ? SN Apr17 0:28 /usr/bin/python -tt /usr/sbin/yum-updatesd
root 14197 0.0 0.3 88596 3236 ? Ss 01:06 0:00 sshd: kcv478 [priv]
root 14862 0.0 0.3 88596 3236 ? Ss 01:18 0:00 sshd: kcv478 [priv]
root 16184 0.0 0.3 88596 3236 ? Ss 01:39 0:00 sshd: kcv478 [priv]
apache 10297 0.0 0.2 210820 2596 ? S May27 0:00 /usr/sbin/httpd
mysql 1768 0.0 0.2 306612 2140 ? Sl Apr17 12:16 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/m
apache 10301 0.0 0.2 210820 2128 ? S May27 0:00 /usr/sbin/httpd
apache 10299 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
apache 10300 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
apache 10302 0.0 0.1 210820 2084 ? S May27 0:00 /usr/sbin/httpd
apache 10304 0.0 0.1 210820 2072 ? S May27 0:00 /usr/sbin/httpd
apache 10298 0.0 0.1 210820 2068 ? S May27 0:00 /usr/sbin/httpd
apache 10303 0.0 0.1 210820 2068 ? S May27 0:00 /usr/sbin/httpd
kcv478 14260 0.0 0.1 88736 1712 ? S 01:07 0:00 sshd: kcv478@pts/0
kcv478 14864 0.0 0.1 88736 1688 ? S 01:18 0:00 sshd: kcv478@pts/1
kcv478 16195 0.0 0.1 88736 1684 ? S 01:39 0:00 sshd: kcv478@pts/2
root 30621 0.0 0.1 210688 1584 ? Ss May11 5:49 /usr/sbin/httpd
kcv478 14261 0.0 0.1 64316 1584 pts/0 Ss+ 01:07 0:00 -bash
kcv478 16197 0.0 0.1 64316 1584 pts/2 Ss 01:39 0:00 -bash
kcv478 14865 0.0 0.1 64316 1576 pts/1 Ss+ 01:18 0:00 -bash
root 10285 0.0 0.1 91856 1368 ? Sl May27 0:00 PassengerHelperAgent
nobody 10290 0.0 0.1 49844 1240 ? Sl May27 0:00 PassengerLoggingAgent
root 1679 0.0 0.1 158436 1104 ? Sl Apr17 6:02 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/sn
kcv478 17354 0.0 0.0 63824 888 pts/2 R+ 01:57 0:00 ps aux --sort -rss,-vsz
root 1774 0.0 0.0 67864 868 ? Ss Apr17 6:06 sendmail: accepting connections
root 1926 0.0 0.0 11040 804 ? SN Apr17 0:14 /usr/libexec/gam_server
avahi 1893 0.0 0.0 22508 732 ? Ss Apr17 0:00 avahi-daemon: running [configfun.local]
nscd 1663 0.0 0.0 195100 724 ? Ssl Apr17 24:08 /usr/sbin/nscd
smmsp 1782 0.0 0.0 56032 632 ? Ss Apr17 5:40 sendmail: Queue runner@01:00:00 for /var/spool/cl
root 10282 0.0 0.0 15296 632 ? Ssl May27 0:00 PassengerWatchdog
dbus 1463 0.0 0.0 21360 620 ? Ss Apr17 0:03 dbus-daemon --system
root 1717 0.0 0.0 64060 544 ? S Apr17 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/m
root 1862 0.0 0.0 72940 480 ? Ss Apr17 5:56 crond
root 1769 0.0 0.0 58900 476 ? S Apr17 0:00 logger -p daemon.err -t mysqld_safe -i -t mysqld
root 1915 0.0 0.0 3804 444 ? Ss+ Apr17 0:00 /sbin/agetty xvc0 9600 vt100-nav
root 1297 0.0 0.0 92864 440 ? S<sl Apr17 30:04 auditd
root 1646 0.0 0.0 121988 432 ? Ssl Apr17 12:02 automount
68 1511 0.0 0.0 30480 368 ? Ss Apr17 0:00 hald
nagios 1744 0.0 0.0 38304 344 ? Ss Apr17 0:03 nrpe -c /etc/nagios/nrpe.cfg -d
root 1877 0.0 0.0 17068 340 ? Ss Apr17 5:52 /usr/sbin/atd
root 1319 0.0 0.0 5912 316 ? Ss Apr17 6:10 syslogd -m 0
root 1691 0.0 0.0 61640 312 ? Ss Apr17 0:00 /usr/sbin/sshd
root 1299 0.0 0.0 81804 272 ? S<sl Apr17 36:40 /sbin/audispd
root 1377 0.0 0.0 10764 268 ? Ss Apr17 6:07 irqbalance
root 1912 0.0 0.0 18420 260 ? S Apr17 5:23 /usr/sbin/smartd -q never
root 1498 0.0 0.0 21044 196 ? Ssl Apr17 6:09 pcscd
root 1322 0.0 0.0 3808 196 ? Ss Apr17 5:55 klogd -x
avahi 1894 0.0 0.0 21484 120 ? Ss Apr17 0:00 avahi-daemon: chroot helper
root 1 0.0 0.0 10352 84 ? Ss Apr17 5:38 init [3]
root 1512 0.0 0.0 21700 8 ? S Apr17 0:00 hald-runner
root 1449 0.0 0.0 55184 4 ? Ss Apr17 0:00 rpc.idmapd
root 372 0.0 0.0 12608 4 ? S<s Apr17 0:00 /sbin/udevd -d
root 1427 0.0 0.0 10164 4 ? Ss Apr17 0:00 rpc.statd
root 1549 0.0 0.0 8520 4 ? Ss Apr17 0:00 /usr/bin/hidd --server
rpc 1387 0.0 0.0 8056 4 ? Ss Apr17 0:00 portmap
root 2 0.0 0.0 0 0 ? S< Apr17 0:04 [migration/0]
[/quote]


5. 开始优化

5.1 15152 这个PID 排在第一位,不知道它是干啥的。总之看起来跟我的RAILS启动方式无关(passenger + apache ), 路径确实是目前部署的应用的所在路径,所以直接把它停止。
$ kill -9 15152

之后, 人肉测试一下,没有影响到当前的应用。
top:
top - 02:24:07 up 40 days, 9:58, 3 users, load average: 0.07, 0.05, 0.01
Tasks: 121 total, 2 running, 119 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 979032k used, 69544k free, 8008k buffers
Swap: 1052248k total, 363172k used, 689076k free, 134872k cached

不过。。。发现它又上来了。。。
ps aux --sort -rss,-vsz

[quote]
[kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 1.0 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 19034 1.3 6.4 152392 67504 ? S 02:23 0:04 Passenger ApplicationSpawner: /rails_apps/babble_
kcv478 10880 0.0 6.1 148424 64632 ? S 00:01 0:00 delayed_job
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 25447 0.0 6.0 148292 63172 ? S May07 7:06 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 31266 0.0 6.0 148412 63160 ? S May11 6:59 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38072 8868 ? S May27 0:00 Passenger spawn server [/quote]

看来它是正常工作的进程。嘿嘿。。。

继续。。。

5.2 9 个delayed_job 进程
我每次部署的时候,都忘记停止当前应用的进程,而是直接把当前应用删掉,然后在开启一个新进程,会不会现在的问题跟它有关?

我先看看目前的DELAYED——JOB的正常的 PID:
(15152, 10880) 一共是两个进程,(其中一个15152 已经被我提前删掉了)
把它们都kill 之后,

[quote]top - 02:33:44 up 40 days, 10:08, 3 users, load average: 0.10, 0.06, 0.01
Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.7%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 859788k used, 188788k free, 9292k buffers
Swap: 1052248k total, 363164k used, 689084k free, 129688k cached
[/quote]
非常好,系统物理内存变成了: 859788k used, 188788k free,

再看看 ps aux ,不错,变成了8个delayed_job了。它们都是没用的。
[quote]
[kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 0.5 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 25447 0.0 6.0 148292 63172 ? S May07 7:06 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 31266 0.0 6.0 148288 63124 ? S May11 6:59 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38064 8900 ? S May27 0:00 Passenger spawn server [/quote]

5.3 把剩下8个没用的 delayed_job 进程停止掉。

现在 top 一下,帅呆了。。。释放了 680 MB内存。。。

top - 02:42:44 up 40 days, 10:17,  3 users,  load average: 0.00, 0.00, 0.00
Tasks: 111 total, 2 running, 109 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 366628k used, 681948k free, 12708k buffers
Swap: 1052248k total, 57936k used, 994312k free, 129896k cached


看看 ps 的结果:

[quote][kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 0.2 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
root 10287 0.0 0.8 38064 8900 ? S May27 0:00 Passenger spawn server
ntp 1705 0.0 0.5 19620 5316 ? SLs Apr17 3:36 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 1924 0.0 0.4 256936 4708 ? SN Apr17 0:28 /usr/bin/python -tt /usr/sbin/yum-updatesd
root 14197 0.0 0.3 88596 3236 ? Ss 01:06 0:00 sshd: kcv478 [priv]
root 14862 0.0 0.3 88596 3236 ? Ss 01:18 0:00 sshd: kcv478 [priv]
root 16184 0.0 0.3 88596 3236 ? Ss 01:39 0:00 sshd: kcv478 [priv]
apache 10297 0.0 0.2 210820 3100 ? S May27 0:00 /usr/sbin/httpd
apache 19039 0.0 0.2 210820 2684 ? S 02:23 0:00 /usr/sbin/httpd
apache 10301 0.0 0.2 210820 2620 ? S May27 0:00 /usr/sbin/httpd
apache 10302 0.0 0.2 210820 2600 ? S May27 0:00 /usr/sbin/httpd
apache 10304 0.0 0.2 210820 2576 ? S May27 0:00 /usr/sbin/httpd
apache 10298 0.0 0.2 210820 2568 ? S May27 0:00 /usr/sbin/httpd
mysql 1768 0.0 0.2 306612 2140 ? Sl Apr17 12:16 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/m
apache 10300 0.0 0.2 210820 2140 ? S May27 0:00 /usr/sbin/httpd
apache 10299 0.0 0.2 210820 2100 ? S May27 0:00 /usr/sbin/httpd
apache 10303 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
kcv478 14260 0.0 0.1 88736 1712 ? S 01:07 0:00 sshd: kcv478@pts/0
kcv478 16195 0.0 0.1 88736 1692 ? S 01:39 0:00 sshd: kcv478@pts/2
kcv478 14864 0.0 0.1 88736 1688 ? S 01:18 0:00 sshd: kcv478@pts/1
root 30621 0.0 0.1 210688 1624 ? Ss May11 5:49 /usr/sbin/httpd
root 10285 0.0 0.1 92196 1588 ? Sl May27 0:01 PassengerHelperAgent
kcv478 14261 0.0 0.1 64316 1588 pts/0 Ss+ 01:07 0:00 -bash
kcv478 16197 0.0 0.1 64316 1588 pts/2 Ss 01:39 0:00 -bash
kcv478 14865 0.0 0.1 64316 1584 pts/1 Ss+ 01:18 0:00 -bash
nobody 10290 0.0 0.1 49844 1240 ? Sl May27 0:00 PassengerLoggingAgent
[/quote]

很好,现在运行的进程, RSS基本都在 10MB以下。 先这样吧 :)

总结:
本以为会是一个很麻烦的过程(卸载不必要的程序啊,调试啊,等等)现在看起来,原因就在于没有及时的把原来跑着的delayed_job停止。 把它们删掉之后就问题解决了。。:-)。如此简单。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值