问题描述
咱们开发环境的有一个进程特别耗资源,能帮分析下原因吗?之前场内和公有云也出现过几次,最终这个进程会把机器内存打满宕机。现在刚好发现这个问题现场
解决思路加方法
对于进程跑满的现象第一件事就是看一下进程名,目测一下 结束了,第一次见
top -pH 48297 看一下具体进程中的哪一个线程有问题 发现只有一个进程 没有线程
ps看一下这个服务的目录在哪
[root@yq01-kg-section1-bud3 libexec]# ps -ef | grep abrt-hook-ccpp
root 45733 11797 0 12:18 pts/8 00:00:00 grep --color=auto abrt-hook-ccpp
root 48297 2 99 Nov16 ? 15:42:50 /usr/libexec/abrt-hook-ccpp 11 0 8669 0 0 1605530067 e 8669 8669
毫无头绪!!开始百度搜到了如下
abrtd
abrtd 是一个守护进程监控的应用程序崩溃.当发生崩溃时,它将收集的崩溃(核心文件的命令行, etc .)application ,并采取措施根据类型崩溃并根据 abrt.conf config 文件中的配置中.有插件的各种动作:例如 bugzilla 报表的崩溃,将该报表.通过 ftp 传输或报表或 scp .请查看手册页的相应的插件.
abrtd: automatically bug report daemon. 自动的bug 报告守护进程
linux调试程序,最痛苦的就是程序异常宕掉,但是找不到core文件,很难定位问题。但是有了core文件就容易定位多了。
一般是可以通过在环境变量中设置ulimit -c unlimited。但是现场实施人员有时会忘记设置这条命令。那么怎么办呢,可以通过设置linux的abrt服务来实现。
修改abrt-action-save-package-data.conf文件
将其修改为:
vi /etc/abrt/abrt-action-save-package-data.conf
# With this option set to "yes",
# only crashes in signed packages will be analyzed.
# the list of public keys used to check the signature is
# in the file gpg_keys
#
OpenGPGCheck = no
# Blacklisted packages
#
BlackList = nspluginwrapper, valgrind, strace, mono-core
# Process crashes in executables which do not belong to any package?
#
ProcessUnpackaged = yes
# Blacklisted executable paths (shell patterns)
#
BlackListedPaths = /usr/share/doc/, /example*, /usr/bin/nspluginviewer, /usr/lib/xulrunner-*/plugin-container
还可以调整core文件的大小:
[root@xx-host2 abrt]# cat abrt.conf
# Enable this if you want abrtd to auto-unpack crashdump tarballs which appear
# in this directory (for example, uploaded via ftp, scp etc).
# Note: you must ensure that whatever directory you specify here exists
# and is writable for abrtd. abrtd will not create it automatically.
#
#WatchCrashdumpArchiveDir = /var/spool/abrt-upload
# Max size for crash storage [MiB] or 0 for unlimited
#
MaxCrashReportsSize = 1000
# Specify where you want to store coredumps and all files which are needed for
# reporting. (default:/var/spool/abrt)
#
# Changing dump location could cause problems with SELinux. See man abrt_selinux(8).
#
#DumpLocation = /var/spool/abrt
# If you want to automatically clean the upload directory you have to tweak the
# selinux policy.
#
DeleteUploaded = no
重启 abrtd 服务: service abrtd restart
有了core文件也需要及时删除,通过abrt-cli list查看文件的包,然后用abrt-cli rm 【文件包】就可以了。
遇到程序崩溃的时候abrt-hook-ccpp使用CPU太多,IO也太高导致系统跑满了,干脆停用算了
systemctl stop abrt-ccpp.service
systemctl disable abrt-ccpp.service
systemctl status abrt-ccpp.service
查了一下systemctl status abrt-ccpp.service发现根本就没有起这个服务
再次百度
usr/libexec/abrt-hook-ccpp为什么这个进程一直在增加
因为无法创建ccpp文件导致的
需要修改/etc/abrt/abrt-action-save-package-data.conf中ProcessUnpackaged参数。
修改/etc/abrt/abrt-action-save-package-data.conf中ProcessUnpackaged参数
sed -i 's/ProcessUnpackaged = no/ProcessUnpackaged = yes/g' /etc/abrt/abrt-action-save-package-data.conf&& service abrtd restart
修改后还是不行 查看系统日志
Nov 17 13:15:15 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:15 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:16 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:16 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:17 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: abrtd.service stop-sigterm timed out. Killing.
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: abrtd.service: main process exited, code=killed, status=9/KILL
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: Unit abrtd.service entered failed state.
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: abrtd.service failed.
Nov 17 13:15:17 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
发现服务没有重启,而且显示一直被这个进程锁死,而这个进程就是那个占用资源特别多的一个进程
kill -9 48297
重启服务
查看服务状态
top重新看一下进程 哦耶!