服务器进程连接数问题

最近遇见服务器连接数过多问题,有些困扰,本次主要讲下本人遇见进程数过多的一些处理方式,如有不正确的或者更好的方式,还请不吝赐教!
1.首先如何查看
linux:
a.最简单的统计数量
ps -ef |wc -l
b.根据用户分组统计,并且倒序显示
ps -ef |awk ‘{print $1}’ |sort |uniq -c |sort -rn

[root@amenity03 ~]<20210830 17:05:16># ps -ef |awk '{print $1}' |sort |uniq -c |sort -rn 
    198 root
      2 68
      1 UID
      1 rpc
      1 ntp
      1 dbus

可以根据awk来区分是哪些用户的 哪些类型的占比数量较多 awk 后的 $1…
2.思路:
1.我自身服务器的情况,查看到了进程大约3000个不到,我的服务器单个用户限制3000,因此已经对我服务器造成严重影响
2.我查看到的进程都是批处理用户的进程,分类后都是 sshd 的服务进程,因为该用户是提供给多方使用,因此也没有办法根据用户判断,因此得想其他办法
3.来源IP,我根据最近的进程增多的情况,使用 last 查看当前还有连接的IP

[etlgs@etldispatch01kf ~]<20210830 16:56:47>$ last |grep still
etlds    pts/13       55.11.38.208     Tue Aug 31 17:13   still logged in   
etlgs    pts/9        99.15.197.27     Tue Aug 31 17:13   still logged in   
etlgs    pts/8        99.11.235.45     Tue Aug 31 17:11   still logged in   
etlgs    pts/2        99.15.197.222    Tue Aug 31 17:06   still logged in   
etlgs    pts/3        99.11.233.93     Tue Aug 31 17:04   still logged in   
etlds    pts/11       55.11.38.208     Tue Aug 31 17:03   still logged in   
etlgs    pts/12       55.11.39.216     Tue Aug 31 15:18   still logged in   
etlgs    pts/10       55.11.39.209     Tue Aug 31 11:07   still logged in   
qadmsom  pts/5        55.11.39.195     Tue Aug 31 10:59   still logged in   
etlgs    pts/7        55.11.38.137     Tue Aug 31 09:43   still logged in   
etlgs    pts/0        55.11.39.209     Tue Aug 31 09:07   still logged in   
qadmsom  pts/6        99.12.39.225     Mon Aug 30 09:11   still logged in 

类似,很遗憾,生产服务器都是容器平台地址,导致我只知道有哪些IP,但是我也不能确定IP具体的连接个数
4.高权限,没办法,只能联系管理员要了审计日志 /usr/log/secure

[root@etldispatch01kf ~]<20210830 17:25:51># cat /var/log/secure |egrep 'Accepted password for|Received disconnect from' |awk -F']:' '{print $2Received disconnect from}' |awk -F':' '{print $1}' |awk -F'port' '{print $1}' |sort |uniq -c |sort -rn 
   1648  Accepted password for etlds from 55.9.131.95 
   1439  Received disconnect from 55.9.131.95
   1352  Received disconnect from 55.9.10.202
   1321  Accepted password for etlds from 55.6.136.249 
   1269  Accepted password for etlds from 55.9.10.202 
   1235  Received disconnect from 55.6.136.249
   1224  Accepted password for etlds from 55.9.6.43 
   1204  Received disconnect from 55.9.6.43
    636  Accepted password for etlds from 55.9.131.26 
    580  Accepted password for etlgs from 55.9.10.202 
    520  Accepted password for etlds from 55.9.6.124 
    515  Received disconnect from 55.9.6.126
    513  Accepted password for etlgs from 55.9.131.95 
    494  Accepted password for etlds from 55.6.136.237 
    489  Received disconnect from 55.9.131.26
    487  Accepted password for etlgs from 55.9.6.43 
    478  Received disconnect from 55.9.6.124
    450  Accepted password for etlgs from 55.6.136.249 
    382  Received disconnect from 55.6.136.237

对比Accepted 和 disconnect 不难比对出是哪个IP连接占用了sshd进程

5.问题查明:新上线的应用在登录服务器时,操作完成后未做连接释放的操作,导致服务器出现进程数异常的情况
查看本身sshd进程

[root@etldispatch01kf ~]<20210830 17:36:36># ps
   PID TTY          TIME CMD
 17509 pts/6    00:00:00 ps
 35645 pts/6    00:00:00 bash
[root@etldispatch01kf ~]<20210830 17:37:06># ps -ef |grep 35645
root      17591  35645  1 17:37 pts/6    00:00:00 ps -ef
root      17592  35645  0 17:37 pts/6    00:00:00 grep 35645
root      35645  **35641**  0 Aug30 pts/6    00:00:00 -bash


排除自身登录程序以及自身必须要保留的程序外,进行kill操作(注意服务器对外应用的程序),风险较大,慎用。

ps -ef |grep '^etlds' |grep 'sshd' |awk '{print $2}' |grep -v 35641 |xargs -i kill {}

tips:目前服务器已经迁移到ACS云平台,特性中存在240s长连接即会终端,也算是给了服务器一些自我保障措施

总结:
1.监控:自身服务器监控不足,导致进程快爆炸了才知道
2.应用测试问题:新应用测试不足,连接未能及时释放
3.使用kill操作时,一定要仔细确认是否能够进行此操作,对于存在的应用风险是极大的

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值