问题
线上有个项目是采用SpringCloud全家桶的,做了集群部署,以Eureka作为服务注册组件,有时候过一段时间其中某台服务器上就会有某个服务莫名其妙停掉了(下线了),从而导致某个服务不可用。
原因
每次都是那一台服务器,服务也是随机的,所以初步排除了代码导致的问题,接下来就查看项目服务运行日志,可是也没发现有什么可疑原因,就很奇怪,后来想了下会不会是服务器硬件配置差异问题导致的,其他几台都没出现过,就单单这台,首先想到的就是内存,对比了服务器配置后怀疑就更加大了,其他几台都是32G内存,出现问题那台是16G,但苦于没有啥证据证明,每次发现服务下线后手动重启下服务就行了,线上影响也不大,作了集群负载。
但内心总想搞明白究竟是什么原因,最近没怎么忙,就想着解决掉。既然是自己停掉,会不会是被系统kill掉了呢,于是就去查看下系统日志文件 /var/log/messages,根据eureka中cancelled leases记录时间进行筛选,如下图
时间点是在8月24日 凌晨3点多,在这个时间段果然在系统日志中发现了猫腻
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: 129801 pages reserved
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: 0 pages hwpoisoned
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 903] 0 903 28194 1492 262144 0 0 systemd-journal
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 934] 0 934 26845 489 221184 0 -1000 systemd-udevd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1119] 0 1119 37351 221 155648 0 -1000 auditd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1123] 0 1123 11961 97 135168 0 0 sedispatch
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1354] 0 1354 48194 486 409600 0 0 sssd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1356] 0 1356 95350 224 225280 0 0 rngd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1378] 996 1378 2665 41 69632 0 0 lsmd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1380] 81 1380 13813 221 151552 0 -900 dbus-daemon
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1385] 998 1385 435938 1279 344064 0 0 polkitd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1390] 0 1390 2170 35 61440 0 0 mcelog
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1396] 0 1396 6516 214 90112 0 0 smartd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1428] 991 1428 29857 136 147456 0 0 chronyd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1486] 0 1486 49923 702 434176 0 0 sssd_be
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1517] 0 1517 53150 445 466944 0 0 sssd_nss
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1533] 0 1533 24046 365 208896 0 0 systemd-logind
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1600] 0 1600 97260 602 372736 0 0 NetworkManager
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1604] 0 1604 106071 4200 434176 0 0 tuned
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1658] 193 1658 28072 321 249856 0 0 systemd-resolve
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1783] 0 1783 86771 581 307200 0 0 rsyslogd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1785] 0 1785 21511 235 200704 0 -1000 sshd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1789] 0 1789 10655 53 106496 0 0 atd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1791] 0 1791 9022 221 110592 0 0 crond
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1799] 0 1799 3275 28 65536 0 0 agetty
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1800] 0 1800 3917 34 69632 0 0 agetty
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1927] 0 1927 23278 328 217088 0 0 systemd
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 1929] 0 1929 75465 657 319488 0 0 (sd-pam)
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 3453] 0 3453 4729 34 61440 0 0 jsvc
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 3454] 1000 3454 1932917 29990 831488 0 0 jsvc
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [11347] 0 11347 2210447 40434 1118208 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [11480] 0 11480 958234 136608 1564672 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [17453] 0 17453 31267 1171 253952 0 0 nginx
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [16325] 0 16325 967428 153355 1683456 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [16551] 0 16551 966502 149882 1617920 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [16675] 0 16675 971132 159509 1724416 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [16772] 0 16772 965700 138480 1548288 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [17113] 0 17113 964312 109380 1376256 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [17216] 0 17216 978504 162168 1744896 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [17668] 0 17668 969823 156884 1667072 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [17971] 0 17971 964950 130197 1511424 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [18176] 0 18176 964531 123918 1441792 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [14021] 0 14021 973196 164087 1761280 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [32715] 990 32715 38377 1298 286720 0 0 nginx
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [32716] 990 32716 38377 1298 286720 0 0 nginx
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [32717] 990 32717 38377 1323 286720 0 0 nginx
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [32718] 990 32718 38377 1298 286720 0 0 nginx
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 9475] 0 9475 948896 105412 1282048 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 9775] 0 9775 972544 165703 1789952 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [10230] 0 10230 970525 166468 1777664 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [10323] 0 10323 972385 158764 1699840 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [10536] 0 10536 960769 130340 1511424 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [10719] 0 10719 969079 137027 1560576 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [10838] 0 10838 969598 141541 1585152 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [10947] 0 10947 973951 166800 1802240 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [11041] 0 11041 975381 148499 1671168 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [12543] 0 12543 966142 133945 1503232 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 8697] 0 8697 973320 159517 1728512 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 8830] 0 8830 975369 162600 1761280 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [15113] 0 15113 8451 407 106496 0 0 AliYunDunUpdate
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [15142] 0 15142 44150 4582 348160 0 0 AliYunDun
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [11839] 0 11839 978032 160551 1794048 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [27130] 0 27130 977958 133039 1634304 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 990] 0 990 10473 194 61440 0 0 aliyun-service
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [ 3673] 0 3673 1052924 154268 1798144 0 0 java
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [29517] 0 29517 5684 69 86016 0 0 anacron
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: [29552] 0 29552 144280 18379 864256 0 0 dnf
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: Out of memory: Killed process 10947 (java) total-vm:3895804kB, anon-rss:667200kB, file-rss:0kB, shmem-rss:0kB
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: oom_reaper: reaped process 10947 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
最重要的是如下这一行
Aug 24 03:10:51 iZbp1fsfdsp82dtdusZ kernel: Out of memory: Killed process 10947 (java) total-vm:3895804kB, anon-rss:667200kB, file-rss:0kB, shmem-rss:0kB
解决
初步怀疑是内存不够所引起的问题,又在系统日志中得到了佐证,接下来就升级内存咯(审批拿钱升级)