1、问题
线上举行活动,由于在线人数激增,导致服务拒绝服务
2、查找问题
排查思路:由于线上的服务第一道关卡就是nginx,所以第一时间去看了nginx的错误日志,果不其然
2021/03/06 20:10:37 [crit] 26071#0: *345171208 open() "/usr/html/50x.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET a.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345164565 open() "xxx/index.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET xxx6475bcf9a35045ad9cda81773
9c21839.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345164565 open() "/usr/html/50x.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET a.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345171006 open() "xxx/index.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET xxx6475bcf9a35045ad9cda817739c
21839.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345171006 open() "/usr/html/50x.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET a.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345167963 open() "xxx/index.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET xxx6475bcf9a35045ad9cda817739
c21839.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345167963 open() "/usr/html/50x.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET a.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345171146 open() "xxx/index.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET xxx6475bcf9a35045ad9cda81773
9c21839.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345171146 open() "/usr/html/50x.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET a.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345170759 open() "xxx/index.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET xxx6475bcf9a35045ad9cda817739
c21839.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345170759 open() "/usr/html/50x.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET a.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345167620 open() "xxx/index.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET xxx6475bcf9a35045ad9cda817739
c21839.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345167620 open() "/usr/html/50x.html" failed (24: Too many open files), client: xxx.xxx.xxx.xxx, server: aa.xxx.com, request: "GET a.jpg HTTP/1.1", host: "aa.xxx.com"
2021/03/06 20:10:37 [crit] 26071#0: *345170650 open() "xxx/index.html" failed (24: Too many open files), client: 223.90.245.33, server: aa.xxx.com, request: "GET xxx6475bcf9a35045ad9cda817739c
21839.jpg HTTP/1.1", host: "aa.xxx.com"
Too many open files !!!!
使用 ps 命令查询nginx worker 的进程id
ps -axu | grep nginx
www 26071 0.2 0.2 51176 22784 ? S 2020 380:07 nginx: worker process
通过 cat /proc/26071/limits 查看,其中26071是worker进程ID,请注意其中的Max open files
cat /proc/26071/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 62462 62462 processes
Max open files 1024 4096 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 62462 62462 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
经查询 Max open files 为 1024,这是默认的值
调大 Max open files
这里调整进程最大可打开文件数(Max open files)有两种方式
1、在nginx中设置 worker_rlimit_nofile 的大小
worker_processes 2; worker_rlimit_nofile 65535; events { worker_connections 65535; }
2、设置系统的ulimit
ulimit -SHn 65535
调整 Max open files 后重启nginx
sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
sudo nginx -s reload
查看调整后的状态
ps -axu | grep nginx
www 21345 0.2 0.2 51176 22784 ? S 2020 380:07 nginx: worker process
cat /proc/21345/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 62462 62462 processes
Max open files 65535 65535 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 62462 62462 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Max open files 已调整为65535