项目场景 & 问题描述:
最近在项目中的nginx配置修改一些内容时,经常重载(nginx -s reload)遇到不生效的情况,现在把我的拙见给大家说下。
原因分析:
先说结论,是因为长连接导致重载不生效,接下来我们一步步分析。
第一步:查看nginx进程和连接情况
[root@localhost conf.d]# ps -ef |grep nginx
root 75480 1 0 09:18 ? 00:00:00 nginx: master process ../../sbin/nginx
nginx 75482 75480 0 09:18 ? 00:00:35 nginx: worker process
root 109632 103442 0 14:30 pts/0 00:00:00 grep --color=auto nginx
[root@localhost conf.d]# netstat -anp|grep ESTABLISHED|grep 75482|wc -l
67
从结果我们可以看到有2个nginx的worker进程,并且建立中的连接有67个(当然这其中也包含当时的短链接),如上图。
注:由于nginx是公共服务器,暂时不清楚这些长连接的用途和逻辑。
第二步:重载并再次查看nginx进程和连接情况
[root@localhost conf.d]# ../../sbin/nginx -s reload
[root@localhost conf.d]# ps -ef |grep nginx
root 75480 1 0 09:18 ? 00:00:00 nginx: master process ../../sbin/nginx
nginx 75482 75480 0 09:18 ? 00:00:35 nginx: worker process is shutting down
nginx 109629 75480 5 14:30 ? 00:00:00 nginx: worker process
root 109632 103442 0 14:30 pts/0 00:00:00 grep --color=auto nginx
[root@localhost conf.d]# netstat -anp|grep ESTABLISHED|grep 75482|wc -l
25
此时我们可以看到有1个worker正在关闭且该进程对应的长连接还有25个,有1个新的worker已经启动,如上图。此时在测试刚才修改的nginx新配置,并未生效。
过了5分钟后,再次查看进程情况,进程和长连接情况依然如此,如下图。再次测试刚才修改的nginx新配置,还未生效。
[root@localhost conf.d]# ps -ef |grep nginx
root 75480 1 0 09:18 ? 00:00:00 nginx: master process ../../sbin/nginx
nginx 75482 75480 0 09:18 ? 00:00:40 nginx: worker process is shutting down
nginx 109629 75480 0 14:30 ? 00:00:01 nginx: worker process
root 110300 103442 0 14:37 pts/0 00:00:00 grep --color=auto nginx
[root@localhost conf.d]# netstat -anp|grep ESTABLISHED|grep 75482|wc -l
25
第三步:查看nginx重载日志
72384#0: signal process started
75480#0: signal 1 (SIGHUP) received from 72384, reconfiguring
75480#0: signal 1 (SIGHUP) received from 72384, reconfiguring
75480#0: reconfiguring
75480#0: reconfiguring
75480#0: using the "epoll" event method
75480#0: using the "epoll" event method
75480#0: start worker processes
75480#0: start worker processes
75480#0: start worker process 109629
75480#0: start worker process 109629
日志里显示接收到重载命令后,只是启动了一个新的worker进程,旧的进程一直没有退出。于是我在自己的虚拟机里搞了一个nginx,看到正常的重载日志如下,正常情况下会有gracefully shutting down和exiting输出。
16839#0: signal process started
10767#0: signal 1 (SIGHUP) received from 16839, reconfiguring
10767#0: signal 1 (SIGHUP) received from 16839, reconfiguring
10767#0: reconfiguring
10767#0: reconfiguring
10767#0: using the "epoll" event method
10767#0: using the "epoll" event method
10767#0: start worker processes
10767#0: start worker processes
10767#0: start worker process 16840
10767#0: start worker process 16840
15845#0: gracefully shutting down
15845#0: gracefully shutting down
15845#0: exiting
15845#0: exiting
15845#0: exit
10767#0: signal 17 (SIGCHLD) received from 15845
10767#0: signal 17 (SIGCHLD) received from 15845
10767#0: worker process 15845 exited with code 0
10767#0: worker process 15845 exited with code 0
10767#0: signal 29 (SIGIO) received
10767#0: signal 29 (SIGIO) received
解决方案:
改成重启nginx,修改的配置都生效了,命令如下:
# 优雅停机,等待请求处理完成
./nginx -s quit
# 快速关闭,不等请求处理完成(如果优雅停机也无法停止则使用该命令)
./nginx -s stop
# 启动
./nginx