nginx使用心得

天下无敌笨笨熊

已于 2023-09-26 12:14:31 修改

阅读量292

点赞数

分类专栏：微服务文章标签： nginx 运维

于 2023-08-10 12:16:38 首次发布

本文链接：https://blog.csdn.net/tlxamulet/article/details/132206454

版权

微服务专栏收录该内容

13 篇文章 0 订阅

订阅专栏

nginx入门

四个用途：
正向代理：内网用户访问internet
反向代理：对internet用户屏蔽内网服务器，往往与负载均衡连用。
负载均衡
web server

启停

启动nginx（指定配置文件）

sudo nginx -c /export/home/clouds/gateway/nginx.conf &

停掉nginx

找到nginx master process的ID：

ps -ef|grep nginx

然后，优雅的停掉nginx:

sudo kill -QUIT pid

或者直接使用：

nginx -s quit

都是优雅的退出

还有强行退出：

nginx -s stop

不太建议使用，可能导致pid文件异常，影响接下来的启停。

重新加载nginx配置

nginx -s reload

查看nginx的生效配置

使用
nginx -T
命令可以把当前的配置打印出来，该命令并不真实启动nginx

配置文件详解

看这里

server_name

配置成0.0.0.0，表示可从ifconfig查看到的所有ip访问。

变量

$http_XXX

http header里的内容，例如：$http_user_agent表示Uaer-Agent的值。

$cookie_XXX

cookie里的内容，例如：$cookie_userid表示cookie中userid字段的值。

$sent_http_XXX

HTTP响应头中的内容，例如：$sent_http_content_type 表示Content-Type。

参考这里

map

这个的作用有点像代码里的switch case：

http {
    map $COOKIE_tenantid $group {
        ~^(999)$ gray;  #指向upstream gray配置的地址
        default production; #默认指向upstream production对应地址
    }

    upstream production {
        ...
    }

    upstream gray {
        ...
    }

    server {
        location / {
            proxy_pass https://$group;
        }
    }
}

意思是，如果变量 $COOKIE_tenantid是999，$ group=gray；其它情况下，$group=production。

通过使用map，可以控制server里的负载均衡配置。

location配置

location [ = | ~ | ~* | ^~ ] uri { ... }

= 表示精确匹配

~ 表示区分大小写的正则匹配

~* 表示不区分大小写的正则匹配

^~ 表示 uri 以某个字符串开头

而当你不使用上述语法，只写 uri 的时候：

/ 表示通用匹配：

location / {         [ configuration ] } # /index.html ok
location /test {    [ configuration ] } # /test ok  # /test2 ok  # /test/ ok

location的匹配策略

locaiton有四种类型的匹配规则，分别为：

完全匹配(=)
前缀普通匹配(^~)
正则表达式匹配(~或~*)
普通匹配

匹配规则：

等号类型（=）的优先级最高。一旦匹配成功，则不再查找其他匹配项
前缀普通匹配(^~)优先级次之。不支持正则表达式。使用前缀匹配，如果有多个location匹配的话，则使用表达式最长的那个
正则表达式类型（~ 或~*）的优先级次之。一旦匹配成功，则不再查找其他匹配项
常规字符串匹配，如果有多个location匹配的话，则使用表达式最长的那个

由规则可知，location书写的先后顺序是没有关系的。

配置文件样例（负载均衡+反向代理）

#负责压缩数据流
gzip              on;  
gzip_min_length   1000;  
gzip_types        text/plain text/css application/x-javascript;

#设定负载均衡的服务器列表
#weigth参数表示权值，权值越高被分配到的几率越大
#max_fails=3 fail_timeout=2s 转发给后端服务时，允许请求代理服务器失败的次数，超过该次数则将请求转发给其他节点进行处理，并将服务器标记为故障，在2s时间内不再转发给故障服务器。2s后重试转发给故障服务器，若仍旧不成功则重复刚才的操作
upstream hello{
    server xxxx:8080 weight=1 max_fails=3 fail_timeout=2s;
    server xxxx:8080 weight=1 max_fails=3 fail_timeout=2s;            
}
   
server {
    #侦听的80端口
    listen       80;
    server_name  localhost;
    #设定查看Nginx状态的地址
    location /nginxstatus{
         stub_status on;
         access_log on;
         auth_basic "nginxstatus";
         auth_basic_user_file htpasswd;
    }
    #匹配以jsp结尾的，tomcat的网页文件是以jsp结尾
    location / {
        index index.jsp;
        proxy_pass   http://hello;    #在这里设置一个代理，和upstream的名字一样
        
        #以下是一些反向代理的配置
        proxy_redirect             off; 
        #后端的Web服务器可以通过X-Forwarded-For获取用户真实IP
        proxy_set_header           Host $host; 
        proxy_set_header           X-Real-IP $remote_addr; 
        proxy_set_header           X-Forwarded-For $proxy_add_x_forwarded_for; 
        client_max_body_size       10m; #允许客户端请求的最大单文件字节数
        client_body_buffer_size    128k; #缓冲区代理缓冲用户端请求的最大字节数
        proxy_connect_timeout      300; #nginx跟后端服务器连接超时时间(代理连接超时)
        proxy_send_timeout         300; #后端服务器数据回传时间(代理发送超时)
        proxy_read_timeout         300; #连接成功后，后端服务器响应时间(代理接收超时)
        proxy_buffer_size          4k; #设置代理服务器（nginx）保存用户头信息的缓冲区大小
        proxy_buffers              4 32k; #proxy_buffers缓冲区，网页平均在32k以下的话，这样设置
        proxy_busy_buffers_size    64k; #高负荷下缓冲大小（proxy_buffers*2）
        proxy_temp_file_write_size 64k; #设定缓存文件夹大小，大于这个值，将从upstream服务器传
    }
}

sticky session

nginx中的ip_hash技术能够将某个ip的请求定向到同一台后端，这样一来这个ip下的某个客户端和某个后端就能建立起稳固的session 。

ip_hash的缺陷：

nginx不是最前端的服务器。ip_hash要求nginx一定是最前端的服务器，否则nginx得不到正确ip，就不能根据ip作hash。譬如使用的是squid为最前端，那么nginx取ip时只能得到squid的服务器ip地址，用这个地址来作分流是肯定错乱的。
nginx的后端还有其它方式的负载均衡。假如nginx后端又有其它负载均衡，将请求又通过另外的方式分流了，那么某个客户端的请求肯定不能定位到同一台session应用服务器上。这么算起来，nginx后端只能直接指向应用服务器，或者再搭一个squid，然后指向应用服务器。最好的办法是用location作一次分流，将需要session的部分请求通过ip_hash分流，剩下的走其它后端去。

nginx进阶

nginx请求处理原理

nginx由master process + 若干worker process组成。

master主要用来管理worker进程，包含：接收来自外界的信号，向各worker进程发送信号，监控worker进程的运行状态，当worker进程异常退出后，会自动重新启动新的worker进程。
master进程充当整个进程组与用户的交互接口，同时对进程进行监护。它不需要处理网络事件，不负责业务的执行，只会通过管理worker进程来实现重启服务、平滑升级、更换日志文件、配置文件实时生效等功能。
我们要控制nginx，只需要通过kill向master进程发送信号就行了。比如kill -HUP pid，则是告诉nginx，从容地重启nginx，我们一般用这个信号来重启nginx，或重新加载配置，因为是从容地重启，因此服务是不中断的。

基本的网络事件，则是放在worker进程中来处理了。多个worker进程之间是对等的，他们同等竞争来自客户端的请求，各进程互相之间是独立的。一个请求，只可能在一个worker进程中处理，一个worker进程，不可能处理其它进程的请求。worker进程的个数是可以设置的，一般我们会设置与机器cpu核数一致，这里面的原因与nginx的进程模型以及事件处理模型是分不开的。
worker进程之间是平等的，每个进程，处理请求的机会也是一样的。当我们提供80端口的http服务时，一个连接请求过来，每个进程都有可能处理这个连接，怎么做到的呢？首先，每个worker进程都是从master进程fork出来的，在master进程里面，先建立好需要listen的socket（listenfd）之后，然后再fork出多个worker进程。
所有worker进程的listenfd会在新连接到来时变得可读（普通fd可读说明有数据到达，listenfd可读说明有连接到达，参看select说明），为保证只有一个进程处理该连接，所有worker进程在注册listenfd读事件（调用epoll_ctl）前抢accept_mutex（这里使用的是try_lock而非lock，try_lock就是一条CAS指令，开销很小），抢到互斥锁的那个进程注册listenfd读事件，在读事件里调用accept接受该连接。当一个worker进程在accept这个连接之后，会产生一个普通fd（connfd），在这个普通fd上开始读取请求，解析请求，处理请求，产生数据后，再返回给客户端，最后断开连接，这就是一个完整的请求过程。

顺带说一句，nginx的这种做法有效解决了多进程监听同一个fd的“惊群问题”(即一个连接到达会唤醒监听在listenfd上的多个进程)，由于accept_mutex的存在，某一时刻只会有一个worker进程去监听listenfd，而没抢到锁的worker还是干自己原来的活。

后端节点健康监测

自带机制

参考这里

nginx的健康检测有两种模式，passive hc 和active hc。

passive hc像下面这样：

upstream WorkerNode{
    server node1:31000 max_fails=3 fail_timeout=2s;
    server node8:31000 max_fails=3 fail_timeout=2s;
    server node15:31000 max_fails=3 fail_timeout=2s;
    keepalive 64;
}

参数含义如下：

fail_timeout: Sets the time during which a number of failed attempts must happen for the server to be marked unavailable, and also the time for which the server is marked unavailable (default is 10 seconds).
max_fails: Sets the number of failed attempts that must occur during the fail_timeout period for the server to be marked unavailable (default is 1 attempt).
slow_start：A recently recovered server can be easily overwhelmed by connections, which may cause the server to be marked as unavailable again. Slow start allows an upstream server to gradually recover its weight from zero to its nominal value after it has been recovered or became available

则，我们之前的配置含义是：如果在2s内失败了3次请求（请求发送失败 or 收不到响应都算），就将该节点标记为不可用，不可用状态持续的时间也是2s。在接下来的2s内，将不会有请求发到该节点。

如果不配max_fails和fail_timeout，则该节点不会被标记为不可用。

slow_start参数则是用于照顾那些刚刚恢复的节点，给它们一个缓冲时间，避免它们被大量连接压垮。slow_start参数仅NGINX商业版才支持。

passive hc的特点是：没有专门的健康检测url，nginx根据实际业务请求的执行情况来做出调整，健康状态不一定准确。

active hc只有NGINX商业版才有。它需要配置健康检查url，周期性的检查该url，然后根据返回结果来标记节点是否可用。配置样例如下：

location / {
    proxy_pass http://backend;
    health_check uri=/some/path interval=10 fails=3 passes=2;
}

顺带说一句，NGINX商业版还能支持TCP和UDP的负载均衡。

自带机制的问题

自带机制每隔fail_timeout时间，就会把节点恢复为可用，如果此时恰好有请求到来，就可能分发到故障节点上，导致请求超时。这里有个细节，就是请求到故障节点，nginx发现不对，还是会将其往健康节点转发，但这个“发现到不对”的时间间隔是受proxy_connect_timeout值影响的，默认情况下这个值是60s，而我们的前端业务一般不会等这么长时间，这就导致从最后表现上看，前端业务是超时的，nginx的自动剔除机制似乎没有效果。

无论如何，nginx可能会先把请求转发给不健康节点，然后再转发给健康节点，这样就会浪费转发，降低转发效率。甚至，当后端应用重启需要很久才能完成的时候，有可能拖死整个负载均衡器。

根本的原因在于：自有机制无法准确判断节点健康状态，容易导致请求挂住，出现假死状态。

自带机制的其他问题还可以参考该文

这篇文章建议我们使用nginx_upstream_check_module来规避自带机制固有的问题。

附上proxy_connect_timeout等超时参数的含义：

proxy_connect_timeout

语法 proxy_connect_timeout time
默认值 60s
上下文 http server location
说明该指令设置与upstream server的连接超时时间，有必要记住，这个超时不能超过75秒。
这个不是等待后端返回页面的时间，那是由proxy_read_timeout声明的。如果你的upstream服务器起来了，但是hanging住了（例如，没有足够的线程处理请求，所以把你的请求放到请求池里稍后处理），那么这个声明是没有用的，因为与upstream服务器的连接已经建立了。

proxy_read_timeout

语法 proxy_read_timeout time
默认值 60s
上下文 http server location
说明该指令设置与代理服务器的读超时时间。它决定了nginx会等待多长时间来获得请求的响应。这个时间不是获得整个response的时间，而是两次reading操作的时间。

proxy_send_timeout

语法 proxy_send_timeout time
默认值 60s
上下文 http server location
说明这个指定设置了发送请求给upstream服务器的超时时间。超时设置不是为了整个发送期间，而是在两次write操作期间。如果超时后，upstream没有收到新的数据，nginx会关闭连接

从含义上看，将proxy_connect_timeout改短一点，似乎是没啥问题的，至少可以解决断网的问题。因为业务请求大概率只会触发proxy_read_timeout，proxy_connect_timeout出现，其实说明的是服务器地址连不上，有可能连接数达到上限，也可能是断网。

设置proxy_connect_timeout之后，我们从nginx日志里也能看到选择过程：

...... "GET /healthcheck HTTP/1.1" 2.005 2.001, 0.004 node1:443, node2:443 - 504, 200
...... "GET /healthcheck HTTP/1.1" 4.008 2.001, 2.002, 0.004 node1:443, node3:443, node2:443 - 504, 504, 200 -

第一条日志说明先去访问node1:443，返回504错误（网关超时错误）；接着再去访问node2:443，这次成功，返回200。

第二条日志更是访问了全部三台机器，说明在那个时刻，nginx认为全部三台机器都是正常的。

当然也有只访问一个节点的日志，说明那个时刻，nginx确实把两个故障节点排除了。

从日志也可以看出，故障下nginx的转发效率是真的很差的，前端会感觉到明显的迟滞甚至是超时的情况。

nginx_upstream_check_module模块

参考这里

这是淘宝团队开发的，需要对nginx版本打补丁。

upstream server1
{
   server localhost:10002;
   keepalive 32;
   check interval=3000 rise=2 fall=3 timeout=1000 default_down=false type=http;
   check_http_send "HEAD /healthcheck HTTP/1.0\r\n\r\n";
   check_http_expect_alive http_2xx;
}

上面配置的意思是，对server1这个负载均衡条目中的所有节点，每隔3秒检测一次，请求2次正常则标记状态为up，如果检测 3 次都失败，则标记状态为down，健康检查的超时时间为1秒。健康检查的类型为http。

keepalive表明我们使用的是长连接，维持最大长连接数为32个。

check_http_send和check_http_expect_alive两个命令说明如下：

Syntax: check_http_send http_packet
Default: "GET / HTTP/1.0\r\n\r\n"
Context: upstream

Syntax: check_http_expect_alive [ http_2xx | http_3xx | http_4xx | http_5xx ]
Default: http_2xx | http_3xx
Context: upstream

注意：

check_http_send我们其实更推荐用HEAD来代替GET请求，不返回body的话，效率更高
type=tcp是默认模式，这个模式不太稳定，有时无法检测节点状态，见这里。

配置节点状态url：

location /status {
    check_status;
    access_log   off;
    allow 127.0.0.1;
    deny all;
}

使用命令：

curl https://localhost/node_status?format=json

查看，正常返回：

{"servers": {
  "total": 14,
  "generation": 2,
  "server": [
    {"index": 0, "upstream": "server1", "name": "127.0.0.1:10002", "status": "down", "rise": 0, "fall": 10972, "type": "tcp", "port": 0},
    {"index": 1, "upstream": "server1", "name": "127.0.0.1:10002", "status": "down", "rise": 0, "fall": 10972, "type": "tcp", "port": 0},
    ......
  ]
}}

如果状态报错：

http upstream check module can not find any check server, make sure you've added the check servers

说明并未成功安装上，应该是没打patch，参看下一节：“nginx_upstream_check_module的编译”。

nginx_upstream_check_module的编译

1、下载nginx源码；

2、从https://github.com/yaoweibin/nginx_upstream_check_module下载nginx_upstream_check_module的源码；

3、打补丁：

如果nginx是1.21.6版本

patch -p1 < contrib/nginx_upstream_check_module-master/check_1.20.1+.patch

如果nginx是1.18.0版本

patch -p1 < contrib/nginx_upstream_check_module-master/check_1.16.1+.patch

注意：这一步千万不能少，否则运行时就会报：

http upstream check module can not find any check server, make sure you've added the check servers

我们产品提供的nginx定制版就是少了这一步，导致nginx_upstream_check_module装了等于没装，到最后要费很大力气重新制作。

4、执行：

# nginx是1.21.6版本
./configure --prefix=/data/apps/opt/nginx --conf-path=/data/apps/config/nginx/nginx.conf --user=easemob --group=easemob --pid-path=/data/apps/var/nginx/nginx.pid --error-log-path=/data/apps/log/nginx/error.log --http-log-path=/data/apps/log/nginx/access.log --sbin-path=/data/apps/opt/nginx/sbin/nginx --lock-path=/data/apps/var/nginx/nginx.lock --http-client-body-temp-path=/data/apps/var/nginx/client_temp --http-proxy-temp-path=/data/apps/var/nginx/proxy_temp --http-fastcgi-temp-path=/data/apps/var/nginx/fastcgi_temp --http-uwsgi-temp-path=/data/apps/var/nginx/uwsgi_temp --http-scgi-temp-path=/data/apps/var/nginx/scgi_temp --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-stream --with-stream_ssl_module --with-file-aio --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-ld-opt=-Wl,-rpath,/usr/local/lib/ --add-module=contrib/lua-nginx-module --add-module=contrib/nginx_upstream_check_module-master

# nginx是1.18.0版本
./configure --prefix=/data/apps/opt/nginx --conf-path=/data/apps/config/nginx/nginx.conf --user=easemob --group=easemob --pid-path=/data/apps/var/nginx/nginx.pid --error-log-path=/data/apps/log/nginx/error.log --http-log-path=/data/apps/log/nginx/access.log --sbin-path=/data/apps/opt/nginx/sbin/nginx --lock-path=/data/apps/var/nginx/nginx.lock --http-client-body-temp-path=/data/apps/var/nginx/client_temp --http-proxy-temp-path=/data/apps/var/nginx/proxy_temp --http-fastcgi-temp-path=/data/apps/var/nginx/fastcgi_temp --http-uwsgi-temp-path=/data/apps/var/nginx/uwsgi_temp --http-scgi-temp-path=/data/apps/var/nginx/scgi_temp --with-http_geoip_module --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-stream --with-stream_ssl_module --with-file-aio --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module  --with-ld-opt=-Wl,-rpath,/usr/local/luajit/lib --add-module=contrib/lua-nginx-module --add-module=contrib/ngx_http_dyups_module-master --add-module=contrib/nginx_upstream_check_module-master --add-module=contrib/nginx-auth-ldap


# nginx是1.18.0版本时，make出错
make

cd  objs
cp /data/apps/opt/nginx/sbin/nginx /data/apps/opt/nginx/sbin/nginx.bak

systemctl stop nginx

cp nginx /data/apps/opt/nginx/sbin

systemctl start nginx

nginx1.18.0编译时出错：

contrib/ngx_http_dyups_module-master/ngx_http_dyups_module.c:576:34: error: variable ‘dmcf’ set but not used [-Werror=unused-but-set-variable]
     ngx_http_dyups_main_conf_t  *dmcf;
cc1: all warnings being treated as errors
make[1]: *** [objs/addon/ngx_http_dyups_module-master/ngx_http_dyups_module.o] Error 1

解决方法是将objs/Makefile里的-Werror删掉，再make

nginx运行时出错：

error while loading shared libraries: libluajit-5.1.so.2: cannot open shared object file: No such file or directory

需要安装LuaJIT-2.1，下载LuaJIT-2.1包，解压缩，执行：

make install PREFIX=/usr/local/luajit

安装完成后，/usr/local/luajit/lib/目录下有libluajit-5.1.so.2等so。

nginx启动时报错：

module 'resty.core' not found:
	no field package.preload['resty.core']
	no file './resty/core.lua'
	no file '/usr/local/share/luajit-2.0.5/resty/core.lua'
	no file '/usr/local/share/lua/5.1/resty/core.lua'
	no file '/usr/local/share/lua/5.1/resty/core/init.lua'
	no file './resty/core.so'
	no file '/usr/local/lib/lua/5.1/resty/core.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'
	no file './resty.so'
	no file '/usr/local/lib/lua/5.1/resty.so'
	no file '/usr/local/lib/lua/5.1/loadall.so') in /data/apps/config/nginx/nginx.conf:204

似乎必须要装openresty相关的包才行。原因不难理解，lua-nginx-module本身就是openresty的核心包。

解决方法是，在nginx.conf里配置：

lua_load_resty_core off;
lua_package_path "/usr/local/lib/lua/?.lua;;";

同时还得安装lua-resty-core、lua-resty-lrucache这两个包：

make install

安装完之后，/usr/local/lib/lua下有核心脚本

尝试不带lua的编译

即使我们不使用lua的能力，还是会报类似no file './resty/core.so’这样的错误。那我们干脆把lua-nginx-module去掉试试。

./configure --prefix=/data/apps/opt/nginx --conf-path=/data/apps/config/nginx/nginx.conf --user=easemob --group=easemob --pid-path=/data/apps/var/nginx/nginx.pid --error-log-path=/data/apps/log/nginx/error.log --http-log-path=/data/apps/log/nginx/access.log --sbin-path=/data/apps/opt/nginx/sbin/nginx --lock-path=/data/apps/var/nginx/nginx.lock --http-client-body-temp-path=/data/apps/var/nginx/client_temp --http-proxy-temp-path=/data/apps/var/nginx/proxy_temp --http-fastcgi-temp-path=/data/apps/var/nginx/fastcgi_temp --http-uwsgi-temp-path=/data/apps/var/nginx/uwsgi_temp --http-scgi-temp-path=/data/apps/var/nginx/scgi_temp --with-http_geoip_module --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-stream --with-stream_ssl_module --with-file-aio --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --add-module=contrib/ngx_http_dyups_module-master --add-module=contrib/nginx_upstream_check_module-master --add-module=contrib/nginx-auth-ldap

去掉：

--with-ld-opt=-Wl,-rpath,/usr/local/luajit/lib --add-module=contrib/lua-nginx-module

再把相关lua的目录都删除掉：

rm -rf /usr/local/luajit
rm -rf /usr/local/lib/lua

实测完全没问题！！

配置nginx到systemctl

如果执行systemctl start nginx报错：

Unit not found

先检查服务文件：

/usr/lib/systemd/system/nginx.service

如果具备Unit，那只需要执行：

systemctl daemon-reload

否则补充内容：

[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/data/apps/var/nginx/nginx.pid
# Nginx will fail to start if /run/nginx.pid already exists but has the wrong
# SELinux context. This might happen when running `nginx -t` from the cmdline.
# https://bugzilla.redhat.com/show_bug.cgi?id=1268621
ExecStartPre=/usr/bin/rm -f /data/apps/var/nginx/nginx.pid
ExecStartPre=/data/apps/opt/nginx/sbin/nginx -t
ExecStart=/data/apps/opt/nginx/sbin/nginx
ExecStop=/data/apps/opt/nginx/sbin/nginx -s quit
ExecReload=/data/apps/opt/nginx/sbin/nginx -s reload
KillSignal=SIGQUIT
TimeoutStopSec=5
KillMode=process
PrivateTmp=true

[Install]
WantedBy=multi-user.target

再执行：

systemctl daemon-reload

dyups的动态配置upstream能力

dyups接口

 GET

/detail get all upstreams and their servers
/list get the list of upstreams
/upstream/name find the upstream by it's name


POST

/upstream/name update one upstream
body commands;
body server ip:port;


DELETE

/upstream/name delete one upstream

我们的产品里引入dyups应该就是为了支持从rest接口来动态修改nginx配置吧。

dyups是否依赖于lua？从源码来看，应该是不依赖的，除非你要在lua里用dyups的能力。实测来看，有的版本即使加了lua支持，dyups也不能工作。所以，我理解很可能是dyups自身的问题。

用dyups 0.2.9、0.2.8版本试试：

./configure --prefix=/data/apps/opt/nginx --conf-path=/data/apps/config/nginx/nginx.conf --user=easemob --group=easemob --pid-path=/data/apps/var/nginx/nginx.pid --error-log-path=/data/apps/log/nginx/error.log --http-log-path=/data/apps/log/nginx/access.log --sbin-path=/data/apps/opt/nginx/sbin/nginx --lock-path=/data/apps/var/nginx/nginx.lock --http-client-body-temp-path=/data/apps/var/nginx/client_temp --http-proxy-temp-path=/data/apps/var/nginx/proxy_temp --http-fastcgi-temp-path=/data/apps/var/nginx/fastcgi_temp --http-uwsgi-temp-path=/data/apps/var/nginx/uwsgi_temp --http-scgi-temp-path=/data/apps/var/nginx/scgi_temp --with-http_geoip_module --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-stream --with-stream_ssl_module --with-file-aio --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --add-module=contrib/ngx_http_dyups_module-0.2.8

结果编译都不过！！

或者是因为dyups和nginx_upstream_check_module有冲突，后者生效的话，前者就不生效？

这篇文章里就提到了这一点。

而且dyups有个缺点，upstream修改只在内存里，不能持久化到配置文件。

再试试：

./configure --prefix=/data/apps/opt/nginx --conf-path=/data/apps/config/nginx/nginx.conf --user=easemob --group=easemob --pid-path=/data/apps/var/nginx/nginx.pid --error-log-path=/data/apps/log/nginx/error.log --http-log-path=/data/apps/log/nginx/access.log --sbin-path=/data/apps/opt/nginx/sbin/nginx --lock-path=/data/apps/var/nginx/nginx.lock --http-client-body-temp-path=/data/apps/var/nginx/client_temp --http-proxy-temp-path=/data/apps/var/nginx/proxy_temp --http-fastcgi-temp-path=/data/apps/var/nginx/fastcgi_temp --http-uwsgi-temp-path=/data/apps/var/nginx/uwsgi_temp --http-scgi-temp-path=/data/apps/var/nginx/scgi_temp --with-http_geoip_module --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-stream --with-stream_ssl_module --with-file-aio --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --add-module=contrib/ngx_http_dyups_module-master

好像dyups的rest接口还是不起作用。

那应该就是dyups模块本身的问题了！！

只能用consul template了。

ningx reload瞬间老的连接怎么了

nginx文档：

Old worker processes, receiving a command to shut down, stop accepting new connections and continue to service current requests until all such requests are serviced. After that, the old worker processes exit。

也就是说，reload瞬间，old nginx worker进程会优雅的退出，确保老的连接上的当前请求处理完成。

但据说，如果client使用的是http长连接（http1.1，http keepalive，参见这里），就可能出现tcp reset导致的请求失败，参见该文

我的理解是，old nginx worker只是确保老连接上的当前请求处理完成，然后就退出，发FIN信号给客户端。但TCP连接是双工的，也就是说，nginx侧发的FIN信号只代表nginx->client的连接关闭，并不意味着client->nginx的连接也关闭了！所以，客户端还可以继续发送业务请求req给nginx（这里就用到了http keepalive的特性：一个http连接可以发送多个http请求！），但此时old nginx worker已然退出，req肯定只能得到一个RST信号，对外呈现就是请求失败。

而如果客户端连接nginx用的是http 1.0，即一个请求处理完成，客户端就会启用一个全新的连接来发送新的请求，新连接自然就会与new nginx worker打通，这样业务请求就不会出现tcp reset。

http1.1的使用请求和优劣势

All modern web browsers including Google Chrome, Firefox, Internet Explorer (since 4.01), Opera (since 4.0)[16] and Safari use persistent connections.

By default, Internet Explorer versions 6 and 7 use two persistent connections while version 8 uses six.[17] Persistent connections time out after 60 seconds of inactivity which is changeable via the Windows Registry.[18]

In Firefox, the number of simultaneous connections can be customized (per-server, per-proxy, total). Persistent connections time out after 115 seconds (1.92 minutes) of inactivity which is changeable via the configuration.[19]

优点：

Reduced latency in subsequent requests (no handshaking and no slow start).
Reduced CPU usage and round-trips because of fewer new connections and TLS handshakes.
Enables HTTP pipelining of requests and responses.
Reduced network congestion (fewer TCP connections).
Errors can be reported without the penalty of closing the TCP connection.

缺点：

If the client does not close the connection when all of the data it needs has been received, the resources needed to keep the connection open on the server will be unavailable for other clients. How much this affects the server's availability and how long the resources are unavailable depend on the server's architecture and configuration.
服务端需要维护长连接所需资源

Also a race condition can occur where the client sends a request to the server at the same time that the server closes the TCP connection.[14] A server should send a 408 Request Timeout status code to the client immediately before closing the connection. When a client receives the 408 status code, after having sent the request, it may open a new connection to the server and re-send the request.[15] Not all clients will re-send the request, and many that do will only do so if the request has an idempotent HTTP method.
这里就谈到了nginx reload时出现的问题，在这种情况下，服务端应该在连接关闭前发送一个408状态码给客户端，客户端收到408，如果请求已发出去了，就应该重开一个新连接，重发之前的请求。但客户端的这个行为其实是不确定的。好像chrome就只是单纯的爆出408这个错误给用户。

nginx反向代理支持http1.1

http{
''' 省去其他的配置
    upstream www{
        keepalive 50; # 必须配置，建议50-100之间
        '''
    }
    server {
    '''省去其他的配置
        location / {
        proxy_http_version 1.1; # 后端配置支持HTTP1.1，必须配
        proxy_set_header Connection "";   # 后端配置支持HTTP1.1 ,必须配置。
        }
    '''
    
    }
'''
}

参考此文