故障描述是印度的lebi访问慢,10.154.30.27宕机造成的。
配置如下
upstream newlebi {
server 10.154.30.94:8080 weight=1 max_fails=3 fail_timeout=10s;
server 10.154.30.27:8080 weight=1 max_fails=3 fail_timeout=10s;
}
报错现象,显示到后端60秒超时。
我:
"10.58.114.68","[10/Mar/2017:14:39:30 +0800]","HTTP/1.1","-","POST","/bdp/action/tableDeploy/loadTableDeploy/ONLINE_APPLY","200","http://in.lebi.letv.cn/bdp/action/tableDeploy/loadTableDeploy/ONLINE_APPLY","62","181.679","10.121.152.89:8081","-","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36","181.679"
我:
"10.58.114.68","[10/Mar/2017:14:39:59 +0800]","HTTP/1.1","-","POST","/bdp/action/tableDeploy/loadTableDeploy/ONLINE_APPLY","200","http://in.lebi.letv.cn/bdp/action/tableDeploy/loadTableDeploy/ONLINE_APPLY","62","62.438","10.121.152.89:8081","-","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36","62.438"
我查了下leibi的配置
upstream newlebi {
server 10.154.30.94:8080;
server 10.154.30.27:8080;
}
日志里面直接把故障服务器踢掉了
总结,一味增加自己没有掌握的参数反而适得其反
用测试的jssdk测试了下,发现确实去掉多余参数之后没有出现重复请求两个后端ip的情况。
批量更新的一个命令nginx配置的一个命令
sed -i 's/weight=1 max_fails=3 fail_timeout=10s//g' urm.jamesding.top.conf #测试一下
cd /etc/nginx/conf.d/
sed -i 's/weight=1 max_fails=3 fail_timeout=10s//g' *.conf
不过我测试智标的logger 两种配置都可以剔除我关闭的242这台服务器的情况过去。所以我都蒙蔽了。
在上例中,最大失败次数为 3,也就是最多进行 3 次尝试,且超时时间为 30秒。max_fails 的默认值为 1,fail_timeout 的默认值是 10s。传输失败的情形,由 proxy_next_upstream 或 fastcgi_next_upstream 指定。而且可以使用 proxy_connect_timeout 和 proxy_read_timeout 控制 upstream 响应时间。
而写的是error timeout invalid_header http_500;
location ~ ^/report {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_pass http://bdp-report-web;
proxy_send_timeout 18000;
proxy_read_timeout 18000;
proxy_next_upstream error timeout invalid_header http_500;
proxy_connect_timeout 60s;
}