nginx(二十九)error.log记录报错信息分析

wzj_110

已于 2023-04-02 15:58:41 修改

阅读量1.9w

点赞数 6

分类专栏： nginx 文章标签： nginx error.log

于 2022-04-28 23:44:47 首次发布

本文链接：https://blog.csdn.net/wzj_110/article/details/124391355

版权

nginx 专栏收录该内容

99 篇文章 173 订阅

订阅专栏

一汇总

nginx报错参考：'客户端报错'、'access.log'信息[一般非200场景]、'error.log'、'后端日志'综合考虑

① 关键字 prematurely

nginx报错信息："upstream prematurely closed connection" while 'reading response' header from upstream

场景一： nginx'下载'超过'1G'大文件报错

access.log信息： '502'状态码

描述： nginx读取后端的'header'头信息得知'后端主动'关闭连接

② 关键字 recv() failed

'error.log'报错信息： "recv() failed (104: Connection reset by peer)" while reading 'response header from upstream'	

'access.log'表现：'502 Bad Gateway'

报错原因：'upstream后端'服务已经断开了连接,但是'未通知'到nginx,nginx还在该连接上'等着收发'数据,最终导致了该'报错'.

表现为：'被调用方[后端服务]'的超时时间'小于'nginx[调用方]的超时时间,导致nginx在'等待'期间,后端服务'先'断开

'502一般原因':后端服务无法处理，业务中断

+++++++++++"可能的原因[暂时没有理论支持]"+++++++++++

1) 服务器的'并发连接数'超过了其承载量,服务器会将'其中一些连接'down掉

2) 客户关掉了浏览器,而服务器'还在'给客户端发送数据

3) 浏览器端'按了'stop

php-fpm由于'超时等原因'终止导致nginx'未收到'有效回应

+++++++++++++++"nginx 502场景"+++++++++++++++

1. 后端服务器['upstream']因请求'过多而过载',并且'出现故障' -->'节点原因 [socket打满]'

2. 后端服务器'配置'不正确 -->'服务配置原因'

3. '网络'问题,例如DNS解析问题,'路由'问题或'防火墙'阻止服务器

其它参考链接

③ 关键字 readv() dailed

error.log'报错'信息：readv() failed (104: Connection reset by peer) while reading upstream ...

tomcat报错: Request header is 'too large'

细节点： 出现这个报错时,nginx'access.log'没有记录信息,但是'error.log'以及'后端服务'的日志有报错信息

补充：'几乎'不会出现

+++++++++++++"场景描述"+++++++++++++

客户端'请求头'太大,'nginx做了开放',但是'后端服务'采用'默认4k'太小导致的

+++++++++++++"最佳策略"+++++++++++++

1) 'nginx.conf'增加'buffer[解决请求头过大]'和'timeout'参数,并调整为'长链接'

proxy_connect_timeout      120;   

proxy_send_timeout         300;    

proxy_read_timeout         300; 

proxy_http_version 1.1;    

proxy_set_header Connection "";

备注：关于buffer参数'不再赘述'

2) 后端是'tomcat','server.xml'增加'maxHttpHeaderSize'参数

Request header is too large 请求头太大

tomcat:Request header is too large和Request Entity Too Large的正确解决方法

④ (111: Connection refused) while connecting to upstream

error.log记录："connect() failed (111: Connection refused) while connecting to upstream"

nginx在连接'后端'时,若遇到后端'upstream服务挂掉[看端口]'、'网络不通'、'防火墙问题',nginx会'收到'该错误

备注：这里'connecting'可以理解为'尝试'与后端建立'tcp连接过程中'failed未能'建立连接'

502与该报错有关联

错误的设置转发请求头导致

⑤ nginx发送或接收数据时，连接中断

++++++++"tcp连接成功后,nginx读取或发送数据时场景"++++++++

1："(111: Connection refused) while reading response header from upstream"

说明： nginx在'tcp连接'成功后'读取后端'数据时,若遇到'后端upstream挂掉或者不通',会'收到'该错误

2："(111: Connection refused) while 'sending request 'to upstream"

说明：nginx和upstream'连接成功'后,nginx'发送数据'时,若遇到'后端'upstream挂掉或者不通,会收到该错误

重点： access.log日志'状态码'体现为'502'

⑥ timeout 超时

强调 ：'超时'一定要理解超过的是'哪个时间'

1)proxy_connect_timeout 太小导致

failed "(110: Connection timed out) while connecting to upstream"

含义：nginx'连接'后面的upstream时'超时',后端是不是'挂了'

关联： proxy_connect_timeout

解读：定义一个'nginx 与real server' 建立链接的超时时间,通常'不要超过'75秒,'默认'：60s

补充： 'access.log'一般报错'502'

2)upstream timed out (110: Connection timed out) while reading response header from upstream

特点： 'nginx'侧主动关注'连接'

场景： 一般是'长连接'场景中

说明： nginx'读取'来自upstream的'响应'时超时,也即'后端服务器响应慢,超时'

原因： 一般是nginx在等待proxy_read_timeout时间之后,server仍没有对nginx发送请求做出响应

相关参考链接

proxy_read_timeout官网解读

建议： '全局'设置一个'泛'值,具体的'location'根据实际需要进行'局部'覆盖

++++++++++++++++++"优化思路"++++++++++++++++++

1) 尝试优化'后端服务器代码',缩短执行时间

2) 后端'本身'执行的就是'耗时'操作,只能'增加'nginx侧的'超时'参数

⑦ 502状态码error.log汇总

[1] 'Connection refused' -->nginx连接后端'被拒绝'

[2] 'Connection timed out' -->'连接超时',是不是'网络'原因

[3] 'Upstream SSL certificate verify error' -->"证书校验失败"

   1. 'self signed certificate 自签名'证书不合法

   2.  使用的'证书过期'

[4] 'connect() failed (113: No route to host)' while connecting to upstream

   --> '后端服务'不可用,导致nginx'找不到'路由

[5] 'SSL_do_handshake() * wrong version number'

   --> '握手使用了http,协议不一致'

[6]  upstream SSL certifiacte does not match "*" while SSL handshaking to upstream

   --> proxy_ssl_server_name on

[7]  'ww.wzj.com could not be resolved(110: Operation timed out)'

   --> resolver参数,'dns解析问题'

[8]  'certificate signature failure' while SSL handshaking to upstream

[9]  proxy_pass 本来应该是'http',但是被配置成'https',实际后端'没有'证书

⑧ 信号引起

++++++++++"nginx发送信号,error.log表现"++++++++++

'reopening logs'	        用户发送kill  -USR1命令

"gracefully shutting down" 	用户发送kill  -WINCH命令

二补充

① upstream发送的响应头无效

1）upstream sent 'invalid header' while reading response header from upstream

upstream发送的'响应头'无效

2) upstream sent 'no valid HTTP/1.0 header' while reading response header from upstream”	

upstream发送的'响应头'无效

++++++++++++"场景"++++++++++++

根因1：upstream'响应头太大',nginx无法处理

根因2：后端响应头有'乱码',nginx'无法'处理

upstream sent no valid HTTP/1.0 header while reading response header from upstream 其它参考

upstream sent invalid chunked response while reading upstream

'invalid chunked'报错：通过'nginx访问'的时候,'浏览器'页面显示'ERR_EMPTY_RESPONSE'

② 客户端请求体太大

(3) 'client intended to send too large body'

	用于设置nginx允许接受的客户端'请求体'内容的最大值，默认值是1M,client发送的body'超过了'设置值
    
    场景：客户端上传'大文件';

    access.log：'413'状态码

nginx 400 413 414报错

③ upstream后端server挂了

1. no servers are inside upstream

说明：upstream下'未配置'server

2. 'no live upstreams while connecting to upstream' -->'注意细节'

可能：upstream下的'server全都挂'了,'健康检查失败',全部'被踢出';但是这个时候'请求'却过来了

备注： access.log会出现'502'报错

补充： error.log也会记录'健康检查失败'日志

④ 下载文件报错

现象： 客户端刚'点击下载'就报错,'没有'下载任何文件

备注： access.log可能出现'403'报错

error.log报错：open()"/*/nginx/proxy_temp/2/26/0000000262" failed (13: Permission denied)

(13: Permission denied) while reading upstream

nginx启动用户不对或者目录权限不对

proxy_tempnginx启动用户不对或者目录权限不对

其它参考

⑤ nginx代理侧端口号耗尽

(99: Cannot assign requested address) while connecting to upstream 短连接问题

⑥ https握手问题

++++++++++"分割线[SSL问题]"++++++++++

"SSL_do_handshake() failed" 	SSL'握手失败'

'SSL_write() failed (SSL:) while sending to client'

'ngx_slab_alloc() failed: no memory in SSL session shared cache'	

    ssl_session_cache大小不够等原因造成

'could not add new SSL session to the session cache while SSL handshaking'

⑦ 端口冲突

'(98: Address already in use) while connecting to upstream'  

备注："端口被占用,nginx未启动" -->"端口冲突"

⑧ no resolver defined报错

error.log报错： 2020/11/17 21:32:42 [error] 17590#0: *14531766 'no resolver defined to resolve'  www.wzj.com

背景： 使用'域名'作为'反向代理的地址'的话,会在nginx0.6.18以后的版本中都会提示" no resolver defined to resolve "的类似错误，而在proxy_pass 中直接'设置地址'却不会。

备注：原因是Nginx0.6.18以后的版本中'启用了一个resolver指令',在'使用变量'来构造某个server地址的时候'一定要用resolver指令'来制定DNS服务器的地址

场景： proxy_pass中'域名用变量代替'、或者proxy_pass中'url'或'请求参数'用变量替代

++++++++++"解决策略"++++++++++

在nginx的配置文件中的http{}部分添加一行'resolver 8.8.8.8;'即可

⑨ 其它

nginx writev() failed (32: Broken pipe) while sending to client 相关链接

send() failed (111: Connection refused) while resolving

104：Connection reset by peer

nginx错误日志文件error.log常见错误详细说明

三 curl模拟请求报错汇总

++++++++++  No route to host 报错 ++++++++++

[1]、域名解析失败

  1) 域名'无法解析' --> 没有这个域名或者/etc/hosts、/etc/resolv.conf配置的有问题

  2) 本机自己没'开了dns 53'防火墙

  3) 验证方式： ping '域名|ip'、dig、nslookup

[2]、域名能解析,但被访问的'ip不存在',实际和访问的ip'不一致'

 常见： proxy_pass中的域名dns解析记录发生了变化,但是nginx没有"reload|restart"

[3]、'路由'问题导致

根因： ip层表现,请求没有达到'服务'所在的节点 ; 排查'route',以及wiresahrk抓包

++++++++++  Connection Timeout 超时 ++++++++++

1) 访问端口被'安全组或防火墙'限制

2) 客户端资源'耗尽'

3）iptables设置'drop'.丢弃了请求报文

4) 连接、读、写超时、服务器'应答太慢'、tcp'重传'等

++++++++++  Connection Refused 拒绝  ++++++++++

1）端口未'listen'监听 --> 端口监听错误或者服务没有开启 --> netstat 在 server端查询

2) 服务端端口被'耗尽'

3) iptables时则会irehected  --> 可以用' iptables -F '临时去掉

根因： 

  1) 从client到target地址的'路由是正常'的

  2) 但是该目标端口'没有'进程在监听,然后服务端'拒绝'掉了连接

+++++++++++  "端口测试的几种方法"  +++++++++++

nc wget curl wget

彻底理解tcp connection timeout

Connection Refused Connection refused间歇性出现的问题定位

curl模拟用户请求的两种超时报错连接和数据传输

① proxy_send_timeout_time

语法 proxy_send_timeout time 

默认值 60s

上下文 http server location

说明:这个指定设置了'发送请求给upstream服务器'的超时时间.

1) 超时设置'不是'为了整个发送期间,而是在'两次write操作'期间;

2) 如果超时后,upstream'没有收到'新的数据,nginx会关闭连接

② tcpdump获取http的请求和响应信息

tcpdump过滤HTTP的GET请求:

sudo tcpdump -s 0 -A 'tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'

tcpdump过滤HTTP的POST请求:

sudo tcpdump -s 0 -A 'tcp dst port 80 and (tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x504f5354)'

tcpdump过滤HTTP的请求和响应头信息，以及请求和响应消息体信息：

tcpdump -A -s 0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'
tcpdump -X -s 0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

tcpdump 获取http请求url