自开发了prometheus的vcenter exporter,vcenter大概有1300多台主机和虚拟机,每隔30秒都要发起一次ping的请求。在生产环境redhat7上部署之后出现too many open files报错,报错信息:
Jun 28 17:53:53 sz180591 observer_vcenter_exporter: time="2018-06-28T17:53:53+08:00" level=error msg="ping ip: 10.60.1.54 error:pipe2: too many open files" sourc
e="ping.go:57"
Jun 28 17:53:53 sz180591 observer_vcenter_exporter: time="2018-06-28T17:53:53+08:00" level=error msg="ping ip: 10.60.0.187 error:fork/exec /usr/bin/ping: too many open files" source="ping.go:57"
在网上查找相关信息,均是通过"ulimit -n 2048"来修改"open files"的最大值。
但通过"ulimit -a"发现"open files"已经设置为了65536
sz180591:root@/root>ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15210
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
后面通过google查询,发现有提到systemd会惹祸,大概的意思就是通过systemd启动的服务,不会使用ulimit中的配置,需要在systemd中或者service配置文件中定义,可以通过查看 /proc/<pid>/limits文件中的内容来确定。
刚好,该exporter就是通过systemd启动的,通过查看进程的limits,发现"Max open files"确实不是ulimit中的设置
sz180591:root@/var/log>cat /proc/5750/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 15210 15210 processes
Max open files 1024 4096 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 15210 15210 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
修改service定义,增加"LimitNOFILE=15210"
[Unit]
Description=exporter vcenter service
After=network.target
[Service]
LimitNOFILE=15210
Type=simple
PIDFile=/var/run/observer_vcenter_exporter.pid
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/qhapp/monitor/exporter_vcenter/observer_vcenter_exporter --config-file=/qhapp/monitor/config/observer/vcenter.yml
SyslogIdentifier=observer_vcenter_exporter
Restart=always
[Install]
WantedBy=multi-user.target
在查看进行的limits,可以看出"Max open files"已经修改。
sz180591:root@/var/log>cat /proc/15986/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 15210 15210 processes
Max open files 15210 15210 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 15210 15210 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
经过一段时间的观察,暂未出现"too many open files"的异常了。