【分布式centreon故障处理】centreon不出图

最新推荐文章于 2021-03-13 21:02:44 发布

weixin_34247299

最新推荐文章于 2021-03-13 21:02:44 发布

阅读量266

点赞数

文章标签：运维

原文链接：http://blog.51cto.com/bluemood/988058

版权

今天可是弄了一天啊尼玛怎么都不出图啊最后终于解决了

查了N多资料啊，一般情况下centreon里画图是不需要你操心的，

但一旦出不了图，可真不好弄，下面有个排查步骤，一般情况下应该能解决，关键我的架构比较特殊，才导致了该问题

【故障现象】

centreon里加好监控后，perfdata正常输出，但是图上面无任何数据

【故障原因】

由于我的架构问题，分布式其中一个poller节点之前做central核心用后迁移了，但是该节点上的centstorage和centcore mysql都运行状态；结果加的这几台问题机器正好是这个poller上

【解决方案】

停掉该poller节点上的centcore和centstorage mysql等服务，即分布式中只能保留一台机器启动centcore和centstorage

【延伸】

上面的原因有些奇葩，正儿八经的解决办法是：（成图主要是centstorage）

重点查看 centstorage.log日志查看是否有报错

同时确认监控服务中perfdata是否正常产生

再有就是相关的centstorage配置

再就是到wiki或论坛上查找相应关键字了

Tips：

当监控服务器很多时，centstorage会有假死(?)情况,可以设置crontab每小时重启一次

[root@localhost ~]# cat /home/admin/restart_centstorage.sh

/etc/init.d/centstorage stop

sleep 8

killall -TERM centstorage

/etc/init.d/centstorage start

这有个详细的排查过程，我可是按着英文对话一步步的对啊，结果是上面的原因，shit

wiki上这东西还是不错的，过程形式是一人提问题，另一人协助排查解决

http://en.doc.centreon.com/Troubleshooting:Graphs

Troubleshooting:Graphs

[hide]

.. I know, currently this is only a log from an IRC session, but me (or any volunteers) may turn this in a good and structured HowTo :)

(14:22:09) grandmoun: How can i create graphs in centreon??
(14:28:27) nfilus: graphs are autogenerated, if the service you defined returns performance data
(14:29:01) zelia5: how do i generate perfdata ? ^^
(14:31:09) nfilus: # /usr/local/nagios/libexec/check_centreon_ping -H www.google.de
(14:31:15) nfilus: GPING OK - rtt min/avg/max/mdev = 23.269/23.269/23.269/0.000 ms|time=23.269ms;20;40;; ok=1
(14:31:25) nfilus: |time .... is the perfdata

compare this with the Service Details in Monitoring -> Services -> Details -> [your_service] like shown in the below p_w_picpath:

(09:43:30) nfilus: let's try to analyze step by step
(09:43:35) dharrison: ok cool
(09:44:25) nfilus: your service is running and the last check timestamp is quite recent in centreon?
(09:44:58) nfilus: look for "last check" at the main page in centreon or in monitoring
(09:45:58) dharrison: everything seems to be running ok
(09:46:40) nfilus: goto administratin -> options -> centstorage -> options
(09:47:38) nfilus: no empty fields?
(09:47:49) dharrison: nope
(09:48:06) nfilus: what storage type :)
(09:48:17) nfilus: rrd & mysql?
(09:48:24) dharrison: yup
(09:48:54) nfilus: check on filesystem if service-perfdata file exists and is 644 user:nagios group:nagios
(09:50:20) dharrison: looks like its 777 nagios & www-data
(09:50:43) nfilus: that's too much, but shouldn't be the problem
(09:50:53) nfilus: ok
(09:51:07) nfilus: goto centstorage -> manage in left menu
(09:51:35) nfilus: and choose the service you are interested in
(09:51:43) dharrison: theres nothing there
(09:51:48) dharrison: its empty
(09:52:35) nfilus: that's a symptom, lets look for the cause ...
(09:52:47) nfilus: na values - no graphs, sorry! :)
(09:52:59) dharrison: lol that would make sense  :-)
(09:53:29) nfilus: go to monitoring to your service details
(09:53:41) dharrison: any service?
(09:53:54) nfilus: the one you are interested in mostly
(09:54:31) dharrison: ok i have picked a host, and we will go for CPU Usage
(09:55:08) nfilus: ok
(09:55:31) nfilus: in status details: you have a status and performance data?
(09:56:16) dharrison: yes
(09:56:35) nfilus: please paste the perfdata here
(09:56:54) dharrison: '5 min avg Load'=1%;85;90;0;100
(09:57:35) nfilus: looks ok
(09:57:55) nfilus: so, perfdata is generated, but not processed
(09:58:56) nfilus: go to config -> command -> misc 
(09:59:31) nfilus: you should have sth like a process-service-perfdata command
(09:59:47) nfilus: (i think my definition is not standard)
(09:59:55) dharrison: yup i have that
(10:00:13) nfilus: open it and paste the command line
(10:01:13) dharrison: $USER1$/process-service-perfdata  "$LASTSERVICECHECK$" "$HOSTNAME$" "$SERVICEDESC$" "$LASTSERVICESTATE$" "$SERVICESTATE$" "$SERVICEPERFDATA$"
(10:02:04) nfilus: looks ok
(10:04:20) nfilus: config -> nagios -> nagios.cfg -> data
(10:05:10) dharrison: ok
(10:05:16) nfilus: perdata option is yes
(10:05:26) nfilus: service command is process-service-perfdata
(10:05:37) nfilus: service data file is /usr/local/nagios/var/service-perfdata
(10:06:13) nfilus: ok?
(10:06:16) dharrison: its /var/log/nagios3/service-perfdata
(10:06:28) dharrison: and perfdata option is yes
(10:07:22) nfilus: is this the same path as defined in administratin -> options -> centstorage -> options?
(10:08:14) dharrison: yes, just checked
(10:08:38) nfilus: so, this is the file you checked before for access, right?
(10:09:10) dharrison: yup
(10:09:36) dharrison: but its not the same file that $USER1$ points to. is that correct?
(10:10:24) nfilus: you mean  $USER1$/process-service-perfdata?
(10:11:08) dharrison: yup
(10:11:40) nfilus: no, this was the command that gets the perfdata from service checks and writes them into /var/log/nagios3/service-perfdata
(10:11:47) dharrison: oh ok
(10:12:30) nfilus: please do
(10:12:36) nfilus: tail -f /var/log/nagios3/service-perfdata
(10:13:04) nfilus: and watch for changes for 1-2 minutes
(10:13:27) dharrison: ok running now
(10:13:33) nfilus: is there any data comming in?
(10:13:37) dharrison: yes
(10:14:37) nfilus: ok, 
(10:14:38) nfilus:  ps ax | grep cent
(10:14:43) nfilus: centstorage is running?
(10:16:24) dharrison: seems to be
(10:17:07) nfilus: ok, do
(10:17:18) nfilus: tail -f /usr/local/centreon/log/centstorage.log
(10:17:28) nfilus: any errors or warnings?
(10:18:18) dharrison: no such log file
(10:20:11) nfilus: path centreon is in usr local, yes?
(10:20:59) dharrison: yes
(10:24:57) nfilus: grep LOG /usr/local/centreon/bin/centstorage
(10:25:07) nfilus: what's the log path?
(10:26:07) dharrison: "/usr/local/centreon/log/centstorage.log";
(10:26:46) nfilus: ls -lad  /usr/local/centreon/log
(10:26:57) nfilus: drwxrwxr-x 2 www-data nagios ?
(10:27:42) dharrison: yup   lol
(10:28:50) nfilus: that'S not normal, that no log file is there if centstorage is running!
(10:29:11) nfilus: is there a logAnalyser.log?
(10:29:21) dharrison: yes
(10:34:56) nfilus: can you restart centstorage
(10:35:05) dharrison: yeah 2secs
(10:36:15) dharrison: it did bring this up when i stopped it No lock file found in /var/run/centreon/centstorage.pid
(10:36:49) dharrison: ive stopped it but says its still running????
(10:37:04) dharrison: whats the process name for centstorage?
(10:37:40) nfilus: something like /usr/bin/perl -w /usr/local/centreon/bin/centstorage
(10:38:59) dharrison: hey hey  can't write /usr/local/centreon/log/centstorage.log: Permission denied
(10:39:17) dharrison: when i typed that command above
(10:40:23) nfilus: you are root?
(10:41:10) nfilus: there is no centstorage.log until now and  /usr/local/centreon/log is writeable, yes?
(10:41:30) dharrison: i have now ran that as sudo and came back ok
(10:42:55) dharrison: i ran  /usr/bin/perl -w /usr/local/centreon/bin/centstorage   as sudo which i should have done tbh. sorry
(10:43:03) dharrison: and there is now a centstorage.log
(10:43:39) nfilus: watch it for progress and errors
(10:43:41) nfilus: tail -f 
(10:44:16) dharrison: just two lines at the mo.
(10:44:26) dharrison: 1 stating that its starting
(10:44:32) dharrison: 2 with the PID Number
(10:44:44) nfilus: woow, that's progress :)
(10:44:52) dharrison: lol certainly is
(10:45:26) dharrison: nothing else is coming through
(10:46:13) nfilus: it should stay silent if no errors occur
(10:46:27) nfilus: like in my case:
(10:46:29) nfilus: 22/10/2009 10:47:01 - ERROR while updating /var/lib/centreon/status/186.rrd at 1256201216 -> 100 : illegal attempt to update using time 1256201216 when last update time is 1529719541 (minimum one second step)
(10:47:31) dharrison: lol
(10:47:42) dharrison: nope still silent......but no graphs still
(10:48:46) nfilus: wait 5 minutes and then go back to admin -> options -> centstorage -> manage
(10:48:58) nfilus: there should be some data now
(10:49:44) dharrison: ok currently still empty. but you reckon to wait a few more minutes?
(10:50:51) nfilus: yes, the perfdata needs to be filled in
(10:51:30) dharrison: ok
(10:56:10) nfilus: so, .... is there any data?
(10:56:32) dharrison: WHOA DUDE!
(10:56:33) nfilus: ... or any errors
(10:56:37) dharrison: data
(10:56:39) dharrison: lots

centstorage.log errors

unitialized value ...

Use of uninitialized value in multiplication (*) at /usr/local/centreon/bin/centstorage line 506

(14:54:33) nfilus: the problem is : $interval = getServiceCheckIntervalWithSVCid($index) * getIntervalLenght($con_oreon);
(14:55:27) nfilus: either the global interval (Configuration -> Nagios -> nagios.cfg -> Tuning : Timing Interval) 
           is not defined in config, or there is no check interval for some services
(14:56:56) iLLiZT: Hmm, there might not be a check interval defined for a couple of services, but shouldn't they use some kind of default then?
(14:58:40) nfilus: no
(14:58:59) iLLiZT: Ok, so I have to define the normal check interval and retry check interval for all services?
(14:59:32) nfilus: either for every service or in the used templates

timestamp error while updating - case A

31/1/2010 13:31:30 - ERROR while updating /var/lib/centreon/metrics/561.rrd at 1264941084 -> 31 : illegal attempt to update using time 1264941084 when last update time is 1264941084 (minimum one second step)

In this case, where all timestamps are the same (1264941084) the reason was the service check_smart and a very old smartctl producing a malformed perfdata by repeating a metric twice (... temp=55234323 temp=34 ...). You can query mysql to which service the metric id (example: 561) corresponds to by using:

mysql> select host_name, service_description from metrics, index_data where index_id = id and metric_id = 561;

Afterwards execute the check_command for service_description on host_name on the command line, to see the unparsed performance data output.

timestamp error while updating - case B

31/1/2010 13:31:30 - ERROR while updating /var/lib/centreon/metrics/561.rrd at 1264941084 -> 31 : illegal attempt to update using time 1264941084 when last update time is 1564941084 (minimum one second step)

In this second case, where these errors occur, the last timestamp in error message is (mucht) greater than the first one (in the future of year 2011). Please check the system clock on your monitoring server. It might be that the systime is jumping or beeing re-adjusted by NTP, /etc/adjtime or vmware-tools.

Can't use string (...) as a HASH ref while "strict refs"

Can't use string ("HOSTSTATE::UP") as a HASH ref while "strict refs" in use at 419

This error is common for people migrating from pnp4nagios or who did import their old nagios commands into centreon and who chose to overwrite the default values. For centstorage to work correctly it is essential to process the performance data coming from the plugins, which is expected in a well-defined format. If the format deviates, centstorage can't parse the values anymore. The format is determined by the command definition which nagios is using as Service Performance Data Processing Command in Configuration -> Nagios -> nagios.cfg -> Data (default: process-service-perfdata). Please check the parameters of this command as defined in Configuration -> Commands -> Miscellaneous -> "command-name", which should be:

$USER1$/process-service-perfdata  "$LASTSERVICECHECK$" "$HOSTNAME$" "$SERVICEDESC$" "$LASTSERVICESTATE$" "$SERVICESTATE$" "$SERVICEPERFDATA$"

Customize graphs

Q: Where and how do I configure Centreon that it has to use the performance data to create a graph?

A: Centreon uses the data as soon as it is parsed by centstorage and copied into the configured storages (RRD, RRD and DB). Go to Views -> Curves and define colors for your metrics (time, temperature, total, ...). In Administration -> Options -> CentStorage -> Manage you can disable not needed performace metrics to be not displayed on the graphs. For more control of graph output use the graph templates.

转载于:https://blog.51cto.com/bluemood/988058

weixin_34247299

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【分布式centreon故障处理】centreon不出图

今天可是弄了一天啊尼玛怎么都不出图啊最后终于解决了查了N多资料啊，一般情况下centreon里画图是不需要你操心的，但一旦出不了图，可真不好弄，下面有个排查步骤，一般情况下应该能解决，关键我的架构比较特殊，才导致了该问题【故障现象】centreon里加好监控后，perfdata正常输出，但是图上面无任何数据【故障原因】由于我的架构问题，分布式其中一个poller节点之前做...
复制链接

扫一扫