浅谈Telegraf+InfluxDB+Grafana快速搭建简易实时监控系统

最新推荐文章于 2024-06-03 17:39:16 发布

置顶老汉在此

最新推荐文章于 2024-06-03 17:39:16 发布

阅读量1.8w

点赞数 9

分类专栏：项目经验分享文章标签： telegraf influxdb grafana 监控系统

本文链接：https://blog.csdn.net/laisinanvictor/article/details/80007356

版权

项目经验分享专栏收录该内容

1 篇文章 0 订阅

订阅专栏

监控从来都是一个很宽泛的问题，任何可能出问题的地方都需要加入监控。全量监控的确是监控的终极目标。在搭建一套监控系统前，需要结合实际的系统情况和发展趋势进行考量。在作者看来，一套监控系统应主要由数据采集、数据存储、数据展示三部分构成。作者经过大量阅读相关资料后，最终选择了Telegraf+InfluxDB+Grafana这套方案。接下来，作者就对这套监控系统方案进行简要的介绍。

Telegraf+InfluxDB+Grafana相关介绍

讲这套方案前，先简要介绍一下InfluxData这家公司。InfluxData是一家从事物联网设备数据捕捉、分析的大数据公司。InfluxDB是InfluxData的核心产品。InfluxDB是一个开源分布式时序、时间和指标数据库，使用Go语言编写，无需外部依赖。其设计目标是实现分布式和水平伸缩扩展。目前已经有超过40000个独立网站在使用InfluxDB，其中包括IBM、CISCO、ebay、NORDSTORM、Mozilla等知名公司。通过InfluxData提供的时间序列平台系统框图，我们可以对该平台的功能有一个概览，如图-1所示。

图-1 InfluxData提供的时间序列平台组件图

该平台可以为以下需求提供服务：

图-2 时间序列平台服务对象

该方案中，作者所采用的是该时间序列平台中的数据采集（Telegraf）和数据存储（InfluxDB）部分。数据展示部分，由于InfluxData公司的Chronograf为新推出产品，相较于目前广泛使用的Grafana还不够成熟，因此暂不考虑，可作为该方案中数据展示部分的备选方案。下面作者就所采用的方案进行分块讲解：

(1) Telegraf——数据采集部分

Telegraf是一个用Go语言编写的代理程序，可采集系统和服务的统计数据，并写入InfluxDB数据库。Telegraf具有内存占用小的特点，通过插件系统开发人员可轻松添加支持其他服务的扩展。目前，最新版Telegraf支持的插件主要有：

Apache
DNS query time
Docker
http Listener
MySQL
Network Response
Tomcat
Zookeeper
TCP Listener

目前Telegraf尚不支持Oracle数据库统计数据的实时监控。

(2) InfluxDB——数据存储部分

InfluxDB 是一个开源分布式时序、事件和指标数据库。使用 Go 语言编写，无需外部依赖。其设计目标是实现分布式和水平伸缩扩展。

图-3 时间序列数据库性能排名

图-3是2016年2月份DB-Engines给出的时间序列数据库性能排名。可以看到InfluxDB是所有时间序列数据库中综合表现最好的。

InfluxDB有三大特性：

1. Time Series （时间序列）：可以使用与时间有关的相关函数（如最大，最小，求和等）

2. Metrics（度量）：你可以实时对大量数据进行计算

3. Eevents（事件）：它支持任意的事件数据

特点

Schemaless（无结构），可以是任意数量的列

Scalable（可扩展）：min, max, sum, count, mean, median 一系列函数，方便统计

Native HTTP API, 内置http支持，使用http读写

Powerful Query Language 类似sql

自带压力测试工具等，功能强大

(3) Grafana——数据展示部分

Grafana是一个纯html/js的web应用，是一个开源仪表盘工具，访问InfluxDB时不会存在跨域访问的限制，只要配置好数据源后，即可展示监控数据。

特点：

1. 丰富的数据源接口，支持InfluxDB、MySQL、ElasticSearch、PostgreSQL等多数据源

2. 丰富的API接口，方便自动化程序调用

3. 监控dashboard导入导出，制作好模板后导入后修改参数即可实现实时监控

4. 支持复杂的告警规则及邮件告警

Telegraf+InfluxDB+Grafana安装及配置

在介绍了相关的组件后，下面我们就进入到动手操作的环节，也是干货部分。

(1) InfluxDB的安装及配置

InfluxDB作为数据存储模块，可直接安装至被监控的目标主机，也可部署在独立的服务器上。目前，InfluxDB的集群模式是收费的，作者只简单通过单机模式进行展示。作者所使用的目标主机操作系统版本为RedHat Enterprise Linux 6.4。截止至目前，InfluxDB的最新版本为v1.5.2。

下载方式：

1.如果服务器无法连接外网，可以到influxdata的官网进行相关下载，下载的网址是https://portal.influxdata.com/downloads，然后通过ftp工具上传至服务器进行安装

2. 如果服务器可以直连外网，可通过终端输入命令wget https://dl.influxdata.com/influxdb/releases/influxdb-1.5.2.x86_64.rpm进行相关下载

安装InfluxDB

sudo yum localinstall influxdb-1.5.2.x86_64.rpm

启动InfluxDB服务、添加开机启动

service influxdb start

systemctl enable influxdb

service influxdb status

创建数据库

1. 进入influxDB

influx

2. 展示用户

> show users

3. 创建用户

> create user “pcbank”with password ‘pcbank’

4. 展示数据库、

> show databases

5. 新建数据库

> create database pcbankDB

查看influxdb当前状态信息

influxd

查看influxdb目前配置

influxd config

其他influxdb相关基本操作可参考文末相关链接

(2) Telegraf的安装及配置

Telegraf作为数据采集模块，需要安装至被监控的目标主机上。作者所使用的目标主机操作系统版本为RedHat Enterprise Linux 6.4。截止至目前，telegraf的最新版本为v1.6.0。

下载方式：

1.如果服务器无法连接外网，可以到influxdata的官网进行相关下载，下载的网址是https://portal.influxdata.com/downloads，然后通过ftp工具上传至服务器进行安装

2. 如果服务器可以直连外网，可通过终端输入命令wget https://dl.influxdata.com/telegraf/releases/telegraf-1.6.0-1.x86_64.rpm进行相关下载

安装telegraf

sudo yum localinstalltelegraf-1.6.0-1.x86_64.rpm

配置telegraf

Telegraf非常方便的一点就在于其配置驱动的特点。通过直接修改.conf配置文件即可实时将数据写入数据源。

首先，我们需要配置telegraf需要写入的数据源，我们选择本机的influxdb作为数据源写入，配置如下：

其次，我们需要配置一些基本的监控项，具体配置如下：

l 启动telegraf服务、添加开机启动

systemctl start telegraf.service

service telegraf status

systemctl enable telegraf.service

(3) Grafana的安装及配置

Grafana作为前端展示及监控告警工具，可以安装在和influxdb有网络连接的任意主机上。截止至目前，Grafana的最新版本为5.0.4。

下载方式

1. 登录https://grafana.com/grafana/download下载

2. 对于Redhat或Centos操作系统的主机：

wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.0.4-1.x86_64.rpm

安装Grafana

sudo yum localinstallgrafana-5.0.4-1.x86_64.rpm

配置Grafana告警邮箱

cd /etc/grafana/

vi grafana.ini

/smtp找到smtp的配置

#################################### SMTP /Emailing ##########################

[smtp]

enabled = true

host = smtp.163.com:25

user = 123456789@163.com

# If the password contains # or ; you have towrap it with trippel quotes. Ex """#password;"""

password = 987654321

;cert_file =

;key_file =

skip_verify = true

from_address = 123456789@163.com

from_name = Grafana Alert

# EHLO identity in SMTP dialog (defaults toinstance_name)

;ehlo_identity = dashboard.example.com

[emails]

;welcome_email_on_sign_up = false

启动服务、添加开机启动

systemctl daemon-reload

systemctl start grafana-server

systemctl status grafana-server

systemctl enablegrafana-server.service

Grafana使用及配置

登录http://localhost:3000，默认用户名admin，密码admin。

进入后，首先选择Data Sources，添加需要的数据源，如下图所示。

然后进入dashboard页面，创建仪表盘。我们可以通过访问https://grafana.com/dashboards来查看已经由其他用户共享的仪表盘，选取合适的使用，缩短上手时间。在这里，作者选取的是https://grafana.com/dashboards/1443这个仪表盘，该仪表盘内已经基本涵盖一个系统需要监控的相关参数。其telegraf相关配置如下：

[[inputs.net]]

# ##By default, telegraf gathers stats from any up interface (excluding loopback)

# ##Setting interfaces will tell it to gather these explicit interfaces,

# ##regardless of status.

# ##

interfaces = ["eth0,eth1,lo"]

[[inputs.cpu]]

##Whether to report per-cpu stats or not

percpu = true

##Whether to report total system cpu stats or not

totalcpu = true

## Iftrue, collect raw CPU time metrics.

collect_cpu_time = false

# Read metrics about disk usage by mountpoint

[[inputs.disk]]

## Bydefault, telegraf gather stats for all mountpoints.

##Setting mountpoints will restrict the stats to the specified mountpoints.

#mount_points = ["/"]

##Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually

##present on /run, /var/run, /dev/shm or /dev).

ignore_fs = ["tmpfs", "devtmpfs"]

# Read metrics about disk IO by device

[[inputs.diskio]]

## Bydefault, telegraf will gather stats for all devices including

##disk partitions.

##Setting devices will restrict the stats to the specified devices.

#devices = ["sda", "sdb"]

##Uncomment the following line if you need disk serial numbers.

#skip_serial_number = false

# Get kernel statistics from /proc/stat

[[inputs.kernel]]

# noconfiguration

# Read metrics about memory usage

[[inputs.mem]]

# noconfiguration

# Get the number of processes and group themby status

[[inputs.processes]]

# noconfiguration

# Read metrics about swap memory usage

[[inputs.swap]]

[[inputs.system]]

[[inputs.netstat]]

通过导入该仪表盘进入grafana后，选择我们需要的数据源，最终的监控仪表盘的效果如下：

通过对任意监控项进行编辑可以增加告警规则，如下图所示：

最终，超出阈值会发送告警邮件，如下如所示：

总结

以上内容只是作者基于自己的认知所给出的方案，由于时间紧迫，不足之处在所难免，希望各位读者能多提意见和建议，相互交流才能成长。作者也希望通过方案分享这样的形式可以帮助到更多有需要的人。

最后，附上作者所参考的文章链接

开源社区中国

https://my.oschina.net/xxbAndy/blog?&search=telegraf

influxdb设置开启https

https://docs.influxdata.com/influxdb/v1.4/administration/https_setup/#setup-https-with-a-self-signed-certificate

基于Telegraf和InfluxDB的url监控方案

https://www.annhe.net/article-3605.html

利用Metrics+influxdb+grafana构建监控平台

https://www.jianshu.com/p/fadcf4d92b0e

Metrics —— JVM上的实时监控类库

https://www.jianshu.com/p/e4f70ddbc287

JAVAMetrics度量工具 - Metrics Core 翻译

http://blog.csdn.net/scutshuxue/article/details/8351810

metrics小常识

http://blog.csdn.net/tracymkgld/article/details/51899721

开源还是商用？十大云运维监控工具横评

http://www.oschina.net/news/67525/monitoring-tools

简析运维监控系统及Open-Falcon

http://blog.csdn.net/puma_dong/article/details/51895063

NetkillerLinux Monitoring 手札

http://netkiller.github.io/monitoring/index.html

小米运维—互联网企业级监控系统实践

https://www.jianshu.com/p/b2f77285266c

grafana+influxdb+telegraf监控服务器cpu,内存和硬盘

https://www.jianshu.com/p/dfd329d30891

新一代监控平台整合telegraf、influxdb、garafana

http://blog.51cto.com/michaelkang/1759877

grafana +influxdb + telegraf , 构建性能监控平台

https://www.cnblogs.com/Scissors/p/5977670.html

快速部署Telegraf & Influxdb

https://www.cnblogs.com/deykenlee/p/7565647.html

Grafana搭建-优化zabbix图形显示

http://blog.51cto.com/ixhao/1847284

[开发方案] 看大众点评如何通过实时监控系统CAT打造7*24服务

http://udn.yyuap.com/thread-24395-1-1.html

Telegraf+ InfluxDB收集系统性能，Grafana图形展示

http://blog.csdn.net/qq942477618/article/details/59579511

IoT实时数据可视化方案：Grafana+InfluxDB+Telegraf+MQTT协议+Windows 10

https://segmentfault.com/a/1190000012514865

Docker监控方案(TIG)的研究与实践之Telegraf

https://www.jianshu.com/p/5e3ca9096caf

如何看待influxdb集群功能不再开源？

https://www.zhihu.com/question/42150020

Windows下本机简易监控系统搭建（Telegraf+Influxdb+Grafana）

https://www.cnblogs.com/liugh/p/6683488.html

telegraf+influxdb+grafana+springboot构建监控平台

http://blog.csdn.net/soongp/article/details/66974529

spring对接InfluxDB（一）--创建数据库和数据写入

http://blog.csdn.net/qq_35981283/article/details/75408859

StatsDMetric

https://www.jianshu.com/p/2b0aa5898dd7

使用 StatsD + Grafana + InfluxDB 搭建 Node.js 监控系统

https://www.v2ex.com/t/328124

telegraf+influxdb+grafana+springboot构建监控平台

http://blog.csdn.net/soongp/article/details/66974529

时序数据库技术体系 –初识InfluxDB

http://hbasefly.com/2017/12/08/influxdb-1/

小米开源项目：Open-Falcon|互联网企业级监控系统

http://open-falcon.com/community/

开源监控利器grafana

https://www.cnblogs.com/txwsqk/p/3974915.html

Grafana安装配置介绍

http://www.ywnds.com/?p=5903

InfluxDB系列学习教程目录

https://www.linuxdaxue.com/influxdb-study-series-manual.html

开源监控系统中 Zabbix 和 Nagios 哪个更好？

https://www.zhihu.com/question/19973178

MeasureAnything, Measure Everything

https://codeascraft.com/2011/02/15/measure-anything-measure-everything/

使用Prometheus+Grafana监控MySQL实践

http://www.ywnds.com/?p=9656

java服务端监控平台设计

http://blog.csdn.net/rosanu_blog/article/details/50585162

目前流行的开源监控框架有哪些

http://blog.csdn.net/moonpure/article/details/78294835

老汉在此

关注

9
点赞
踩
60

收藏

觉得还不错? 一键收藏
3
评论
浅谈Telegraf+InfluxDB+Grafana快速搭建简易实时监控系统

监控从来都是一个很宽泛的问题，任何可能出问题的地方都需要加入监控。全量监控的确是监控的终极目标。在搭建一套监控系统前，需要结合实际的系统情况和发展趋势进行考量。在作者看来，一套监控系统应主要由数据采集、数据存储、数据展示三部分构成。作者经过大量阅读相关资料后，最终选择了Telegraf+InfluxDB+Grafana这套方案。接下来，作者就对这套监控系统方案进行简要的介绍。 Telegraf+I...
复制链接

扫一扫