Telegraf+influxdb+机器学习预测节点cpu、disk、diskio、net、system数据
对于节点的硬件(cpu、磁盘、网络、系统)数据未来事件预测能够使运维人员更好的了解未来一段时间设备的运行状态。telegraf作为设备硬件数据监控软件,将实时的数据写入到时序数据库(inlfuxdb),采用机器学习算法进行数据预处理、训练、预测得到未来时间设备的使用状态。
Telegraf
- Telegraf是TICK Stack的一部分,是一个插件驱动的服务器代理,用于收集和报告指标。
- Telegraf 集成了直接从其运行的容器和系统中提取各种指标,事件和日志,从第三方API提取指标,甚至通过StatsD和Kafka消费者服务监听指标。
- 它还具有输出插件,可将指标发送到各种其他数据存储,服务和消息队列,包括InfluxDB,Graphite,OpenTSDB,Datadog,Librato,Kafka,MQTT,NSQ等等。
- 安装
yum install telegraf
- telegraf的配置
https://blog.csdn.net/youngtong/article/details/84640382
vim /etc/telegraf/telegraf.conf
- 数据输出到influxdb
[[outputs.influxdb]]
## influxdb地址
urls = ["http://172.16.109.105:8086"]
##influxdb数据库
database = "telegraf"
influxdb
- 拉取influxdb镜像
docker pull influxdb:alpine
- docker-compose开启influxdb服务
influxdb:
image: influxdb:alpine
container_name: influxdb
hostname: influx
restart: always
volumes:
- /var/lib/influxdb:/var/lib/influxdb
ports:
- "8086:8086"
networks:
docker1:
ipv4_address: 172.20.0.3
- influxdb数据
/ # influx
Connected to http://localhost:8086 version 1.8.3
InfluxDB shell version: 1.8.3
> use telegraf
Using database telegraf
> show measurements
name: measurements
name
----
cpu
disk
diskio
mem
net
system
> select * from cpu limit 10;
name: cpu
time cpu host usage_guest usage_guest_nice usage_idle usage_iowait usage_irq usage_nice usage_softirq usage_steal usage_system usage_user
---- --- ---- ----------- ---------------- ---------- ------------ --------- ---------- ------------- ----------- ------------ ----------
1612161120000000000 cpu-total node01 0 0 95.42315859399426 0.2184024140203704 0 0 0.04418183698259994 0 0.8088514080219019 3.5054057469811073
1612161120000000000 cpu0 node01 0 0 96.14510197258421 0.3243062520896004 0 0 0.07856904045469705 0 0.5834169174189224 2.8686058174523965
1612161120000000000 cpu1 node01 0 0 97.32847398689321 0.11368195800454726 0 0 0.058512772502340465 0 0.6168918015246849 1.8824394810752585
1612161120000000000 cpu10 node01 0 0 94.999916470372 0.3458126597504134 0 0 0.033411851183614776 0 0.6565428757580418 3.9643161429359015
1612161120000000000 cpu11 node01 0 0 96.77279184423811 0.13202974847497223 0 0 0.041781565973092516 0 0.7052728336258063 2.34812400768783
1612161120000000000 cpu12 node01 0 0 94.85397313372998 0.2740092227494494 0 0 0.04009891064626091 0 0.6917062086479787 4.1402125242264685
1612161120000000000 cpu13 node01 0 0 97.78832812317134 0.09194403116066806 0 0 0.03677761246426721 0 0.640264798809768 1.4426854343937716
1612161120000000000 cpu14 node01 0 0 95.56405800975753 0.33916995254962234 0 0 0.04678206242063758 0 0.6232039029606388 3.426786072311736
1612161120000000000 cpu15 node01 0 0 97.89929307953278 0.11029964737536962 0 0 0.04010896268195262 0 0.5999632334508984 1.350335076959058
1612161120000000000 cpu2 node01 0 0 93.49165789869241 0.3239975282662796 0 0 0.028391535982096684 0 1.2258463182858463 4.930106718773494
机器学习
- 数据读取采python依赖包: pip install influxdb
- 数据预处理、归一化、时序数据窗口、、、采用numpy、pandas
- sklearn中预测回归算法
https://www.jb51.net/article/164603.htm
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import HuberRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
- 采用随机森林回归算法举例
def train(self,query,fields=None,opt=None,device=None,time_dim=5,model_file=None,host_name=None):
influx_client = InfluxdbConnect(host=INFLUX_HOST, username=INFLUX_USER, passwd=INFLUX_PASSWD,
database=INFLUX_DB).connect()
dataset = InfluxdbQueryRead(influx_client, query)
X, y = DataPreprocessTrain(dataset=dataset, opt=opt, device=device, time_dim=time_dim,fields=fields).preprocess()
model = DecisionTreeRegressor()
model.fit(X,y)
joblib.dump(model, model_file)
return True
- 预测数据写入influxdb
> use precloud
Using database precloud
> show measurements
name: measurements
name
----
cpu_predict
disk_predict
diskio_predict
mem_predict
system_predict
> select * from cpu_predict limit 10
name: cpu_predict
time device device_id host_id hostname usage_idle usage_system usage_user
---- ------ --------- ------- -------- ---------- ------------ ----------
1614070935000000000 cpu0 1 1 node01 99.74480435000164 0.0600460352937251 0.18847783300530452
1614070935000000000 cpu1 2 1 node01 99.00208594075929 0.1518564872757614 0.8110137672090112
1614070935000000000 cpu10 11 1 node01 99.82657417289221 0.04335645677694768 0.12840181430096106
1614070935000000000 cpu11 12 1 node01 98.97203123957014 0.16687804552433075 0.8227087644349507
1614070935000000000 cpu12 13 1 node01 99.85495406878847 0.04501425451392964 0.09669728747436661
1614070935000000000 cpu13 14 1 node01 98.98211186757445 0.1585235616072623 0.8259911894273124
1614070935000000000 cpu14 15 1 node01 99.85661409183365 0.04334922805028506 0.09503484611024043
1614070935000000000 cpu15 16 1 node01 99.02039317779774 0.1535329261373118 0.7876906645305565
1614070935000000000 cpu2 3 1 node01 99.2629648157412 0.28514257128564285 0.4418876104719022
1614070935000000000 cpu3 4 1 node01 98.5879760990754 0.39389792035250526 0.9780685649430853