SLS机器学习最佳实战:日志聚类+异常告警

0.文章系列链接
SLS机器学习介绍(01):时序统计建模
SLS机器学习介绍(02):时序聚类建模
SLS机器学习介绍(03):时序异常检测建模
SLS机器学习介绍(04):规则模式挖掘
SLS机器学习介绍(05):时间序列预测
一眼看尽上亿日志-SLS智能聚类(LogReduce)发布
SLS机器学习最佳实战:时序异常检测和报警
SLS机器学习最佳实战:时序预测
1.手中的锤子都有啥?
围绕日志,挖掘其中更大价值,一直是我们团队所关注。在原有日志实时查询基础上,今年SLS在DevOps领域完善了如下功能:

上下文查询
实时Tail和智能聚类,以提高问题调查效率
提供多种时序数据的异常检测和预测函数,来做更智能的检查和预测
数据分析的结果可视化
强大的告警设置和通知,通过调用webhook进行关联行动
SLS机器学习最佳实战:日志聚类+异常告警
今天我们重点介绍下,日志只能聚类和异常告警如何配合,更好的进行异常发现和告警

2.平台实验
2.1 实验数据
一份Sys Log的原始数据,,并且开启了日志聚类服务,具体的状态截图如下:
SLS机器学习最佳实战:日志聚类+异常告警

通过调整下面截图中红色框1的大小,可以改变图中红色框2的结果,但是对于每个最细粒度的pattern并不会改变,也就是说:子Pattern的结果是稳定且唯一的,我们可以通过子Pattern的Signature找到对应的原始日志条目。
SLS机器学习最佳实战:日志聚类+异常告警

2.2 生成子模式的时序信息
假设,我们对这个子Pattern要进行监控:

msg:vm-111932.tc su: pam_unix(*:session): session closed for user root
对应的 signature_id : __log_signature__: 1814836459146662485

我们得到了上述pattern对应的原始日志,可以看下具体的数量在时间轴上的直返图:
SLS机器学习最佳实战:日志聚类+异常告警

上图中,我们可以发现,这个模式的日志分布不是很均衡,其中还有一些是没有的,如果直接按照时间窗口统计数量,得到的时序图如下:

log_signature__: 1814836459146662485 |
select
date_trunc('minute',
time__) as time,
COUNT(*) as num
from log GROUP BY time order by time ASC limit 10000
SLS机器学习最佳实战:日志聚类+异常告警

上述图中我们发现时间上并不是连续的。因此,我们需要对这条时序进行补点操作。

log_signature__: 1814836459146662485 |
select
time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
avg(num) as num
from (
select
time - time__ % 60 as time,
COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC limit 10000
SLS机器学习最佳实战:日志聚类+异常告警

2.3 对时序进行异常检测
使用时序异常检测函数: ts_predicate_arma

__log_signature: 1814836459146662485 |
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg')
from (
select
time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
avg(num) as num
from (
select
time - time__ % 60 as time,
COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC ) limit 10000
SLS机器学习最佳实战:日志聚类+异常告警

2.4 告警该如何设置
将机器学习函数的结果拆解开
__log_signature: 1814836459146662485 |
select
t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
from (
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
from (
select
time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
avg(num) as num
from (
select
time - time__ % 60 as time,
COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC )) , unnest(res) as t(t1)
SLS机器学习最佳实战:日志聚类+异常告警

针对最近两分钟的结果进行告警
__log_signature: 1814836459146662485 |
select
unixtime, src, pred, up, lower, prob
from (
select
t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
from (
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
from (
select
time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time,
avg(num) as num
from (
select
time - time__ % 60 as time, COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC )) , unnest(res) as t(t1) )
where is_nan(src) = false order by unixtime desc limit 2
SLS机器学习最佳实战:日志聚类+异常告警

针对上升点进行告警,并设置兜底策略
__log_signature: 1814836459146662485 |
select
sum(prob) as sumProb, max(src) as srcMax, max(up) as upMax
from (
select
unixtime, src, pred, up, lower, prob
from (
select
t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
from (
select
ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, 'avg') as res
from (
select
time_series(time, '1m', '%Y-%m-%d %H:%i:%s', '0') as time, avg(num) as num
from (
select
time - time__ % 60 as time, COUNT(*) as num
from log GROUP BY time order by time desc )
GROUP by time order by time ASC )) , unnest(res) as t(t1) )
where is_nan(src) = false order by unixtime desc limit 2 )
SLS机器学习最佳实战:日志聚类+异常告警

具体的告警设置如下:
SLS机器学习最佳实战:日志聚类+异常告警

3.硬广时间
3.1 日志进阶
这里是日志服务的各种功能的演示 日志服务整体介绍,各种Demo
SLS机器学习最佳实战:日志聚类+异常告警

更多日志进阶内容可以参考:日志服务学习路径。

转载于:https://blog.51cto.com/14031893/2399644

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值