一个ELK日志检索实施案例

最新推荐文章于 2024-04-19 01:05:03 发布

RabinRow

最新推荐文章于 2024-04-19 01:05:03 发布

阅读量259

点赞数

文章标签： json 大数据

原文链接：http://www.cnblogs.com/selfdem/p/10543615.html

版权

1. 需求是这样的

一台机器上有两种不同类型的日志数据，且是按天存储的。目录结构是这样的：

       
--/data/log
  --/access_log    --190101.txt    --190102.txt    ...  --/click_log    --190101.txt    --190102.txt    ...

access_log每行是一个JSON字符串，长得像这样：

       
         xxxxxxxxxx 
        
{
    "url": "index",     "server": {        ...    },     "timestamp": 1551369609,     "ip": "188.138.188.34",     "session_id": "l9tooea23s6mjlid1q01svmc80",     "method": "GET",     "request_id": "a7ce6bc3e6c39e65eb5d11c979956acf"}

click_log每行也是一个JSON字符串，长得像这样：

       
         xxxxxxxxxx 
        
{
    "click_id": "474b927e12731128579b8100f4f8918b",     "click_obj": [        ...    ],     "data": {        ...    }}

需求比较简单，提供一个可查功能就可以了：

针对access_log，需要解析其中timestamp，方便按时间筛选日志；
access_log和click_log都需要解析JSON；

2. 调研与实施

在百度一搜，就是ELK，社区又比较火，索性就直接上吧。

2.1 搭建ES集群

比较简单，照着官网来即可，不想英文的，就百度吧，前人已经出了很多案例了。列几个比较好的参考吧： http://jiangew.me/es-deploy-proxy/ https://www.cnblogs.com/leeSmall/p/9189078.html https://www.cnblogs.com/leeSmall/p/9220535.html 我搭建的集群是6节点的，配置如下：

       
         xxxxxxxxxx 
        
cluster.name: myname
node.name: 192-168-0-106node.master: falsenode.data: falsepath.data: /data/es/datapath.logs: /data/es/lognetwork.host: 192.168.0.106http.port: 9200discovery.zen.ping.unicast.hosts: ["192.168.0.106", "192.168.0.107", "192.168.0.108", "192.168.0.109", "192.168.0.110", "192.168.0.111"]discovery.zen.minimum_master_nodes: 3

2.2 搭建kibana

直接用192.168.0.106那个节点，下包、解压、改配置，配置如下：

       
         xxxxxxxxxx 
        
server.host: "192.168.0.106"
elasticsearch.hosts: ["http://192.168.0.106:9200"]

2.3 搭建FileBeat

网上说logstash较重，FileBeat比较轻量，想想，我的需求这么简单就从了。轻量是轻量，但搭建过程，对于新手来说，还是不容易，中间遇到了很多问题，弯路就不提了，只提两个重要的。

2.3.1 如何解析JSON

参考官网这一页：https://www.elastic.co/guide/en/beats/filebeat/6.6/filebeat-input-log.html 就这两行代码：

       
         xxxxxxxxxx 
        
filebeat.inputs:
- type: log  paths:    - /data/log/access_log/*   -- 日志路径  json.keys_under_root: true   -- 把JSON中key移到根上(日志JSON只是beat上传数据的一项)  json.add_error_key: true     -- 解析失败，把错误放出来

2.3.2 ES 的mapping设置与输出设置

       
         xxxxxxxxxx 
        
setup.template.fields: "access_fields.yml"
setup.template.pattern: "access*"setup.template.name: "access"output.elasticsearch:  hosts: ["192.168.0.106:9200"]  index: "access_%{[dissect.key4]}"  pipeline: "log_pipeline"processors:  - add_host_metadata: ~  - add_cloud_metadata: ~  - dissect:      tokenizer: "%{key1}/%{key2}/%{key3}/%{key4}"      field: "source"      target_prefix: "dissect"

A. 没有用默认的fields.yml，在里面加了一个log_time，且要定义成date类型，需要从数据中的timestamp解析出来，用的pipeline。在kibana上的dev tools功能中console，贴入这段代码，并执行。

       
         xxxxxxxxxx 
        
PUT _ingest/pipeline/log_pipeline
{    "description": "log processor",     "processors": [        {            "date": {                "field": "timestamp",                 "target_field" : "log_time",                "formats": [                    "UNIX"                ],                 "timezone":"Asia/Shanghai",                "on_failure": [                    {                        "set": {                            "field": "date_processor_error",                             "value": "{{ _ingest.on_failure_message }}"                        }                    }                ]            }        }    ]}

B. 由于每天的数据要单独建索引，而且只能从文件名中提取年月日，所以用了dissect processor。

2.3.2 一个filebeat发一类索引，还是发多类索引

我用的后者，不知道会不会有什么问题，从运行来看，没出什么错。

3. 总结

不得不说，ES的文档写得真详细，搭个demo非常快，但要用好，还是有很多值得探索的地方，抽空再研究一下吧。

转载于:https://www.cnblogs.com/selfdem/p/10543615.html

RabinRow

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
一个ELK日志检索实施案例

1. 需求是这样的一台机器上有两种不同类型的日志数据，且是按天存储的。目录结构是这样的：--/data/log --/access_log --190101.txt --190102.txt ... --/click_log --190101.tx...
复制链接

扫一扫