elasticsearch + logstash + kibana 基础操作

最新推荐文章于 2024-05-23 21:37:25 发布

「已注销」

最新推荐文章于 2024-05-23 21:37:25 发布

阅读量420

点赞数

文章标签： elasticsearch

本文链接：https://blog.csdn.net/QQLKET/article/details/104671625

版权

安装ELK

elasticsearch下载地址：https://www.elastic.co/downloads/elasticsearch

logstash下载地址：https://www.elastic.co/downloads/logstash

kibana下载地址：https://www.elastic.co/downloads/kibana

安装参考（推荐官网下载压缩包再解压，brew安装会缺少x-pack插件）：https://www.cnblogs.com/liuxiaoming123/p/8081883.html

elasticsearch基础操作

API

Java API（官方API）：https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html

maven引用

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>6.6.0</version>
</dependency>

JEST（Java HTTP Rest client）：https://github.com/searchbox-io/Jest

maven引用

<dependency>
  <groupId>io.searchbox</groupId>
  <artifactId>jest</artifactId>
  <version>5.3.3</version>
</dependency>

PHP API（官方API）：https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/index.html

composer引用

{
    "require": {
        "elasticsearch/elasticsearch": "~6.0"
    }
}

名词定义

索引（index）

索引是很多文档的集合，这些文档都具备一些相似的特征。例如，你可以分别创建客户，产品目录和其他数据的索引。索引是通过名字（必须是小写的）区分的，当执行索引，搜索，更新和删除操作时，这个名字将会被用来引用实际的索引。

类型（Type）

你可以在索引中定义一个或多个类型。类型是索引的一个逻辑分类或划分，它的概念完全取决于你自己如何去理解。通常，类型是为一些具备公共字段的文档定义的。例如，假想你在运行一个博客平台，并且把其全部的数据都存储索引中。你可以在索引中定义一个用于保存用户数据的类型，另一个用于保存博客数据的类型，还有一个用于保存评论数据的类型。

文档（document）

文档是可以被索引的基本单位。例如，你可以用一个文档保存单个客户的数据，另一个文档保存单个产品的数据，还有一个文档保存单个订单的数据。文档使用一种互联网广泛存在的数据交换格式保存-JSON。

基本用法

索引

新增索引

创建空索引

curl -PUT 'localhost:9200/_index'

设置索引mapping

curl -POST 'localhost:9200/_index/_type?pretty'
{
    "_type": {
        "properties": {
            "field1": {
                "type": "text"
            },
            "field2": {
                "type": "text"
            },
            "field3": {
                "type": "text"
            },
            "field4": {
                "type": "long"
            }
        }
    }
}

删除索引

删除单个

curl -DELETE 'localhost:9200/_index'

删除多个

curl -DELETE 'localhost:9200/_index1,_index2' 或 curl -DELETE 'localhost:9200/_index*'

文档

新增文档

curl -POST 'localhost:9200/_index/_type{/_id}'
{
  "field1": "XXXXXXXX",
  "field2":  "XXXXXXXX",
  "field3":  "XXXXXXXX",
  "field4":  "1529396883"
}

修改文档

curl -PUT 'localhost:9200/_index/_type/_id'
{
  "field1": "XXXXXXXX",
  "field2":  "XXXXXXXX",
  "field3":  "XXXXXXXX",
  "field4":  "1529396883"
}

删除文档

curl -DELETE 'localhost:9200/_index/_type/_id'

查询单条文档

查询文档

curl -XGET 'localhost:9200/_index/_type/_id'

返回内容

{
  "_index" :   "_index",
  "_type" :    "_type",
  "_id" :      "_id",
  "_version" : 1,
  "found" :    true,
  "_source" :  {
      "field1": "XXXXXXXX",
      "field2":  "XXXXXXXX",
      "field3":  "XXXXXXXX",
      "field4":  "1529396883"
  }
}

只需要_source字段中特定的字段，请求

curl -XGET 'localhost:9200/_index/_type/_id?_source=field1,field2'

返回内容

{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 1,
  "found" :   true,
  "_source" : {
      "field1": "My first blog entry" ,
      "field2":  "Just trying this out..."
  }
}

只需要返回_source字段，不需要任何元数据，请求

curl -XGET 'localhost:9200/_index/_type/_id/_source'

返回内容

{
    "field1": "XXXXXXXX",
    "field2": "XXXXXXXX",
    "field3": "XXXXXXXX",
    "field4": "1529396883"
}

查询多条文档

curl -XPOST 'localhost:9200/_mget'
{
    "docs": [
        {
            "_index": "_index",
            "_type": "_type",
            "_id": "_id"
        },
        {
            "_index": "_index",
            "_type": "_type",
            "_id": "_id"
        }
    ]
}

Elasticsearch批量操作

curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'

{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 


{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}

{ "title":    "My first blog post" }


{ "index":  { "_index": "website", "_type": "blog" }}

{ "title":    "My second blog post" }


{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }

{ "doc" : {"title" : "My updated blog post"} }

'

返回内容

{
   "took": 4,
   "errors": false, 
   "items": [
      {  "create": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "_version": 3,
            "status":   201
      }}
   ]
}

API可以进行多次 create、index、update或delete 请求，请求体如下

{ action: { metadata }}\n
{ request body        }\n

action 是定义对文档做什么操作,有create、index、update、delete四种（create、index都是创建，若数据存在，使用create提示失败；使用index则可以成功执行，变为更新文档）
metadata 是指定被索引、创建、更新或者删除的文档的 _index 、 _type 和 _id
request body 是由文档的 _source 本身组成，文档包含的字段和值；删除操作不需要 request body 行
每行一定要以换行符(\n)结尾， 包括最后一行 ；这些换行符被用作一个标记，可以有效分隔行。（在Postman请求中，不需要写 \n ，直接每行回车就好）

postman请求body示例：

API每个子请求都是独立执行

某个子请求的失败不会对其他子请求的成功与否造成影响。如果其中任何子请求失败，最顶层的 error 标志被设置为 true ，并且在相应的请求报告出错误明细

单个处理错误提示示例：

{
   "took": 3,
   "errors": true, 
   "items": [
      {  "create": {
            "_index":   "website",
            "_type":    "blog",
            "_id":      "123",
            "status":   409, 
            "error":    "DocumentAlreadyExistsException 
                        [[website][4] [blog][123]:
                        document already exists]"
      }}
   ]
}

bulk一次最大处理的数据量

bulk会把将要处理的数据载入内存中，所以数据量是有限制的，最佳的数据量不是一个确定的数值，它取决于硬件、文档大小和复杂性、索引和搜索的负载；
一般建议是1000-5000个文档，大小建议是5-15MB，默认不能超过100M，可以在es的配置文件（即$ES_HOME下的config下的elasticsearch.yml）中修改。

请求体查询（条件搜索）

基础查询

curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
    "query":{
        "bool": {
            "must":     { "match": { "field": "value" }},
            "must_not": { "match": { "field": "value" }},
            "should":   { "match": { "field": "value" }},
            "filter":   { "range": { "field" : { "gt" : num }} }
        }
    },
    "form":0,
    "size":10,
    "sort":{"field":{"order":"desc"}}
}
’

must: 文档必须完全匹配条件
should: should下面会带一个以上的条件，至少满足一个条件，这个文档就符合should
must_not: 文档必须不匹配条件
filter: 过滤条件
form / size：分页
sort：排序

返回内容

结果查看 hits->total 的值
{
    "took": 10,// 请求毫秒数
    "timed_out": false,// 是否超时
    "_shards": {// 分片信息
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 200,// 匹配到的文档总数
        "max_score": 14.509778,
        "hits": [// 查询结果，默认10条
            ······
        ]
    }
}

查询去重数量（distinct）

curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "posiName": "questionGender"
                    }
                },
                {
                    "match": {
                        "pageName": "questionDetail"
                    }
                },
                {
                    "match": {
                        "modleName": "questionAnswer"
                    }
                }
            ]
        }
    },
    "aggs": {
        "distinct": {
            "cardinality": {
                "field": "modleId"
            }
        }
    }
}
’

返回内容

结果查看 aggregations->distinct->value 的值
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 200,
        "max_score": 14.509778,
        "hits": [
            ······
        ]
    },
    "aggregations": {
        "distinct": {
            "value": 3// 去重结果
        }
    }
}

查询去重结果集

curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "posiName": "questionGender"
                    }
                },
                {
                    "match": {
                        "pageName": "questionDetail"
                    }
                },
                {
                    "match": {
                        "modleName": "questionAnswer"
                    }
                }
            ]
        }
    },
    "collapse":{
        "field":"modleId"
    }
}
’

返回内容

结果查看 hits->hits 的值
{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 200,
        "max_score": 14.509778,
        "hits": [
            {
                "_index": "mxsp_events",
                "_type": "events",
                "_id": "aPVv6mQBkQR_Xrrgricj",
                "_score": 14.509778,
                "_source": {
                    "modleId": 2,
                    "posiName": "questionGender",
                    "pageName": "questionDetail",
                    "modleName": "questionAnswer",
                    "userId": 1540563,
                    "createdAt": 1532941929
                },
                "fields": {
                    "modleId": [
                        2
                    ]
                }
            },
            {
                "_index": "mxsp_events",
                "_type": "events",
                "_id": "dgIP9GQBkQR_XrrgQF6S",
                "_score": 14.509778,
                "_source": {
                    "modleId": 1,
                    "posiName": "questionGender",
                    "pageName": "questionDetail",
                    "modleName": "questionAnswer",
                    "userId": 3,
                    "createdAt": 1533103385
                },
                "fields": {
                    "modleId": [
                        1
                    ]
                }
            },
            {
                "_index": "mxsp_events",
                "_type": "events",
                "_id": "nMyw2WQBkQR_XrrgsDQ6",
                "_score": 14.312874,
                "_source": {
                    "modleId": "0",
                    "posiName": "questionGender",
                    "pageName": "questionDetail",
                    "modleName": "questionAnswer",
                    "userId": "19",
                    "createdAt": "1529396883"
                },
                "fields": {
                    "modleId": [
                        0
                    ]
                }
            }
        ]
    }
}

查询分组数量（group by）

curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "posiName": "questionGender"
                    }
                },
                {
                    "match": {
                        "pageName": "questionDetail"
                    }
                },
                {
                    "match": {
                        "modleName": "questionAnswer"
                    }
                }
            ]
        }
    },
    "aggs": {
        "group_by": {
            "terms": {
                "field": "modleId"
            }
        }
    }
}
’

返回内容

结果查看 aggregations->group_by->buckets 的值
{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 200,
        "max_score": 14.509778,
        "hits": [
            ······
        ]
    },
    "aggregations": {
        "group_by": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [// 分组结果
                {
                    "key": 2,
                    "doc_count": 116
                },
                {
                    "key": 1,
                    "doc_count": 83
                },
                {
                    "key": 0,
                    "doc_count": 1
                }
            ]
        }
    }
}

logstash基础操作

logstash常用命令

启动命令：

bin/logstash -f logstash.conf

-f：通过这个命令可以指定Logstash的配置文件，根据配置文件配置logstash
bin/logstash -e ‘input { stdin { } } output { stdout {} }’ 或 bin/logstash -e “”

-e：后面跟着字符串，该字符串可以被当做logstash的配置（如果是""则默认使用stdin作为输入，stdout作为输出）

检查配置文件命令：

bin/logstash -f logstash.conf -t

-t：检查配置文件是否正确

logstash配置文件参数

配置文件由三部分组成，input、filter（可不添加）、output

# 日志导入
input {
}
# 日志筛选匹配处理
filter {
}
# 日志匹配输出
output {
}

input插件

input插件来源列表

https://www.elastic.co/guide/en/logstash/current/input-plugins.html

file类型

input{
    file{
        # 要导入的文件的位置，可以使用*，例如/var/log/nginx/*.log
        path=>"/var/lib/mysql/slow.log"
        # 要排除的文件（配合path => "/var/log/*"使用）
        excude=>"*.gz"
        # 从文件开始的位置开始读,end表示从结尾开始读
        start_position=>"beginning"
    }
}

redis类型

input{
    redis{
        # redis地址
        host=>"127.0.0.1"
        # redis端口号
        port=>6379
        # 使用redis的数据库，默认为0号
        db=>0
        # redis的密码，默认不使用
        password=>"XXX"
        # 连接超时的时间
        timeout=>5
        
        # 操作类型，必填项（list、channel和pattern_channel三种；list是BLPOP，channel是SUBSCRIBE，pattern_channel是PSUBSCRIBE）
        data_type=>"list"
        # 监听的键值，必填项
        key=>"logstash-test-list"
        
        # EVAL命令返回的事件数目，表示一次请求返回N条日志信息
        batch_count=>1
        # 启用线程数量
        threads=>1
    }
}

filter插件

filter插件列表

https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

grok插件

正则匹配内容

# SYNTAX代表匹配值的类型，如NUMBER、WORD；SEMANTIC表示存储该值的一个变量名称
基础语法：%{SYNTAX:SEMANTIC}
# field_name表示存储该值的一个变量名称；后面跟上正则表达式；如：(?<queue_id>[0-9A-F]{10,11})
自定义语法：(?<field_name>the pattern here)


# 示例
filter {
    grok {
        match => {
            "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
        }
    }
}
# 输入
55.3.244.1 GET /index.html 15824 0.043
# 匹配结果
{
    "@version" => "1",
    "method" => "GET",
    "message" => "58.23.56.101 GET /index.html 15824 0.043",
    "duration" => "0.043",
    "request" => "/index.html",
    "client" => "58.23.56.101",
    "bytes" => "15824",
    "host" => "linchendeMac-mini.local",
    "@timestamp" => 2019-03-06T06:24:21.333Z
}

dissect插件

基于分隔符原理解析数据，相比于 grok 速度更快、消耗更小的CPU资源；dissect插件有一定局限性：主要适用于每行格式相似且分隔符明确简单的场景；dissect语法比较简单，有一系列字段(field)和分隔符(delimiter)组成

基础语法：%{}字段名称；%{}之间是分隔符


# 示例
input{
   stdin{}
}
filter{
    dissect {
        mapping => { "message" => "%{ip} [%{time} %{+time}] %{method} %{request} %{bytes} %{duration}" }
    }
}
output{
    stdout{}
}
# 输入
55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043
# 匹配结果
{
    "bytes" => "15824",
    "time" => "07/Sep/2017:17:24:53 +0800",
    "duration" => "0.043",
    "@timestamp" => 2019-03-06T09:15:28.822Z,
    "ip" => "55.3.244.1",
    "message" => "55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043",
    "@version" => "1",
    "host" => "linchendeMac-mini.local",
    "method" => "GET",
    "request" => "/index.html"
}

date插件

date插件会将 @timestamp 字段的值保存为指定字段对应的时间值，不使用则为当前时间

# 示例
filter {
    grok {
        match => {
            "message" => "%{IP:client} \[%{HTTPDATE:time}\] %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
        }
    }
    date{
        match=>["time","dd/MMM/yyyy:HH:mm:ss Z"]
    }
}
# 输入
55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043
# 匹配结果
{
    "bytes" => "15824",
    "time" => "07/Sep/2017:17:24:53 +0800",
    "client" => "55.3.244.1",
    "request" => "/index.html",
    "@version" => "1",
    "duration" => "0.043",
    "method" => "GET",
    "host" => "linchendeMac-mini.local",
    "message" => "55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043",
    "@timestamp" => 2017-09-07T09:24:53.000Z
}

geoip插件

根据ip地址提供对应的地域信息，比如经纬度、城市名等，方便进行地理数据分析

# 示例
filter {
    grok {
        match => {
            "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
        }
    }
    geoip {
        # IP地址数据库文件的路径
        database => "/usr/local/Cellar/logstash-6.6.0/config/GeoLite2-City.mmdb"
        # 含有ip地址的字段名称
        source => "client"
        # 指定需要的字段
        # fields => ["country_name",  "region_name", "city_name"]
    }
}
# 输入
55.3.244.1 GET /index.html 15824 0.043
# 匹配结果
{
    "method" => "GET",
    "bytes" => "15824",
    "request" => "/index.html",
    "duration" => "0.043",
    "geoip" => {
        "continent_code" => "AS",
        "location" => {
            "lat" => 24.4798,
            "lon" => 118.0819
        },
        "region_name" => "Fujian",
        "ip" => "58.23.56.101",
        "city_name" => "Xiamen",
        "latitude" => 24.4798,
        "country_code3" => "CN",
        "longitude" => 118.0819,
        "region_code" => "FJ",
        "timezone" => "Asia/Shanghai",
        "country_name" => "China",
        "country_code2" => "CN"
    },
    "host" => "linchendeMac-mini.local",
    "@timestamp" => 2019-03-06T06:13:00.118Z,
    "message" => "58.23.56.101 GET /index.html 15824 0.043",
    "@version" => "1",
    "client" => "58.23.56.101"
}

应用场景示例

解析Nginx日志

filter {
    grok {
        match => { "message" => ["(?<RemoteIP>(\d*.\d*.\d*.\d*)) - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
    }
}

output模块

filter插件列表

https://www.elastic.co/guide/en/logstash/current/output-plugins.html

elasticsearch类型

output{
    elasticsearch{
        # elasticsearch地址:端口
        hosts=>["127.0.0.1:9200"]
        # 导出到index的名称，可以使用时间变量
        index=>"logstash-slow-%{+YYYY.MM.dd}"
        # 导出到type的名称，默认为doc
        document_type=>"log"
        # elasticsearch账号密码，无安全认证不需要这两个参数
        user=>"admin"
        password=>"xxxxxx"
        
        # 模板文件路径
        template=>"/opt/logstash-conf/es-template.json"
        # 模板命名
        template_name=>"logstash"
        # 自动管理模板功能（true：默认模板；false：自定义模板）
        template_overwrite=>false
    }
}

redis类型

output{
    redis{
        # redis的地址和端口，会覆盖全局端口
        host=>["127.0.0.1:6379"]
        # 全局端口，默认6379，如果host已指定，本条失效
        port=>6379
        # 使用redis的数据库，默认为0号
        db=>0
        # redis的密码，默认不使用
        password=>xxx
        # 操作类型（list和channel两种；list是RPUSH，channel是PUBLISH）
        data_type=>list
        # key的名称
        key=>xxx
        # 失败重连的间隔，默认为1s
        reconnect_interval=>1
        # 连接超时的时间
        timeout=>5
        
        # 批量处理（仅用于data_type=list的模式）
        # 是否批量处理（默认false：1条rpush命令只存储1条数据；true：批量处理，1条rpush会发送batch_events条数据或发送batch_timeout秒（取决于哪一个先到达））
        batch=>true
        # 批量处理时一次rpush的最大数据量
        batch_events=>50
        # 批量处理时一次rpush最多消耗多少时间
        batch_timeout=>5
        
        # 拥塞保护（仅用于data_type=list的模式，redis防止内存溢出） 
        # 每多长时间（单位为秒，0为每次都检查）进行一次拥塞检查
        congestion_interval=>1
        # list中最多可以存在多少个item数据（默认为0，表示禁用拥塞检测；达到congestion_threshold的数量会阻塞直到有其他消费者消费list中的数据）
        congestion_threshold=>0
    }
}

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
elasticsearch + logstash + kibana 基础操作

安装ELKelasticsearch下载地址：https://www.elastic.co/downloads/elasticsearchlogstash下载地址：https://www.elastic.co/downloads/logstashkibana下载地址：https://www.elastic.co/downloads/kibana安装参考（推荐官网下载压缩包再解压，brew安...
复制链接

扫一扫