安装ELK
elasticsearch下载地址:https://www.elastic.co/downloads/elasticsearch
logstash下载地址:https://www.elastic.co/downloads/logstash
kibana下载地址:https://www.elastic.co/downloads/kibana
安装参考(推荐官网下载压缩包再解压,brew安装会缺少x-pack插件):https://www.cnblogs.com/liuxiaoming123/p/8081883.html
elasticsearch基础操作
API
Java API(官方API):https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html
maven引用
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>6.6.0</version>
</dependency>
JEST(Java HTTP Rest client):https://github.com/searchbox-io/Jest
maven引用
<dependency>
<groupId>io.searchbox</groupId>
<artifactId>jest</artifactId>
<version>5.3.3</version>
</dependency>
PHP API(官方API):https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/index.html
composer引用
{
"require": {
"elasticsearch/elasticsearch": "~6.0"
}
}
名词定义
索引(index)
索引是很多文档的集合,这些文档都具备一些相似的特征。例如,你可以分别创建客户,产品目录和其他数据的索引。索引是通过名字(必须是小写的)区分的,当执行索引,搜索,更新和删除操作时,这个名字将会被用来引用实际的索引。
类型(Type)
你可以在索引中定义一个或多个类型。类型是索引的一个逻辑分类或划分,它的概念完全取决于你自己如何去理解。通常,类型是为一些具备公共字段的文档定义的。例如,假想你在运行一个博客平台,并且把其全部的数据都存储索引中。你可以在索引中定义一个用于保存用户数据的类型,另一个用于保存博客数据的类型,还有一个用于保存评论数据的类型。
文档(document)
文档是可以被索引的基本单位。例如,你可以用一个文档保存单个客户的数据,另一个文档保存单个产品的数据,还有一个文档保存单个订单的数据。文档使用一种互联网广泛存在的数据交换格式保存-JSON。
基本用法
索引
新增索引
- 创建空索引
curl -PUT 'localhost:9200/_index'
- 设置索引mapping
curl -POST 'localhost:9200/_index/_type?pretty'
{
"_type": {
"properties": {
"field1": {
"type": "text"
},
"field2": {
"type": "text"
},
"field3": {
"type": "text"
},
"field4": {
"type": "long"
}
}
}
}
删除索引
- 删除单个
curl -DELETE 'localhost:9200/_index'
- 删除多个
curl -DELETE 'localhost:9200/_index1,_index2' 或 curl -DELETE 'localhost:9200/_index*'
文档
新增文档
curl -POST 'localhost:9200/_index/_type{/_id}'
{
"field1": "XXXXXXXX",
"field2": "XXXXXXXX",
"field3": "XXXXXXXX",
"field4": "1529396883"
}
修改文档
curl -PUT 'localhost:9200/_index/_type/_id'
{
"field1": "XXXXXXXX",
"field2": "XXXXXXXX",
"field3": "XXXXXXXX",
"field4": "1529396883"
}
删除文档
curl -DELETE 'localhost:9200/_index/_type/_id'
查询单条文档
- 查询文档
curl -XGET 'localhost:9200/_index/_type/_id'
返回内容
{
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_version" : 1,
"found" : true,
"_source" : {
"field1": "XXXXXXXX",
"field2": "XXXXXXXX",
"field3": "XXXXXXXX",
"field4": "1529396883"
}
}
- 只需要_source字段中特定的字段,请求
curl -XGET 'localhost:9200/_index/_type/_id?_source=field1,field2'
返回内容
{
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 1,
"found" : true,
"_source" : {
"field1": "My first blog entry" ,
"field2": "Just trying this out..."
}
}
- 只需要返回_source字段,不需要任何元数据,请求
curl -XGET 'localhost:9200/_index/_type/_id/_source'
返回内容
{
"field1": "XXXXXXXX",
"field2": "XXXXXXXX",
"field3": "XXXXXXXX",
"field4": "1529396883"
}
查询多条文档
curl -XPOST 'localhost:9200/_mget'
{
"docs": [
{
"_index": "_index",
"_type": "_type",
"_id": "_id"
},
{
"_index": "_index",
"_type": "_type",
"_id": "_id"
}
]
}
Elasticsearch批量操作
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "My first blog post" }
{ "index": { "_index": "website", "_type": "blog" }}
{ "title": "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }
'
返回内容
{
"took": 4,
"errors": false,
"items": [
{ "create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 3,
"status": 201
}}
]
}
API可以进行多次 create、index、update或delete 请求,请求体如下
{ action: { metadata }}\n
{ request body }\n
- action 是定义对文档做什么操作,有create、index、update、delete四种(create、index都是创建,若数据存在,使用create提示失败;使用index则可以成功执行,变为更新文档)
- metadata 是指定被索引、创建、更新或者删除的文档的 _index 、 _type 和 _id
- request body 是由文档的 _source 本身组成,文档包含的字段和值;删除操作不需要 request body 行
- 每行一定要以换行符(\n)结尾, 包括最后一行 ;这些换行符被用作一个标记,可以有效分隔行。(在Postman请求中,不需要写 \n ,直接每行回车就好)
postman请求body示例:
API每个子请求都是独立执行
某个子请求的失败不会对其他子请求的成功与否造成影响。 如果其中任何子请求失败,最顶层的 error 标志被设置为 true ,并且在相应的请求报告出错误明细
单个处理错误提示示例:
{
"took": 3,
"errors": true,
"items": [
{ "create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"status": 409,
"error": "DocumentAlreadyExistsException
[[website][4] [blog][123]:
document already exists]"
}}
]
}
bulk一次最大处理的数据量
bulk会把将要处理的数据载入内存中,所以数据量是有限制的,最佳的数据量不是一个确定的数值,它取决于硬件、文档大小和复杂性、索引和搜索的负载;
一般建议是1000-5000个文档,大小建议是5-15MB,默认不能超过100M,可以在es的配置文件(即$ES_HOME下的config下的elasticsearch.yml)中修改。
请求体查询(条件搜索)
基础查询
curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
"query":{
"bool": {
"must": { "match": { "field": "value" }},
"must_not": { "match": { "field": "value" }},
"should": { "match": { "field": "value" }},
"filter": { "range": { "field" : { "gt" : num }} }
}
},
"form":0,
"size":10,
"sort":{"field":{"order":"desc"}}
}
’
- must: 文档必须完全匹配条件
- should: should下面会带一个以上的条件,至少满足一个条件,这个文档就符合should
- must_not: 文档必须不匹配条件
- filter: 过滤条件
- form / size:分页
- sort:排序
返回内容
结果查看 hits->total 的值
{
"took": 10,// 请求毫秒数
"timed_out": false,// 是否超时
"_shards": {// 分片信息
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 200,// 匹配到的文档总数
"max_score": 14.509778,
"hits": [// 查询结果,默认10条
······
]
}
}
查询去重数量(distinct)
curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"match": {
"posiName": "questionGender"
}
},
{
"match": {
"pageName": "questionDetail"
}
},
{
"match": {
"modleName": "questionAnswer"
}
}
]
}
},
"aggs": {
"distinct": {
"cardinality": {
"field": "modleId"
}
}
}
}
’
返回内容
结果查看 aggregations->distinct->value 的值
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 200,
"max_score": 14.509778,
"hits": [
······
]
},
"aggregations": {
"distinct": {
"value": 3// 去重结果
}
}
}
查询去重结果集
curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"match": {
"posiName": "questionGender"
}
},
{
"match": {
"pageName": "questionDetail"
}
},
{
"match": {
"modleName": "questionAnswer"
}
}
]
}
},
"collapse":{
"field":"modleId"
}
}
’
返回内容
结果查看 hits->hits 的值
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 200,
"max_score": 14.509778,
"hits": [
{
"_index": "mxsp_events",
"_type": "events",
"_id": "aPVv6mQBkQR_Xrrgricj",
"_score": 14.509778,
"_source": {
"modleId": 2,
"posiName": "questionGender",
"pageName": "questionDetail",
"modleName": "questionAnswer",
"userId": 1540563,
"createdAt": 1532941929
},
"fields": {
"modleId": [
2
]
}
},
{
"_index": "mxsp_events",
"_type": "events",
"_id": "dgIP9GQBkQR_XrrgQF6S",
"_score": 14.509778,
"_source": {
"modleId": 1,
"posiName": "questionGender",
"pageName": "questionDetail",
"modleName": "questionAnswer",
"userId": 3,
"createdAt": 1533103385
},
"fields": {
"modleId": [
1
]
}
},
{
"_index": "mxsp_events",
"_type": "events",
"_id": "nMyw2WQBkQR_XrrgsDQ6",
"_score": 14.312874,
"_source": {
"modleId": "0",
"posiName": "questionGender",
"pageName": "questionDetail",
"modleName": "questionAnswer",
"userId": "19",
"createdAt": "1529396883"
},
"fields": {
"modleId": [
0
]
}
}
]
}
}
查询分组数量(group by)
curl -XGET 'localhost:9200/_index/_type/_search' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"match": {
"posiName": "questionGender"
}
},
{
"match": {
"pageName": "questionDetail"
}
},
{
"match": {
"modleName": "questionAnswer"
}
}
]
}
},
"aggs": {
"group_by": {
"terms": {
"field": "modleId"
}
}
}
}
’
返回内容
结果查看 aggregations->group_by->buckets 的值
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 200,
"max_score": 14.509778,
"hits": [
······
]
},
"aggregations": {
"group_by": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [// 分组结果
{
"key": 2,
"doc_count": 116
},
{
"key": 1,
"doc_count": 83
},
{
"key": 0,
"doc_count": 1
}
]
}
}
}
logstash基础操作
logstash常用命令
启动命令:
-
bin/logstash -f logstash.conf
-f:通过这个命令可以指定Logstash的配置文件,根据配置文件配置logstash
-
bin/logstash -e ‘input { stdin { } } output { stdout {} }’ 或 bin/logstash -e “”
-e:后面跟着字符串,该字符串可以被当做logstash的配置(如果是""则默认使用stdin作为输入,stdout作为输出)
检查配置文件命令:
-
bin/logstash -f logstash.conf -t
-t:检查配置文件是否正确
logstash配置文件参数
配置文件由三部分组成,input、filter(可不添加)、output
# 日志导入
input {
}
# 日志筛选匹配处理
filter {
}
# 日志匹配输出
output {
}
input插件
input插件来源列表
https://www.elastic.co/guide/en/logstash/current/input-plugins.html
- file类型
input{
file{
# 要导入的文件的位置,可以使用*,例如/var/log/nginx/*.log
path=>"/var/lib/mysql/slow.log"
# 要排除的文件(配合path => "/var/log/*"使用)
excude=>"*.gz"
# 从文件开始的位置开始读,end表示从结尾开始读
start_position=>"beginning"
}
}
- redis类型
input{
redis{
# redis地址
host=>"127.0.0.1"
# redis端口号
port=>6379
# 使用redis的数据库,默认为0号
db=>0
# redis的密码,默认不使用
password=>"XXX"
# 连接超时的时间
timeout=>5
# 操作类型,必填项(list、channel和pattern_channel三种;list是BLPOP,channel是SUBSCRIBE,pattern_channel是PSUBSCRIBE)
data_type=>"list"
# 监听的键值,必填项
key=>"logstash-test-list"
# EVAL命令返回的事件数目,表示一次请求返回N条日志信息
batch_count=>1
# 启用线程数量
threads=>1
}
}
filter插件
filter插件列表
https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
- grok插件
正则匹配内容
# SYNTAX代表匹配值的类型,如NUMBER、WORD;SEMANTIC表示存储该值的一个变量名称
基础语法:%{SYNTAX:SEMANTIC}
# field_name表示存储该值的一个变量名称;后面跟上正则表达式;如:(?<queue_id>[0-9A-F]{10,11})
自定义语法:(?<field_name>the pattern here)
# 示例
filter {
grok {
match => {
"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
}
}
}
# 输入
55.3.244.1 GET /index.html 15824 0.043
# 匹配结果
{
"@version" => "1",
"method" => "GET",
"message" => "58.23.56.101 GET /index.html 15824 0.043",
"duration" => "0.043",
"request" => "/index.html",
"client" => "58.23.56.101",
"bytes" => "15824",
"host" => "linchendeMac-mini.local",
"@timestamp" => 2019-03-06T06:24:21.333Z
}
- dissect插件
基于分隔符原理解析数据,相比于 grok 速度更快、消耗更小的CPU资源;dissect插件有一定局限性:主要适用于每行格式相似且分隔符明确简单的场景;dissect语法比较简单,有一系列字段(field)和分隔符(delimiter)组成
基础语法:%{}字段名称;%{}之间是分隔符
# 示例
input{
stdin{}
}
filter{
dissect {
mapping => { "message" => "%{ip} [%{time} %{+time}] %{method} %{request} %{bytes} %{duration}" }
}
}
output{
stdout{}
}
# 输入
55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043
# 匹配结果
{
"bytes" => "15824",
"time" => "07/Sep/2017:17:24:53 +0800",
"duration" => "0.043",
"@timestamp" => 2019-03-06T09:15:28.822Z,
"ip" => "55.3.244.1",
"message" => "55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043",
"@version" => "1",
"host" => "linchendeMac-mini.local",
"method" => "GET",
"request" => "/index.html"
}
- date插件
date插件会将 @timestamp 字段的值保存为指定字段对应的时间值,不使用则为当前时间
# 示例
filter {
grok {
match => {
"message" => "%{IP:client} \[%{HTTPDATE:time}\] %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
}
}
date{
match=>["time","dd/MMM/yyyy:HH:mm:ss Z"]
}
}
# 输入
55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043
# 匹配结果
{
"bytes" => "15824",
"time" => "07/Sep/2017:17:24:53 +0800",
"client" => "55.3.244.1",
"request" => "/index.html",
"@version" => "1",
"duration" => "0.043",
"method" => "GET",
"host" => "linchendeMac-mini.local",
"message" => "55.3.244.1 [07/Sep/2017:17:24:53 +0800] GET /index.html 15824 0.043",
"@timestamp" => 2017-09-07T09:24:53.000Z
}
- geoip插件
根据ip地址提供对应的地域信息,比如经纬度、城市名等,方便进行地理数据分析
# 示例
filter {
grok {
match => {
"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
}
}
geoip {
# IP地址数据库文件的路径
database => "/usr/local/Cellar/logstash-6.6.0/config/GeoLite2-City.mmdb"
# 含有ip地址的字段名称
source => "client"
# 指定需要的字段
# fields => ["country_name", "region_name", "city_name"]
}
}
# 输入
55.3.244.1 GET /index.html 15824 0.043
# 匹配结果
{
"method" => "GET",
"bytes" => "15824",
"request" => "/index.html",
"duration" => "0.043",
"geoip" => {
"continent_code" => "AS",
"location" => {
"lat" => 24.4798,
"lon" => 118.0819
},
"region_name" => "Fujian",
"ip" => "58.23.56.101",
"city_name" => "Xiamen",
"latitude" => 24.4798,
"country_code3" => "CN",
"longitude" => 118.0819,
"region_code" => "FJ",
"timezone" => "Asia/Shanghai",
"country_name" => "China",
"country_code2" => "CN"
},
"host" => "linchendeMac-mini.local",
"@timestamp" => 2019-03-06T06:13:00.118Z,
"message" => "58.23.56.101 GET /index.html 15824 0.043",
"@version" => "1",
"client" => "58.23.56.101"
}
应用场景示例
- 解析Nginx日志
filter {
grok {
match => { "message" => ["(?<RemoteIP>(\d*.\d*.\d*.\d*)) - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
}
}
output模块
filter插件列表
https://www.elastic.co/guide/en/logstash/current/output-plugins.html
- elasticsearch类型
output{
elasticsearch{
# elasticsearch地址:端口
hosts=>["127.0.0.1:9200"]
# 导出到index的名称,可以使用时间变量
index=>"logstash-slow-%{+YYYY.MM.dd}"
# 导出到type的名称,默认为doc
document_type=>"log"
# elasticsearch账号密码,无安全认证不需要这两个参数
user=>"admin"
password=>"xxxxxx"
# 模板文件路径
template=>"/opt/logstash-conf/es-template.json"
# 模板命名
template_name=>"logstash"
# 自动管理模板功能(true:默认模板;false:自定义模板)
template_overwrite=>false
}
}
- redis类型
output{
redis{
# redis的地址和端口,会覆盖全局端口
host=>["127.0.0.1:6379"]
# 全局端口,默认6379,如果host已指定,本条失效
port=>6379
# 使用redis的数据库,默认为0号
db=>0
# redis的密码,默认不使用
password=>xxx
# 操作类型(list和channel两种;list是RPUSH,channel是PUBLISH)
data_type=>list
# key的名称
key=>xxx
# 失败重连的间隔,默认为1s
reconnect_interval=>1
# 连接超时的时间
timeout=>5
# 批量处理(仅用于data_type=list的模式)
# 是否批量处理(默认false:1条rpush命令只存储1条数据;true:批量处理,1条rpush会发送batch_events条数据或发送batch_timeout秒(取决于哪一个先到达))
batch=>true
# 批量处理时一次rpush的最大数据量
batch_events=>50
# 批量处理时一次rpush最多消耗多少时间
batch_timeout=>5
# 拥塞保护(仅用于data_type=list的模式,redis防止内存溢出)
# 每多长时间(单位为秒,0为每次都检查)进行一次拥塞检查
congestion_interval=>1
# list中最多可以存在多少个item数据(默认为0,表示禁用拥塞检测;达到congestion_threshold的数量会阻塞直到有其他消费者消费list中的数据)
congestion_threshold=>0
}
}