ELK基础入门篇

说明

两年前(2017)整理ES5.6的笔记。

一.ElasticSearch

1、配置mapping(建索引)

字符默认为标准分词standard


PUT /library
{
  "settings": {
    "index": {
      "number_of_shards": 3,//分片数 
      "number_of_replicas": 1, //副本
      "max_result_window": "1000000", //fromsize最大值
      "analysis": {
        "analyzer": {
          "my_anaylzer2": {
            "type": "custom", 
            "tokenizer": "standard", //标准分词
            "filter": [
              "lowercase", //不区分大小写
              "word_delimiter"//分隔符过滤
            ], 
            "char_filter": [
              "html_strip"//html标签过滤
            ]
          }, 
          "my_anaylzer": {            
			"tokenizer": "my_ngram", //一个字符一个字符切割
            "char_filter": [
              "html_strip"
            ], 
            "filter": [
              "lowercase"
            ]
          }
        }, 
        "tokenizer": {
          "my_ngram": {
            "token_chars": [
              "letter", 
              "digit", 
              "punctuation"
            ], 
            "min_gram": "1", 
            "type": "nGram", 
            "max_gram": "1"
          }
        }
      }
    }
  }, 
  "mappings": {
    "project": {
      "dynamic": "false", 
      "properties": {
        "public_day": {
          "format": "yyyy.MM.dd || yyyy.MM || yyyy", 
          "type": "date"
        }, 
        "public_num": {
          "analyzer": "my_anaylzer", 
          "type": "text", 
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }, 
        "priority": {
          "analyzer": "my_anaylzer", 
          "type": "text"
        }, 
        "name": {
          "analyzer": "my_anaylzer2", 
          "type": "text"
        }, 
        "navi_id": {
          "type": "integer"
        }
      }
    }
  }}
2、查询文档
1)根据id查询文档
GET /index/type/1
2)根据id批量查询文档
GET /index/type/_mget
{
    "docs" : [
        {
            "_id" : "1"
        },
        {
            "_id" : "2"
        }
    ]
}
3)url查询
GET /index/type/_search?q=user:kimchy&sort=sort_field:desc&from=100&size=100
&_source=user&timeout=10s

q:条件; sort:排序; asc/desc:升/降; from:开始下标(0);
size:返回大小; _source:返回需要的字段; timeout:连接超时;

4)dsl查询——query

①term:精确匹配 mysql——=

{query:{"term" : { "user" : {"value" :"张三" }}}} 	/*查找叫张三的用户*/

②terms:精确匹配多个 mysql——in

{query:{"terms" : { "user" : ["张三","李四"] }}} 	/*查找叫张三、李四的用户*/

③match_phrase:短语匹配 mysql——like

{query:{"match_phrase" : { "user" : "张" }}} 	/*查找名称含有张的用户*/

④range:范围查询 (integer、long、date)

{"query": {"range": {"age": {"gte": 10,"lte": 100}}}}

⑤exists:判断这个属性是否存在,或者为””,或者为null

{"query": {"exists":{"field":"age"}}}

⑥prefix:以前缀开头

{"query": {"prefix":{"user":"张"}}} 		/*查找名称以张开头的用户*/

⑦regexp:正则匹配

{"query": {"regexp":{"user":"张.*"}}}		/*查找名称以张开头的用户*/

⑧post_filter:过滤查询 不计算分数,效率快

{"post_filter": {"term": {"user": "张三"}}}

跟filter区别是post_filter对统计结果没有影响
⑨filter:过滤查询

{"bool": {"filter": [{"term": {"user": "张三"} }]}}

⑩bool:多条件组合查询

bool-must:mysql——and
bool-should:mysql——or
bool-must_not:mysql——not

例如:查询名字为kimchy并且年龄不在10-20之间的人

{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      }
    }
  }
}
3、Scroll 游标

可以返回所有的结果

POST /twitter/tweet/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
POST  /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5NjU1QQ==" 
}

删除

DELETE /_search/scroll
{
    "scroll_id" : "DXF1ZXJ5NjU1QQ=="
}
DELETE /_search/scroll/_all   
4、删除文档

1)根据文档id删除

DELETE /index/type/1

2)根据条件删除

POST /index/type/_delete_by_query
{
  "query": { 
    "match": {
      "message": "some message"
    }
  }
}

3)根据id批量删除

POST /index/type/_bulk
{ "delete" : {"_id" : "1" } }
{ "delete" : {"_id" : "2" } }
5、修改文档

1)根据文档id修改整篇文档(有则修改,无则添加)

PUT / index/type/1
{
    "counter" : 1,
    "tags" : ["red"]
}

2)根据文档id修改部分字段

POST /index/type/1/_update
{
    "doc" : {
        "name" : "new_name"
    }
}

3)根据条件修改文档

POST /index/type/_update_by_query
{
  "script": {
    "source": "ctx._source.likes++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user": "kimchy"
    }
  }
}

4)根据id批量修改

POST /index/type/_bulk
{ "update" : {"_id" : "1"} }
{ "doc" : {"field2" : "value2"} }
{ "update" : {"_id" : "2"} }
{ "doc" : {"field3" : "value3"} }
6、bulk批量操作(增加、删除、修改)
POST _bulk
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
7、aggregation 统计

1)统计台站的数量

"aggs" : {
        "site_id" : {
            "terms" : { 
			"field" : "site_id.keyword",
			"size": 10,
			"order": {
		    "_term": "asc"
			}
		}
    }
 }

2)统计每天的采集数量

"aggs": {
    "dataTime": {
      "date_histogram": {
        "field": "dataTime",
        "interval": "day",(year/hour/week/month/minute/10m...)
 		"format": "yyyy-MM-dd", 
        "min_doc_count": 1
        "order": {
          "_count": "asc"
        }
      }
    }
  }

3)统计每个台站下每天PM10的值大于等于500的数量

"aggs": {
    "site_id": {
      "terms": {
        "field": "site_id.keyword",
        "size": 1000,
        "min_doc_count": 1
      },
     "aggs": {
       "dataTime": {
         "date_histogram": {
           "field": "dataTime",
           "interval": "day",
           "format": "yyyy-MM-dd",
           "min_doc_count": 1
         },
         "aggs": {
            "PM10":{
             "range": {
               "field": "PM10_data.value",
               "ranges": [
                 {
                   "from": 500
                 }
               ]
             }
           }
         }
       }
     }
    }
  }  

4)统计每个台站下采集的PM25均值、PM10最大值、TSP最小值

"aggs": {
    "site_id":{
      "terms": {
        "field": "site_id.keyword",
        "min_doc_count":1,
        "size": 10
      },
      "aggs": {
          "PM25": {
            "avg": {
              "field": "PM25_datavalue"
            }
          },
          "PM10": {
            "max": {
              "field": "PM10_datavalue"
            }
          },
          "TSP": {
            "min": {
              "field": "TSP_datavalue"
            }
          }
      }
    }
  }

5)按照分类号前四位统计数量

"aggs" : {
      "classnum" : {
          "terms" : {
              "script" : {
                  "inline": "doc['classnum.keyword'].value.substring(0,4)",
                  "lang": "painless"
              }
          }
      }
    }
8、reindex(迁移数据)

从一个索引迁移数据到另一个索引,可以跨服务器迁移数据,需要在elasticsearch.yaml配置文件增加一行reindex.remote.whitelist:otherhost:9200, another:9200, 127.0.10.*:9200,

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",//其他服务器的主机
	  "username": "user",//用户名
      "password": "pass"//密码
	  "socket_timeout": "1m",//socket读取超时时间
      "connect_timeout": "10s"//连接超时时间
    },
    "index": "source",//原索引名称
	"type": "tweet",//type名称
    "size": 100,//每次批量操作的大小,默认1000
    "query": {//条件过滤(只迁移自己想要的数据)
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "dest"//目标索引名称
  },
  "script": {//修改字段名
    "source": "ctx._source.tag = ctx._source.remove(\"flag\")"
  }

}
9、?refresh(刷新)

用来增删改等操作完是否立即刷新数据
1)?refresh /?refresh=true 立即刷新,把更新的文档出现在搜索结果中
2)?refresh=waif_for 等待时间刷新(index.refresh_interval 默认1秒)
3)?refresh=false 默认,不刷新

10、Term Vectors(词向量)

返回信息和统计领域的一个特殊的文档。

第一:设置mapping
"fullname": {
   "type": "text",
    "term_vector": "with_positions_offsets_payloads",
    "analyzer" : "fulltext_analyzer"
  }
第二:查询某文档
GET /twitter/tweet/1/_termvectors
{
"term_statistics" : true,//是否返回
    "field_statistics" : true,
    "positions": true,
    "offsets": true,
    "filter" : {//过滤
      "max_num_terms" : 3,
      "min_term_freq" : 1,
      "min_doc_freq" : 1
    }
}

返回结果示例:

{
  "_index": "twitter",
  "_type": "tweet",
  "_id": "1",
  "_version": 2,
  "found": true,
  "took": 0,
  "term_vectors": {
    "fullname": {
      "field_statistics": {
        "sum_doc_freq": 5,//每个文档fullname分词后的个数(去重)之和
        "doc_count": 3,//当前索引含有fullname字段的文档个数
        "sum_ttf": 6//每个文档fullname分词后的个数(不去重)之和
      },
      "terms": {
        "doe": {
          "doc_freq": 2,//每个文档fullname分词后含有doe的个数(去重)之和
          "ttf": 2,//每个文档fullname分词后含有doe的个数(不去重)之和
          "term_freq": 1,//当前文档fullname分词后含有doe的个数(不去重)
          "tokens": [
            {
              "position": 1,
              "start_offset": 5,//偏移量(出现的开始位置)
              "end_offset": 8,
              "payload": "d29yZA=="
            }
          ],
          "score": 1.287682
        }
      }
    }
  }
}

二.Logstash

Logstash 处理流程:input–>decode–>filter–>encode–>output

1、日志文件同步到ES

Logstash日志文件同步到es:自动监听文件新增的内容追加到es里。
推荐地址:
1)里面有一些写好的正则,可以直接拿来用;
https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
2)方便调试,验证自己写的正则表达式是否正确
http://grokdebug.herokuapp.com/

文件输入流

Path:需要监听的文件路径,可以写多个;
Multiline:多行事件编码
Charset:设置字符编码
Max_bytes:设置最大字节数
Max_lines:设置最大的行数,默认500行
Pattern:设置匹配的正则表达式 必填
Pattern_dir:设置多个正则表大式
Negate :设置正向匹配还是反向匹配 默认false
What :设置未匹配的内容是向前合并还是向后合并 向前:“previous” or 向后:“next” 必填

Mutate过滤器

Remove_field:需要移除字段

es输出流

Hosts :主机
Document_type :type名称
Index :索引名称
User:用户名
Password :密码

示例-logstash产生的日志文件

路径/var/log/logstash/,日志文件内容如下所示:
[2017-11-07T08:48:28,322][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>“fb_apache”, :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2017-11-07T08:48:29,037][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://elastic:xxxxxx@192.168.xxx.xxx:9205/]}}

格式:[日期][日志等级][日志产生源]日志详细内容

[datetime][level][component]content 自定义匹配的字段名称
[TIMESTAMP_ISO8601][LOGLEVEL][DATA]GREEDYDATA 对应的正则表达式
具体正则如下,SPACE表示空格
[%{TIMESTAMP_ISO8601:datetime}%{SPACE}]%{SPACE}[%{LOGLEVEL:level}%{SPACE}]%{SPACE}[%{DATA:component}%{SPACE}] %{GREEDYDATA:content}

脚本代码:
input {	
  file {  
	path=>["/var/log/logstash/*"]
	tags=>["testlog"]
    codec => multiline {    
	pattern=>"\[%{TIMESTAMP_ISO8601:datetime}%{SPACE}\]%{SPACE}\[%{LOGLEVEL:level}%{SPACE}\]%{SPACE}\[%{DATA:component}%{SPACE}\] %{GREEDYDATA:content}" 
	negate => "true"
	what => "previous"  
    }
  }
}
filter {
  if("testlog" in [tags]){
    mutate{
     "remove_field"=>["@version","@timestamp","host","path","message"]
    }
  }
}
output{
  if("testlog" in [tags]){
     elasticsearch {
     	hosts => [ "127.0.0.1:9205" ]
     	manage_template => false
     	document_type => "project"
     	index => "testlog"
     	user => "elastic"
     	password => "changeme"
    }
  }
}
Es存储的数据结构
{
  "hits": {
    "total": 2731,
    "max_score": null,
    "hits": [
      {
        "_index": "testlog",
        "_type": "project",
        "_id": "AV-zx-3xmKYPNFP-kf3c",
        "_score": null,
        "_source": {
          "datetime": "2017-11-13T13:09:04,819",
          "component": "org.apache.kafka.clients.consumer.internals.ConsumerCoordinator",
          "level": "WARN",
          "content": "Auto offset commit failed for group logstash: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.",
          "tags": [
            "testlog"
          ]
        }
      }
    ]
  }
}


2、kafka数据同步到ES



input{
    kafka {
        bootstrap_servers => "127.0.0.1:9092"
        topics => ["airdatainfo"]
        tags=>["batchdata2"]
        enable_auto_commit => "true"
        group_id => "logstash"
        auto_offset_reset => "earliest"
        codec => "json"
   }

}
filter {
if "batchdata2" in [tags]{
  mutate {
    rename => ["[data][AQI_data][grade]","AQI_datagrade"]
    rename => ["[data][AQI_data][name]","AQI_dataname"]
    rename => ["[data][AQI_data][value]","AQI_datavalue"]
    rename => ["[dev_id]","device_id"]
    remove_field => [ "data" ]
    remove_field=>["@version"]
    remove_field=>["@timestamp"]
  }}
}
output{
	if "batchdata2" in [tags]{
 		elasticsearch {
     		hosts => [ "127.0.0.1:9205" ]
     		manage_template => false
     		document_type => "project"
     		document_id =>"%{uuid}"
     		index => "air_logstash"
     		user => "elastic"
     		password => "changeme"
    }
  }
}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值