目标
学习ELK,最终目标是:
- 实现应用日志在线查看
- 监控系统运行状况
- 通知异常情况(发出警告)
ELK Stack 包含
E: ElasticSearch 一个开源分布式Restful搜素和分析引擎,能够存储,查找和分析大规模数据,一版用于有复杂查询的场景中
L: logstash
K: Kibana
现在解压elasticsearch之后,启动,通过http://localhost:9200可以访问的到,但是http://ip:9200访问不到,解决办法如下:
修改elasticsearch-x.x.x\config\elasticsearch.yml 文件
vi conf/elasticsearch.yml
# 修改network.host: 0.0.0.0
ElasticSearch
基本概念
- Near Realtime (NRT) 实时查询,也就是说新增的需要进行分析检索的文档一般在一秒钟后可供查询
- Cluster 集群,多个Node的集合
- Node 集群中单个Server ,用于存储数据,参与集群索引
- Index 索引是指多个拥有相似特征文档的集合
- Type 在elasticSearch 6.00 之后废弃,运行在一个Index中存在多个不同Type的文档,之后Index 中将只允许一个类型
- Document 被Index信息的最基本单位,信息用JSON表达,在一个Index中,可以存储任意多的document,注意,虽然document只能存在于一个Index,单是也必须声明一个type
- Shards & Replicas 考虑到一个Index的数据过大,一个Node的硬件要求不足以支撑其存储或运算
- Shards : ElasticSearch 可以将Index的数据切分为多个小块,称之为shards,创建index时候可以定义shards的数量,每个shard 都是集群Cluster中一个Node上包含完整功能的Index,
- Shards 建立的两个主要原因:
水平扩容
提高并行计算能力和吞吐
- Replicas : 也就是replica shards ,一个推荐且有效的机制防止Node/shard 掉线或故障,这个机制就是 ElasticSearch 对shards 制作一个或多个副本。Index创建时可以指定主shard和副本shard的数量,副本shard的数量可以在任意时候修改,主shard的数量一旦创建就不能修改
- Replicas 建立的主要原因
保证高可用性(注意副本不要放在同一个Node上)
在多个node上针对同一个Index的副本执行任务,赋予并行计算的能力,提高数据吞吐
安装ElasticSearch
- JDK最低版本java 8 ,推荐使用Oracle JDK version 1.8.0_131
#Linux 安装
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.7.1.tar.gz
tar -xvf elasticsearch-6.7.1.tar.gz
cd elasticsearch-6.7.1/bin
./elasticsearch #启动只有一个节点
#MacOS 安装
brew install elasticsearch
# window msi 安装方式会以UI方式配置集群和Node名称,Linux 可以通过命令行指定
./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name
- Window下也可以通过zip包下载方式安装,6.7版本已经内置xpack
探索ElasticSearch Cluster
目的
- 检查集群,节点,索引状态,健康状态和原数据
- 管理集群,节点,索引状态,健康状态和原数据
- 执行CURD
- 执行高级操作,分页、排序、过滤、聚合、scripting等
# GREEN ,YELLOW ,RED
GET /_cat/health?v
# 查看节点
GET /_cat/nodes?v
单个node下index 状态会是Yellow ,因为创建index时默认会有一个副本, 当没有两个node时,副本无法创建!
Index操作
<HTTP Verb> /<Index>/<Type>/<ID>
Batch 操作
POST /customer/_doc/_bulk?pretty
{“index”:{"_id":“1”}}
{“name”: “John Doe” }
{“index”:{"_id":“2”}}
{“name”: “Jane Doe” }
加载数据
#sample
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"
查询数据Getting Started->Exploring Your Data Executing Searches
GET /_cat/health?v
GET /_cat/nodes?v
GET /_cat/indices?v
#sample 1
GET /bank/_search?q=*&sort=account_number:asc&pretty
#sample 2
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
#只查关注的字段
GET /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"],
"from":0 #查找位置 ,默认为0
"size":1 #查找数量,默认为10
}
#查找匹配的字段值
GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}
#This example returns all accounts containing the term "mill" or "lane" in the address:
GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
#This example is a variant of match (match_phrase) that returns all accounts containing the phrase "mill lane" in the address:
GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
#This example composes two match queries and returns all accounts containing "mill" and "lane" in the address:
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
#this example composes two match queries and returns all accounts containing "mill" or "lane" in the address:
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
#This example composes two match queries and returns all accounts that contain neither "mill" nor "lane" in the address:
GET /bank/_search
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
#This example returns all accounts of anybody who is 40 years old but doesn’t live in ID(aho):
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
#This example uses a bool query to return all accounts with balances between 20000 and 30000, inclusive. In other words, we want to find accounts with a balance that is greater than or equal to 20000 and less than or equal to 30000.
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
#DisjunctionMaxQuery
GET /_search
{
"query": {
"dis_max" : {
"tie_breaker" : 0.7, #?
"boost" : 1.2, #?
"queries" : [
{
"term" : { "age" : 34 }
},
{
"term" : { "age" : 35 }
}
]
}
}
}
Query DSL 参考
Bool Query 参考
Aggregation聚合参考
DisjunctionMaxQuery 多结果集连接查询
#TODO
Function Score Query
Boosting Query
…
结果字段分析
As for the response, we see the following parts:
took – time in milliseconds for Elasticsearch to execute the search
timed_out – tells us if the search timed out or not
_shards – tells us how many shards were searched, as well as a count of the successful/failed searched shards
hits – search results
hits.total – total number of documents matching our search criteria
hits.hits – actual array of search results (defaults to first 10 documents)
hits.sort - sort key for results (missing if sorting by score)
hits._score and max_score - ignore these fields for now