前言
收集大量的日志信息之后,把这些日志存放在哪里?才能对其日志内容进行搜素呢?MySQL?
如果MySQL里存储了1000W条这样的数据,每条记录的details字段有128个字。
用户想要查询details字段包含“ajax”这个关键词的记录。
MySQL执行
select * from logtable where details like "%ajax%";
每次执行这条SQL语句,都需要逐一查询logtable中每条记录,最头痛的是找到这条记录之后,每次还要对这条记录中details字段里的文本内容进行全文扫描。
判断这个当前记录中的details字段是的内容否包含 “ajax”?有可能会查询 10000w*128次.
如果用户想搜素 “ajax”拼错了拼成了“ajxa”,这个sql无法搜素到用户想要的信息。因为不支持尝试把用户输入的错别字"ajxa"拆分开使用‘a‘,‘j‘,‘x‘,'a' 去尽可能多的匹配我想要的信息。
所以想要支持搜素details字段的Text内容的情况下,把海量的日志信息存在MySQL中是不太合理的。
Elasticsearch简介
1.倒排索引
倒排索引是一种索引数据结构:从文本数据内容中提取出不重复的单词进行分词,每1个单词对应1个ID对单词进行区分,还对应1个该单词在那些文档中出现的列表 把这些信息组建成索引。
倒排索引还记录了该单词在文档中出现位置、频率(次数/TF)用于快速定位文档和对搜素结果进行排序。
(出现在文档1,<11位置>频率1次)
(1,<11>,1),(2,<7>,1),(3,<3,9>,2)
2.全文检索
全文检索:把用户输入的关键词也进行分词,利用倒排索引,快速锁定关键词出现在那些文档。
说白了就是根据value查询key(根据文档中内容关键字,找到该该关键字所在的文档的)而非根据key查询value。
Lucene是apache软件基金会4 jakarta项目组的一个java子项目,是一个开放源代码的全文检索引擎JAR包。帮助我我们实现了以上的需求。
lucene实现倒排索引之后,那么海量的数据如何分布式存储?如何高可用?集群节点之间如何管理?这是Elasticsearch实现的功能。
常说的ELK是Elasticsearch(全文搜素)+Logstash(内容收集)+Kibana(内容展示)三大开源框架首字母大写简称。
本文主要简单的介绍Elaticsearch,Elasticsearch是一个基于Lucene的分布式、高性能、可伸缩的搜素和分析系统,它提供了RESTful web API。
Elasticsearch安装
官网下载
ES的版本和Kibana的版本必须一致,官网下载比较慢,还好有好心人。
系统配置
vm.max_map_count = 655360
vim /etc/sysctl.conf
权限
chown -R elsearch:elsearch /data/elastic-search
安全
/etc/security/limits.conf
[root@zhanggen config]# java -version
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (build 1.8.0_102-b14)
OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
[root@zhanggen config]# cat /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 4096
* hard nproc 4096
es配置文件
# ======================== Elasticsearch Configuration =========================#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration,make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /data/elastic-search/data/
#
# Path to log files:
#
path.logs: /data/elastic-search/log/
#
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: false
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host:0.0.0.0#
# Set a custom port for HTTP:
#
http.port:9200#
# For more information,consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
/elasticsearch-7.3.2/config/elasticsearch.yml
启动
[elsearch@zhanggen /]$ ./elasticsearch-7.3.2/bin/elasticsearch
访问
Elasticsearch使用
关于Elasticsearch的使用都是基于RESTful风格的API进行的。
1.查看健康状态
http://192.168.56.135:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent1590655194 08:39:54 my-application green
2.创建索引
{"acknowledged": true,"shards_acknowledged": true,"index": "web"}
3.删除索引
{"acknowledged": true}
4.插入数据
request body
{"name":"张根","age":22,"marrid":"false"}
response body
{"_index": "students","_type": "go","_id": "cuWEWnIBWnQK6MVivzvO","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 0,"_primary_term": 1}
ps:也可以使用PUT方法,但是需要传入id
reqeust body
{"name":"李渊","age":1402,"marrid":"true"}
response body
{"_index": "students","_type": "go","_id": "2","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 1,"_primary_term": 1}
5.查询all
[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": { "match_all":{}}}'
{"took" : 211,"timed_out" : false,"_shards": {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits": {"total": {"value" : 6,"relation" : "eq"},"max_score" : 1.0,"hits": [
{"_index" : "students","_type" : "go","_id" : "cuWEWnIBWnQK6MVivzvO","_score" : 1.0,"_source": {"name" : "张根","age" : 22,"marrid" : "false"}
},
{"_index" : "students","_type" : "go","_id" : "c-WPWnIBWnQK6MViOzt1","_score" : 1.0,"_source": {"name" : "张百忍","age" : 3200,"marrid" : "true"}
},
{"_index" : "students","_type" : "go","_id" : "dOWPWnIBWnQK6MVimTsg","_score" : 1.0,"_source": {"name" : "李渊","age" : 1402,"marrid" : "true"}
},
{"_index" : "students","_type" : "go","_id" : "deWQWnIBWnQK6MViazuu","_score" : 1.0,"_source": {"name" : "姜尚","age" : 5903,"marrid" : "fale"}
},
{"_index" : "students","_type" : "go","_id" : "duWSWnIBWnQK6MViXDtD","_score" : 1.0,"_source": {"name" : "孛儿只斤.铁木真","age" : 814,"marrid" : "true"}
},
{"_index" : "students","_type" : "go","_id" : "2","_score" : 1.0,"_source": {"query": {"match": {"name" : "张根"}
}
}
}
]
}
}
返回
6.分页查询 (from, size)
from 偏移,默认为0,size 返回的结果数,默认为10
[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": { "match_all": {} },"from":1,"size":2}'
返回
{"took" : 1,"timed_out" : false,"_shards": {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits": {"total": {"value" : 6,"relation" : "eq"},"max_score" : 1.0,"hits": [
{"_index" : "students","_type" : "go","_id" : "c-WPWnIBWnQK6MViOzt1","_score" : 1.0,"_source": {"name" : "张百忍","age" : 3200,"marrid" : "true"}
},
{"_index" : "students","_type" : "go","_id" : "dOWPWnIBWnQK6MVimTsg","_score" : 1.0,"_source": {"name" : "李渊","age" : 1402,"marrid" : "true"}
}
]
}
}
View Code
7.模糊查询字段中包含某些关键词
[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": {"term": {"name":"张"}}}'
返回
{"took" : 155,"timed_out" : false,"_shards": {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits": {"total": {"value" : 2,"relation" : "eq"},"max_score" : 0.8161564,"hits": [
{"_index" : "students","_type" : "go","_id" : "cuWEWnIBWnQK6MVivzvO","_score" : 0.8161564,"_source": {"name" : "张根","age" : 22,"marrid" : "false"}
},
{"_index" : "students","_type" : "go","_id" : "c-WPWnIBWnQK6MViOzt1","_score" : 0.7083998,"_source": {"name" : "张百忍","age" : 3200,"marrid" : "true"}
}
]
}
}
View Code
8.range范围查找
范围查询接收以下参数:
gte:大于等于
gt:大于
lte:小于等于
lt:小于
boost:设置查询的推动值(boost),默认为1.0
curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query":{"range":{"age":{"gt":"18"}}}}'
{"took" : 11,"timed_out" : false,"_shards": {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits": {"total": {"value" : 5,"relation" : "eq"},"max_score" : 1.0,"hits": [
{"_index" : "students","_type" : "go","_id" : "cuWEWnIBWnQK6MVivzvO","_score" : 1.0,"_source": {"name" : "张根","age" : 22,"marrid" : "false"}
},
{"_index" : "students","_type" : "go","_id" : "c-WPWnIBWnQK6MViOzt1","_score" : 1.0,"_source": {"name" : "张百忍","age" : 3200,"marrid" : "true"}
},
{"_index" : "students","_type" : "go","_id" : "dOWPWnIBWnQK6MVimTsg","_score" : 1.0,"_source": {"name" : "李渊","age" : 1402,"marrid" : "true"}
},
{"_index" : "students","_type" : "go","_id" : "deWQWnIBWnQK6MViazuu","_score" : 1.0,"_source": {"name" : "姜尚","age" : 5903,"marrid" : "fale"}
},
{"_index" : "students","_type" : "go","_id" : "duWSWnIBWnQK6MViXDtD","_score" : 1.0,"_source": {"name" : "孛儿只斤.铁木真","age" : 814,"marrid" : "true"}
}
]
}
}
返回
安装Kibana
kibana是针对elasticsearch操作及数据展示的工具,支持中文。安装时请确保kibama和Elasticsearch的版本一致。
配置文件
[root@zhanggen config]# cat ./kibana.yml|grep -Ev '^$|#'server.port:5601server.host:"0.0.0.0"elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.username:"kibana"elasticsearch.password:"xxxxxxx"i18n.locale:"zh-CN"[root@zhanggen config]#
启动
[root@zhanggen bin]# ./kibana --allow-root
kibana使用
管理-----》kibana索引模式-----》创建索引模式
ps:
Elasticsearch果然是全文检索, 666。果然对用户输入的搜素关键词,进行了分词。我输入了(访问日志为例)把包含“问”的文档也搜素出来了!
而不是仅仅搜素内容包含“访问日志为例”这个1个词的文档。
Go操作Elasticsearch
注意下载与你的ES相同版本的client,例如我们这里使用的ES是7.2.1的版本,那么我们下载的client也要与之对应为github.com/olivere/elastic/v7。
使用go.mod来管理依赖下载指定版本的第三库:
module go相关模块/elasticsearch
go 1.13
require github.com/olivere/elastic/v7 v7.0.4
代码
package main
import (
"context"
"fmt"
"github.com/olivere/elastic/v7"
)
// Elasticsearch demo
type Person struct {
Name string `json:"name"`
Age int `json:"age"`
Married bool `json:"married"`
}
func main() {
client, err := elastic.NewClient(elastic.SetURL("http://192.168.56.135:9200/"))
if err != nil {
// Handle error
panic(err)
}
fmt.Println("connect to es success")
p1 := Person{Name: "曹操", Age: 155, Married: true}
put1, err := client.Index().
Index("students").Type("go").
BodyJson(p1).
Do(context.Background())
if err != nil {
// Handle error
panic(err)
}
fmt.Printf("Indexed user %s to index %s, type %s\n", put1.Id, put1.Index, put1.Type)
}