如果上天没有给你想要的,不是你值得拥有更好的,而是你不配。
准备工作
上次说了怎么安装Elasticsearch,这次我们就讲讲怎么用它。也没有什么其他的准备工作就是以下这些
Elasticsearch启动成功
kibana启动成功
然后我们打开kibana,再打开Dev Tools界面,这样既能看着官方文档,还能在这里直接练习查询语句。
索引和文档
首先,我们需要搞明白一些概念。索引和文档。索引就相当于我们的某个数据库,文档就相当于我们数据库中某一条记录。就先这么理解就行。
接下来我们尝试着建立一个索引,叫xiumu_user
,直接在kibana中执行这个命令:
PUT xiumu_user
ES就会返回这样的一个信息:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "xiumu_user"
}
我们的索引就建立成功了,然后我们用get请求就可以看到这个索引的情况:
GET xiumu_user
{
"xiumu_user" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index" : {
"creation_date" : "1604752426171",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "wtR8dESSSaGygWfRItccAw",
"version" : {
"created" : "7090299"
},
"provided_name" : "xiumu_user"
}
}
}
}
索引简单来说就是这么一回事,那么我们都知道,数据库里的某一条记录都有一个唯一标识,也就是主键,那我们ES里也有一个文档ID作为某个文档的唯一标识。所以我们插入一条数据需要这样写:
PUT xiumu_user/_doc/1
{
"id": 1,
"username": "xiumu",
"age": 22,
"gender": "男",
"description": "一个学习Java的菜鸟"
}
这个请求的意思就是说在xiumu_user这个索引中插入一个文档
,这个文档的类型是_doc
,文档的ID是1,当然ID是这个_doc
后面紧挨着的1,而不是后面大括号里的id。大括号里的数据就是一个文档,全都是JSON格式的数据。
补充一个知识点,xiumu_user/_doc/1
,这三个部分是索引/类型/ID,但是类型要被弃用了好像,所以我们就默认所有的类型都是_doc
,只关心索引和ID就行了。ES会返回这样的数据:
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
接下来我们就多放进去一点数据,为我们的花式查询做一些准备。批量插入
是这样操作的:
PUT xiumu_user/_bulk
{"index":{"_index":"xiumu_user","_id":2}}
{"id": 2,"username": "亚索","age": 200,"gender": "男","description":"风一样的男人"}
{"index":{"_index":"xiumu_user","_id":3}}
{"id": 3,"username": "伊泽瑞尔","age": 100,"gender": "男","description": "有位移的ADC"}
{"index":{"_index":"xiumu_user","_id":4}}
{"id": 5,"username": "寒冰射手","age": 20,"gender": "女","description": "大招会拐弯"}
{"index":{"_index":"xiumu_user","_id":5}}
{"id": 6,"username": "九尾妖狐","age": 18,"gender": "女","description": "我们心有灵犀"}
{"index":{"_index":"xiumu_user","_id":6}}
{"id": 6,"username": "赵信","age": 28,"gender": "男","description": "一点寒芒先到,随后枪出如龙"}
注意一点就是json数据要在一行,不要换行。一行操作命令,一行数据,一行操作命令,一行数据这样的方式。插入成功就会返回这样的json数据:took
这个字段表示执行的时间。
{
"took" : 25,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "2",
"_version" : 3,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 12,
"_primary_term" : 6,
"status" : 201
}
},
{
"index" : {
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "3",
"_version" : 3,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 13,
"_primary_term" : 6,
"status" : 201
}
},
......
由于返回的全文有点长,我就用省略号表示吧。
开始查询
&emasp;&emasp;如何将添加的数据查询出来呢?这个操作非常简单:
GET xiumu_user/_search
{
"query": {
"match_all": {}
}
}
看这单词的意思就是查询全部,返回值如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"username" : "xiumu",
"age" : 22,
"gender" : "男",
"description" : "一个学习Java的菜鸟"
}
},
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 2,
"username" : "亚索",
"age" : 200,
"gender" : "男",
"description" : "风一样的男人"
}
},
......
我还是用省略号省略一下返回的内容。
一、根据某个字段查询文档
这个查询方式呢也有很多种,我们一一来试验,例如我们需要根据性别查询,也就是根据gender这个字段
(1)match查询
GET xiumu_user/_search
{
"query": {
"match": {
"gender": "女"
}
}
}
(2)query_string查询
GET xiumu_user/_search
{
"query": {
"query_string": {
"default_field": "gender",
"query": "女"
}
}
}
(3)term查询
GET xiumu_user/_search
{
"query": {
"term": {
"gender": {
"value": "女"
}
}
}
}
理论上我们会查询出来九尾妖狐和寒冰射手,事实上也正是如此,这三种查询方式都会返回这样的结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.1631508,
"hits" : [
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.1631508,
"_source" : {
"id" : 5,
"username" : "寒冰射手",
"age" : 20,
"gender" : "女",
"description" : "大招会拐弯"
}
},
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.1631508,
"_source" : {
"id" : 6,
"username" : "九尾妖狐",
"age" : 18,
"gender" : "女",
"description" : "我们心有灵犀"
}
}
]
}
}
二、模糊查询
Elasticsearch能够将中文分词,然后匹配你想要搜索的关键字,接下来我们就试一试模糊搜索,比如我们根据description这个字段来进行模糊搜索。
(1)match查询
GET xiumu_user/_search
{
"query": {
"match": {
"description": "有位移"
}
}
}
(1)query_string查询
GET xiumu_user/_search
{
"query": {
"query_string": {
"default_field": "description",
"query": "有位移"
}
}
}
我们搜索这个关键字 “ 有位移 ” 这三个字,理论上来说应该是把伊泽瑞尔搜索出来。事实上它也确实出来了,但是顺便也把九尾妖狐也搜索出来了。
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 5.242115,
"hits" : [
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "3",
"_score" : 5.242115,
"_source" : {
"id" : 3,
"username" : "伊泽瑞尔",
"age" : 100,
"gender" : "男",
"description" : "有位移的ADC"
}
},
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.2199264,
"_source" : {
"id" : 6,
"username" : "九尾妖狐",
"age" : 18,
"gender" : "女",
"description" : "我们心有灵犀"
}
}
]
}
}
那么这里就涉及了一个小知识,Elasticsearch的分词与匹配度:它会把文档进行分词,然后我们搜索的关键字与文档相匹配,匹配到的字越多说明文档的相关度越高
,这在Elasticsearch中叫做倒排索引(当然倒排索引没有我说的这么草率,大家可以去详细的了解)。我们看到伊泽瑞尔三个字都匹配到了,所以他的相关度比较高,“_score”这个字段有5点多这么高,但是九尾妖狐只能匹配到“有”这一个字,所以它的“_score”只有1点多。
3)term查询
这个查询结果就不太一样了,需要单独的说明一下,
GET xiumu_user/_search
{
"query": {
"term": {
"description": {
"value": "有位移"
}
}
}
}
返回是这样的结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
竟然什么也没查到,这是不是很神奇,根据我的理解,这其实是涉及了上面我们所说的倒排索引,我们来看看“有位移的男人”这6个字被分词之后会分成什么?
GET _analyze
{
"text": ["有位移的男人"]
}
返回结果如下:
{
"tokens" : [
{
"token" : "有",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "位",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "移",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "的",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "男",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
},
{
"token" : "人",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<IDEOGRAPHIC>",
"position" : 5
}
]
}
我们看到它被分成了一个个的汉字。match和query_string查询肯定也是将关键字分成了这样的一个个的汉字与文档进行匹配。但是term就不是这样了,term查询又叫精准查询,但是也不是说字段的值要完全精准,而是关键字要精准,它不会对关键字分词,而是将关键字作为一个词,只要索引中存在这个词就算,但是我们搜索的是“有位移”,倒排索引中只有单个的字,并没有三个字的。所以它才会找不到,假如我们用term只搜索一个字,那就能搜索出来。
三、多字段查询
有时候我们会有这样的需求,比如我们想要搜索某些关键字,希望“username”这个字段里有关键字要搜索出来,“description”这个字段里有这个关键字也要搜索出来,一个关键字要匹配多个字段。
(1)multi_match查询
GET xiumu_user/_search
{
"query": {
"multi_match": {
"query": "伊泽 风 寒芒",
"fields": ["username","description"]
}
}
}
我们看这个查询,理论上会把username和description这两字段中包含关键字的文档查询出来。
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 2.955389,
"hits" : [
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.955389,
"_source" : {
"id" : 3,
"username" : "伊泽瑞尔",
"age" : 100,
"gender" : "男",
"description" : "有位移的ADC"
}
},
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "6",
"_score" : 2.1834397,
"_source" : {
"id" : 6,
"username" : "赵信",
"age" : 28,
"gender" : "男",
"description" : "一点寒芒先到,随后枪出如龙"
}
},
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.4900502,
"_source" : {
"id" : 2,
"username" : "亚索",
"age" : 200,
"gender" : "男",
"description" : "风一样的男人"
}
},
{
"_index" : "xiumu_user",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.4776945,
"_source" : {
"id" : 5,
"username" : "寒冰射手",
"age" : 20,
"gender" : "女",
"description" : "大招会拐弯"
}
}
]
}
}
我们确实得到了想要的结果。
其他的查询
Elasticsearch有很多查询方式。比如Bool查询,must,must_not,should这些一看就大概明白什么意思。练一练就知道了。
GET xiumu_user/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"username": "寒冰"
}
}
]
}
}
}
GET xiumu_user/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"username": "寒冰"
}
}
]
}
}
}
GET xiumu_user/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"username": "寒冰"
}
},
{
"query_string": {
"default_field": "description",
"query": "寒芒"
}
}
]
}
}
}
当然假如Elasticsearch中有很多索引想一起查询怎么办?那就多写几个索引,并用逗号分开就可以了,多索引查询:
GET xiumu_user,xiumu_user0,xiumu_user1/_search
{
"query": {
"match_all": {}
}
}
还有高亮查询,我们用百度搜索的时候会发现,查询结果有关键词的都会变成红色:
GET xiumu_user/_search
{
"query": {
"multi_match": {
"query": "伊泽 风 寒芒",
"fields": ["username","description"]
}
},
"highlight": {
"pre_tags": "<span>",
"post_tags": "</span>",
"fields": {
"username": {},
"description": {}
}
}
}
最后
我说的这些都是皮毛,关于Elasticsearch还有很多知识和概念,我们还需要看看官方文档多多学习。下一篇我们讲讲怎么用springboot对Elasticsearch进行CRUD。