Elasticsearch文档高级操作

最新推荐文章于 2024-08-23 16:41:49 发布

PPPsych

最新推荐文章于 2024-08-23 16:41:49 发布

阅读量454

点赞数

分类专栏： Elasticsearch 文章标签： elasticsearch 搜索引擎学习

本文链接：https://blog.csdn.net/qq_39280718/article/details/127023341

版权

Elasticsearch 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

本文介绍了Elasticsearch的高级查询操作，包括URL和请求体条件查询、分页、排序、多条件查询、全文检索、聚合查询以及文档映射关系。详细讲解了match、match_phrase、should、filter、highlight等用法，并展示了如何创建和使用映射关系。

摘要由CSDN通过智能技术生成

ES条件查询

在之前的文档操作中，仅使用了id和全查的方式，但是这样的查询在实践中往往不能够满足多样化的检索需求。根据条件检索是非常必须的。

URL传参查询

条件查询的URL地址格式为http://127.0.0.1:9200/索引/_search?q=查询条件.

例如：直接查询author为PPPsych的所有数据，具体步骤为：

URL地址为http://127.0.0.1:9200/golang/_search?q=author:PPPsych.
查询的请求方式为GET

发起请求，获取响应结果：

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.25613075,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.25613075,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": 0.25613075,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

请求体传参查询

上面的例子可以看出，在url地址中发送请求条件格式过于复杂，而且还存在在url地址中传入中文等字符，这样的操作很容易出现乱码等问题导致查询出错。所以我们可以考虑使用请求体传参的方式来规避这些问题，ES的条件查询也是支持这样的操作的。
URL地址就比较固定了，URL地址为:http://127.0.0.1:9200/golang/_search
请求的body结构如下：
在这里插入图片描述

查询条件的最外层为query, 匹配条件的字段为match， match下是一个对象，可以天上响应的字段内容信息。

发起请求，获取响应结果：

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.25613075,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.25613075,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": 0.25613075,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

和普通的条件相同，结果相同。

ES分页查询

分页的方式其实就是在条件查询的内容体中加入分页数据。
在body体中添加两个关键字段：

from:起始的下标值，从0开始。
size: 一页的数量

例如，查询author为PPPsych的所有数据，其中请求体body为：

{
    "query":{
        "match": {
            "author":"PPPsych"
        }
    },
    "from":1,
    "size":1
}

响应结果为：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.25613075,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

因为使用的是size =1,所以只返回匹配的一条数据。其实页的计算就应该是：（页码-1）*单页数量。

在此基础上我们可以修改返回的内容

比如我们期望查询部分字段，而并非所有字段的时候，可以通过_source字段进行控制。

比如仅查询author为PPPsych的所有数据中的name字段：

请求体body为：

{
    "query":{
        "match": {
            "author":"PPPsych"
        }
    },
    "from":1,
    "size":1,
    "_source":"name"
}

运行结果：

在这里插入图片描述

这样就可以得到我们想要查询的字段

ES查询排序

根据文档的部分字段信息排序。在查询条件中加入字段:sort

例如通过id排序，请求体为：

{
    "query":{
        "match_all": {
        }
    },
    "sort":{
        "_id":{
            "order":"asc"
        }
    }
}

运行结果：

{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": null,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                },
                "sort": [
                    "001"
                ]
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": null,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                },
                "sort": [
                    "002"
                ]
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": null,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                },
                "sort": [
                    "003"
                ]
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": null,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                },
                "sort": [
                    "005"
                ]
            }
        ]
    }
}

其中排序字段下可以指定排序的类型，递增、递减。

asc: 递增排序
desc:递减排序

ES多条件查询

同时满足must

查询的body体的查询不能在使用match关键字了，而需要使用bool然后要多个条件同时成立，接下来要填入must。

例如请求体body为：

{
    "query":{
        "bool": {
            "must":[
                {
                    "match":{
                        "author":"PPPsych"
                    }
                }
            ]
        }
    }
}

以上看起来就相当于是单条件要一样的效果。然后我们再加一个条件。如下：

{
    "query":{
        "bool": {
            "must":[
                {
                    "match":{
                        "author":"PPPsych"
                    }
                },
                {
                    "match":{
                        "name":"Psych"
                    }
                }
            ]
        }
    }
}

运行结果：

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.9492779,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.9492779,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

返回结果就同时满足了 "author":"PPPsych"和"name":"Psych"。

部分满足should

部分满足类似于sql语句中的or.

比如我们要查author为PPPsych和钟离的.

请求体body为：

{
    "query":{
        "bool": {
            "should":[
                {
                    "match":{
                        "author":"PPPsych"
                    }
                },
                {
                    "match":{
                        "author":"钟离"
                    }
                }
            ]
        }
    }
}

运行结果：

{
    "took": 192,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 2.3842063,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 2.3842063,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.25613075,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": 0.25613075,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

范围条件filter

还可以为查询条件添加filter来控制查询的范围

例如在一个商品索引中查询品类为床上用品和3C数码的商品，并且期望查询的价格是500以上的产品，则此时的body请求体为：

{
    "query":{
        "bool": {
            "should":[
                {
                    "match":{
                        "category":"床上用品"
                    }
                },
                {
                    "match":{
                        "category":"3C数码"
                    }
                }
                    
            ],
            "filter":{
                "range":{
                    "price":{
                        "gt":500
                    }
                }
            }
        }
        
    }
}

ES全文检索

全文检索match

在之前我们查询过author为PPPsych和钟离的字段，那当我们查询字段查author=PPPsych钟离呢，是否还能查询出结果？我们知道

如果是传统的关系型数据库应该是无法查询到结果的。

请求体body为：

{
    "query":{
        "match": {
            "author":"PPPsych钟离"
        }
    }
}

运行结果：

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 2.3842063,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 2.3842063,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.25613075,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": 0.25613075,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

从上面可以看出，查询出了4条结果，其中包含了author为PPPsych和钟离两大类的数据。从现象来看，其实我们的查询条件也被做了全文检索的分词处理，执行查询的操作上把分词后的结果再到倒排索引中去做一次匹配检索得出最后的匹配结果，这就是全文检索的效果了。

完全匹配match_phrase

如果我们不期望使用全文检索，而是直接完全匹配我们要查询的字段的话，可以使用完全匹配。也就是不能使用match字段，而改用match_phrase。那把检索的内容体改为：

{
    "query":{
        "match_phrase": {
            "author":"PPPsych钟离"
        }
    }
}

运行结果：

{
    "took": 75,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

这个时候ES中没有author为PPPsych钟离的文档了，所以我们检索不到数据。

那如果我们把author改为钟呢？是否可以像关系型数据库那样做一个like模糊查询？

{
    "query":{
        "match_phrase": {
            "author":"钟"
        }
    }
}

运行结果：

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.1921031,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 1.1921031,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                }
            }
        ]
    }
}

从执行结果上看已经查询出了相应结果。那说明这里的完全匹配只是不对我们的查询条件做分词处理。在后端查询的时候其实和关系型数据库的like '%钟%'的效果是一致的。

高亮查询highlight

如果我们查询的结果期望像百度谷歌那样，对匹配的关键字进行高亮显示的话，可以使用查询条件中的higlight条件。

请求体body：

{
    "query":{
        "match_phrase": {
            "author":"钟"
        }
    },
    "highlight": {
        "fields": {
            "author": {
                
            }
        }
    }
}

运行结果：

{
    "took": 27,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.1921031,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 1.1921031,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                },
                "highlight": {
                    "author": [
                        "<em>钟</em>离"
                    ]
                }
            }
        ]
    }
}

从上可以看出，查询的响应内容上多出了一个highlight字段。并且在该字段中将匹配的关键词加入了em标签。

ES聚合查询

对查询的结果进行统计，分组等操作的时候就需要用的聚合操作，聚合操作需要用到聚合操作对应的参数。参数字段名叫：aggs。

分组统计terms

现有一个索引shopping，文档内容为：

{
    "took": 654,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "title": "sk2爽肤水",
                    "category": "护肤品",
                    "image": "https://www.tb.com",
                    "price": 899.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "title": "眼霜",
                    "category": "化妆品",
                    "image": "https://www.tb.com",
                    "price": 249.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "title": "拍立得",
                    "category": "数码产品",
                    "image": "https://www.tb.com",
                    "price": 299.0
                }
            }
        ]
    }
}

此时我们对价格进行分组统计，则请求体body为：

{
    "aggs": { // 聚合操作
        "category_group":{ // 名称，随意取名
            "terms":{
                "field":"price"
            }
        }
    }
}

进行get请求运行结果：

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "title": "sk2爽肤水",
                    "category": "护肤品",
                    "image": "https://www.tb.com",
                    "price": 899.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "title": "眼霜",
                    "category": "化妆品",
                    "image": "https://www.tb.com",
                    "price": 249.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "title": "拍立得",
                    "category": "数码产品",
                    "image": "https://www.tb.com",
                    "price": 299.0
                }
            }
        ]
    },
    "aggregations": {
        "category_group": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 249.0,
                    "doc_count": 1
                },
                {
                    "key": 299.0,
                    "doc_count": 1
                },
                {
                    "key": 899.0,
                    "doc_count": 1
                }
            ]
        }
    }
}

则会生成针对价格进行的分组信息。

从返回结果中还包含了元素的数据信息。如果要取消原数据的获取，那再添加一个size参数即可。

在这里插入图片描述

运行结果：

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "category_group": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 249.0,
                    "doc_count": 1
                },
                {
                    "key": 299.0,
                    "doc_count": 1
                },
                {
                    "key": 899.0,
                    "doc_count": 1
                }
            ]
        }
    }
}

取平均值avg

获取价格的平均值，请求体body为：

在这里插入图片描述

运行结果为：

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "category_group": {
            "value": 482.3333333333333
        }
    }
}

ES文档映射关系

mapping：映射关系。

类比关系型数据库，我们在插入数据之前我们需要首先去创建表结构，而我们以上对文档的操作却一路没有进行结构的创建，其实在ES中确实可以不创建类似于表结构的东西，但是他也是可以创建表结构的。
在ES中这个表结构叫着映射。它主要的作用就是用于定义字段是否被分词和被检索。

准备测试数据

首先创建一个新的索引student.

在这里插入图片描述

创建映射关系

创建新索引之后我们再新索引上建立映射关系。

建立映射关系同样要使用PUT请求，请求的URL地址:http://127.0.0.1:9200/student/_mapping

在这里插入图片描述

插入测试数据

插入三条数据：

数据一：

{
    "name": "可莉",
    "sex": "女",
    "tel": "14100000000"
}

数据二：

{
    "name": "魈",
    "sex": "男",
    "tel": "16100000000"
}

数据三：

{
    "name": "一斗",
    "sex": "男",
    "tel": "19100000000"
}

查询一下已有数据：

{
    "took": 554,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "name": "可莉",
                    "sex": "女",
                    "tel": "14100000000"
                }
            },
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "name": "魈",
                    "sex": "男",
                    "tel": "16100000000"
                }
            },
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "name": "一斗",
                    "sex": "男",
                    "tel": "19100000000"
                }
            }
        ]
    }
}

从结果上看已经将测试数据插入成功了。

测试

根据name查询

在这里插入图片描述

根据查询的返回结果可以看出name字段支持全量查询，即验证了text类型是支持全量查询的。

根据`keyword`类型的`sex`进行检索

首先将sex查询的内容设置为男：

{
    "query":{
        "match":{
            "sex":"男"
        }
    }
}

查询结果：

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.4700036,
        "hits": [
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.4700036,
                "_source": {
                    "name": "魈",
                    "sex": "男",
                    "tel": "16100000000"
                }
            },
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.4700036,
                "_source": {
                    "name": "一斗",
                    "sex": "男",
                    "tel": "19100000000"
                }
            }
        ]
    }
}