Elasticsearch文档高级操作

本文介绍了Elasticsearch的高级查询操作,包括URL和请求体条件查询、分页、排序、多条件查询、全文检索、聚合查询以及文档映射关系。详细讲解了match、match_phrase、should、filter、highlight等用法,并展示了如何创建和使用映射关系。
摘要由CSDN通过智能技术生成


ES条件查询

在之前的文档操作中,仅使用了id全查的方式,但是这样的查询在实践中往往不能够满足多样化的检索需求。根据条件检索是非常必须的。

URL传参查询

条件查询的URL地址格式为http://127.0.0.1:9200/索引/_search?q=查询条件.

例如:直接查询authorPPPsych的所有数据,具体步骤为:

  1. URL地址为http://127.0.0.1:9200/golang/_search?q=author:PPPsych.

  2. 查询的请求方式为GET

  3. 发起请求,获取响应结果:

    {
        "took": 16,
        "timed_out": false,
        "_shards": {
            "total": 1,
            "successful": 1,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": {
                "value": 3,
                "relation": "eq"
            },
            "max_score": 0.25613075,
            "hits": [
                {
                    "_index": "golang",
                    "_type": "_doc",
                    "_id": "003",
                    "_score": 0.25613075,
                    "_source": {
                        "name": "Psych",
                        "url": "https://blog.csdn.net/qq_39280718?type=blog",
                        "author": "PPPsych"
                    }
                },
                {
                    "_index": "golang",
                    "_type": "_doc",
                    "_id": "001",
                    "_score": 0.25613075,
                    "_source": {
                        "name": "ROOT",
                        "url": "https://blog.csdn.net/qq_39280718?type=blog",
                        "author": "PPPsych"
                    }
                },
                {
                    "_index": "golang",
                    "_type": "_doc",
                    "_id": "002",
                    "_score": 0.25613075,
                    "_source": {
                        "name": "root",
                        "url": "https://blog.csdn.net/qq_39280718?type=blog",
                        "author": "PPPsych"
                    }
                }
            ]
        }
    }
    

请求体传参查询

上面的例子可以看出,在url地址中发送请求条件格式过于复杂,而且还存在在url地址中传入中文等字符,这样的操作很容易出现乱码等问题导致查询出错。所以我们可以考虑使用请求体传参的方式来规避这些问题,ES的条件查询也是支持这样的操作的。
URL地址就比较固定了,URL地址为:http://127.0.0.1:9200/golang/_search
请求的body结构如下:
在这里插入图片描述

查询条件的最外层为query, 匹配条件的字段为matchmatch下是一个对象,可以天上响应的字段内容信息。

发起请求,获取响应结果:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.25613075,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.25613075,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": 0.25613075,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

和普通的条件相同,结果相同。

ES分页查询

分页的方式其实就是在条件查询的内容体中加入分页数据。
在body体中添加两个关键字段:

  1. from:起始的下标值,从0开始。
  2. size: 一页的数量

例如,查询authorPPPsych的所有数据,其中请求体body为:

{
    "query":{
        "match": {
            "author":"PPPsych"
        }
    },
    "from":1,
    "size":1
}

响应结果为:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.25613075,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

因为使用的是size =1,所以只返回匹配的一条数据。其实页的计算就应该是:(页码-1)*单页数量

在此基础上我们可以修改返回的内容

比如我们期望查询部分字段,而并非所有字段的时候,可以通过_source字段进行控制。

比如仅查询authorPPPsych的所有数据中的name字段:

请求体body为:

{
    "query":{
        "match": {
            "author":"PPPsych"
        }
    },
    "from":1,
    "size":1,
    "_source":"name"
}

运行结果:

在这里插入图片描述

这样就可以得到我们想要查询的字段

ES查询排序

根据文档的部分字段信息排序。在查询条件中加入字段:sort

例如通过id排序,请求体为:

{
    "query":{
        "match_all": {
        }
    },
    "sort":{
        "_id":{
            "order":"asc"
        }
    }
}

运行结果:

{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": null,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                },
                "sort": [
                    "001"
                ]
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": null,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                },
                "sort": [
                    "002"
                ]
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": null,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                },
                "sort": [
                    "003"
                ]
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": null,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                },
                "sort": [
                    "005"
                ]
            }
        ]
    }
}

其中排序字段下可以指定排序的类型,递增、递减。

  1. asc: 递增排序
  2. desc:递减排序

ES多条件查询

同时满足must

查询的body体的查询不能在使用match关键字了,而需要使用bool然后要多个条件同时成立,接下来要填入must

例如请求体body为:

{
    "query":{
        "bool": {
            "must":[
                {
                    "match":{
                        "author":"PPPsych"
                    }
                }
            ]
        }
    }
}

以上看起来就相当于是单条件要一样的效果。然后我们再加一个条件。如下:

{
    "query":{
        "bool": {
            "must":[
                {
                    "match":{
                        "author":"PPPsych"
                    }
                },
                {
                    "match":{
                        "name":"Psych"
                    }
                }
            ]
        }
    }
}

运行结果:

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.9492779,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.9492779,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

返回结果就同时满足了 "author":"PPPsych""name":"Psych"

部分满足should

部分满足类似于sql语句中的or.

比如我们要查authorPPPsych钟离的.

请求体body为:

{
    "query":{
        "bool": {
            "should":[
                {
                    "match":{
                        "author":"PPPsych"
                    }
                },
                {
                    "match":{
                        "author":"钟离"
                    }
                }
            ]
        }
    }
}

运行结果:

{
    "took": 192,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 2.3842063,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 2.3842063,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.25613075,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": 0.25613075,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

范围条件filter

还可以为查询条件添加filter来控制查询的范围

例如在一个商品索引中查询品类为床上用品和3C数码的商品,并且期望查询的价格是500以上的产品,则此时的body请求体为:

{
    "query":{
        "bool": {
            "should":[
                {
                    "match":{
                        "category":"床上用品"
                    }
                },
                {
                    "match":{
                        "category":"3C数码"
                    }
                }
                    
            ],
            "filter":{
                "range":{
                    "price":{
                        "gt":500
                    }
                }
            }
        }
        
    }
}

ES全文检索

全文检索match

在之前我们查询过authorPPPsych钟离的字段,那当我们查询字段查author=PPPsych钟离呢,是否还能查询出结果?我们知道

如果是传统的关系型数据库应该是无法查询到结果的。

请求体body为:

{
    "query":{
        "match": {
            "author":"PPPsych钟离"
        }
    }
}

运行结果:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 2.3842063,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 2.3842063,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "003",
                "_score": 0.25613075,
                "_source": {
                    "name": "Psych",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "001",
                "_score": 0.25613075,
                "_source": {
                    "name": "ROOT",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            },
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "002",
                "_score": 0.25613075,
                "_source": {
                    "name": "root",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "PPPsych"
                }
            }
        ]
    }
}

从上面可以看出,查询出了4条结果,其中包含了authorPPPsych钟离两大类的数据。 从现象来看,其实我们的查询条件也被做了全文检索的分词处理,执行查询的操作上把分词后的结果再到倒排索引中去做一次匹配检索得出最后的匹配结果,这就是全文检索的效果了。

完全匹配match_phrase

如果我们不期望使用全文检索,而是直接完全匹配我们要查询的字段的话,可以使用完全匹配。 也就是不能使用match字段,而改用match_phrase。那把检索的内容体改为:

{
    "query":{
        "match_phrase": {
            "author":"PPPsych钟离"
        }
    }
}

运行结果:

{
    "took": 75,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

这个时候ES中没有authorPPPsych钟离的文档了,所以我们检索不到数据。

那如果我们把author改为呢?是否可以像关系型数据库那样做一个like模糊查询?

{
    "query":{
        "match_phrase": {
            "author":"钟"
        }
    }
}

运行结果:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.1921031,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 1.1921031,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                }
            }
        ]
    }
}

从执行结果上看已经查询出了相应结果。那说明这里的完全匹配只是不对我们的查询条件做分词处理。在后端查询的时候其实和关系型数据库的like '%钟%'的效果是一致的。

高亮查询highlight

如果我们查询的结果期望像百度谷歌那样,对匹配的关键字进行高亮显示的话,可以使用查询条件中的higlight条件。

请求体body

{
    "query":{
        "match_phrase": {
            "author":"钟"
        }
    },
    "highlight": {
        "fields": {
            "author": {
                
            }
        }
    }
}

运行结果:

{
    "took": 27,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.1921031,
        "hits": [
            {
                "_index": "golang",
                "_type": "_doc",
                "_id": "005",
                "_score": 1.1921031,
                "_source": {
                    "name": "Morax",
                    "url": "https://blog.csdn.net/qq_39280718?type=blog",
                    "author": "钟离"
                },
                "highlight": {
                    "author": [
                        "<em>钟</em>离"
                    ]
                }
            }
        ]
    }
}

从上可以看出,查询的响应内容上多出了一个highlight字段。并且在该字段中将匹配的关键词加入了em标签。

ES聚合查询

对查询的结果进行统计,分组等操作的时候就需要用的聚合操作, 聚合操作需要用到聚合操作对应的参数。参数字段名叫:aggs

分组统计terms

现有一个索引shopping,文档内容为:

{
    "took": 654,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "title": "sk2爽肤水",
                    "category": "护肤品",
                    "image": "https://www.tb.com",
                    "price": 899.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "title": "眼霜",
                    "category": "化妆品",
                    "image": "https://www.tb.com",
                    "price": 249.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "title": "拍立得",
                    "category": "数码产品",
                    "image": "https://www.tb.com",
                    "price": 299.0
                }
            }
        ]
    }
}

此时我们对价格进行分组统计,则请求体body为:

{
    "aggs": { // 聚合操作
        "category_group":{ // 名称,随意取名
            "terms":{
                "field":"price"
            }
        }
    }
}

进行get请求运行结果:

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "title": "sk2爽肤水",
                    "category": "护肤品",
                    "image": "https://www.tb.com",
                    "price": 899.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "title": "眼霜",
                    "category": "化妆品",
                    "image": "https://www.tb.com",
                    "price": 249.0
                }
            },
            {
                "_index": "shopping",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "title": "拍立得",
                    "category": "数码产品",
                    "image": "https://www.tb.com",
                    "price": 299.0
                }
            }
        ]
    },
    "aggregations": {
        "category_group": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 249.0,
                    "doc_count": 1
                },
                {
                    "key": 299.0,
                    "doc_count": 1
                },
                {
                    "key": 899.0,
                    "doc_count": 1
                }
            ]
        }
    }
}

则会生成针对价格进行的分组信息。

从返回结果中还包含了元素的数据信息。如果要取消原数据的获取,那再添加一个size参数即可。

在这里插入图片描述

运行结果:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "category_group": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 249.0,
                    "doc_count": 1
                },
                {
                    "key": 299.0,
                    "doc_count": 1
                },
                {
                    "key": 899.0,
                    "doc_count": 1
                }
            ]
        }
    }
}

取平均值avg

获取价格的平均值,请求体body为:

在这里插入图片描述

运行结果为:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "category_group": {
            "value": 482.3333333333333
        }
    }
}

ES文档映射关系

mapping:映射关系。

类比关系型数据库,我们在插入数据之前我们需要首先去创建表结构, 而我们以上对文档的操作却一路没有进行结构的创建,其实在ES中确实可以不创建类似于表结构的东西,但是他也是可以创建表结构的。
ES中这个表结构叫着映射。它主要的作用就是用于定义字段是否被分词被检索

准备测试数据

首先创建一个新的索引student.

在这里插入图片描述

创建映射关系

创建新索引之后我们再新索引上建立映射关系。

建立映射关系同样要使用PUT请求,请求的URL地址:http://127.0.0.1:9200/student/_mapping

在这里插入图片描述

插入测试数据

插入三条数据:

数据一:

{
    "name": "可莉",
    "sex": "女",
    "tel": "14100000000"
}

数据二:

{
    "name": "魈",
    "sex": "男",
    "tel": "16100000000"
}

数据三:

{
    "name": "一斗",
    "sex": "男",
    "tel": "19100000000"
}

查询一下已有数据:

{
    "took": 554,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "name": "可莉",
                    "sex": "女",
                    "tel": "14100000000"
                }
            },
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "name": "魈",
                    "sex": "男",
                    "tel": "16100000000"
                }
            },
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "name": "一斗",
                    "sex": "男",
                    "tel": "19100000000"
                }
            }
        ]
    }
}

从结果上看已经将测试数据插入成功了。

测试

根据name查询

在这里插入图片描述

根据查询的返回结果可以看出name字段支持全量查询,即验证了text类型是支持全量查询的。

根据keyword类型的sex进行检索

首先将sex查询的内容设置为

{
    "query":{
        "match":{
            "sex":"男"
        }
    }
}

查询结果:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.4700036,
        "hits": [
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.4700036,
                "_source": {
                    "name": "魈",
                    "sex": "男",
                    "tel": "16100000000"
                }
            },
            {
                "_index": "student",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.4700036,
                "_source": {
                    "name": "一斗",
                    "sex": "男",
                    "tel": "19100000000"
                }
            }
        ]
    }
}

从结果上可以看出, 该字段精准的匹配了为的性别,而没匹配上另外数据。也就是说,keyword类型的字段是不会进行分词存储的。

indextrue的字段是能被检索的

其实,当index设置为了false的字段就不能被检索,这里就不再测试。

总结

  1. text类型

    • 会进行分词,分词后建立索引。
    • 支持模糊查询,支持精准查询。
    • 不支持聚合查询。
  2. keyword类型

    • 不分词,直接建立索引。
    • 支持模糊查询, 支持准确查询。
    • 支持聚合查询。
  3. index

    • 控制是否可以被用于检索
    • false, 不能被用于检索
    • true, 可以被用于检索
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值