Elasticsearch（四）elasticsearch复杂检索

最新推荐文章于 2024-08-04 14:46:53 发布

cc-lady

最新推荐文章于 2024-08-04 14:46:53 发布

阅读量1.1w

点赞数 1

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/cc907566076/article/details/78553950

版权

elasticsearch 专栏收录该内容

15 篇文章 2 订阅

订阅专栏

Query-string 搜索通过命令非常方便地进行临时性的即席搜索，但它有自身的局限性（参见轻量搜索）。Elasticsearch 提供一个丰富灵活的查询语言叫做查询表达式，它支持构建更加复杂和健壮的查询。
领域特定语言（DSL），指定了使用一个 JSON 请求。我们可以像这样重写之前的查询所有 Smith 的搜索：
GET /megacorp/employee/_search
{
“query” : {
“match” : {
“last_name” : “Smith”
}
}
}
View in Sense
返回结果与之前的查询一样，但还是可以看到有一些变化。其中之一是，不再使用 query-string 参数，而是一个请求体替代。这个请求使用 JSON 构造，并使用了一个 match 查询（属于查询类型之一，后续将会了解）。

更复杂的搜索

现在尝试下更复杂的搜索。同样搜索姓氏为 Smith 的雇员，但这次我们只需要年龄大于 30 的。查询需要稍作调整，使用过滤器 filter ，它支持高效地执行一个结构化查询。

GET /megacorp/employee/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith"  
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 }  
                }
            }
        }
    }
}

这部分与我们之前使用的 match 查询一样。
这部分是一个 range 过滤器，它能找到年龄大于 30 的文档，其中 gt 表示_大于(_great than)。
目前无需太多担心语法问题，后续会更详细地介绍。只需明确我们添加了一个过滤器用于执行一个范围查询，并复用之前的 match 查询。现在结果只返回了一个雇员，叫 Jane Smith，32 岁。

{
   ...
   "hits": {
      "total":      1,
      "max_score":  0.30685282,
      "hits": [
         {
            ...
            "_source": {
               "first_name":  "Jane",
               "last_name":   "Smith",
               "age":         32,
               "about":       "I like to collect rock albums",
               "interests": [ "music" ]
            }
         }
      ]
   }
}

bool简单介绍

首先，简单介绍下bool，它是一种复合查询方式，
（参考：https://www.elastic.co/guide/en/elasticsearch/reference/6.0/query-dsl-bool-query.html）
与匹配其他查询的布尔组合的文档相匹配的查询。bool查询映射到Lucene BooleanQuery。它是使用一个或多个布尔子句构建的，每个子句都有一个类型化的事件。发生的类型是：

发生描述
must 该条款（查询）必须出现在匹配的文件，并将有助于得分。
filter 子句（查询）必须出现在匹配的文档中。然而不像 must查询的分数将被忽略。Filter子句在过滤器上下文中执行，这意味着评分被忽略，子句被考虑用于高速缓存。

should 子句（查询）应该出现在匹配的文档中。如果 bool查询位于查询上下文中并且具有mustorfilter子句，那么bool即使没有 should查询匹配，文档也将匹配查询。在这种情况下，这些条款仅用于影响分数。如果bool查询是过滤器上下文或者两者都不存在，must或者filter至少有一个should查询必须与文档相匹配才能与bool查询匹配。这种行为可以通过设置minimum_should_match参数来显式控制。

must_not 子句（查询）不能出现在匹配的文档中。子句在过滤器上下文中执行，意味着评分被忽略，子句被考虑用于高速缓存。因为计分被忽略，0所有文件的分数被返回。

即，must：必须匹配，filter:匹配的结果过滤，should:至少有一个 must_not:不能匹配

Client程序演示bool查询

term

增加一个方法：

/*
     * 简单运用一个bool查询，查询姓Smith且年龄大于的员工
     * 查询姓Smith的员工
     * 过滤为大于30岁的
     */
    private static void findEmployeeByAgeAndName(Client client) {
        SearchRequestBuilder request = client.prepareSearch("megacorp1")
                .setTypes("employee1")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) 
                .setQuery(QueryBuilders.boolQuery().must(termQuery("last_name","Smith")).filter(rangeQuery("age").gt(30)));
//      SearchResponse response = request.get();
        printResponseHits(request.get());
    }

封装查看结果方法：

//查看结果
    private static void printResponseHits(SearchResponse response) {
        SearchHits searchHits = response.getHits();
        Iterator<SearchHit> iterator = searchHits.iterator();
        while(iterator.hasNext()) {
            SearchHit hit = iterator.next();
            String index = hit.getIndex();
            String type = hit.getType();
            String id = hit.getId();
            float score = hit.getScore();
            System.out.println("index="+index+" type="+type+" id="+id+" score="+score+" source-->"+hit.getSourceAsString());
        }
    }

Main方法中增加调用

// 5.查询姓smith的雇员，过滤过滤器查询示例 bool查询
findEmployeeByAgeAndName(client);

结果显示：
index=megacorp1 type=employee1 id=2 score=1.2809339 source–>{“first_name”:”Jane”,”last_name”:”Smith”,”age”:”32”,”about”:”I like to collect rock albums”,”interests”:[“music”]}
有兴趣的可以自己debug到request查看bool的请求

Head插件示例

这里写图片描述

match

我们可以现在用match来写下：
将刚才的例子改为如下

SearchRequestBuilder request = client.prepareSearch("megacorp1")
                .setTypes("employee1")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) 
                .setQuery(QueryBuilders.boolQuery().must(matchQuery("last_name","Smith")).filter(rangeQuery("age").gt(30)));
//      SearchResponse response = request.get();
        printResponseHits(request.get());

再次调用此方法返回结果为：
index=megacorp1 type=employee1 id=2 score=1.3862944 source–>{“first_name”:”Jane”,”last_name”:”Smith”,”age”:”32”,”about”:”I like to collect rock albums”,”interests”:[“music”]}

head插件示例

这里写图片描述
我们的结果没有区别，因为这里我们的索引不会进行分词解析。
我们去可以之前可以分词解析的索引megacorp中实验以下：

term:rock climbing

这里写图片描述

Match:rock climbing

这里写图片描述
可以看出档案的结果根据相关性评分排序。整个都匹配的在第一个，匹配其中一个的在后面。
Elasticsearch 默认按照相关性得分排序，即每个文档跟查询的匹配程度。第一个最高得分的结果很明显：John Smith 的 about 属性清楚地写着 “rock climbing” 。
但为什么 Jane Smith 也作为结果返回了呢？原因是她的 about 属性里提到了 “rock” 。因为只有 “rock” 而没有 “climbing” ，所以她的相关性得分低于 John 的。
这是一个很好的案例，阐明了 Elasticsearch 如何在全文属性上搜索并返回相关性最强的结果。Elasticsearch中的相关性概念非常重要，也是完全区别于传统关系型数据库的一个概念，数据库中的一条记录要么匹配要么不匹配。

至于之前的last_name为何查不出，还是一个疑问。
尝试将lastname增加这个没有下划线的字段，term依旧没有查出来。
尝试将内容改为Smith Smith中间空格形式也查不出来term
欢迎解惑。

短语搜索

找出一个属性中的独立单词是没有问题的，但有时候想要精确匹配一系列单词或者短语。比如，我们想执行这样一个查询，仅匹配同时包含 “rock” 和 “climbing” ，并且二者以短语 “rock climbing” 的形式紧挨着的雇员记录。
为此对 match 查询稍作调整，使用一个叫做 match_phrase 的查询：

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

毫无悬念，返回结果仅有 John Smith 的文档。

{
   ...
   "hits": {
      "total":      1,
      "max_score":  0.23013961,
      "hits": [
         {
            ...
            "_score":         0.23013961,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            }
         }
      ]
   }
}

Client程序演示

增加一个方法

/**
     * match phrase查询
     * 仅匹配同时包含 “rock” 和 “climbing” ，并且 二者以短语 “rock climbing” 的形式紧挨着的雇员记录。
     * @param client 客户端
     * @param field 字段
     * @param phrase 词语
     */
    private static void findEmployeesWithOneUniqueMatchPhrase(Client client, String field, String phrase) {
        SearchRequestBuilder request = client.prepareSearch("megacorp")
                .setTypes("employee")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) 
                .setQuery(QueryBuilders.boolQuery().must(matchPhraseQuery(field, phrase)));
        printResponseHits(request.get());
    }

Main方法中调用

// 6.match_phrase查询 仅匹配同时包含 “rock” 和 “climbing” ，并且 二者以短语 “rock climbing” 的形式紧挨着的雇员记录。
            findEmployeesWithOneUniqueMatchPhrase(client,"about","rock climbing");

我增加了一些数据：
结果显示：
index=megacorp type=employee id=5 score=0.6449836 source–>{“first_name”:”John”,”last_name”:”Smith1”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=8 score=0.6449836 source–>{“first_name”:”John”,”last_name”:”蜂蜜柚子”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=9 score=0.6449836 source–>{“first_name”:”John”,”last_name”:”蜂蜜”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=10 score=0.6449836 source–>{“first_name”:”John”,”last_name”:”Smith Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=6 score=0.6449836 source–>{“first_name”:”John”,”last_name”:”Smith 1”,”age”:26,”about”:”I love to go rock climbing”,”interests”:[“sports”,”art”]}
index=megacorp type=employee id=1 score=0.6449836 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=7 score=0.6449836 source–>{“first_name”:”John”,”last_name”:”蜂蜜柚子蜂蜜”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}

可以看出结果完全符合。