Elasticsearch（三）elasticsearch轻量检索

最新推荐文章于 2023-03-17 13:27:30 发布

cc-lady

最新推荐文章于 2023-03-17 13:27:30 发布

阅读量854

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/cc907566076/article/details/78539530

版权

elasticsearch 专栏收录该内容

15 篇文章 2 订阅

订阅专栏

一个 GET 是相当简单的，可以直接得到指定的文档。现在尝试点儿稍微高级的功能，比如一个简单的搜索！

搜索所有雇员

第一个尝试的几乎是最简单的搜索了。我们使用下列请求来搜索所有雇员：

GET /megacorp/employee/_search

可以看到，我们仍然使用索引库 megacorp 以及类型 employee，但与指定一个文档 ID 不同，这次使用_search 。返回结果包括了所有三个文档，放在数组 hits 中。一个搜索默认返回十条结果。

{
   "took":      6,
   "timed_out": false,
   "_shards": { ... },
   "hits": {
      "total":      3,
      "max_score":  1,
      "hits": [
         {
            "_index":         "megacorp",
            "_type":          "employee",
            "_id":            "3",
            "_score":         1,
            "_source": {
               "first_name":  "Douglas",
               "last_name":   "Fir",
               "age":         35,
               "about":       "I like to build cabinets",
               "interests": [ "forestry" ]
            }
         },
         {
            "_index":         "megacorp",
        ...

注意：返回结果不仅告知匹配了哪些文档，还包含了整个文档本身：显示搜索结果给最终用户所需的全部信息。

Client程序演示

增加一个方法：

/*
     * GET /megacorp/employee/_search
     * 返回的文档放在hit[]中
     * SearchResponse response5 = client.prepareSearch(index1, index2)
                .setTypes(type1, type2)
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) // 就写这个就好了，虽然Java API定义了额外的搜索类型QUERY_AND_FETCH和DFS_QUERY_AND_FETCH，但这些模式是内部优化，不应该由API的用户明确指定。
                .setQuery(QueryBuilders.termQuery("brandNameNew", 2))                 // Query 
                .setPostFilter(QueryBuilders.rangeQuery("useYears").from(2).to(5))     // Filter
                .setFrom(0).setSize(60).setExplain(true)
                .get();
        //所有的参数都是可选的，也就是说，最简单的可以这样写,代表查询整个集群
        SearchResponse response6 = client.prepareSearch().get();
     * 此方面知识来源于 Search API 搜索API允许执行搜索查询并取回匹配查询的搜索匹配。
     * 它可以跨越一个或多个索引并跨越一个或多个类型执行。查询可以使用查询Java API提供。
     * 
took：是查询花费的时间，毫秒单位
        time_out：标识查询是否超时
        _shards：描述了查询分片的信息，查询了多少个分片、成功的分片数量、失败的分片数量等
        hits：搜索的结果，total是全部的满足的文档数目，hits是返回的实际数目（默认是10）
        _score是文档的分数信息，与排名相关度有关，参考各大搜索引擎的搜索结果，就容易理解。
     * !!!搜索请求的主体是使用SearchSourceBuilder。
     */
    private static void getEmployeesByIndexAndType(Client client,String[] indics,String[] types) {
        System.out.println("集群中查询索引为"+Arrays.deepToString(indics)+"和类型为"+Arrays.deepToString(types)+"的所有数据，开始查询...");
        //查询
        SearchResponse response = client.prepareSearch(indics)
                .setTypes(types)
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                .get();
        //分析查询结果 -- took暂不说明，它每次是变化的
        // timed_out
        boolean isTimedOut = response.isTimedOut();
        System.out.println("timed_out:"+isTimedOut);
        // _shards
        int totalShards = response.getTotalShards();
        int successfulShards = response.getSuccessfulShards();
        int failedShards = response.getFailedShards();
        System.out.println("_shards:{ total="+totalShards+" successful="+successfulShards+" failed="+failedShards+"}");
        // 文档在hit数组中，更多方法使用请看API中SearchHits
        SearchHits searchHits = response.getHits();
        Iterator<SearchHit> iterator = searchHits.iterator();
        while(iterator.hasNext()) {
            SearchHit hit = iterator.next();
            String index = hit.getIndex();
            String type = hit.getType();
            String id = hit.getId();
            float score = hit.getScore();
            System.out.println("index="+index+" type="+type+" id="+id+" score="+score+" source-->"+hit.getSourceAsString());
        }
        System.out.println("查询结束...");
    }

Main中增加一个调用（main方法见之前文档，其实只要获得client连接即可）

// 3.查询所有雇员文档  _search 
getEmployeesByIndexAndType(client,new String[] {"megacorp"},new String[] {"employee"});

运行结果显示：
集群中查询索引为[megacorp]和类型为[employee]的所有数据，开始查询…
timed_out:false
_shards:{ total=5 successful=5 failed=0}
index=megacorp type=employee id=2 score=1.0 source–>{“first_name”:”Jane”,”last_name”:”Smith”,”age”:”32”,”about”:”I like to collect rock albums”,”interests”:[“music”]}
index=megacorp type=employee id=4 score=1.0 source–>{“first_name”:”Douglas1”,”last_name”:”Fir”,”age”:35,”about”:”I like to build cabinets”,”interests”:[“forestry”]}
index=megacorp type=employee id=1 score=1.0 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=3 score=1.0 source–>{“first_name”:”Douglas”,”last_name”:”Fir”,”age”:35,”about”:”I like to build cabinets”,”interests”:[“forestry”]}
查询结束…

Head插件示例

这里写图片描述

搜索姓中为smith的雇员

接下来，尝试下搜索姓氏为 Smith 的雇员。为此，我们将使用一个高亮搜索，很容易通过命令行完成。这个方法一般涉及到一个查询字符串（query-string）搜索，因为我们通过一个URL参数来传递查询信息给搜索接口：
GET /megacorp/employee/_search?q=last_name:Smith
我们仍然在请求路径中使用 _search 端点，并将查询本身赋值给参数 q= 。返回结果给出了所有的 Smith：

{
   ...
   "hits": {
      "total":      2,
      "max_score":  0.30685282,
      "hits": [
         {
            ...
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            }
         },
         {
            ...
            "_source": {
               "first_name":  "Jane",
               "last_name":   "Smith",
               "age":         32,
               "about":       "I like to collect rock albums",
               "interests": [ "music" ]
            }
         }
      ]
   }
}

Client程序演示

我们引入
import static org.elasticsearch.index.query.QueryBuilders.*;
类似第一个例子使用即可。
增加一个方法：

/*
     * 根据一个字段的值查询  
     * GET /megacorp/employee/_search?q=last_name:Smith
     * 
     * QueryBuilders的term查询 ，表全部匹配，不进行分词解析
     */
    private static void getEmployeesByFieldEqual(Client client, String field, String text) {
        SearchResponse response = client.prepareSearch("megacorp")
                .setTypes("employee")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                .setQuery(QueryBuilders.termQuery(field, text))
                .get();

        //查看结果
        SearchHits searchHits = response.getHits();
        Iterator<SearchHit> iterator = searchHits.iterator();
        while(iterator.hasNext()) {
            SearchHit hit = iterator.next();
            String index = hit.getIndex();
            String type = hit.getType();
            String id = hit.getId();
            float score = hit.getScore();
            System.out.println("index="+index+" type="+type+" id="+id+" score="+score+" source-->"+hit.getSourceAsString());
        }
    }

主方法中增加调用：

// 4.查询姓smith的雇员
getEmployeesByFieldEqual(client,"last_name","Smith");

结果并没有显示。。。
我们先测试它运作吗？
getEmployeesByFieldEqual(client,”about”,”love”);
结果显示：
index=megacorp type=employee id=5 score=0.7884338 source–>{“first_name”:”John”,”last_name”:”Smith1”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp type=employee id=1 score=0.7884338 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}

即它把包含的也显示出来了，原来about这个字段是text类型的，也就是默认分析的，（analyzed:默认选项，以标准的全文索引方式，分析字符串，完成索引。）表示他将被分析器分析，也就是说如果一个文档的about字段是I love to go rock climbing，那么将被分析成[I,love,to,go,rock,climbing]，如图
这里写图片描述
在匹配love词时只要about字段中有love这个词就会被匹配。所以会出现这个结果。参考：
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/query-dsl-term-query.html

如果我们不想要这样的结果：

可以将此索引的类型改变成不被分析的类型。
（not_analyzed 索引时不进行分词分析，确切值形式）
查看一下索引mapping的内容：
这里写图片描述
发现他们都是默认text类型的。
我们已经存在的索引是不可以更改它的映射的，（为了使数据可查询，就需要知道每一个field包含的数据的数据类型以及它是如何索引的。如果你将一个field的数据类型从string修改为date，这这个字段所包含的数据将全部无用。你需要重建创建索引了！这条规则不仅仅针对es，任何一个可用于查询的数据库系统都是这样。如果不用索引，就是为灵活性牺牲速度。参考：http://blog.csdn.net/jingkyks/article/details/41513063）
对于存在的索引，只有新字段出现时，Elasticsearch才会自动进行处理。如果确实需要修改映射，那么就使用reindex,采用重新导入数据的方式完成。
（参考：http://blog.csdn.net/u010994304/article/details/50454025）
（如果想要执行重新导入的操作参考：
http://blog.csdn.net/jingkyks/article/details/41513063
http://blog.csdn.net/u010994304/article/details/50454025
http://blog.csdn.net/lengfeng92/article/details/38230521
http://www.cnblogs.com/Creator/p/3722408.html）
所以要么建立的时候就将这个字段设置为不分析的字段（删除这个索引，重新增加）
要么重新导入数据

示例

现在我们举个栗子，重新新建一个索引，让他的映射都为no_analyzed
（你也可以先delete你现有的索引，我这里重建）
这里写图片描述
（参照原来的索引写）
（数据类型参考https://www.cnblogs.com/xing901022/p/5471419.html）
放入和原来相同的数据

这里写图片描述
再调用刚才的方法：
getEmployeesByFieldEqual(client,”about”,”love”);
没有返回任何数据
调用：
getEmployeesByFieldEqual(client,”about”,”I love to go rock climbing”);
结果显示2条数据：
index=megacorp1 type=employee1 id=5 score=0.87546873 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp1 type=employee1 id=1 score=0.87546873 source–>{“first_name”:”John”,”last_name”:”Smith1”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}

增加一条如图的数据：
这里写图片描述
调用：
getEmployeesByFieldEqual(client,”last_name”,”Smith 1”);
显示：
index=megacorp1 type=employee1 id=6 score=1.5404451 source–>{“first_name”:”John”,”last_name”:”Smith 1”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}

调用：
getEmployeesByFieldEqual(client,”last_name”,”Smith”);
显示：
index=megacorp1 type=employee1 id=5 score=1.0296195 source–>{“first_name”:”John”,”last_name”:”Smith”,”age”:25,”about”:”I love to go rock climbing”,”interests”:[“sports”,”music”]}
index=megacorp1 type=employee1 id=2 score=1.0296195 source–>{“first_name”:”Jane”,”last_name”:”Smith”,”age”:”32”,”about”:”I like to collect rock albums”,”interests”:[“music”]}
当然也可以用querystring来写

SearchRequestBuilder request = client.prepareSearch("megacorp")
                .setTypes("employee")
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                .setQuery(new QueryStringQueryBuilder(text).field(field)); 

SearchResponse response = request.get();

问题解决！

疑问

至于为什么改了它的是否分析后就能够查询到了，这一点很迷惑，未找到原因。他们的tokens完全相同
这里写图片描述
请求的request串也完全相同。
{
“query” : {
“term” : {
“last_name” : {
“value” : “Smith”,
“boost” : 1.0
}
}
}
}
这个方法是运作的， last_name没有匹配任何值
{“took”:3,”timed_out”:false,”_shards”:{“total”:5,”successful”:5,”failed”:0},”hits”:{“total”:0,”max_score”:null,”hits”:[]}}
欢迎解惑。。。