ElasticSearch查询学习笔记章节3——scroll,delete-by-query,bool,boosting,filter,highlight查询

ElasticSearch查询笔记目录

  涉及的常用查询内容较多,将分多个章节进行笔记整理,具体如下:

  1. ElasticSearch查询学习笔记章节1——term,terms,match,id查询

   主要是依据精准的查询条件来查询,查询速度快,也是最常用的几类查询方式,具体种类如下:

  • term查询
  • terms查询
  • match_all查询
  • match查询
  • 布尔match查询
  • multi_match查询
  • 根据文档id查询(单个id)
  • 根据文档ids查询(多个id)
  1. ElasticSearch查询学习笔记章节2——prefix,fuzzy,wildcard,range,regexp查询

  主要是涉及ElasticSearch查询条件相对模糊,查询速度相对慢,实时查询时应尽量避免这些方式,但是这些查询方式又具有自己独特不可代替的功能,还是还有必要,具体如下:

  • prefix查询
  • fuzzy查询
  • wildcard查询
  • range查询
  • regexp查询
  1. ElasticSearch查询学习笔记章节3——scroll,delete-by-query,bool,boosting,filter,highlight查询

  主要涉及ElasticSearch的一些常用的杂项查询;

  • 深分页scroll查询
  • delete-by-query
  • bool查询
  • boosting查询
  • filter查询
  • highlight(高亮)查询
  1. ElasticSearch查询学习笔记章节4——cardinality,range,extended_stats聚合统计aggregations查询

  主要涉及ES的聚合查询Aggregations;

  • cardinality(去重计数)查询
  • range(范围统计)查询
  • extended_stats(统计聚合)查询
  1. ElasticSearch查询学习笔记章节5——geo_distance,geo_bounding_box,geo_polygon地图检索geo查询

.   主要涉及ES的地图检索geo相关的查询;

  • geo_distance查询
  • geo_bounding_box查询
  • geo_polygon查询

整体Java代码的测试用例项目

  整个章节的Java代码放在CSDN资源ElasticSearch常用查询的Java实现;路径效果如下图,欢迎下载访问;在这里插入图片描述

深分页scroll查询

之前讲过from+size的分页,为何又有scroll+size的深分页呢?这里先对比一下两者的区别;
from+size在ES查询数据的方式步骤如下:

  1. 先将用户指定的关键字进行分词;
  2. 将词汇去分词库中进行检索,得到多个文档的id;
  3. 去各个分片中拉取指定的数据,相对耗时较长;
  4. 将数据根据score进行排序,耗时相对较长;
  5. 根据from,size的值,截取满足条件的查询到的数据;
  6. 返回结果;
    优点:每次都能获取到最新的记录;
    缺点:同一个查询,展示另一页的from+size时,以上步骤需要再来一遍;

scoll+size在ES查询数据的方式:

  1. 先将用户指定的关键字进行分词;
  2. 将词汇去分词库中进行检索,得到多个文档的id;
  3. 将文档的id存放在内存的一个ES的上下文中;
  4. 根据你指定的size的个数去ES上下文中检索指定个数的数据,拿完了数据的文档id,会从上下文中移除;
  5. 如果需要下一页数据,直接去ES的上下文中,找后续内容;
  6. 循环第4步,第五步,直到数据都取完了;
    优点:数据缓存进了内存,速度快,同一个查询,展示另一页的scoll+size时,只需要循环4,5步;
    缺点:冷加载,不适合做实时,当数据更新时,内存中的上下文id数据不会更新;

  实现要求,依据fee字段和moblie字段倒序按照每一页2条scroll查询公司信息;

  RESTFUL代码如下;

#步骤1 scoll 查询,返回第一页数据,将ES的id存放在上下文中
#参数scroll=2m表示scroll查询的上下文在内存中存放2分钟,不指定默认生存时间为0,当超时,会自动删除上下文,则下面的步骤23会查询报错
#指定size为2
#scroll可以指定字段排序,默认按照文档id排序
POST /sms-logs-index/_search?scroll=2m
{
  "query": {
    "match_all": {}
  }
  , "size": 2
  , "sort": [
    {
      "fee": {
        "order": "desc"
      }
     ,"moblie": {
       "order": "desc"
    }
  ]
}

#步骤2 根据scroll查询下一页数量,再下一页的话再执行下此语句,再下一页再再执行,直到结束或超时;
# scroll_id指的是上面的查询结果
# scroll还是要继续指定上下文在内存中缓存2分钟

POST /_search/scroll
{
  "scroll_id":"FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR"
 ,"scroll":"2m"
}

# 步骤3 删除scroll在es上下文中的数量
# 可能我查到第一页就知道了结果,对后面的分页不感兴趣了,我想提前删除scroll中的上下文
DELETE /_search/scroll/FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR

  RESTFUL代码查询结果如下;

#步骤1 scoll 查询结果
{
  "_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR",
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : null,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "24514635",
          "moblie" : 18545427895,
          "corpName" : "东东集团",
          "smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "北京",
          "ipAddr" : "10.254.19.45",
          "replyTotal" : "1",
          "fee" : "6000"
        },
        "sort" : [
          6000.0
        ]
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : null,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "54784641",
          "moblie" : 15625584654,
          "corpName" : "勾股科技有限公司",
          "smsContent" : "智能算法,智慧生活,勾股科技!",
          "state" : "1",
          "opratorId" : "2",
          "province" : "杭州",
          "ipAddr" : "10.215.19.45",
          "replyTotal" : "6",
          "fee" : "4000"
        },
        "sort" : [
          4000.0
        ]
      }
    ]
  }
}
#步骤2 根据scroll查询下一页数量结果

{
  "_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDnF1ZXJ5VGhlbkZldGNoBRQtNDEtREhVQnZKaFZKTkZ3Z3VyRgAAAAAABIWAFmJWa2hfQ2g3UlF1bjBoMEVvWkZnbHcULXd0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU7xY3Si1RRmU0NlRzQ19mdkFtb0pMLVVRFGJsMS1ESFVCb3RTY3RrNUdnREVQAAAAAAABAqAWdmh6NmMzeXVUa1NFbVFYMjQ0S3dGZxRaVjUtREhVQnVPVGdEcnZ1Z0xKQgAAAAAAE8ZFFjdGSWx5WkpGVDkyZXA5OEtIQnlqcFEUX0F0LURIVUJlUTJ6NWVhOGdSU2UAAAAAAAiU8BY3Si1RRmU0NlRzQ19mdkFtb0pMLVVR",
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : null,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "33656412674",
          "moblie" : 18956451203,
          "corpName" : "华丽网集团",
          "smsContent" : "网络安全,华丽靠谱!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "上海",
          "ipAddr" : "10.215.254.45",
          "replyTotal" : "1",
          "fee" : "2000"
        },
        "sort" : [
          2000.0
        ]
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : null,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "458744536",
          "moblie" : 134625584654,
          "corpName" : "星雨文化传媒",
          "smsContent" : "魅力宣传,星雨传媒!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        },
        "sort" : [
          500.0
        ]
      }
    ]
  }
}

# 步骤3 删除scroll在es上下文中的数量结果
{
  "succeeded" : true,
  "num_freed" : 5
}

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的
    String index="sms-logs-index";

    @Test
    public void scrollQuery() throws IOException
    {
        //1. 创建SearchRequest
        SearchRequest request=new SearchRequest(index);

        //2.指定scroll鑫鑫
        request.scroll(TimeValue.timeValueMinutes(2L));

        //3.指定查询条件
        SearchSourceBuilder builder =new SearchSourceBuilder();
        builder.size(4);
        builder.sort("fee", SortOrder.DESC);
        builder.query(QueryBuilders.matchAllQuery());
        request.source(builder);

        //4. 获取返回结果scrollId,source的首页信息
        SearchResponse response = myClient.search(request, RequestOptions.DEFAULT);
        String scrollId = response.getScrollId();
        System.out.println("-----------------------首页----------------------------");
        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }

        while (true)
        {
            //5.循环-创建SearchSrollRequest
            SearchScrollRequest scrollRequest=new SearchScrollRequest(scrollId);

            //6.指定scrollId的生存时间
            scrollRequest.scroll(TimeValue.timeValueMinutes(2L));

            //7.执行查询获取返回结果
            SearchResponse scrollResp=myClient.scroll(scrollRequest,RequestOptions.DEFAULT);

            //8.判断这一页是否还有数据,有则输出,没有则跳出循环
            SearchHit[] hits = scrollResp.getHits().getHits();
            if(hits != null && hits.length>0)
            {
                System.out.println("-----------------------下一页----------------------------");
                for (SearchHit hit : hits) {
                    System.out.println(hit.getSourceAsMap());
                }
            }
            else
            {
                //9。判断没有查询到数据-退出循环
                System.out.println("-----------------------结束----------------------------");
                break;
            }

        }

        //10.创建ClearScrollRequest
        ClearScrollRequest clearScrollRequest=new ClearScrollRequest();

        //11.指定ScrollId
        clearScrollRequest.addScrollId(scrollId);

        //12.删除ScrollId
        ClearScrollResponse clearScrollResponse =myClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);

        //13.输出结果
        System.out.println("删除scroll:"+clearScrollResponse.isSucceeded());

    }

  Java代码实现的结果如下图;

在这里插入图片描述

图1 Java代码实现scroll深分页的查询结果

delete-by-query

根据term,match等查询方式去删除大量的文档
注意:如果需要删除的内容,是该index下的大部分数据,推荐逆向思维,即新建一个新的index,将保留的文档内容添加到新的index,然后再直接访问新的index即可。

  实现要求,依据利用range查询fee小于0.2的公司信息,并将这些数据删除。

  RESTFUL代码如下;

#步骤1 利用range查询fee小于0.2的公司信息,查看一下查询结果可以发现有2条数据
POST /sms-logs-index/_search
{
  "query": {
    "range": {
      "fee": {
        "lt": 0.2
      }
    }
  }
}

#步骤2 利用delete_by_query删除查询结果数据
POST /sms-logs-index/_delete_by_query
{
    "query": 
    {
    "range": 
    {
      "fee": 
      {
        "lt": 0.2
      }
    }
  }
}

#步骤3 再次利用range查询fee小于0.2的公司信息,已经无信息
POST /sms-logs-index/_search
{
  "query": {
    "range": {
      "fee": {
        "lt": 0.2
      }
    }
  }
}





  RESTFUL代码查询结果如下;

#步骤1 利用range查询fee小于0.2的公司信息,查看一下查询结果可以发现有2条数据的反馈结果
# POST /sms-logs-index/_search
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.0,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "5784320",
          "moblie" : 15236964578,
          "corpName" : "花花派",
          "smsContent" : "花开花落,魅力女性,买花选我!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.265.19.45",
          "replyTotal" : "1",
          "fee" : "0.1"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "10201021",
          "moblie" : 13026254898,
          "corpName" : "上海智慧软件有限公司",
          "smsContent" : "连接你我,智慧软件,让生活更美好",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.215.19.45",
          "replyTotal" : "1",
          "fee" : "0.1"
        }
      }
    ]
  }
}

#步骤2 利用delete_by_query删除查询结果数据的反馈结果
# POST /sms-logs-index/_delete_by_query
{
  "took" : 107,
  "timed_out" : false,
  "total" : 2,
  "deleted" : 2,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

#步骤3 再次利用range查询fee小于0.2的公司信息,已经无信息的反馈结果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

  为了保证刚刚的效果,这里先用RESTFul风格代码把刚刚删除的两条记录再加回来,代码如下;


PUT /sms-logs-index/_doc/1
{

"createDate":"2020-09-16"
,"senDate":"2020-09-16"
,"longCode":"10201021"
,"moblie":13026254898
,"corpName":"上海智慧软件有限公司"
,"smsContent":"连接你我,智慧软件,让生活更美好"
,"state":"1"
,"opratorId":"1"
,"province":"上海"
,"ipAddr":"10.215.19.45"
,"replyTotal":"1"
,"fee":"0.1"
}


PUT /sms-logs-index/_doc/9
{

"createDate":"2020-09-16"
,"senDate":"2020-09-16"
,"longCode":"5784320"
,"moblie":15236964578
,"corpName":"花花派"
,"smsContent":"花开花落,魅力女性,买花选我!"
,"state":"1"
,"opratorId":"1"
,"province":"上海"
,"ipAddr":"10.265.19.45"
,"replyTotal":"1"
,"fee":"0.1"
}

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的
    String index="sms-logs-index";

    @Test
    public void deleteByQuery() throws IOException {
        //1.创建DeleteByQueryRequest
        DeleteByQueryRequest request=new DeleteByQueryRequest(index);

        //2.指定检索的条件和SearchRequest指定Query的方式不一样
        request.setQuery(QueryBuilders.rangeQuery("fee").lt(0.2));

        //3.指定删除
        BulkByScrollResponse resp = myClient.deleteByQuery(request, RequestOptions.DEFAULT);

        //4.输出返回结果
        System.out.println(resp.toString());

    }


  Java代码实现的效果如图2所示;
在这里插入图片描述

图2 Java代码实现delete-by-query结果反馈

bool查询

复合过滤器,将你的多个查询条件,以一定的逻辑组合在一起

-must:所有的条件,用must组合在一起,类似于逻辑判断的意思
-must_not:将must_not中的条件,全部不能匹配,类似于逻辑判断的意思;
should:所有的条件,只要其中一条满足即可,类似于逻辑判断的意思;

  实现要求,查询城市为北京或者杭州,运营商id不等于2的,smsContent中包含魅力或者推动的公司的短信内容;

  注意RESTFUL代码稍有不慎,可能出现should失效的写法,错误示例如下;

#查询城市为北京或者杭州
#运营商id不等于2的
#smsContent中包含魅力或者推动的
#bool查询
POST /sms-logs-index/_search
{
  "query": 
  {
    "bool": 
    {
      "should": [
        {
          "terms": {
            "province": [
              "北京",
              "杭州"
            ]
          }
    
        }
      ]
      ,"must_not": [
        {
          "term": {
            "opratorId": {
              "value": "2"
            }
          }
        }
      ]
      ,"must": [
        {
          "match": {
            "smsContent": 
            {
              "query": "魅力 推动"
              , "operator": "or"
            }
          }
        }
      ]
    }
  }
}

  可以看到结果中把上海的点也查出来了,其他条件倒是都是满足的,只是should条件失效了;当使用should查询时,如果包含了must或者filter查询,那么should的查询语句就不是或者的意思了,而是有或者没有都行的含义。但是should里面再嵌套两个must

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 2.0892315,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 2.0892315,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "24514635",
          "moblie" : 18545427895,
          "corpName" : "东东集团",
          "smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "北京",
          "ipAddr" : "10.254.19.45",
          "replyTotal" : "1",
          "fee" : "6000"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 1.73617,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "123546241",
          "moblie" : 156625584654,
          "corpName" : "哈雷天文用具公司",
          "smsContent" : "天文研究,放心推动,哈雷天文!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 1.6317747,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "458744536",
          "moblie" : 134625584654,
          "corpName" : "星雨文化传媒",
          "smsContent" : "魅力宣传,星雨传媒!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.56260216,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "5784320",
          "moblie" : 15236964578,
          "corpName" : "花花派",
          "smsContent" : "花开花落,魅力女性,买花选我!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.265.19.45",
          "replyTotal" : "1",
          "fee" : "0.1"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.2876821,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "87454120",
          "moblie" : 13625789645,
          "corpName" : "爱美化妆品有限公司",
          "smsContent" : "魅力,势不可挡,爱美爱美",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.258.19.45",
          "replyTotal" : "1",
          "fee" : "200"
        }
      }
    ]
  }
}

  准确的RESTFul风格代码应该是把should嵌入到must里面,代码参考如下;

  
#查询城市为北京或者杭州
#运营商id不等于2的
#smsContent中包含魅力或者推动的
#bool查询
POST /sms-logs-index/_search
{
  "query": 
  {
    "bool": 
    {
      "must_not": [
        {
          "term": {
            "opratorId": {
              "value": "2"
            }
          }
        }
      ]
      ,"must": 
      [
        {
          "match": 
          {
            "smsContent": 
            {
              "query": "魅力 推动"
              , "operator": "or"
            }
          }
        }
        ,
        {
          "bool": 
          {
            "should": [
              {
                "terms": {
                  "province": [
                    "北京",
                    "杭州"
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

  结果如下;

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.95882,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.95882,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "24514635",
          "moblie" : 18545427895,
          "corpName" : "东东集团",
          "smsContent" : "数据驱动,AI推动,新零售模型让你的购买更心怡!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "北京",
          "ipAddr" : "10.254.19.45",
          "replyTotal" : "1",
          "fee" : "6000"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 1.8187511,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "458744536",
          "moblie" : 134625584654,
          "corpName" : "星雨文化传媒",
          "smsContent" : "魅力宣传,星雨传媒!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "12",
        "_score" : 1.73617,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "123546241",
          "moblie" : 156625584654,
          "corpName" : "哈雷天文用具公司",
          "smsContent" : "天文研究,放心推动,哈雷天文!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        }
      }
    ]
  }
}

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的
    String index="sms-logs-index";

    @Test
    public void BoolQuery() throws IOException {
        //1.创建SearchRequest
        SearchRequest request=new SearchRequest(index);

        //2.指定查询条件
        SearchSourceBuilder builder=new SearchSourceBuilder();
        BoolQueryBuilder boolQuery=QueryBuilders.boolQuery();
        //#查询城市为北京或者杭州
        boolQuery.must(QueryBuilders.termsQuery("province","北京","杭州"));

        //#运营商id不等于2的
        boolQuery.mustNot(QueryBuilders.termQuery("opratorId",2));

        //#smsContent中包含魅力或者推动的
        boolQuery.must(QueryBuilders.matchQuery("smsContent","魅力 推动").operator(Operator.OR));


        builder.query(boolQuery);
        request.source(builder);
        //3。职称查询
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);

        //4.输出结果
        for (SearchHit hit : resp.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }
    }

在这里插入图片描述

图3 Java代码实现bool查询结果反馈

boosting查询

boosting查询可以帮助我们去影响查询后的score。

  • positive:只有匹配上positive的查询内容,才会被放到返回的结果集中;
  • negative:如果匹配上和positive的内容也匹配上了negative,就可以降低这样的文档的内容;
  • negative_boost:指定系数,必须小于1.0;

关于查询时,分数是如何计算的思路设计:

  • 搜索的关键字在文档中出现的频次越高,分数就越高;
  • 符合搜索内容的文档内容越短,分数越高;
  • 我们在搜索时,指定的关键字也会被分词,这个被分词的内容,被分词库匹配的个数越多,分数越高。

  实现要求,依据smsContent字段包含魅力词语的文档信息,并且把查到的文档smsContent字段也包含传媒字样的文档得分score降低;

  RESTFUL代码先来看一下正常的查询得分,即实现依据smsContent字段包含魅力词语的文档信息得分;

#实现
POST /sms-logs-index/_search
{
  "query": {
    "match": {
      "smsContent": "魅力"
    }
  }
}


#结果
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.6317746,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.6317746,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "458744536",
          "moblie" : 134625584654,
          "corpName" : "星雨文化传媒",
          "smsContent" : "魅力宣传,星雨传媒!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.56260216,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "5784320",
          "moblie" : 15236964578,
          "corpName" : "花花派",
          "smsContent" : "花开花落,魅力女性,买花选我!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.265.19.45",
          "replyTotal" : "1",
          "fee" : "0.1"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.2876821,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "87454120",
          "moblie" : 13625789645,
          "corpName" : "爱美化妆品有限公司",
          "smsContent" : "魅力,势不可挡,爱美爱美",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.258.19.45",
          "replyTotal" : "1",
          "fee" : "200"
        }
      }
    ]
  }
}



  可以发现目前smsContent字段包含魅力词语的文档信息,并且把查到的文档smsContent字段也包含传媒字样的文档,得分最高0.6317746分,排在第一;接下来使用RESTFul风格的boosting代码和效果;

#boosting查询
POST /sms-logs-index/_search
{
  "query": 
  {
    "boosting": {
      "positive": {
        "match": {
          "smsContent": "魅力"
        }
      }
      , "negative": {
        "match": {
          "smsContent": "传媒"
        }
      }
      , "negative_boost": 0.2
    }
    
  }
}


#效果如下
{
  "took" : 33,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.73050237,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.73050237,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "5784320",
          "moblie" : 15236964578,
          "corpName" : "花花派",
          "smsContent" : "花开花落,魅力女性,买花选我!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.265.19.45",
          "replyTotal" : "1",
          "fee" : "0.1"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.2876821,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "87454120",
          "moblie" : 13625789645,
          "corpName" : "爱美化妆品有限公司",
          "smsContent" : "魅力,势不可挡,爱美爱美",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.258.19.45",
          "replyTotal" : "1",
          "fee" : "200"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.16375022,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "458744536",
          "moblie" : 134625584654,
          "corpName" : "星雨文化传媒",
          "smsContent" : "魅力宣传,星雨传媒!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        }
      }
    ]
  }
}


  这条记录的的score得分变成了是 0.16375022,排在最后;

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的
    String index="sms-logs-index";


    @Test
    public void boostingQuery() throws IOException {
        //1.创建SearchRequest
        SearchRequest request=new SearchRequest(index);

        //2.指定查询条件
        SearchSourceBuilder builder=new SearchSourceBuilder();
        BoostingQueryBuilder boostingQuery =QueryBuilders.boostingQuery(
                QueryBuilders.matchQuery("smsContent","魅力"),
                QueryBuilders.matchQuery("smsContent","传媒")
        ).negativeBoost(0.2f);

        builder.query(boostingQuery);
        request.source(builder);

        request.source(builder);
        //3。职称查询
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);

        //4.输出结果
        for (SearchHit hit : resp.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());
        }
    }

在这里插入图片描述

图4 Java代码实现boosting查询结果反馈

filter查询

query,根据你的查询条件,去计算文档的匹配得到一个分数score,并且根据分数进行排序,不会做缓存;
filter,根据你的查询条件去查询文档,不去计算分数,而且filter会对经常被过滤的数据进行缓存,方便下次快速定位查询;
如果你的查询比较精准,即不太在乎匹配数据的分数score,建议使用filter,反之,如果匹配条件不确定,需要依赖分数score来进行产讯结果的排序,则用query;
不依赖分数score的情况下,filter的性能优于query;

  实现要求,依据smsContent字段包含魅力的以及fee消费小于400的filter查询公司的短信内容;

  RESTFUL代码如下;

POST /sms-logs-index/_search
{
  "query": {
    "bool": {
      "filter": 
      [
        {
          "term": 
          {
            "smsContent": "魅力"
          }
          
        }
        , 
        {
          "range": 
          {
            "fee":
            {
              "lte": 400
            }
          }
        }
      ]
    }
  }
}


  RESTFUL代码实现的结果如下,注意看,这些记录的score都是0.0,说明没有进行分数统计,如下;

{
  "took" : 81,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.0,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "5784320",
          "moblie" : 15236964578,
          "corpName" : "花花派",
          "smsContent" : "花开花落,魅力女性,买花选我!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.265.19.45",
          "replyTotal" : "1",
          "fee" : "0.1"
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.0,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "87454120",
          "moblie" : 13625789645,
          "corpName" : "爱美化妆品有限公司",
          "smsContent" : "魅力,势不可挡,爱美爱美",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.258.19.45",
          "replyTotal" : "1",
          "fee" : "200"
        }
      }
    ]
  }
}

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的
    String index="sms-logs-index";

    @Test
    public void filter() throws IOException {
        //1.SearchRequest
        SearchRequest request=new SearchRequest(index);

        //2.查询条件
        SearchSourceBuilder builder=new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
        boolQueryBuilder.filter(QueryBuilders.termQuery("smsContent","魅力"));
        boolQueryBuilder.filter(QueryBuilders.rangeQuery("fee").lte(400));
        builder.query(boolQueryBuilder);
        request.source(builder);

        //3.执行查询
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);

        //.返回结果
        for (SearchHit hit : resp.getHits().getHits()) {
            System.out.println(hit.getSourceAsMap());

        }
    }

  Java代码实现filter的结果如下图5;
在这里插入图片描述

图5 Java实现filter的效果

highlight(高亮)查询

高亮查询就是你输入的关键字,以一定的特殊字符样式展示给用户,让用户知道为什么这个结果被检索出来,效果展示如图6。
高亮展示的数据,本身九十文档中的一个field,单独讲field以highlight的形式返回给你。
ES提供了一个highlight属性,和query同级别的。

  • fragment_size :指定高亮数据展示多少个字符回来;
  • pe_tag:指定前缀标签,举个栗子<font color="red">
  • post_tags:指定后缀标签,举个栗子</font>·
  • field:指定那个字段为高亮字段

在这里插入图片描述

图6 高亮查询的含义效果

  实现要求,依据smsContent字段包含的魅力字段语法高

  RESTFUL代码如下;


POST /sms-logs-index/_search
{
  "query": {
    "match": {
      "smsContent": "魅力"
    }
  }
  , "highlight": 
  {
    "fields": {
      "smsContent": {}
    }
    , "pre_tags": "<font color='red'>"
    , "post_tags": "</font>"
    ,"fragment_size":10
  }
}


  RESTFUL代码实现的结果如下,可以发现他并没有改变返回结果本身,而是在第二个hits同级别的下面多个highlight标签,里面的内容就是运用于高亮的html语法,将结果copy到txt文件,把txt后缀的文件改成html后缀,再使用Chrome浏览器打开该文件,就可以,看到图7的效果;

{
  "took" : 121,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.81875104,
    "hits" : [
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "11",
        "_score" : 0.81875104,
        "_source" : {
          "createDate" : "2020-09-22",
          "senDate" : "2020-09-22",
          "longCode" : "458744536",
          "moblie" : 134625584654,
          "corpName" : "星雨文化传媒",
          "smsContent" : "魅力宣传,星雨传媒!",
          "state" : "1",
          "opratorId" : "3",
          "province" : "杭州",
          "ipAddr" : "10.289.19.45",
          "replyTotal" : "6",
          "fee" : "500"
        },
        "highlight" : {
          "smsContent" : [
            "<font color='red'>魅力</font>宣传,星雨传媒!"
          ]
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 0.73050237,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "5784320",
          "moblie" : 15236964578,
          "corpName" : "花花派",
          "smsContent" : "花开花落,魅力女性,买花选我!",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.265.19.45",
          "replyTotal" : "1",
          "fee" : "0.1"
        },
        "highlight" : {
          "smsContent" : [
            "花开花落,<font color='red'>魅力</font>女性,买花选我"
          ]
        }
      },
      {
        "_index" : "sms-logs-index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.2876821,
        "_source" : {
          "createDate" : "2020-09-16",
          "senDate" : "2020-09-16",
          "longCode" : "87454120",
          "moblie" : 13625789645,
          "corpName" : "爱美化妆品有限公司",
          "smsContent" : "魅力,势不可挡,爱美爱美",
          "state" : "1",
          "opratorId" : "1",
          "province" : "上海",
          "ipAddr" : "10.258.19.45",
          "replyTotal" : "1",
          "fee" : "200"
        },
        "highlight" : {
          "smsContent" : [
            "<font color='red'>魅力</font>,势不可挡,爱美爱美"
          ]
        }
      }
    ]
  }
}

在这里插入图片描述

图7 高亮查询在浏览器中实现

  Java代码如下;

    static RestHighLevelClient myClient= EsClient.getClient();  //获取操作ES的
    String index="sms-logs-index";

    @Test
    public void highLigtQuery() throws IOException {
        //1.SearchRequest
        SearchRequest request=new SearchRequest(index);

        //2.查询条件
        SearchSourceBuilder builder =new SearchSourceBuilder();
        builder.query(QueryBuilders.matchQuery("smsContent","魅力"));

        //2.1 添加高亮
        HighlightBuilder highlightBuilder =new HighlightBuilder();
        highlightBuilder.field("smsContent",10).preTags("<font color='red'>").postTags("</font>");

        builder.highlighter(highlightBuilder);
        request.source(builder);

        //3.执行查询
        SearchResponse resp = myClient.search(request, RequestOptions.DEFAULT);

        //4.返回结果
        for (SearchHit hit : resp.getHits().getHits()) {
            System.out.println(hit.getHighlightFields().get("smsContent"));
        }
    }

  Java代码实现的效果如图8如下;
在这里插入图片描述

图8 高亮查询在Java中实现
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

╭⌒若隐_RowYet——大数据

谢谢小哥哥,小姐姐的巨款

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值