ES-Query&Filtering与多字符串多字段查询

最新推荐文章于 2024-01-12 22:11:01 发布

longasyan

最新推荐文章于 2024-01-12 22:11:01 发布

阅读量1.1k

点赞数

分类专栏： Elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/qq_43045747/article/details/119052856

版权

Elasticsearch 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

Query Context & Filter Content

高级搜索功能：支持多项文本输入，针对多个字段进行搜索

搜索引擎一般也提供价格，时间等条件的过滤

ES中有两种筛选的上下文

query：有算分机制

filter：无算分机制，利用缓存提高性能

Bool 复合查询

must:必须匹配

should：选择性匹配

must_not：必须不匹配不算分

filter必须匹配但不算分

单字符串多字段查询

例：

PUT blogs/_doc/1
{
  "title":"Quick brown rabbits",
  "content":"Brown rabbits are commonly seen."
}

PUT blogs/_doc/2
{
  "title":"Keeping pets healthy",
  "content":"My quick brown fox eats rabbits on a regular basis."
}

GET blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "brown fox"
          }
        },
                {
          "match": {
            "content": "brown fox"
          }
        }
      ]
    }
  }
}

结果：
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.90425634,
    "hits" : [
      {
        "_index" : "blogs",//由于brown同时出现在title和content中所以匹配度更高
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.90425634,
        "_source" : {
          "title" : "Quick brown rabbits",
          "content" : "Brown rabbits are commonly seen."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "title" : "Keeping pets healthy",
          "content" : "My quick brown fox eats rabbits on a regular basis."
        }
      }
    ]
  }
}

那如何才能将我们想要的文档排在第一位呢
GET blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "dis_max": {
            "queries": [
              {
                "match": {
                  "title": "brown fox"
                }
              },
              {
                "match": {
                  "content": "brown fox"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

结果：
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.77041256,
    "hits" : [
      {
        "_index" : "blogs",//dis_max是将两个查询语句中每个字段匹配度高低进行排序
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "title" : "Keeping pets healthy",
          "content" : "My quick brown fox eats rabbits on a regular basis."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "title" : "Quick brown rabbits",
          "content" : "Brown rabbits are commonly seen."
        }
      }
    ]
  }
}

GET blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "dis_max": {
            "queries": [
              {
                "match": {
                  "title": "quick pets"
                }
              },
              {
                "match": {
                  "content": "quick pets"
                }
              }
            ],
            "tie_breaker": 0.7//在分数相同的情况下，可以通过tie_breaker进行设置
          }
        }
      ]
    }
  }
}

tie_breaker：

单字符串多字段查询-二-multi match

包含三种类型：

best fields:当字段之间相互关联又相互竞争时评分来自最佳的匹配字段

most fields：处理英文文档时：一种常见的手段是在主字段(english analyzer) 抽取词干，加入同义词，以匹配更多的文档，相同的文本，加入子字段（standard analyzer）用于提供更精确的匹配，其他字段作为匹配文档提高相关度的信号，匹配字段越多则越好

cross fields：对于某些实体，例如人名，地址，图书信息。需要在多个字段中确定信息，单个字段只能作为整体的一部分，希望在任何这些列出的字段中找到尽可能多的字段

例：

PUT titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{"index":{"_id":1}}
{"title":"My dog barks"}
{"index":{"_id":2}}
{"title":"I see a lot of barking dogs on zhe road"}

GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"//这个查询理论上是id为2的在第一位，应为更匹配
    }
  }
}

结果：id为1的排在第一位，原因就是字符长度更短，在倒排索引里barking dogs会被解析成bark和dog，所以两条记录都匹配，按照字符最短优先
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.43598637,
    "hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43598637,
        "_source" : {
          "title" : "My dog barks"
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.3133652,
        "_source" : {
          "title" : "I see a lot of barking dogs on zhe road"
        }
      }
    ]
  }
}

那如何处理这种情况呢
PUT titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english",
        "fields": {"std":{"type":"text","analyzer":"standard"}}//添加子字段
      }
    }
  }
}

//使用multi_match查询 类型改为most_fields
GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "fields": ["title","title.std"],
      "type": "most_fields"
    }
  }
}

结果：如果父字段分数一直，会对比子字段，子字段采用的是standard分词，所以Id为2的匹配度更高
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.4494115,
    "hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.4494115,
        "_source" : {
          "title" : "I see a lot of barking dogs on zhe road"
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43598637,
        "_source" : {
          "title" : "My dog barks"
        }
      }
    ]
  }
}

可以通过title^10改变这个字段的权重也就是boost
GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "fields": ["title^10","title.std"],
      "type": "most_fields"
    }
  }
}

cross fields 应用场景是跨字段并且需要每个字段都需要满足查询字符串中所有单词的分数最高
PUT address/_doc/1
{
  "street":"5 Poland Street",
  "city":"London",
  "country":"United Kindom",
  "postcode":"W1V 3DG"
}

POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "fields": ["street","city","country","postcode"],
      "type": "most_fields",
      "operator": "and"
    }
  }
}

结果：查不到任何数据
POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "fields": ["street","city","country","postcode"],
      "type": "cross_fields",
      "operator": "and"
    }
  }
}
可以查到数据

longasyan

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ES-Query&Filtering与多字符串多字段查询

Query Context & Filter Content高级搜索功能：支持多项文本输入，针对多个字段进行搜索搜索引擎一般也提供价格，时间等条件的过滤ES中有两种筛选的上下文query：有算分机制filter：无算分机制，利用缓存提高性能Bool 复合查询must:必须匹配should：选择性匹配must_not：必须不匹配不算分filter必须匹配但不算分单字符串多字段查询例：PUT blogs/_doc/1{ "title
复制链接

扫一扫

专栏目录