ES-Query&Filtering与多字符串多字段查询

Query Context & Filter Content

高级搜索功能:支持多项文本输入,针对多个字段进行搜索

搜索引擎一般也提供价格,时间等条件的过滤

ES中有两种筛选的上下文

query:有算分机制

filter:无算分机制,利用缓存提高性能

Bool 复合查询

must:必须匹配

should:选择性匹配

must_not:必须不匹配 不算分

filter必须匹配 但不算分

单字符串多字段查询

例:

PUT blogs/_doc/1
{
  "title":"Quick brown rabbits",
  "content":"Brown rabbits are commonly seen."
}

PUT blogs/_doc/2
{
  "title":"Keeping pets healthy",
  "content":"My quick brown fox eats rabbits on a regular basis."
}

GET blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "brown fox"
          }
        },
                {
          "match": {
            "content": "brown fox"
          }
        }
      ]
    }
  }
}

结果:
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.90425634,
    "hits" : [
      {
        "_index" : "blogs",//由于brown同时出现在title和content中所以匹配度更高
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.90425634,
        "_source" : {
          "title" : "Quick brown rabbits",
          "content" : "Brown rabbits are commonly seen."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "title" : "Keeping pets healthy",
          "content" : "My quick brown fox eats rabbits on a regular basis."
        }
      }
    ]
  }
}

那如何才能将我们想要的文档排在第一位呢
GET blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "dis_max": {
            "queries": [
              {
                "match": {
                  "title": "brown fox"
                }
              },
              {
                "match": {
                  "content": "brown fox"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

结果:
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.77041256,
    "hits" : [
      {
        "_index" : "blogs",//dis_max是将两个查询语句中每个字段匹配度高低进行排序
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "title" : "Keeping pets healthy",
          "content" : "My quick brown fox eats rabbits on a regular basis."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "title" : "Quick brown rabbits",
          "content" : "Brown rabbits are commonly seen."
        }
      }
    ]
  }
}

GET blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "dis_max": {
            "queries": [
              {
                "match": {
                  "title": "quick pets"
                }
              },
              {
                "match": {
                  "content": "quick pets"
                }
              }
            ],
            "tie_breaker": 0.7//在分数相同的情况下,可以通过tie_breaker进行设置
          }
        }
      ]
    }
  }
}

tie_breaker:

单字符串多字段查询-二-multi match

包含三种类型:

best fields:当字段之间相互关联又相互竞争时 评分来自最佳的匹配字段

most fields:处理英文文档时:一种常见的手段是在主字段(english analyzer) 抽取词干,加入同义词,以匹配更多的文档,相同的文本,加入子字段(standard analyzer)用于提供更精确的匹配,其他字段作为匹配文档提高相关度的信号,匹配字段越多则越好

cross fields:对于某些实体,例如人名,地址,图书信息。需要在多个字段中确定信息,单个字段只能作为整体的一部分,希望在任何这些列出的字段中找到尽可能多的字段

例:

PUT titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{"index":{"_id":1}}
{"title":"My dog barks"}
{"index":{"_id":2}}
{"title":"I see a lot of barking dogs on zhe road"}

GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"//这个查询理论上是id为2的在第一位,应为更匹配
    }
  }
}

结果:id为1的排在第一位,原因就是字符长度更短,在倒排索引里barking dogs会被解析成bark和dog,所以两条记录都匹配,按照字符最短优先
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.43598637,
    "hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43598637,
        "_source" : {
          "title" : "My dog barks"
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.3133652,
        "_source" : {
          "title" : "I see a lot of barking dogs on zhe road"
        }
      }
    ]
  }
}

那如何处理这种情况呢
PUT titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english",
        "fields": {"std":{"type":"text","analyzer":"standard"}}//添加子字段
      }
    }
  }
}

//使用multi_match查询 类型改为most_fields
GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "fields": ["title","title.std"],
      "type": "most_fields"
    }
  }
}

结果:如果父字段分数一直,会对比子字段,子字段采用的是standard分词,所以Id为2的匹配度更高
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.4494115,
    "hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.4494115,
        "_source" : {
          "title" : "I see a lot of barking dogs on zhe road"
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43598637,
        "_source" : {
          "title" : "My dog barks"
        }
      }
    ]
  }
}

可以通过title^10改变这个字段的权重也就是boost
GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "fields": ["title^10","title.std"],
      "type": "most_fields"
    }
  }
}

cross fields 应用场景是跨字段并且需要每个字段都需要满足查询字符串中所有单词的分数最高
PUT address/_doc/1
{
  "street":"5 Poland Street",
  "city":"London",
  "country":"United Kindom",
  "postcode":"W1V 3DG"
}

POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "fields": ["street","city","country","postcode"],
      "type": "most_fields",
      "operator": "and"
    }
  }
}

结果:查不到任何数据
POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "fields": ["street","city","country","postcode"],
      "type": "cross_fields",
      "operator": "and"
    }
  }
}
可以查到数据


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值