ElasticSearch 基于 Term 的查询

最新推荐文章于 2024-08-21 12:20:32 发布

天才小熊猫12138584

最新推荐文章于 2024-08-21 12:20:32 发布

阅读量2.1k

点赞数 1

分类专栏： ElasticSearch 文章标签： ElasticSearch Term

本文链接：https://blog.csdn.net/qq_40990836/article/details/96195494

版权

ElasticSearch 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

Term 的重要性

Term 是表达语音的最小单位，搜索和利用统计语言模型进行自然语言处理都需要处理 Term

特点

Term Level Query : Term Query / Range Query / Exists Query0 / Prefix Query / Wildcard Query
在ES 中， Term 查询，对输入不做分词，会将输入座位一个整体，在倒排索引中查找准确的词项，并且使用相关度算分公式为每个包含该词项的文档进行相关度算分。 – 例如 Apple Store
可以通过 Constant Score 将查询转换成一个 Filtering, 避免算分，并利用缓存，提高性能。

关于 Term 查询的例子

先添加一些数据

// 创建一个索引
PUT products
{
  "settings": {
    "number_of_shards": 1
  }
}
// 创建数据
POST products/_bulk
{"index":{"_id":1}}
{"productID" : "XHDK-A-1293-#fJ3","desc" : "iPhone"}
{"index":{"_id":2}}
{"productID" : "KDKE-B-9947-#kL5","desc" : "iPad"}
{"index":{"_id":3}}
{"productId" : "JODL-X-1937-#pV7","desc" : "MBP"}

首先搜索一下iPhone

POST products/_search
{
  "query": {
    "term": {
      "desc": {
         "value": "iPhone"
      }
    }
  }
}
// 查看搜索结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

然后搜索 iphone

POST products/_search
{
  "query": {
    "term": {
      "desc": {
 	     "value": "iphone"
      }
    }
  }
}
// 搜索结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "products",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.9808292,
        "_source" : {
          "productID" : "XHDK-A-1293-#fJ3",
          "desc" : "iPhone"
        }
      }
    ]
  }
}

当搜索iPhone 的时候，并没有返回一条搜索到的结果，搜索iphone 的时候，才返回了一条数据。
这是因为之前上面说了。使用term搜索的时候，你ES对你搜索输入的词，并不会去做任何处理。所以你输入的不管是iPhone 还是iphone，他并不会去处理，但是当你添加数据的时候，ES默认会帮你对数据进行一些处理，比如说。大小写转换，去除语义词等等。

我们查看一下对iPhone默认的分词

GET products/_analyze
{
  "analyzer": "standard",
  "text": ["iPhone"]
}
// 返回结果
{
  "tokens" : [
    {
      "token" : "iphone",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

可以看到iPhone 变成了 iphone
因为term搜索时不会对搜索词进行处理，所以搜索iPhone就无法搜索到上面那条结果了

当我们对编号之类的进行搜索

POST /products/_search
{
  "query": {
    "term": {
      "productID": {
        // "value": "XHDK-A-1293-#fJ3"
         "value": "xhdk-a-1293-#fj3"
      }
    }
  }
}
// 当我们对上面编号进行搜索的时候，无论使用什么都不会被检索出来
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
// 那我们查看一下这个被ES处理成什么样的数据

GET products/_analyze
{
  "analyzer": "standard",
  "text": ["XHDK-A-1293-#fJ3"]
}
// 分词结果
{
  "tokens" : [
    {
      "token" : "xhdk",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "a",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "1293",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "<NUM>",
      "position" : 2
    },
    {
      "token" : "fj3",
      "start_offset" : 13,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}
// 那么我们对xhdk 进行搜索
POST /products/_search
{
  "query": {
    "term": {
      "productID": {
        "value": "xhdk"
      }
    }
  }
}
// 这个时候我们就查询到了结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "products",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "productID" : "XHDK-A-1293-#fJ3",
          "desc" : "iPhone"
        }
      }
    ]
  }
}
// 如果我们想要对这个编号精准搜索的话。那么我们只需要对他的keyword 进行搜索
POST /products/_search
{
    "query": {
      "term": {
        "productID.keyword": {
          "value": "XHDK-A-1293-#fJ3"
        }
      }
    }
}
// 返回结果
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "products",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "productID" : "XHDK-A-1293-#fJ3",
          "desc" : "iPhone"
        }
      }
    ]
  }
}

Term 不会做分词，如果希望做一个完全匹配的话，可以选择使用`keyword`

ES 会默认给Text的field添加一个keyword属性
Term搜索也会有用一个算分结果的

复合查询 - Constant Score 转为Filter

将 Query 转成 Filter，忽略 TF - IDF 计算，避免相关性算分的开销
Filter 是可以有效利用缓存的

举例

// 避免算分，节省系统开销
POST /products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "productID.keyword": "XHDK-A-1293-#fJ3"
        }
      }
    }
  }
}
// 查询结果 
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "products",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "productID" : "XHDK-A-1293-#fJ3",
          "desc" : "iPhone"
        }
      }
    ]
  }
}

基于全文本的查询

基于全文本的查找
1. Match Query / Match Phrase Query / Query String Query
特点
1. 索引和搜索时都会进行分词，查询字符串先传递到一个合适的分词器，然后生成一个供查询的词项列表
2. 查询时候，会先对输入的查询进行分词，然后每个词项逐个进行底层的查询，最终将结果进行合并，并为每个文档生成一个算分， - 例如查Matrix reloaded, 会查到包括 Matrix 或者 reload 的所有结果

// 之举列子了，前面文章有提到 
// Match Query 
POST /movies/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Matrix reloaded",
        "operator": "and"  // 表示必须存在 Matrix 和 reloaded 的 才能被搜索出来  or 的话，表示两个词语存在一个就可以了
      }
    }
  }
}
// Match Phrase Query
POST movies/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "Matrix reloaded",
        "slop": 5
      }
    }
  }
}