Elasticsearch使用term精确查询，查询不到结果或获取结果不准的问题

本文探讨了在使用ik分词器进行内容索引时，如何针对不同查询需求选择合适的查询方式，包括term、match和wildcard等，以提高查询效率并确保结果准确性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

目标字段

 "content": {
     "type": "text",
     "analyzer":"ik_max_word", #对内容使用ik分词
     "fielddata": true #为了词频统计
 }

目标内容

content:"那我估计他应该喜欢西红柿"

查询

通过term:"估计"查询能够获取到对应值

"term":{"content":"估计"}

通过term:"估计他"无法获取到对应值的原因是，term查询内容为content字段分词后的索引

"term":{"content":"估计他"}

当前content中的索引为:

{

    "tokens": [

        {

            "token": "那我",

            "start_offset": 0,

            "end_offset": 2,

            "type": "CN_WORD",

            "position": 0

        },

        {

            "token": "估计",

            "start_offset": 2,

            "end_offset": 4,

            "type": "CN_WORD",

            "position": 1

        },

        {

            "token": "他",

            "start_offset": 4,

            "end_offset": 5,

            "type": "CN_CHAR",

            "position": 2

        },

        {

            "token": "应该",

            "start_offset": 5,

            "end_offset": 7,

            "type": "CN_WORD",

            "position": 3

        },

        {

            "token": "喜欢",

            "start_offset": 7,

            "end_offset": 9,

            "type": "CN_WORD",

            "position": 4

        },

        {

            "token": "西红柿",

            "start_offset": 9,

            "end_offset": 12,

            "type": "CN_WORD",

            "position": 5

        }

    ]

}

索引中不存在"估计他"，所以无法找到对应值
后面我试着为content添加附属字段keyword，希望能够借此达到目的:

 "content": {
     "type": "text",
     "analyzer":"ik_max_word",
     "fielddata": true,
     "fields": {
	"keyword": {
        "type": "keyword"
  		}
 	}
 },

再次查询

通过term:"估计他"查询仍然不能获取到对应值

"term":{"content.keyword":"估计他"}

后面发现只有包含全部内容的查询才能查询到结果，不满足我的要求

"term":{"content.keyword":"那我估计他应该喜欢西红柿"}

尝试match

然而使用match查询时得命中结果为包含"估计他"得内容，其实仍然不满足要求

"match":{"content":"估计他"}

尝试wildcard通配符匹配keyword整句

 "wildcard" : { "content.keyword" : "\*估计他\*" }

得到得查询结果是正确得
但是如果所有关键词都通过这样得查询方式会无法体现倒排索引的效率，
因此采取的方式是：
查询前先对关键词进行分词，如果分词结果中包含整个关键词的内容，使用term，不包含则使用wildcard

对"西便门"分词，分词结果如下，存在包含整个关键词内容的，使用term查询：term:"西便门"
{
    "tokens": [
        {
            "token": "西便门",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "便门",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}
对"估计他"分词，分词结果如下，不存在包含整个关键词内容的"估计他"，
使用wildcard查询，查询content的附属字段keyword:
	"wildcard" : { "content.keyword" : "*估计他*" }
	 也可以用"regexp" : { "content.keyword" : ".*估计他.*" }
{
    "tokens": [
        {
            "token": "估计",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "他",
            "start_offset": 2,
            "end_offset": 3,
            "type": "CN_CHAR",
            "position": 1
        }
    ]
}