昨天在测试 match 查询用法的时候,随便输入了一段
twenty British women made history
这样一段文本,结果却查不出来记录。明显是有问题的,这段文本是从 es 记录中拷贝的一句,怎么就查不出来了呢?
后来看了下分词器用的是 ik_smart ,是不是中文分词器对英文支持的不够好?后来试了下 standard 分词器,是可以查出来的。
刚才试了下分词结果,发现了问题所在:
GET /_analyze
{
"analyzer": "ik_smart",
"text": "In 1997, a group of twenty British women made history. Working "
}
返回结果是:
{
"tokens": [
{
"token": "1997",
"start_offset": 3,
"end_offset": 7,
"type": "LETTER",
"position": 0
},
{
"token": "group",
"start_offset": 11,
"end_offset": 16,
"type": "ENGLISH",
"position": 1
},
{
"token": "twenty",
"start_offset": 20,
"end_offset": 26,
"type": "ENGLISH",
"position": 2
},
{
"token": "british",
"start_offset": 27,
"end_offset": 34,
"type": "ENGLISH",
"position": 3
},
{
"token": "women",
"start_offset": 35,
"end_offset": 40,
"type": "ENGLISH",
"position": 4
},
{
"token": "made",
"start_offset": 41,
"end_offset": 45,
"type": "ENGLISH",
"position": 5
},
{
"token": "history.",
"start_offset": 46,
"end_offset": 54,
"type": "LETTER",
"position": 6
},
{
"token": "working",
"start_offset": 55,
"end_offset": 62,
"type": "ENGLISH",
"position": 7
}
]
}
原来啊,ik_smart 把英文句号 . 和其前面的单词(history.)分词在一起了,所以就查不出来结果了。