对大字段在设计mapping时,添加term_vector参数,如下:
"description": {
"similarity": "customize_bm25",
"type": "text",
"store": true,
"analyzer": "my_jieba_index_analyzer",
"search_analyzer": "my_jieba_search_analyzer",
"term_vector" : "with_positions_offsets"
}
配置该参数后,能明显看到高亮速度快了很多。
但是,当输入某些查询词时,可能会遇到如下错误:
错误Lucense解析字段中的空格导致的。
解决方案:把空格term,使用filter过滤掉。
但是,在添加空格filter时,发现一个问题,就是使用jieba分词器,就算添加了如下filter过滤器,也没办法过滤到空格term:
"my_stop_filter": {
"ignore_case": "true",
"type": "stop",
"stopwords": [
" ",
"的",
"得",
"地"
]
},
而使用ik分词器是可以,所以就转战ik了。定义了两个解析器,如下:
"my_ik_index_analyzer": {
"filter": [
"my_stop_filter"
],
"type": "custom",
"tokenizer": "ik_max_word"
},
"my_ik_search_analyzer": {
"filter": [
"my_stop_filter"
],
"type": "custom",
"tokenizer": "ik_smart"
}
大字段mapping定义如下:
"description": {
"similarity": "customize_bm25",
"type": "text",
"store": true,
"analyzer": "my_ik_index_analyzer",
"search_analyzer": "my_ik_search_analyzer",
"term_vector" : "with_positions_offsets"
}
如此,上述报错就会消失。
done......