前提条件
搜索“申报”
结论:
-
优先级:先同义词,在分词
-
更推荐使用“,”方式的远程同义词,处于特殊场景则需要严谨使用“=>”方式的远程同义词
-
“=>” 方式同义词尽量避免出现同一个词在不同的词义中如下
申报 => 申报增值税,申报
申报 => 申报纳税,申报 -
“,” 方式同义词同理,尽量避免
analysis中filter可能存在缓存,可以尝试新定义一个或者重启
一、索引
PUT search_demo
{
"aliases" : {
},
"mappings" : {
"properties" : {
"article_no" : {
"type" : "text",
"analyzer" : "ik_max_custom",
"search_analyzer" : "ik_smart_synonym"
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "1",
"max_result_window" : "500000",
"analysis" : {
"filter" : {
"limit_len" : {
"type" : "length",
"min" : "1"
},
"synonym_filter" : {
"type" : "dynamic_synonym",
"synonyms_path" : "http://localhost/keyword/synonym",
"interval" : "30"
},
"synonym_comma_filter" : {
"type" : "dynamic_synonym",
"synonyms_path" : "http://localhost/keyword/synonymComma",
"interval" : "10"
},
"my_synonyms": {
"type": "synonym",
"synonyms": [
"申报,申报增值税,申报纳税,报税"
]
},
"my_syno": {
"type": "synonym",
"synonyms": [
"报税 => 申报,报税",
"申报 => 申报增值税,申报纳税,申报"
]
}
},
"analyzer" : {
"default_pattern_analyzer" : {
"type" : "pattern"
},
"ik_smart_syno" : {
"filter" : [
"limit_len",
"my_synonyms"
],
"char_filter" : [ ],
"tokenizer" : "ik_smart"
},
"ik_smart_my_syno" : {
"filter" : [
"limit_len",
"my_syno"
],
"char_filter" : [ ],
"tokenizer" : "ik_smart"
},
"ik_smart_my_comma" : {
"filter" : [
"limit_len",
"synonym_comma_filter"
],
"char_filter" : [ ],
"tokenizer" : "ik_smart"
},
"ik_smart_synonym" : {
"filter" : [
"limit_len",
"synonym_filter"
],
"char_filter" : [ ],
"tokenizer" : "ik_smart"
},
"ik_max_custom" : {
"filter" : [
"limit_len",
"synonym_filter"
],
"char_filter" : [
"html_strip"
],
"tokenizer" : "ik_max_word"
}
}
},
"number_of_replicas" : "1"
}
}
}
二、ik分词词典
三、“=>”方式同义词
申报 => 申报增值税,申报纳税,申报
远程同义词 dynamic_synonym
本地同义词
总结:因本地同义词比较单一,远程同义词,词体很多,比较复杂,可能出现意想不到的效果,而且“=>”这种方式的同义词,分词后可能有问题,使用需谨慎。更推荐“,”方式同义词。根据使用场景需要严谨测试
analysis中filter可能存在缓存,可以尝试新定义一个或者重启
四、“,”方式同义词
申报,申报增值税
远程同义词 dynamic_synonym
完全符合预期结果,与“=>”方式截然不同
analysis中filter可能存在缓存,可以尝试新定义一个或者重启
本地词典方式
与远程结果一致