官网文档链接为:
Pattern Replace Char Filter | Elasticsearch Guide [6.4] | Elastic
场景:
es集群存在一个旧表 my_test_v1.0, 该表存在一个字段 testContent,存在中英文混合以及标点符号, 例如: cooperation.测试中国,<a,。
方案:
1.短语匹配查询 QueryBuilders.matchPhraseQuery() ;
2. wildcard 模糊查询;
3. 自定义过滤器;
这里介绍第三种:
1. 设置自定义过滤器
curl -X PUT "192.168.11.10:9201/my_test_v2.0?pretty" -H 'Content-Type: application/json' -d'
{
"settings":{
"analysis":{
"analyzer":{
"punctuation_analyzer":{
"type":"custom",
"tokenizer":"keyword",
"char_filter":[
"punctuation_filter"
]
}
},
"char_filter":{
"punctuation_filter":{
"type":"pattern_replace",
"pattern":"[\\p{Punct}\\pP]",
"replacement":""
}
}
}
}
}
'
2. #填加索引结构
curl -XPUT 'http://192.168.11.10:9201/my_test_v2.0/my_test/_mapping' -H 'Content-Type:application/json' -d '
{
"my_test":{
"properties":{
"testContent":{
"type":"text",
"analyzer":"ik_max_word",
"search_analyzer":"ik_max_word",
"fields":{
"raw":{
"type":"text",
"analyzer":"punctuation_analyzer",
"search_analyzer":"punctuation_analyzer"
}
}
}
}
}
}'
3.动态表结构索引设置方式:
curl -XPOST 'http://10.111.1.11:9200/_template/test_msg_template' -H 'Content-Type:application/json' -d'
{
"index_patterns":[
"test_msg_*"
],
"settings":{
"index":{
"number_of_shards":1,
"number_of_replicas":0,
"max_result_window":"1000000"
},
"analysis":{
"analyzer":{
"punctuation_analyzer":{
"type":"custom",
"tokenizer":"keyword",
"char_filter":[
"punctuation_filter"
]
}
},
"char_filter":{
"punctuation_filter":{
"type":"pattern_replace",
"pattern":"[\\p{Punct}\\pP]",
"replacement":""
}
}
}
},
"mappings":{
"test_msg":{
"properties":{
"msg":{
"type":"text",
"analyzer":"ik_max_word",
"search_analyzer":"ik_max_word",
"fields":{
"raw":{
"type":"text",
"analyzer":"punctuation_analyzer",
"search_analyzer":"punctuation_analyzer"
}
}
}
}
}
}
}'