ElasticSearch 设置自定义过滤器，查询时过滤标点符号

最新推荐文章于 2024-04-29 22:58:18 发布

zhangxl-jc

最新推荐文章于 2024-04-29 22:58:18 发布

阅读量585

点赞数

文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/zhangjiaxx/article/details/131460968

版权

官网文档链接为：

Pattern Replace Char Filter | Elasticsearch Guide [6.4] | Elastic

场景：

es集群存在一个旧表 my_test_v1.0, 该表存在一个字段 testContent，存在中英文混合以及标点符号，例如: cooperation.测试中国,<a，。

方案：

1.短语匹配查询 QueryBuilders.matchPhraseQuery() ；

2. wildcard 模糊查询；

3. 自定义过滤器；

这里介绍第三种：

1. 设置自定义过滤器

curl -X PUT "192.168.11.10:9201/my_test_v2.0?pretty" -H 'Content-Type: application/json' -d'
{
"settings":{
"analysis":{
"analyzer":{
"punctuation_analyzer":{
"type":"custom",
"tokenizer":"keyword",
"char_filter":[
"punctuation_filter"
]
}
},
"char_filter":{
"punctuation_filter":{
"type":"pattern_replace",
"pattern":"[\\p{Punct}\\pP]",
"replacement":""
}
}
}
}
}
'

2. #填加索引结构

curl -XPUT 'http://192.168.11.10:9201/my_test_v2.0/my_test/_mapping' -H 'Content-Type:application/json' -d '
{
"my_test":{
"properties":{
"testContent":{
"type":"text",
"analyzer":"ik_max_word",
"search_analyzer":"ik_max_word",
"fields":{
"raw":{
"type":"text",
"analyzer":"punctuation_analyzer",
"search_analyzer":"punctuation_analyzer"
}
}
}
}
}
}'

3.动态表结构索引设置方式：

curl -XPOST 'http://10.111.1.11:9200/_template/test_msg_template' -H 'Content-Type:application/json' -d'
{
"index_patterns":[
"test_msg_*"
],
"settings":{
"index":{
"number_of_shards":1,
"number_of_replicas":0,
"max_result_window":"1000000"
},
"analysis":{
"analyzer":{
"punctuation_analyzer":{
"type":"custom",
"tokenizer":"keyword",
"char_filter":[
"punctuation_filter"
]
}
},
"char_filter":{
"punctuation_filter":{
"type":"pattern_replace",
"pattern":"[\\p{Punct}\\pP]",
"replacement":""
}
}
}
},
"mappings":{
"test_msg":{
"properties":{
"msg":{
"type":"text",
"analyzer":"ik_max_word",
"search_analyzer":"ik_max_word",
"fields":{
"raw":{
"type":"text",
"analyzer":"punctuation_analyzer",
"search_analyzer":"punctuation_analyzer"
}
}
}
}
}
}
}'