【Elasticsearch】Elasticsearch中的 char_filter 使用和讲解
char_filter
char_filter 在 Elasticsearch 中分为了3类 (html_strip、mapping、test_char_filter),并且在使用char_filter 的时候,在创建index的时候需要就制定对应的 analysis 和 analyzer ,一下讲解其基础的使用方法。
analysis 指定分析器下指定 analyzer 指定分词器
- html_strip:
其中:
1、一个是keyword 、类型为:“type”: “html_strip” 分词,一个字符串过滤器
2、escaped_tags 排除指定的标签
DELETE my_index
PUT my_index
{
"settings": {
"analysis": {
"char_filter": {
"test_char_filter": {
"type": "html_strip",
"escaped_tags":["a"]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"char_filter": "test_char_filter"
}
}
}
}
}
# 测试数据
GET my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "<p>I &aposm so <a>happy<a/>!"
}
结果:
- mapping:
DELETE my_index
# 2、映射类型:"type":"mapping"
# mappings 映射的相关数据 一般用于过滤敏感词汇
PUT my_index
{
"settings": {
"analysis": {
"char_filter": {
"test_char_filter": {
"type":"mapping",
"mappings":[
"擦你嘛 => ***"
]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"char_filter": "test_char_filter"
}
}
}
}
}
GET my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "擦你嘛"
}
结果
- pattern_replace:
DELETE my_index
# 3、正则替换类型:"type":"pattern_replace", 按照所指定的正则表达式进行替换
# ”replacement“ 替换对饮的值
PUT my_index
{
"settings": {
"analysis": {
"char_filter": {
"test_char_filter": {
"type":"pattern_replace",
"pattern":"(\\d{3})\\d{4}(\\d{3})",
"replacement":"$1***$2"
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"char_filter": "test_char_filter"
}
}
}
}
}
GET my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "我的电话是17603415057"
}
结果:
以上是简单的使用示例,具体可以看下官方文档:
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis.html