1. 简介
normalizer和analyzer类似,不同的是normalizer 只会产出单个的token,所以normalizer 不包含tokenizer,只是由char_filter和token filter构成,因为normalnizer只会处理单个token,所以只有那些针对每个char工作的token filter是可以用来构建normalizer的,比如 lowercase filter是可以的,但是stemer filter则是不行的。
目前可以在normalizer中使用的filter有
arabic_normalization,
asciifolding,
bengali_normalization,
cjk_width,
decimal_digit,
elision,
german_normalization,
hindi_normalization,
indic_normalization,
lowercase,
persian_normalization,
scandinavian_folding,
serbian_normalization,
sorani_normalization,
uppercase.
一个样例
PUT index
{
"settings": {
"analysis": {
"char_filter": {
"quote": {
"type": "mapping",
"mappings": [
"« => \"",
"» => \""
]
}
},
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": ["quote"],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}