note
- 索引 和 搜索过程的 analyzer 应该保持一致
- analyzer包含:character fitlers, tokenizers, and token filters
- character fitlers: 单词转化,过滤。
- analyzer有0到多个character filters,按顺序执行
- tokenizer: 把输入处理成多个单词 term(单词),还负责记录 term 的位置信息。
- analyzer必须有一个 tokenizer
- Token filters: 负责 增、删、改 tokens。lowercase token filter:转化为小写; stop token filter: 删除stop words;synonym token filter: 增加同义词。toker filters 不会改变字符的偏移等信息。
- analyzer 有0或者多个token filters
- Analyzers
demo
custom
GET analyzer_index/_mapping
POST _analyze
{
"analyzer": "whitespace",
"text": "The quick brown fox."
}
POST _analyze
{
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ],
"text": "Is this déja vu?"
}
DELETE analyzer_index
#
# self define Custom analyzers
#
PUT analyzer_index
{
"settings": {
"analysis": {
"analyzer": {
"std_folded": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"my_text": {
"type": "text",
"analyzer": "std_folded"
}
}
}
}
GET