- 分析器是三种功能的封装
- Character filters
字符会首先被依次送到各种character filters,它们会在tokenization之前整理string,比如去掉html标记、把&转换为and等。 - Tokenizer
使用分词器把字符串分割为单独的词 - Token filters
被分词器分割出来的词会依次送到过滤器,过滤器可能会对词语进行修改(比如大写转换为小写),移除(例如a、and、the),添加(添加同义词)
- Character filters
- 测试分析器
GET /_analyze?analyzer=standard
Text to analyze
结果
{
"tokens": [
{
"token": "text",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "to",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "analyze",
"start_offset": 8,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 3
}
]
}