1)环境准备
启动Elasticsearch https://blog.csdn.net/qq_36918149/article/details/104221934
启动Kinbana https://blog.csdn.net/qq_36918149/article/details/104224625
2)Character Filter
演示一:
#去除html标签
POST _analyze
{
"tokenizer":"keyword",
"char_filter":["html_strip"],
"text": "<b>hello world</b>"
}
结果,html标签已经被去除
演示二:
#使用char filter进行替换
POST _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type" : "mapping",
"mappings" : [ "- => _"]
}
],
"text": "123-456, I-test! test-990 650-555-1234"
}
结果显示,“-” 已替换为“_”,并进行了分词
3)Tokenizer
演示:
#按路径分词
POST _analyze
{
"tokenizer":"path_hierarchy",
"text":"/user/ymruan/a/b/c/d/e"
}
结果:路径被依次进行了分词
4)Token Filter
演示一:
# whitespace与stop
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["stop","snowball"],
"text": ["The rain in Spain falls mainly on the plain."]
}
结果:连接词被去掉以后的分词结果
4)总结
- 过滤器过滤顺序:char filter - >tokenizer -> token filter。
- 本章节在实际应用中还需要,仔细看一下api文档。