【Elasticsearch】Elasticsearch中的 自定分析器

【Elasticsearch】Elasticsearch中的 自定分析器


  • char_filter:es中字符过滤器,字符过滤器用于在将字符流传递给标记器之前对其进行预处理。属于Character filters reference
    官方文档,这里是我的一些简单小例子: char_filter使用
  • filter:标记过滤器接受来自标记器的标记流, 并且可以修改标记(例如小写)、删除标记(例如删除停用词)或添加标记(例如同义词)。
    官方文档
  • tokenizer:分词器接收一个字符流,将其分解为单个 令牌(通常是单个单词),并输出一个令牌流
    分词器,国人日常开发是用个 ik_max_word
    官方文档
#自定义分词
DELETE custom_analysis_index
PUT custom_analysis_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "& => and",
            "| => or"
          ]
        }
      },
      "filter": {
        "my_stopword": {
          "type": "stop",
          "stopwords": [
            "who",
            "the",
            "are",
            "at"
          ]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": "[ ,.!?]"
        }
      }, 
      "analyzer": {
        "my_analyzer":{
          "type":"custom",
          "char_filter":["my_char_filter"],
          "filter":["my_stopword"],
          "tokenizer":"my_tokenizer"
        }
      }
    }
  }
}

GET custom_analysis_index/_analyze
{
    "analyzer": "my_analyzer",
    "text": ["what is your name? I am kerry","who are you? i am green, i am a student at school"]
}

通过上面的不断组合,我么实现自定义一个解析器;

通过解析我么获取到是已经 过滤掉的内容

{
  "tokens": [
    {
      "token": "what",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 5,
      "end_offset": 7,
      "type": "word",
      "position": 1
    },
    {
      "token": "your",
      "start_offset": 8,
      "end_offset": 12,
      "type": "word",
      "position": 2
    },
    {
      "token": "name",
      "start_offset": 13,
      "end_offset": 17,
      "type": "word",
      "position": 3
    },
    {
      "token": "I",
      "start_offset": 19,
      "end_offset": 20,
      "type": "word",
      "position": 4
    },
    {
      "token": "am",
      "start_offset": 21,
      "end_offset": 23,
      "type": "word",
      "position": 5
    },
    {
      "token": "kerry",
      "start_offset": 24,
      "end_offset": 29,
      "type": "word",
      "position": 6
    },
    {
      "token": "you",
      "start_offset": 38,
      "end_offset": 41,
      "type": "word",
      "position": 109
    },
    {
      "token": "i",
      "start_offset": 43,
      "end_offset": 44,
      "type": "word",
      "position": 110
    },
    {
      "token": "am",
      "start_offset": 45,
      "end_offset": 47,
      "type": "word",
      "position": 111
    },
    {
      "token": "green",
      "start_offset": 48,
      "end_offset": 53,
      "type": "word",
      "position": 112
    },
    {
      "token": "i",
      "start_offset": 55,
      "end_offset": 56,
      "type": "word",
      "position": 113
    },
    {
      "token": "am",
      "start_offset": 57,
      "end_offset": 59,
      "type": "word",
      "position": 114
    },
    {
      "token": "a",
      "start_offset": 60,
      "end_offset": 61,
      "type": "word",
      "position": 115
    },
    {
      "token": "student",
      "start_offset": 62,
      "end_offset": 69,
      "type": "word",
      "position": 116
    },
    {
      "token": "school",
      "start_offset": 73,
      "end_offset": 79,
      "type": "word",
      "position": 118
    }
  ]
}

还需要注意:
在搜索时,通过下面参数依次检查搜索时使用的分词器:

  • 搜索时指定analyzer参数 创建mapping时指定字段的search_analyzer属性
  • 创建索引时指定setting的analysis.analyzer.default_search
  • 查看创建索引时字段指定的analyzer属性

详细可以看下这篇博客。
https://blog.csdn.net/u011250186/article/details/125704364

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值