解决ES默认不按照下划线分词的方法

es默认的standard分词器不会按照下划线去分词

GET my_index/_analyze
{
    "analyzer": "standard", 
    "text": "yi_yuan_ordersvc_person_comp_inter_s1104_ISubmitProdCoSvc_prodDataSubmit"
}


#解析结果:
{
  "tokens" : [
    {
      "token" : "yi_yuan_ordersvc_person_comp_inter_s1104_isubmitprodcosvc_proddatasubmit",
      "start_offset" : 0,
      "end_offset" : 75,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

通过自定义分词可以解决这一问题

#自定义分词
PUT /my_index2
{
  "settings": {
    "analysis": {
      "char_filter": {
        "XtoS": {
          "type": "mapping",
          "mappings": ["_=>|"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["XtoS"],
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  }
}

#测试分词效果
GET my_index/_analyze
{
    "analyzer": "my_analyzer", 
    "text": "yi_yuan_ordersvc_person_comp_inter_s1104_ISubmitProdCoSvc_prodDataSubmit"
}


#分词结果
{
  "tokens" : [
    {
      "token" : "com",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "sitech",
      "start_offset" : 4,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "ordersvc",
      "start_offset" : 11,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "person",
      "start_offset" : 20,
      "end_offset" : 26,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "comp",
      "start_offset" : 27,
      "end_offset" : 31,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "inter",
      "start_offset" : 32,
      "end_offset" : 37,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "s1104",
      "start_offset" : 38,
      "end_offset" : 43,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "isubmitprodcosvc",
      "start_offset" : 44,
      "end_offset" : 60,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "proddatasubmit",
      "start_offset" : 61,
      "end_offset" : 75,
      "type" : "<ALPHANUM>",
      "position" : 8
    }
  ]
}
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值