Elasticsearch 5.5.1 中文/拼音分词亲测有效

最新推荐文章于 2024-07-28 17:12:40 发布

Leon0204

最新推荐文章于 2024-07-28 17:12:40 发布

阅读量2.7k

点赞数

分类专栏： ElasticSearch ElasticSearch分布式查询

本文链接：https://blog.csdn.net/qq_28018283/article/details/80396937

版权

ElasticSearch分布式查询同时被 2 个专栏收录

25 篇文章 1 订阅

订阅专栏

ElasticSearch

10 篇文章 0 订阅

订阅专栏

所有不说明elastic 版本的博客都是耍流氓。 ——某码农

原文链接

版本如题。拼音和中文分词一起的整个测试流程如下：

预备删除 index

DELETE /index_name/
{
}

创建一个 index_name 的 index

PUT /index_name/
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_max_word",
                    "filter": ["my_pinyin", "word_delimiter"]
                }
            },
            "filter": {
                "my_pinyin": {
                    "type": "pinyin",
                    "first_letter": "prefix",
                    "padding_char": " "
                }
            }
        }
    }
}

修改 type 的 mapping

PUT /index_name/app/_mapping
{
    "app": {
        "properties": {
            "ProductCName": {
                "type": "keyword",
                "fields": {
                    "pinyin": {
                        "type": "text",
                        "store": false,
                        "term_vector": "with_positions_offsets",
                        "analyzer": "ik_pinyin_analyzer",
                        "boost": 10
                    }
                }
            },
            "ProductEName":{  
                "type":"text",  
                "analyzer": "ik_max_word"  
            },
            "Description":{  
                "type":"text",  
                "analyzer": "ik_max_word"  
            }
        }
    }
}

创建测试数据

PUT /index_name/app/1
{
  "ProductCName":"口红世家",
  "ProductEName":"Red History",
  "Description":"口红真是很棒的东西呢"
}

测试拼音分词效果

POST /index_name/_analyze?pretty
{
  "analyzer": "pinyin",
  "text":"王者荣耀"
}

{
  "tokens": [
    {
      "token": "wang",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "wzry",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "zhe",
      "start_offset": 1,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "rong",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "yao",
      "start_offset": 3,
      "end_offset": 4,
      "type": "word",
      "position": 3
    }
  ]
}