Elasticsearch 5.4.3 ik分词、pinyin分词配置

  • ik分词器安装

下载ik分词器:https://github.com/medcl/elasticsearch-analysis-ik [ik与Elasticsearch版本一致]

把elasticsearch-analysis-ik-5.4.3.zip,解压后的文件拷贝到elasticsearch-5.4.3/plugins/。

mkdir /opt/ik
unzip elasticsearch-analysis-ik-5.4.3.zip -d /opt/ik
mv /opt/ik {ES_HOME}/plugins

重启es,ik分词器安装完成

  • pinyin分词器安装

pinyin分词器安装,相对复杂。要自己进行源码的编译打包。

下载源码、编译源码:

git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git
cd elasticsearch-analysis-pinyin
mvn clean install -Dmaven.test.skip

安装pinyin分词器:

cd target/releases
unzip elasticsearch-analysis-pinyin-5.5.1.zip
mv elasticsearch elasticsearch-analysis-pinyin
mv elasticsearch-analysis-pinyin {ES_HOME}/plugins

重启es,pinyin分词器安装完成

  • 创建索引[index]

创建索引,并设置index分析器相关属性:

curl -XPUT "http://localhost:9200/medcl/" -d'
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_smart",
                    "filter": ["my_pinyin", "word_delimiter"]
                }
            },
            "filter": {
                "my_pinyin": {
                    "type": "pinyin",
                    "first_letter": "prefix",
                    "padding_char": " "
                }
            }
        }
    }
}'

 

  • 创建类型[mapping]

创建一个type并设置mapping:

curl -XPOST http://localhost:9200/medcl/folks/_mapping -d'
{
    "folks": {
        "properties": {
            "name": {
                "type": "keyword",
                "fields": {
                    "pinyin": {
                        "type": "text",
                        "store": "no",
                        "term_vector": "with_positions_offsets",
                        "analyzer": "ik_pinyin_analyzer",
                        "boost": 10
                    }
                }
            }
        }
    }
}'

 

  • 创建文档

创建两份文档

curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"刘德华"}'
curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中华人民共和国国歌"}'

 

  • 测试pinyin分词

下面四个查询请求都能查询出“刘德华”

curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"

查询结果示例:

{
    "took": 2, 
    "timed_out": false, 
    "_shards": {
        "total": 5, 
        "successful": 5, 
        "failed": 0
    }, 
    "hits": {
        "total": 1, 
        "max_score": 0.85669875, 
        "hits": [
            {
                "_index": "medcl", 
                "_type": "folks", 
                "_id": "andy", 
                "_score": 0.85669875, 
                "_source": {
                    "name": "刘德华"
                }
            }
        ]
    }
}

 

  • 测试ik分词

发送请求:

curl -XPOST "http://172.30.250.164:9200/medcl/_search?pretty" -d'
{
  "query": {
    "match": {
      "name.pinyin": "国歌"
    }
  },
  "highlight": {
    "fields": {
      "name.pinyin": {}
    }
  }
}'

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 9.507006,
    "hits" : [
      {
        "_index" : "medcl",
        "_type" : "folks",
        "_id" : "tina",
        "_score" : 9.507006,
        "_source" : {
          "name" : "中华人民共和国国歌"
        },
        "highlight" : {
          "name.pinyin" : [
            "<em>中华人民共和国</em><em>国歌</em>"
          ]
        }
      }
    ]
  }
}

 

  • 测试ik+pin分词

发送请求:

curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d'
{
  "query": {
    "match": {
      "name.pinyin": "zhonghua"
    }
  },
  "highlight": {
    "fields": {
      "name.pinyin": {}
    }
  }
}'

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 6.188843,
    "hits" : [
      {
        "_index" : "medcl",
        "_type" : "folks",
        "_id" : "tina",
        "_score" : 6.188843,
        "_source" : {
          "name" : "中华人民共和国国歌"
        },
        "highlight" : {
          "name.pinyin" : [
            "<em>中华人民共和国</em>国歌"
          ]
        }
      },
      {
        "_index" : "medcl",
        "_type" : "folks",
        "_id" : "3",
        "_score" : 3.0490103,
        "_source" : {
          "@timestamp" : "2017-07-13T06:42:00.203Z",
          "last_modify_time" : "2017-07-13T02:52:53.000Z",
          "name" : "可能猜到可以使用iterator来删除循环中的元素",
          "@version" : "1",
          "id" : 3,
          "type" : "jdbc"
        },
        "highlight" : {
          "name.pinyin" : [
            "可能猜到可以使用iterator来删除循<em>环中</em>的元素"
          ]
        }
      },
      {
        "_index" : "medcl",
        "_type" : "folks",
        "_id" : "andy",
        "_score" : 0.22534128,
        "_source" : {
          "name" : "刘德华"
        },
        "highlight" : {
          "name.pinyin" : [
            "<em>刘德华</em>"
          ]
        }
      }
    ]
  }
}

Ps:由于测试库多加几个文档,可以忽略返回结果中的,第二条结果。在该博客中并没有加入。

转载于:https://my.oschina.net/panswforlldx/blog/1493062

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值