elasticsearch 使用 ik分词器

SteveGao2013

已于 2023-05-07 23:13:48 修改

阅读量1k

点赞数

文章标签： elasticsearch 大数据搜索引擎

于 2023-04-19 14:56:41 首次发布

本文链接：https://blog.csdn.net/gyk163/article/details/130232242

版权

在elasticsearch全文搜索中，如果需要用到中文分词，可以选择默认的分词器，但是默认分词器的分词效果不太好，我们可以选择ik分词器。

ik分词器支持的版本,目前我们基本都是根据elasticsearch 的版本选择对应的ik分词器版本，

目前使用elasticsearch-7.16.0，那么分词器也选择7.16.0，下面是对应的版本选择

es常用数据类型

字段的数据类型由字段的属性type指定，ElasticSearch支持的基础数据类型主要有：

字符串类型：keyword和text。（在5.0之后更改，原来为string）。
数值类型：字节（byte）、2字节（short）、4字节（integer）、8字节（long）、float、double；
布尔类型：boolean，值是true或false；
时间/日期类型：date，用于存储日期和时间；
二进制类型：binary；
IP地址类型：ip，以字符串形式存储IPv4地址；
特殊数据类型：token_count，用于存储索引的字数信息

安装ik分词器有两种方法

1、直接下载对应分词器压缩包然后解压到对应目录,安装完成后重启es

cd /data/es/elasticsearch-7.16.0-node-1/plugins/
mkdir ik
cd ik
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.16.0/elasticsearch-analysis-ik-7.16.0.zip

unzip elasticsearch-analysis-ik-7.16.0.zip

2、使用 elasticsearch-plugin 安装（从 v5.5.1 版本支持）,安装完成重启es

cd /data/es/elasticsearch-7.16.0-node-1/

./bin/elasticsearch-plugin install wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.16.0/elasticsearch-analysis-ik-7.16.0.zip

测试分词器

{
    "analyzer" : "ik_max_word",
    "text": "河北省石家庄市高新区虚度大道"
}

返回结果：

{
    "tokens": [
        {
            "token": "河北省",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "河北",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "省",
            "start_offset": 2,
            "end_offset": 3,
            "type": "CN_CHAR",
            "position": 2
        },
        {
            "token": "石家庄市",
            "start_offset": 3,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "石家庄",
            "start_offset": 3,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "家庄",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "市",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 6
        },
        {
            "token": "高新区",
            "start_offset": 7,
            "end_offset": 10,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "高新",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 8
        },
        {
            "token": "新区",
            "start_offset": 8,
            "end_offset": 10,
            "type": "CN_WORD",
            "position": 9
        },
        {
            "token": "虚度",
            "start_offset": 10,
            "end_offset": 12,
            "type": "CN_WORD",
            "position": 10
        },
        {
            "token": "大道",
            "start_offset": 12,
            "end_offset": 14,
            "type": "CN_WORD",
            "position": 11
        }
    ]
}

在使用ElasticSearch的时候，我们会牵扯到很多的请求方法，比如GET,POST,PUT,DELETE等等，这些方法使用的都是Restful的调用风格，我们来简单介绍下这些方法

GET 请求：获取服务器中的对象
- 相当于SQL的Select命令
- GET /test_analysis 获取所有的test_analysis信息，默认查询10条
POST 请求：在服务器上更新对象
- 相当于SQL的update命令
- POST /test_analysis/1 更新id为1的test_analysis的信息
PUT 请求：在服务器上创建对象
- 相当于SQL的create命令
- PUT /test_analysis/id 创建一个id为xx的数据
DELETE 请求：删除服务器中的对象HEAD 请求：仅仅用于获取对象的基础信息
- 相当于sql中的delete命令
- DELETE /test_analysis/1 删除id为1的数据

创建索引并指定分词器

http://127.0.0.1:9200/test_analysis?pretty

PUT /test_analysis

{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

查看索引映射：

curl http://127.0.0.1:9200/test_analysis/_mapping?pretty

删除索引

curl -X DELETE 'http://127.0.0.1:9200/test_analysis?pretty=null'

新增数据

curl -X GET 'http://elastic:dsydnn@127.0.0.1:9200/test_analysis/_doc?pretty=null' \
-H 'Content-Type: application/json' \
-d '
{
"content" : "我是中国人"
}'

查询：

curl  -X GET 'http://127.0.0.1:9200/test_analysis/_search?pretty=null' \
-H'Content-Type: application/json' \
-d '{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "content": "中"
                    }
                }
            ]
        }
    }
}'