elasticsearch常用命令总结(三)
目录
1 标准分词器
POST _analyze
{
"analyzer": "standard",
"text": "The quick brown fox."
}
# 输出:
{
"tokens" : [
{
"token" : "上",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "海",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "大",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
}
]
}
标准分词器只对英文支持比较好
但是对汉语支持性不好,所以我们要用 IK分词器
2. 安装IK分词器
2.1 下载ik分词器插件
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
2.2 将分词器解压到 es的plugins/ik 文件夹内
由于我们之前时用docker 安装的es, 且映射了/data/elasticsearch/plugins
# 新建目录
mkdir -p /data/elasticsearch/plugins/ik
# 将elasticsearch-analysis-ik-7.4.2.zip
# 解压到/data/elasticsearch/plugins/ik 内
unzip elasticsearch-analysis-ik-7.4.2.zip -d /data/elasticsearch/plugins/ik/
# 将elasticsearch-analysis-ik-7.4.2.zip
2.3 测试ik 是否安装成功
# 进入到docker中
docker exec -it ebbb6ee33542 /bin/bash
# 执行检测是否安装plugins
/usr/share/elasticsearch/bin/elasticsearch-plugin list
2.4 重启elasticsearch
docker restart ebbb6ee33542
3. 使用 IK分词器
3.1 智能分词器 ik_smart
POST _analyze
{
"analyzer": "ik_smart",
"text": "上海大"
}
# 输出
{
"tokens" : [
{
"token" : "上海",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "大",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 1
}
]
}
3.2 智能分词器 ik_max_word
POST _analyze
{
"analyzer": "ik_max_word",
"text": "我是中国人"
}
# 输出
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中国人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "中国",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "国人",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 4
}
]
}
4. 自定义词库扩展
4.1 安装nginx
4.2 配置nginx
新建es目录
mkdir -p /data/nginx/html/es
新建分词器的自定义文件
vi /data/nginx/html/es/fenci.txt
内容如下:
小黄人
尚硅谷
访问测试
http://192.168.103.129/es/fenci.txt
4.3 配置IK
# 打开文件
vi /data/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
# 原配置:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
修改配置
# 修改 将该行注释打开并配置
<entry key="remote_ext_dict">http://192.168.103.129/es/fenci.txt</entry>
4.4 重启es
docker restart ebbb6ee33542
4.5 测试
POST _analyze
{
"analyzer": "ik_max_word",
"text": "尚硅谷项目"
}
输出结果:
{
"tokens" : [
{
"token" : "尚硅谷",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "硅谷",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "项目",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}
]
}