一. IK分词器
1. 什么是IK分词器
分词: 即把一段中文或者别的划分成一个个的关键字, 我们在搜索时候会把自己的信息进行分词, 会把数据库中或者索引库中的数据进行分词, 然后进行一个匹配操作, 默认的中文分词是将每个字看成一个词, 比如 “我爱狂神” 会被分为 “我”,“爱”,“狂”,“神” , 这显然是不符合要求的, 所以我们需要安装中文分词器 ik 来解决这个问题。
2. 分词算法
IK 提供了两个分词算法: ik_ smart 和 ik_ max_ word ,
其中 ik_ smart 为最少切分, ik_ max_ _word 为最细粒度划分!
1. 最少切分: ik_smart
-
命令
GET _analyze { "analyzer": "ik_smart", "text": "我是社会主义接班人" }
-
结果
{ "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "社会主义", "start_offset" : 2, "end_offset" : 6, "type" : "CN_WORD", "position" : 2 }, { "token" : "接班人", "start_offset" : 6, "end_offset" : 9, "type" : "CN_WORD", "position" : 3 } ] }
2. 最细粒度划分: ik_max_word
-
命令
GET _analyze { "analyzer": "ik_max_word", "text": "我是社会主义接班人" }
-
结果
{ "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "社会主义", "start_offset" : 2, "end_offset" : 6, "type" : "CN_WORD", "position" : 2 }, { "token" : "社会", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 3 }, { "token" : "主义", "start_offset" : 4, "end_offset" : 6, "type" : "CN_WORD", "position" : 4 }, { "token" : "接班人", "start_offset" : 6, "end_offset" : 9, "type" : "CN_WORD", "position" : 5 }, { "token" : "接班", "start_offset" : 6, "end_offset" : 8, "type" : "CN_WORD", "position" : 6 }, { "token" : "人", "start_offset" : 8, "end_offset" : 9, "type" : "CN_CHAR", "position" : 7 } ] }
二. 命令模式的使用
1. Rest风格说明
Method | URL地址 | 描述 |
---|---|---|
PUT | localhost:9200/索引名称/类型名称/文档id | 创建文档(指定文档id) |
POST | localhost:9200/索引名称/类型名称 | 创建文档(随机文档id) |
POST | localhost:9200/索引名称/类型名称/文档id/_update | 修改文档 |
DELETE | localhsot:9200/索引名称/类型名称/文档id | 删除文档 |
GET | localhost:9200/索引名称/类型名称/文档id | 通过文档id查询文档 |
POST | localhost:9200/索引名称/类型名称/_search | 查询所有的数据 |
2. 基础测试
-
创建索引
PUT es1 { "mappings": { "properties": { "name": { "type": "text" }, "age": { "type": "integer" }, "birthday": { "type": "date" } } } } # 结果 { "acknowledged" : true, "shards_acknowledged" : true, "index" : "es1" }
-
删除索引
# 命令 DELETE es1 # 结果 { "acknowledged" : true }
-
创建一个文档
localhost:9200/索引名称/类型名称/文档id
# 命令 PUT /es/test/1 { "name": "张三", "age": 22 } # 结果 { "_index" : "es", "_type" : "test", "_id" : "1", "_version" : 1, "result" : "created", // 创建 "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }
-
查看文档
# 指令 GET es GET es/test/1 # 结果 { "_index" : "es", "_type" : "test", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "张三", "age" : 22 } }
-
修改文档
# 指令1: PUT: 必须包含为修改的数据, 否则会丢失数据 PUT es/test/1 { "name": "李四", "age": 17 } GET es/test/1 # 结果1 { "_index" : "es", "_type" : "test", "_id" : "1", "_version" : 2, // 版本号会递增 "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "name" : "李四", "age" : 17 } } # 指令2: POST: 只需要包含修改部分的数据 POST es/test/1/_update { "doc": { "name": "王五" } } GET es/test/1 # 结果2 { "_index" : "es", "_type" : "test", "_id" : "1", "_version" : 3, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "name" : "王五", "age" : 17 } }
三. 查询
1. 简单查询
-
查询指定字段
# 查询nme中包含四 或 五的数据 GET /es/test/_search { "query": { "match": { "name": "四 五" } } }
-
只显示指定属性
# 方法1: 只显示name属性的值 GET /es/test/_search { "query": { "match": { "name": "四 五" } }, // 包含指定的属性 "_source": ["name"] } # 方法2: GET /es/test/_search { "query": { "match": { "name": "四 五" } }, "_source": { // includes: 包含指定的属性 "includes": ["name"] } }
-
过滤指定字段的值
# 命令 GET /es/test/_search { "query": { "match": { "name": "四 五" } }, "_source": { // excludes: 不包含指定的属性 "excludes": "age" } }
-
排序
# 按照age倒序排序 GET /es/test/_search { "sort": { "age": { "order": "desc" } } }
-
分页
GET /es/test/_search { "sort": { "age": { "order": "desc" } }, // 从下标2开始读取3条数据 "from": 2, "size": 3 }
2. 多条件查询
-
must(and): 所有条件都要满足
# 查询出name中包含"王" 且 age = 23的数据 GET /es/test/_search { "query": { "bool": { "must": [ { "match": { "name": "王" } }, { "match": { "age": "23" } } ] } } }
-
should(or): 满足其中一个条件即可
# 查询出name中包含"王" 或 age = 25的数据 GET /es/test/_search { "query": { "bool": { "should": [ { "match": { "name": "王" } }, { "match": { "age": "25" } } ] } } }
-
must_not(not): 过滤掉指定数据
# must_not: 过滤掉指定数据 GET /es/test/_search { "query": { "bool": { "must_not": [ { "match": { "name": "王" } }, { "match": { "age": "25" } } ] } } }
-
条件区间
gt: 大于; gte: 大于等于; lt: 小于; lte: 小于等于
# 条件区间 GET /es/test/_search { "query": { "bool": { "filter": { "range": { "age": { "gt": 22, "lt": 25 } } } } } }
-
匹配多个条件
# 匹配多个条件, 满足其中一个条件即可 GET /es/test/_search { "query": { "match": { "name": "张 王" } } }
四. 分词
1. 说明
- term: 精确查询
- match: 会使用分词器解析
2. 例子
-
创建索引
# text: 会做分词查询 # keyword: 不会分词搜索 PUT /t1 { "mappings": { "properties": { "name": { "type": "text" }, "content": { "type": "keyword" } } } }
-
term: 只会获取精确匹配的数据
GET /t1/_doc/_search { "query": { "bool": { "should": [ { "term": { "name": "用户" } }, { "term": { "content": "呼吸" } } ] } } } # 结果 { "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.6931471, "hits" : [ { "_index" : "t1", "_type" : "_doc", "_id" : "1", "_score" : 0.6931471, "_source" : { "name" : "用户1", "content" : "呼吸" } } ] } }
-
match: 会使用分词器解析
GET /t1/_doc/_search { "query": { "bool": { "should": [ { "match": { // 虽然是match, 但是content是keyword类型, 所以仍然不会分词查询 "content": "呼吸" } } ] } } } #结果 { "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.6931471, "hits" : [ { "_index" : "t1", "_type" : "_doc", "_id" : "1", "_score" : 0.6931471, "_source" : { "name" : "用户1", "content" : "呼吸" } } ] } }
五. 高亮
-
例子
GET /t1/_search { "query": { "bool": { "should": [ { "match": { "name": "3" } } ] } }, "highlight": { // 自定义高亮样式 "pre_tags": "<strong style='color:red'>", "post_tags": "</strong>", "fields": { "name": {} } } } # 结果 { "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.2039728, "hits" : [ { "_index" : "t1", "_type" : "_doc", "_id" : "3", "_score" : 1.2039728, "_source" : { "name" : "用户3", "content" : "呼吸呼吸" }, "highlight" : { "name" : [ "用户<strong style='color:red'>3</strong>" ] } } ] } }