目录
1、一个是通过使用 REST request URI 发送搜索参数(uri+检索参数)
2、Query DSL :另一个是通过使用 REST request body 来发送它们(uri+请求体)推荐
小提示
如果你重启了虚拟机或者服务器,那么可能会发现访问es服务器还有kibana都失败,这是因为重启虚拟机或者服务器,docker容器并不会自动开启。怎么证明呢?
通过docker ps命令查看,证明确实没有运行es容器
于是我们只需要通过docker start es容器名 即可开启容器
如何让服务器或虚拟机重启自动开启es容器?
docker update es容器名 --restart=always
一、ES 支持两种基本方式检索
1、一个是通过使用 REST request URI 发送搜索参数(uri+检索参数)
GET bank/_search?q=*&sort=account_number:asc
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "0",
"_score" : null,
"_source" : {
"account_number" : 0,
"balance" : 16623,
"firstname" : "Bradshaw",
"lastname" : "Mckenzie",
"age" : 29,
"gender" : "F",
"address" : "244 Columbus Place",
"employer" : "Euron",
"email" : "bradshawmckenzie@euron.com",
"city" : "Hobucken",
"state" : "CO"
},
"sort" : [
0
]
},
.........................
]
}
}
2、Query DSL :另一个是通过使用 REST request body 来发送它们(uri+请求体)推荐
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"account_number": {"order": "desc"}
}
]
}
二、 详解:Qurey DSL
1、基本语法格式
Elasticsearch 提供了一个可以执行查询的 Json 风格的 DSL(domain-specific language 领域特定语言)。这个被称为 Query DSL。该查询语言非常全面,并且刚开始的时候感觉有点复杂,
真正学好它的方法是从一些基础的示例开始的。
一个查询语句的典型结构
{
QUERY_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
GET bank/_search
{
"query": {
"match_all": {}
}
}
如果是针对某个字段,那么它的结构如下:
{
QUERY_NAME: {
FIELD_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
}
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"balance": {
"order": "desc"
}
}
]
}
<!-- 简写方式 -->
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"balance":"desc"
}
]
}
<!-- from size相当于mysql中的limit x,x -->
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"balance": {
"order": "desc"
}
}
],
"from": 0,
"size": 5
}
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"balance": {
"order": "desc"
}
}
],
"from": 0,
"size": 5,
"_source": ["balance","firstname"]
}
小总结
- query 定义如何查询,
- match_all 查询类型【代表查询所有的所有】,es 中可以在 query 中组合非常多的查
- 询类型完成复杂查询
- 除了 query 参数之外,我们也可以 传递其它的参数以改变查询结果。如 sort,size
- from+size 限定,完成分页功能
- sort 排序,多字段排序,会在前序字段相等时后续字段内部排序,否则以前序为准
- _source查询出要显示的字段,如果有多个字段,用中括号[]接收
2、继续体会操作: match 【匹配查询】
2.1 基本类型(非字符串),精确匹配
GET bank/_search
{
"query": {
"match": {
"account_number": "20"
}
}
}match 返回 account_number=20 的
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "20",
"_score" : 1.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "elinorratliff@scentric.com",
"city" : "Ribera",
"state" : "WA"
}
}
]
}
}
2.2 字符串,全文检索
全文检索按照评分进行排序,会针对检索条件进行分词匹配
GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
}
}最终查询出 address 中包含 mill 单词的所有记录
match 当搜索字符串类型的时候,会进行全文检索,并且每条记录有相关性得分。
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 5.4032025,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "970",
"_score" : 5.4032025,
"_source" : {
"account_number" : 970,
"balance" : 19648,
"firstname" : "Forbes",
"lastname" : "Wallace",
"age" : 28,
"gender" : "M",
"address" : "990 Mill Road",
"employer" : "Pheast",
"email" : "forbeswallace@pheast.com",
"city" : "Lopezo",
"state" : "AK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 5.4032025,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "345",
"_score" : 5.4032025,
"_source" : {
"account_number" : 345,
"balance" : 9812,
"firstname" : "Parker",
"lastname" : "Hines",
"age" : 38,
"gender" : "M",
"address" : "715 Mill Avenue",
"employer" : "Baluba",
"email" : "parkerhines@baluba.com",
"city" : "Blackgum",
"state" : "KY"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 5.4032025,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "leelong@comverges.com",
"city" : "Movico",
"state" : "MT"
}
}
]
}
}
2.3 字符串,多个单词(分词+全文检索)
GET bank/_search
{
"query": {
"match": {
"address": "mill road"
}
}
}
最终查询出 address 中包含 mill 或者 road 或者 mill road 的所有记录,并给出相关性得分
3、match_phrase【短语匹配】
将需要匹配的值当成一个整体单词(不分词)进行检索
GET bank/_search
{
"query": {
"match_phrase": {
"address": "mill road"
}
}
}
4、multi_match 【多字段匹配】
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill",
"fields": ["state","address"]
}
}
}
state 或者 address 包含 mill
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill movico",
"fields": ["city","address"]
}
}
}
city 或者 address 包含 mill 或 movico 或 mill movico
5、bool 【复合查询】
bool 用来做复合查询:
复合语句可以合并 任何 其它查询语句,包括复合语句,了解这一点是很重要的。这就意味
着,复合语句之间可以互相嵌套,可以表达非常复杂的逻辑。
5.1 must :必须达到 must 列举的所有条件
GET bank/_search
{
"query": {
"bool": {
"must": [
{ "match": {
"address": "mill"
}
},
{ "match": {
"gender": "M"
}
}
]
}
}
}
5.2 must_not 必须不是指定的情况
GET bank/_search
{
"query": {
"bool": {
"must": [
{ "match": {
"address": "mill"
}
},
{ "match": {
"gender": "M"
}
}
],
"must_not": [
{"match": {
"email": "baluba.com"
}
}
]
}
}
5.3 should :应该达到 should 列举的条件
如果达到会增加相关文档的评分,并不会改变查询的结果。
如果 query 中只有 should 且只有一种匹配规则,那么 should 的条件就会
被作为默认匹配条件而去改变查询结果。
GET bank/_search
{
"query": {
"bool": {
"must": [
{ "match": {
"address": "mill"
}
},
{ "match": {
"gender": "M"
}
}
],
"should": [
{"match": {
"address": "lane"
}
}
],
"must_not": [
{"match": {
"email": "baluba.com"
}
}
]
}
}
6、filter【 结果过滤 】
并不是所有的查询都需要产生分数,特别是那些仅用于 “filtering”(过滤)的文档。为了不
计算分数 Elasticsearch 会自动检查场景并且优化查询的执行。
GET bank/_search
{
"query": {
"bool": {
"must": [
{"match": {
"address": "mill"
}
}
],
"filter": {
"range": {
"balance": {
"gte": 10000,
"lte": 20000
}
}
}
}
}
}
7、term
和 match 一样。匹配某个属性的值。 全文检索字段用 match, 其他非 text 字段匹配用 term。
GET bank/_search
{
"query": {
"bool": {
"must": [
{"term": {
"age": {
"value": "28"
}
}
},
{"match": {
"address": "990 Mill Road"
}
}
]
}
}
}
三、映射Mapping
1、查看默认映射规则
当我们创建索引时,如果不指定属性的类型,就会走默认映射规则
GET /bank/_mapping
{
"bank" : {
"mappings" : {
"properties" : {
"account_number" : {
"type" : "long"
},
"address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "long"
},
"balance" : {
"type" : "long"
},
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"employer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"state" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
2、自己创建映射
PUT /my_index
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}
类型有哪些,需要参考官方文档
3、添加新的字段映射
PUT /my_index/_mapping
{
"properties": {
"employee_id": { "type": "keyword", "index": false}
}
}
4、更新映射
对于已经存在的映射字段,我们不能更新。更新必须创建新的索引进行数据迁移
5、数据迁移
先创建出 new_bank 的正确映射。
然后使用如下方式进行数据迁移
POST _reindex [固定写法]
{
"source": { "index": "bank"},
"dest": { "index": "new_bank"}
}
如果是老版本,将旧索引的 type 下的数据进行迁移
POST _reindex
{
"source": {"index": "twitter", "type": "tweet"},
"dest": { "index": "tweets"}
}
四、分词(核心)
1、自带分词器
POST _analyze
{
"analyzer":"standard",
"text": "我是中国人,I am Chinese."
}
默认的标准分词器,它是将这些内容拆分成了一个一个的字符,显然是不符合实际要求的。
另外,es的这些分词器都是针对英文的,所以我们要想得到中文分词,还需要额外安装IK分词器
下载地址:
https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.4.2
IK分词器是按照es的版本走的,所以我们需要下载对应的IK分词器版本
例如:
2、安装步骤
进入 es 容器内部 plugins 目录
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-anal ysis-ik-7.4.2.zip unzip
我们之前有挂载过,所以不用进入容器内部安装也行
1.进入本地挂载路径:
比如我的,cd /mydata/elasticsearch/,然后进入plugins目录中,将下载好的分词器压缩文件传过来
2.进入容器验证(可略)
可以进入容器内部检查看看,正常肯定是挂载的和容器内部的是一致的。
docker exec -it 容器 id /bin/bash
3.退出容器,在挂载路径新建一个文件夹ik
mkdir ik
4.给文件夹更改执行权限
chmod -R 777 ik/
5.进入ik文件夹进行解压
cd /mydata/elasticsearch/plugins/ik
unzip elasticsearch-analysis-ik-7.4.2.zip
6. 进入容器内部执行ik分词器的启动命令
docker exec -it 容器别名(或容器id) /bin/bash
cd /usr/share/elasticsearch/bin
elasticsearch-plugin list
当然你也无需进入容器内部
7. 退出容器,然后重启es容器
稍等,再刷新就可以了!
3、测试
POST _analyze
{
"analyzer":"ik_smart",
"text": "我是中国人,I am Chinese."
}
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中国人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "i",
"start_offset" : 6,
"end_offset" : 7,
"type" : "ENGLISH",
"position" : 3
},
{
"token" : "am",
"start_offset" : 8,
"end_offset" : 10,
"type" : "ENGLISH",
"position" : 4
},
{
"token" : "chinese.",
"start_offset" : 11,
"end_offset" : 19,
"type" : "LETTER",
"position" : 5
}
]
}
而某些中文词语它没有识别到,就需要自定义扩展词库了。
4、自定义扩展词库
待完善,大致就是专门建立一个txt文档,然后将它放到一个可供访问的链接地址,最后将他的地址配置进es即可。