ElasticSearch进阶检索（二）

最新推荐文章于 2023-08-22 16:00:55 发布

何苏三月

最新推荐文章于 2023-08-22 16:00:55 发布

阅读量440

点赞数 1

文章标签： elasticsearch 大数据

本文链接：https://blog.csdn.net/YuanFudao/article/details/128001643

版权

小提示

一、ES 支持两种基本方式检索

1、一个是通过使用 REST request URI 发送搜索参数（uri+检索参数）

2、Query DSL :另一个是通过使用 REST request body 来发送它们（uri+请求体）推荐

二、详解：Qurey DSL

1、基本语法格式

小总结

2、继续体会操作： match 【匹配查询】

2.1 基本类型（非字符串），精确匹配

2.2 字符串，全文检索

2.3 字符串，多个单词（分词+全文检索）

3、match_phrase【短语匹配】

4、multi_match 【多字段匹配】

5、bool 【复合查询】

5.1 must ：必须达到 must 列举的所有条件

5.2 must_not 必须不是指定的情况

5.3 should ：应该达到 should 列举的条件

3.退出容器，在挂载路径新建一个文件夹ik

4.给文件夹更改执行权限

5.进入ik文件夹进行解压

6. 进入容器内部执行ik分词器的启动命令

7. 退出容器，然后重启es容器

3、测试

4、自定义扩展词库

小提示

如果你重启了虚拟机或者服务器，那么可能会发现访问es服务器还有kibana都失败，这是因为重启虚拟机或者服务器，docker容器并不会自动开启。怎么证明呢？

通过docker ps命令查看，证明确实没有运行es容器

于是我们只需要通过docker start es容器名 即可开启容器

如何让服务器或虚拟机重启自动开启es容器？

docker update es容器名 --restart=always

一、ES 支持两种基本方式检索

1、一个是通过使用 REST request URI 发送搜索参数（uri+检索参数）

GET bank/_search?q=*&sort=account_number:asc

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "0",
        "_score" : null,
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      },
       .........................
    ]
  }
}

2、Query DSL :另一个是通过使用 REST request body 来发送它们（uri+请求体）推荐

GET bank/_search
{
        "query": {
                "match_all": {}
        },
        "sort": [
                {
                        "account_number": {"order": "desc"}
                }
        ]
}

二、详解：Qurey DSL

1、基本语法格式

Elasticsearch 提供了一个可以执行查询的 Json 风格的 DSL（domain-specific language 领域特定语言）。这个被称为 Query DSL。该查询语言非常全面，并且刚开始的时候感觉有点复杂，
真正学好它的方法是从一些基础的示例开始的。

一个查询语句的典型结构

{
QUERY_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}

GET bank/_search
{
  "query": {
    "match_all": {}
  }
}

如果是针对某个字段，那么它的结构如下：
{
        QUERY_NAME: {
                FIELD_NAME: {
                        ARGUMENT: VALUE,
                        ARGUMENT: VALUE,...
                }
        }
}

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ]
}

<!-- 简写方式 -->
GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "balance":"desc"
    }
  ]
}



<!-- from size相当于mysql中的limit x,x -->
GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 5
}


GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 5,
  "_source": ["balance","firstname"]
}

小总结

query 定义如何查询，
match_all 查询类型【代表查询所有的所有】，es 中可以在 query 中组合非常多的查
询类型完成复杂查询
除了 query 参数之外，我们也可以传递其它的参数以改变查询结果。如 sort，size
from+size 限定，完成分页功能
sort 排序，多字段排序，会在前序字段相等时后续字段内部排序，否则以前序为准
_source查询出要显示的字段，如果有多个字段，用中括号[]接收

2、继续体会操作： match 【匹配查询】

2.1 基本类型（非字符串），精确匹配

GET bank/_search
{
"query": {
"match": {
"account_number": "20"
}
}
}

match 返回 account_number=20 的

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "elinorratliff@scentric.com",
          "city" : "Ribera",
          "state" : "WA"
        }
      }
    ]
  }
}

2.2 字符串，全文检索

全文检索按照评分进行排序，会针对检索条件进行分词匹配

GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
}
}

最终查询出 address 中包含 mill 单词的所有记录
match 当搜索字符串类型的时候，会进行全文检索，并且每条记录有相关性得分。

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 5.4032025,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "970",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 970,
          "balance" : 19648,
          "firstname" : "Forbes",
          "lastname" : "Wallace",
          "age" : 28,
          "gender" : "M",
          "address" : "990 Mill Road",
          "employer" : "Pheast",
          "email" : "forbeswallace@pheast.com",
          "city" : "Lopezo",
          "state" : "AK"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "345",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 345,
          "balance" : 9812,
          "firstname" : "Parker",
          "lastname" : "Hines",
          "age" : 38,
          "gender" : "M",
          "address" : "715 Mill Avenue",
          "employer" : "Baluba",
          "email" : "parkerhines@baluba.com",
          "city" : "Blackgum",
          "state" : "KY"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "472",
        "_score" : 5.4032025,
        "_source" : {
          "account_number" : 472,
          "balance" : 25571,
          "firstname" : "Lee",
          "lastname" : "Long",
          "age" : 32,
          "gender" : "F",
          "address" : "288 Mill Street",
          "employer" : "Comverges",
          "email" : "leelong@comverges.com",
          "city" : "Movico",
          "state" : "MT"
        }
      }
    ]
  }
}

2.3 字符串，多个单词（分词+全文检索）

GET bank/_search
{
        "query": {
                "match": {
                        "address": "mill road"
                }
        }
}
最终查询出 address 中包含 mill 或者 road 或者 mill road 的所有记录，并给出相关性得分

3、match_phrase【短语匹配】

将需要匹配的值当成一个整体单词（不分词）进行检索

GET bank/_search
{
        "query": {
                "match_phrase": {
                        "address": "mill road"
                }
        }
}

4、multi_match 【多字段匹配】

GET bank/_search
{
        "query": {
                "multi_match": {
                        "query": "mill",
                        "fields": ["state","address"]
                }
        }
}
state 或者 address 包含 mill

GET bank/_search
{
        "query": {
                "multi_match": {
                        "query": "mill movico",
                        "fields": ["city","address"]
                }
        }
}
city 或者 address 包含 mill 或  movico 或 mill movico

5、bool 【复合查询】

bool 用来做复合查询：
复合语句可以合并任何其它查询语句，包括复合语句，了解这一点是很重要的。这就意味
着，复合语句之间可以互相嵌套，可以表达非常复杂的逻辑。

5.1 must ：必须达到 must 列举的所有条件

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { 
            "address": "mill"
          } 
        },
        { "match": { 
            "gender": "M" 
          } 
        }
      ]
    }
  }
}

5.2 must_not 必须不是指定的情况

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { 
            "address": "mill" 
          } 
        },
        { "match": { 
            "gender": "M" 
          } 
        }
      ],  
    "must_not": [
      {"match": { 
          "email": "baluba.com" 
        }
      }
    ]
  }
}

5.3 should ：应该达到 should 列举的条件

如果达到会增加相关文档的评分，并不会改变查询的结果。

如果 query 中只有 should 且只有一种匹配规则，那么 should 的条件就会
被作为默认匹配条件而去改变查询结果。

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { 
            "address": "mill" 
          } 
        },
        { "match": { 
            "gender": "M" 
          } 
        }
      ],
    "should": [
      {"match": { 
        "address": "lane" 
        }
      }
    ],
    "must_not": [
      {"match": { 
          "email": "baluba.com" 
        }
      }
    ]
  }
}

6、filter【结果过滤】

并不是所有的查询都需要产生分数，特别是那些仅用于 “filtering”（过滤）的文档。为了不
计算分数 Elasticsearch 会自动检查场景并且优化查询的执行。

GET bank/_search
{
    "query": {
        "bool": {
            "must": [
                {"match": { 
                    "address": "mill"
                    }
                }
            ],
            "filter": {
                "range": {
                    "balance": {
                        "gte": 10000,
                        "lte": 20000
                     }
                }
            }
        }
    }
}

7、term

和 match 一样。匹配某个属性的值。全文检索字段用 match，其他非 text 字段匹配用 term。

GET bank/_search
{
    "query": {
        "bool": {
            "must": [
                {"term": {
                    "age": {
                        "value": "28"
                     }
                  }
                },
                {"match": {
                    "address": "990 Mill Road"
                     }
                }
            ]
        }
    }
}

三、映射Mapping

1、查看默认映射规则

当我们创建索引时，如果不指定属性的类型，就会走默认映射规则

GET /bank/_mapping

{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

2、自己创建映射

PUT /my_index
{ 
    "mappings": { 
        "properties": {
            "age": { "type": "integer" }, 
            "email": { "type": "keyword" }, 
            "name": { "type": "text" }
        }
    }
}

类型有哪些，需要参考官方文档

3、添加新的字段映射

PUT /my_index/_mapping
{ 
    "properties": { 
        "employee_id": { "type": "keyword", "index": false}
    }
}

4、更新映射

对于已经存在的映射字段，我们不能更新。更新必须创建新的索引进行数据迁移

5、数据迁移

先创建出 new_bank 的正确映射。

然后使用如下方式进行数据迁移

POST _reindex [固定写法]
{ 
    "source": { "index": "bank"},
    "dest": { "index": "new_bank"}
}

如果是老版本，将旧索引的 type 下的数据进行迁移


POST _reindex
{ 
    "source": {"index": "twitter", "type": "tweet"},
    "dest": { "index": "tweets"}
}

四、分词（核心）

1、自带分词器

POST _analyze
{ 
    "analyzer":"standard",
    "text": "我是中国人,I am Chinese."
}

默认的标准分词器，它是将这些内容拆分成了一个一个的字符，显然是不符合实际要求的。

另外，es的这些分词器都是针对英文的，所以我们要想得到中文分词，还需要额外安装IK分词器

下载地址：

https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.4.2

IK分词器是按照es的版本走的，所以我们需要下载对应的IK分词器版本

例如：

2、安装步骤

进入 es 容器内部 plugins 目录

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-anal ysis-ik-7.4.2.zip unzip

我们之前有挂载过，所以不用进入容器内部安装也行

1.进入本地挂载路径：

比如我的，cd /mydata/elasticsearch/，然后进入plugins目录中，将下载好的分词器压缩文件传过来

2.进入容器验证（可略）

可以进入容器内部检查看看，正常肯定是挂载的和容器内部的是一致的。

docker exec -it 容器 id /bin/bash

3.退出容器，在挂载路径新建一个文件夹ik

mkdir ik

4.给文件夹更改执行权限

chmod -R 777 ik/

5.进入ik文件夹进行解压

cd /mydata/elasticsearch/plugins/ik
unzip elasticsearch-analysis-ik-7.4.2.zip

6. 进入容器内部执行ik分词器的启动命令

docker exec -it 容器别名(或容器id) /bin/bash

cd /usr/share/elasticsearch/bin

elasticsearch-plugin list

当然你也无需进入容器内部

7. 退出容器，然后重启es容器

稍等，再刷新就可以了！

3、测试

POST _analyze
{ 
    "analyzer":"ik_smart",
    "text": "我是中国人,I am Chinese."
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "i",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "ENGLISH",
      "position" : 3
    },
    {
      "token" : "am",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "ENGLISH",
      "position" : 4
    },
    {
      "token" : "chinese.",
      "start_offset" : 11,
      "end_offset" : 19,
      "type" : "LETTER",
      "position" : 5
    }
  ]
}

而某些中文词语它没有识别到，就需要自定义扩展词库了。

4、自定义扩展词库

待完善，大致就是专门建立一个txt文档，然后将它放到一个可供访问的链接地址，最后将他的地址配置进es即可。

何苏三月

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
ElasticSearch进阶检索（二）

query 定义如何查询，match_all 查询类型【代表查询所有的所有】，es 中可以在 query 中组合非常多的查询类型完成复杂查询除了 query 参数之外，我们也可以传递其它的参数以改变查询结果。如 sort，sizefrom+size 限定，完成分页功能sort 排序，多字段排序，会在前序字段相等时后续字段内部排序，否则以前序为准_source查询出要显示的字段，如果有多个字段，用中括号[]接收。
复制链接

扫一扫