ElasticSearch查询

最新推荐文章于 2024-08-20 00:00:00 发布

_zshuo

最新推荐文章于 2024-08-20 00:00:00 发布

阅读量204

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/zhengshuoa/article/details/87875128

版权

elasticsearch 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1 介绍

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

Kibana是一个开源的Elasticsearch分析和可视化平台
https://www.elastic.co/downloads/kibana

2 基本概念

在Elasticsearch中，包含多个索引（Index），相应的每个索引可以包含多个类型（Type），这些不同的类型每个都可以存储多个文档（Document），每个文档又有多个属性。一个索引索引 (index) 类似于传统关系数据库中的一个数据库，是一个存储关系型文档的地方。索引 (index) 的复数词为 indices 或 indexes 。

2.1 Node 与 Cluster

Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。

单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。

集群中有多个节点，其中有一个为主节点，这个主节点是可以通过选举产生的，主从节点是对于集群内部来说的。es的一个概念就是去中心化，字面上理解就是无中心节点，这是对于集群外部来说的，因为从外部来看es集群，在逻辑上是个整体，你与任何一个节点的通信和与整个es集群通信是等价的。

2.2 Index

Elastic 会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。

所以，Elastic 数据管理的顶层单位就叫做 Index（索引）。它是单个数据库的同义词。每个 Index （即数据库）的名字必须是小写。

下面的命令可以查看当前节点的所有 Index。

$ curl -X GET 'http://localhost:9200/_cat/indices?v'

2.3 Document

Index 里面单条的记录称为 Document（文档）。许多条 Document 构成了一个 Index。

Document 使用 JSON 格式表示，下面是一个例子。

{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}

同一个 Index 里面的 Document，不要求有相同的结构（scheme），但是最好保持相同，这样有利于提高搜索效率。

2.4 Type

下面的命令可以列出每个 Index 所包含的 Type。

在任意的查询字符串中增加pretty参数，会让Elasticsearch美化输出(pretty-print)JSON响应以便更加容易阅读。

$ curl 'localhost:9200/_mapping?pretty=true'

根据规划，Elastic 6.x 版只允许每个 Index 包含一个 Type，7.x 版将会彻底移除 Type。

关系型数据库	Elasticsearch
Databases(数据库)	Indices(索引)
Tables(表)	Types(类型)
Rows(行)	Documents(文档)
Columns(列)	Fields(域/字段)

3 索引

3.1 新建索引

$ curl -X PUT 'localhost:9200/weather'

服务器返回一个 JSON 对象，里面的acknowledged字段表示操作成功。

{
  "acknowledged":true,
  "shards_acknowledged":true
}

3.2 删除索引

curl -X DELETE 'localhost:9200/weather'

3.3 设置分词器

在新建索引的时候设置分词器
首先要安装分词器,在elasticsearch安装目录下执行,重新启动 Elastic，就会自动加载这个新安装的插件。
查看版本 https://github.com/medcl/elasticsearch-analysis-ik/

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

analyzer是字段文本的分词器，search_analyzer是搜索词的分词器

curl -X PUT 'localhost:9200/megacorp' -d '
{
  "mappings": {
    "employee": {
      "properties": {
        "user": {
          "type": "text",
          "analyzer":"ik_max_word",
          "search_analyzer":"ik_smart"
        },
        "title": {
          "type": "text",
          "analyzer":"ik_max_word",
          "search_analyzer":"ik_smart"
        }
      }
    }
  }
}'

查看分词 GET

curl -X GET 'localhost:9200/megacorp/employee/1/_termvectors?fields=about'

3 数据基本操作

3.1 添加数据

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

PUT /megacorp/employee/2
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

PUT /megacorp/employee/3
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

这个URI后面的1代表的是这条数据的ID，也可以字符串。如果不想自己指定ID，可以不传，但是必须使用POST来新增，这样的话Elasticsearch会给这条数据生成一个随机的字符串。

如果想对这条数据进行更新，可以重新请求这个URI，关键是这个ID要指定，然后修改json内容，这样就可以更新这条数据了。

3.2 检索数据

根据ID检索到具体某条数据:

GET /megacorp/employee/1

3.3 简单搜索

GET /megacorp/employee/_search?q=last_name:Smith&size=20&from=0

搜索指定Index下的Type的全部文档，默认每页只显示10条，可以通过size字段改变这个设置，还可以通过from字段，指定位移（默认是从位置0开始）。返回结果的 took字段表示该操作的耗时（单位为毫秒），timed_out字段表示是否超时，hits字段表示命中的记录

3.4 条件搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    },
    "size": 20,
    "from": 0
}

这段查询和上面的例子是一样的，不过参数从简单的参数变成了一个复杂的json，不过复杂带来的优势就是控制力更强，我们可以对查询做出更多精细的控制。

3.5 更复杂搜索

根据last_name搜索，并且只关心年龄大于30的：

GET /megacorp/employee/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith" 
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            }
        }
    }
}

这里新增了一个range过滤器，gt 表示_大于(_great than)。

3.6 全文搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

这个搜索会返回about中包含rock或者climbing的数据，也就是关键词之间默认是or的关系。如果希望精确匹配这个短语呢？就是用match_phrase查询。

3.7 短语搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

3.8 高亮搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

返回结果多了个highlight的部分，默认是用包裹：

{
   ...
   "hits": {
      "total":      1,
      "max_score":  0.23013961,
      "hits": [
         {
            ...
            "_score":         0.23013961,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            },
            "highlight": {
               "about": [
                  "I love to go <em>rock</em> <em>climbing</em>" 
               ]
            }
         }
      ]
   }
}

4 深入搜索

4.1 精确值查找

term可以用它处理数字（numbers）、布尔值（Booleans）、日期（dates）以及文本（text）。
创建并索引一些表示产品的文档，文档里有字段 price 和 productID （ 价格 和 产品ID ）

curl -X POST "localhost:9200/my_store/products/_bulk" -H 'Content-Type: application/json' -d'
{ "index": { "_id": 1 }}
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8" }
'

4.1.1 term查询数字

通常当查找一个精确值的时候，我们不希望对查询进行评分计算。只希望对文档进行包括或排除的计算，所以我们会使用 constant_score 查询以非评分模式来执行 term 查询并以一作为统一评分。
查询置于 filter 语句内不进行评分或相关度的计算，所以所有的结果都会返回一个默认评分 1 。

GET /my_store/products/_search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "price" : 20
                }
            }
        }
    }
}

4.1.2 term查询文本

GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "productID" : "XHDK-A-1293-#fJ3"
                }
            }
        }
    }
}

找不到对该字段进行分析

GET /my_store/_analyze
{
  "field": "productID",
  "text": "XHDK-A-1293-#fJ3"
}

{
  "tokens" : [ {
    "token" :        "xhdk",
    "start_offset" : 0,
    "end_offset" :   4,
    "type" :         "<ALPHANUM>",
    "position" :     1
  }, {
    "token" :        "a",
    "start_offset" : 5,
    "end_offset" :   6,
    "type" :         "<ALPHANUM>",
    "position" :     2
  }, {
    "token" :        "1293",
    "start_offset" : 7,
    "end_offset" :   11,
    "type" :         "<NUM>",
    "position" :     3
  }, {
    "token" :        "fj3",
    "start_offset" : 13,
    "end_offset" :   16,
    "type" :         "<ALPHANUM>",
    "position" :     4
  } ]
}

Elasticsearch 用 4 个不同的 token 而不是单个 token 来表示这个 UPC 。
所有字母都是小写的。
丢失了连字符和哈希符（ # ）

所以当我们用 term 查询查找精确值 XHDK-A-1293-#fJ3 的时候，找不到任何文档，因为它并不在我们的倒排索引中

重建索引为keyword 不会拆分

DELETE /my_store 

PUT /my_store 
{
  "mappings" : {
      "products" : {
          "properties" : {
              "productID" : {
                  "type" : "keyword"
              }
          }
      }
  }
}

添加数据,再次查询就可以查到了

4.1.3 terms查找多个精确值

GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "terms" : { 
                    "price" : [20, 30]
                }
            }
        }
    }
}

4.2 组合过滤器

4.2.1 布尔过滤器

这是个复合过滤器（compound filter），它可以接受多个其他过滤器作为参数，并将这些过滤器结合成各式各样的布尔（逻辑）组合

一个 bool 过滤器由三部分组成：

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}

must
所有的语句都必须（must）匹配，与 AND 等价。

must_not
所有的语句都不能（must not）匹配，与 NOT 等价。

should
至少有一个语句要匹配，与 OR 等价。

GET /my_store/products/_search
{
   "query" : {
      "bool" : {
        "should" : [
           { "term" : {"price" : 20}}, 
           { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} 
        ],
        "must_not" : {
           "term" : {"price" : 30} 
        }
     }
   }
}

4.2.2 嵌套布尔过滤器

GET /my_store/products/_search
{
   "query" : {
        "bool" : {
          "should" : [
            { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, 
            { "bool" : { 
              "must" : [
                { "term" : {"productID" : "JODL-X-1937-#pV7"}}, 
                { "term" : {"price" : 30}} 
              ]
            }}
          ]
       }
   }
}

4.3 范围查找

gt: > 大于（greater than）
lt: < 小于（less than）
gte: >= 大于或等于（greater than or equal to）
lte: <= 小于或等于（less than or equal to）

GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "range" : {
                    "price" : {
                        "gte" : 20,
                        "lt"  : 40
                    }
                }
            }
        }
    }
}

4.4 null值处理

POST /my_index/posts/_bulk
{ "index": { "_id": "1"              }}
{ "tags" : ["search"]                }  
{ "index": { "_id": "2"              }}
{ "tags" : ["search", "open_source"] }  
{ "index": { "_id": "3"              }}
{ "other_field" : "some data"        }  
{ "index": { "_id": "4"              }}
{ "tags" : null                      }  
{ "index": { "_id": "5"              }}
{ "tags" : ["search", null]          }

GET /my_index/posts/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "exists" : { "field" : "tags" }
            }
        }
    }
}

GET /my_index/posts/_search
{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field" : "tags"
                }
            }
        }
    }
}

5 其他

5.1监控

集群健康

GET _cluster/health

监控单个节点

GET _nodes/stats

集群统计

GET _cluster/stats

索引统计

GET my_index/_stats
GET my_index,another_index/_stats
GET _all/_stats

等待中的任务

GET _cluster/pending_tasks

5.2 cat API

https://www.elastic.co/guide/cn/elasticsearch/guide/current/_cat_api.html
类似Linux中的cat命令，请注意这个查询返回的不是json，而是以表格的形式展现。

要启用表头，加上?v这个参数

GET /_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}

_zshuo

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch查询

1 介绍ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。Kibana是一个开源的Elasticsearch分析和可视化平...
复制链接

扫一扫

专栏目录