Elasticsearch大数据的学习

最新推荐文章于 2024-05-05 10:51:08 发布

CongBird

最新推荐文章于 2024-05-05 10:51:08 发布

阅读量277

点赞数

本文链接：https://blog.csdn.net/CongBird/article/details/106351321

版权

Elasticsearch大规模数据的检索

1、存储数据时按有序存储；
2、将数据和索引分离；
3、压缩数据；

ES数据架构的主要概念（与关系数据库Mysql对比）

（1）关系型数据库中的数据库（DataBase），等价于ES中的索引（Index）
（2）一个数据库下面有N张表（Table），等价于1个索引Index下面有N多类型（Type）
（3）一个数据库表（Table）下的数据由多行（ROW）多列（column，属性）组成，等价于1个Type由多个文档（Document）和多Field组成。
（4）在一个关系型数据库里面，schema定义了表、每个表的字段，还有表和字段之间的关系。与之对应的，在ES中：Mapping定义索引下的Type的字段处理规则，即索引如何建立、索引类型、是否保存原始索引JSON文档、是否压缩原始JSON文档、是否需要分词处理、如何进行分词处理等。
（5）在数据库中的增insert、删delete、改update、查search操作等价于ES中的增PUT/POST、删Delete、改_update、查GET搜索入门

1、最简单的搜索，使用match_all来表示，例如搜索全部；

GET /blank/_search
{
  "query":{"match_all":{}}
}

2、分页搜索，from表示偏移量，从0开始，size表示每页显示的数量；

GET /bank/_search
{
"query": { "match_all": {} },
"from": 0,
"size": 10
}

3、搜索排序，使用sort表示，例如按balance字段降序排列

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}

4、搜索并返回指定字段内容，使用_source表示，例如只返回account_number和balance两个字段内容：

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}

5、条件搜索，使用match表示匹配条件，例如搜索出account_number为20的文档

GET /bank/_search
{
  "query": {
    "match": {
      "account_number": 20
    }
  }
}

6、本类型字段的条件搜索，例如搜索address字段中包含mill的文档，对比上一条搜索可以发现，对于数值类型match操作使用的是精确匹配，对于文本类型使用的是模糊匹配；

GET /bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  },
  "_source": [
    "address",
    "account_number"
  ]
}

7、短语匹配搜索，使用match_phrase表示，例如搜索address字段中同时包含mill和lane的文档：

GET /bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  }
}
GET /bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  }
}

8、组合搜索，使用bool来进行组合，must表示同时满足，例如搜索address字段中同时包含mill和lane的文档；

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

9、组合搜索，should表示满足其中任意一个，搜索address字段中包含mill或者lane的文档；

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

10、组合搜索，must_not表示同时不满足，例如搜索address字段中不包含mill且不包含lane的文档；

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

11、组合搜索，组合must和must_not，例如搜索age字段等于40且state字段不包含ID的文档；

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

12、过滤搜索，使用filter来表示，例如过滤出balance字段在20000~30000的文档；

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

13、聚合搜索，搜索结果进行聚合，使用aggs来表示，类似于MySql中的group by，例如对state字段进行聚合，统计出相同state的文档数量；

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

14、嵌套聚合，例如对state字段进行聚合，统计出相同state的文档数量，再统计出balance的平均值；

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}