ElasticSearch学习（2）安装和初步使用

最新推荐文章于 2024-08-22 22:48:15 发布

xiaotong_cloud

最新推荐文章于 2024-08-22 22:48:15 发布

阅读量359

点赞数

分类专栏：大数据文章标签： ElasticSearch 大数据

大数据专栏收录该内容

2 篇文章 0 订阅

订阅专栏

参考文档

1 安装

1.1 确认Java版本

最新版ES 6.0.1至少需要Java 8
手册上推荐Oracle JDK version 1.8.0_131

java -version
echo $JAVA_HOME

1.2 Linux下安装

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.0.1.tar.gz
tar -xvf elasticsearch-6.0.1.tar.gz
cd elasticsearch-6.0.1/bin
./elasticsearch

1.3 Windows下安装

1.4 启动集群

# linux下
cd %PROGRAMFILES%\Elastic\Elasticsearch\bin
# powershell下
cd $env:PROGRAMFILES\Elastic\Elasticsearch\bin
.\elasticsearch.exe
# 启动时指定集群名字和节点名字
./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name

2 探索集群

2.1 REST API

使用REST API可以干的事：

cluster，node，index健康、状态、策略查询
cluster，node，index数据和元数据管理
CRUD、indexes查询操作
执行预操作，例如paging,sorting,filtering,scripting,aggregations等

2.2 安装kibana

参考文档

下载和解压kibana
配置config/kibana.yml文件，设置elasticsearch.url为es实例
运行kibana

# linux
bin/kibana
# windows
bin\kibana.bat

使用浏览器登录 http://localhost:5601
kibana用户指南

2.3 集群健康状态

2.3.1 查询集群健康状态

GET /_cat/health?v

使用postman执行查询
在这里插入图片描述
返回json结果

GET 127.0.0.1:9200/_cat/health?format=json&pretty
# response
[
    {
        "epoch": "1541249930",
        "timestamp": "20:58:50",
        "cluster": "elasticsearch",
        "status": "green",
        "node.total": "1",
        "node.data": "1",
        "shards": "0",
        "pri": "0",
        "relo": "0",
        "init": "0",
        "unassign": "0",
        "pending_tasks": "0",
        "max_task_wait_time": "-",
        "active_shards_percent": "100.0%"
    }
]

三种健康状态：

green：所有服务运行正常，集群全部功能都可用
yellow：所有数据都可用，但部分replica失效，集群全部功能都可用
red：部分数据不可用，集群部分功能可用

2.3.2获取集群节点列表

GET /_cat/nodes?v

GET 127.0.0.1:9200/_cat/nodes?format=json&pretty
# response
[
    {
        "ip": "127.0.0.1",
        "heap.percent": "11",
        "ram.percent": "41",
        "cpu": "8",
        "load_1m": null,
        "load_5m": null,
        "load_15m": null,
        "node.role": "mdi",
        "master": "*",
        "name": "my_first_node"
    }
]

2.4 列出所有indices

GET /_cat/indices?v

2.5 创建index

创建一个名为customer的index，并列出所有分片

PUT /customer?pretty
GET /_cat/indices?v

PUT 127.0.0.1:9200/customer?pretty
# response
{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "customer"
}
GET 127.0.0.1:9200/indices?format=json&pretty
# response
[
    {
        "health": "yellow",	# 目前只有一个节点，无法分派1个replica，固为yellow状态
        "status": "open",
        "index": "customer",
        "uuid": "BpesQm0kRhWBauTfht4UZg",
        "pri": "5",					# 5个primary shards
        "rep": "1",					# 1个replica
        "docs.count": "0",	# 0个document
        "docs.deleted": "0",
        "store.size": "1.1kb",
        "pri.store.size": "1.1kb"
    }
]

2.5 index和query document

2.5.1 index

index一个ID为1的customer document 到customer index

PUT /customer/_doc/1?pretty
{
  "name": "John Doe"
}
# response
{
    "_index": "customer",
    "_type": "doc",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

2.5.2 query

GET /customer/_doc/1?pretty
# response
{
    "_index": "customer",
    "_type": "doc",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {								# 返回全量JSON document
        "name": "Xiaotong Who"
    }
}

2.6 删除 index

DELETE /customer?pretty
# response
{
    "acknowledged": true
}
GET /_cat/indices?v
# response
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

2.7 访问ES数据的模式

<HTTP Verb> /<Index>/<Type>/<ID>

3 修改数据

3.1 修改数据:替换 document

对某一document执行PUT操作,ES会自动替换document中的值

# 创建index和document
PUT /customer
PUT /customer/_doc/1?pretty
{
  "name": "John Doe"
}
GET  /customer/_doc/1?pretty
# 替换之前的document
PUT /customer/_doc/1?pretty
{
	"name":"Tom Hu"
}
GET  /customer/_doc/1?pretty
# 创建一个新的document
PUT /customer/_doc/2?pretty
{
	"name":"Yaping Leaf"
}

创建document的时候，ID是可选的，如果没指定，ES会自动生成一个随机ID

3.2 修改数据：更新数据

除了插入和替换数据，还可以更新数据
更新数据不是真的跟新，而是把旧的删除，然后创建个新的document

# 修改名字，并增加年龄
POST /customer/_doc/1/_update?pretty
{
	"doc":{"name":"Xiaotong Who","age":20}
}

update支持使用简单的脚本
ctx._source表示当前document的引用

# 给年龄增加5
POST /customer/_doc/1_update?pretty
{
	"script":"ctx._source.age += 5"
}

3.3 删除documents

DELETE /customer/_doc/2?pretty

3.4 批量处理

# 执行两条index document
POST /customer/_doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "Tom Hu" }
{"index":{"_id":"2"}}
{"name": "Yaping Leaf" }
# 执行跟新和删除
POST /customer/_doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "Tom Hu becomes Xiaotong Who" } }
{"delete":{"_id":"2"}}

4 探索数据

4.1 search API

4.1.1 发送请求

通过URL发送请求

GET /bank/_search?q=*&sort=account_number:asc&pretty

使用_search节点
q=*参数可以匹配index中的所有document
sort=account_number:asc参数使返回值以account_number字段按升序排序
pretty参数使ES将返回值以JSON的形式返回，便于阅读

# response（部分）
{
  "took" : 63,				# 查询花费了63毫秒
  "timed_out" : false,		# 查询没有超时
  "_shards" : {				# 总共5个shards被查询，成功5个，跳过0个，失败0个
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {				# 查询结果 
    "total" : 1000,			# 满足查询标准的document总量
    "max_score" : null,		
    "hits" : [ {			# 实际查到的document列表，默认前10条数据
      "_index" : "bank",		
      "_type" : "_doc",
      "_id" : "0",
      "sort": [0],			# 排序key
      "_score" : null,	
      "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
    }, {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "1",
      "sort": [1],
      "_score" : null,
      "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
    }, ...
    ]
  }
}

通过method body发送请求

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

4.2 query语言

例1 ：查询

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}

query：定义查询
match_all：查询全部文档
from：从第10条开始
size：返回数量

例2：按balance字段降序排序

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}

4.3 执行search

4.3.1 查询document中的部分字段

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"] 	# 只返回account_number和balance字段
}

4.3.2 条件查询

返回account_number为20的document

GET /bank/_search
{
  "query": { "match": { "account_number": 20 } }
}

返回在address字段中含有“mill“或者”lane”

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

返回在address字段中含有“mill lane”

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

4.3.3 Bool查询

4.3.3.1 must子句：所有match查询都必须为都真才会匹配成功

查询address字段同时包含mill和lane的document

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

4.3.3.2 shoud子句：只要有一个match查询为真就会匹配成功

查询address字段包含“mill”或“lane”的document

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

4.3.3.3 must_not子句：所有match查询都为假，才会匹配成功

查询address字段既不包含“mill”也不包含“lane”的document

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

4.3.3.4 组合bool查询

查询age为40，state不为ID的账户：

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

4.4 执行filter

4.4.1 document score （_score字段）

用来表示document与我们指定的查询的相关程度，数值越大，相关度越高，数值越小，相关度越低。
查询不总是会有score来衡量相关性，一般在执行filter的时候才会涉及
3.bool查询也支持filter子句，在bool查询中写filter子句，可以让我们在不用计算score增减的情况下，使用别的子句来条件查询document

4.4.2 range查询

可以通过限定一个范围值来过滤文档，通常用在数值或者日期的过滤。
举例：返回余额在20000到30000的账户

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },	# 匹配所有document
      "filter": {					# filter子句
        "range": {					# range子句
          "balance": {				# 余额
            "gte": 20000,			# 大于等于20000 
            "lte": 30000			# 小于等于30000 
          }
        }
      }
    }
  }
}

4.5 执行aggregation

提供类似SQL GROUP BY语句以及SQL Aggregation功能
能同时返回search结果集以及aggregation结果

4.5.1 Group By

将账户按state排序，然后按count降序排序，返回Top10的state
如果写SQL

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;

GET /bank/_search
{
  "size": 0,	# 在response中不返回查询到的document，只需要数量
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

# response
{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

4.5.2 aggregation语句中嵌套aggregation

通常用于对聚合得到的数据做另外的总结操作
对账户数在前十的州求余额平均值

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

按账户数前十的州余额平均值降序排序

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

按年龄段分别聚合，然后在各年龄段内按性别聚合，然后获取到各年龄段中各性格的平均账户余额

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}