es 笔记 1

guagua070707

已于 2022-02-14 08:24:14 修改

阅读量487

点赞数

分类专栏： java elk 文章标签： elasticsearch 大数据 big data

于 2022-02-10 10:46:21 首次发布

本文链接：https://blog.csdn.net/yilu_beiyu/article/details/122855986

版权

java 同时被 2 个专栏收录

48 篇文章 0 订阅

订阅专栏

elk

8 篇文章 0 订阅

订阅专栏

### only focus on use
### abstraction layer separate the representation

1. Elasticsearch
2. 版本基于7.10

Elasticsearch Guide [7.10] | Elastic

简介+节点+分片+简单索引数据+批量索引数据+cat健康检查

Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches

An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure

Defining your own mappings enables you to:

- Distinguish between full-text string fields and exact value string fields
- Perform language-specific text analysis
- Optimize fields for partial matching
- Use custom date formats
- Use data types such as geo_point and geo_shape that cannot be automatically detected

It’s often useful to index the same field in different ways for different purposes. For example, you might want to index a string field as both a text field for full-text search and as a keyword field for sorting or aggregating your data

The Elasticsearch REST APIs支持结构化查询和全文查询，以及两者的结合

Full-text queries find all documents that match the query string and return them sorted by relevance—how good a match they are for your search terms.
全文检索根据相关性relevance来获取结果

3. analyze

- How many needles are in the haystack?
- What is the average length of the needles?
- What is the median length of the needles, broken down by manufacturer?
- How many needles were added to the haystack in each of the last six months?

You can also use aggregations to answer more subtle questions, such as:
- What are your most popular needle manufacturers?
- Are there any unusual or anomalous clumps of needles?

4. abstraction layer

```
shard
==============
节点node
=====================
物理机(或者是虚拟机)
```

where each shard is actually a self-contained index

By distributing the documents in an index across multiple shards, and distributing those shards across multiple nodes

There are two types of shards: primaries and replicas
有两种分片：主分片和副本分片

The number of primary shards in an index is fixed at the time that an index is created

```
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
```

添加用户验证
- If the Elasticsearch security features are enabled, you must also provide a valid user name (and password) that has authority to run the API. For example, use the -u or --u cURL command parameter.

in the end they all do the same thing: put JSON documents into an Elasticsearch index.

```
放入文档中一条数据

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "John Doe"
}

自动创建index为customer，文档id为1，存储和索引name字段

获取刚才索引的文档

curl -X GET "localhost:9200/customer/_doc/1?pretty"

```

```
abstraction layer：

批量索引文档
===========================
abstraction layer：bulk api
===========================
representation

```

```
账户数据：
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "bradshawmckenzie@euron.com",
"city": "Hobucken",
"state": "CO"
}

索引数据：
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v=true"
使用cat health api验证节点健康状况
Use the cat health API to verify that your three-node cluster is up running

抽象层：
健康检查
========
cat

```

5. 检索数据：

```
_search endpoint

To access the full suite of search capabilities, you use the Elasticsearch Query DSL to specify the search criteria in the request body. You specify the name of the index you want to search in the request URI.

在body体中指定criteria，在uri中指定index

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
'

The response also provides the following information about the search request:

took – how long it took Elasticsearch to run the query, in milliseconds
timed_out – whether or not the search request timed out
_shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
max_score – the score of the most relevant document found
hits.total.value - how many matching documents were found
hits.sort - the document’s sort position (when not sorting by relevance score)
hits._score - the document’s relevance score (not applicable when using match_all)

```

分页查询：

```
a basic search request

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
],
"from": 10,
"size": 10
}
'

```

匹配所有，使用match_all

匹配字段，使用match

To search for specific terms within a field, you can use a match query

分词搜索
```
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match": { "address": "mill lane" } }
}
'

```

a phrase search 与 matching individual terms

不分词搜索
```
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_phrase": { "address": "mill lane" } }
}
'

```

bool query

You can designate criteria as required (must match), desirable (should match), or undesirable (must not match)

Each must, should, and must_not element in a Boolean query is referred to as a query clause.查询子句

The criteria in a must_not clause is treated as a filter.

```
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
'

```

6. 使用aggregations分析结果

```
returns the ten states with the most accounts in descending order//默认按照结果的降序

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword" //按这个字段分组
}
}
}
}
'

Because the request set size=0, the response only contains the aggregation results.

```

```
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": { //在上层的组下在进行加和，上层为按照state字段进行分组
"average_balance": {
"avg": {
"field": "balance" //求每个组下这个字段的平均值
}
}
}
}
}
}
'

```

abstraction layer

```
聚合层n
=======
...
========
聚合层3
==========
聚合层2
============
聚合层1
```

```
指定排序字段为，agg后的字段

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
'