elasticsearch 7.9.0基本使用

最新推荐文章于 2023-05-11 21:13:19 发布

1咸菜

最新推荐文章于 2023-05-11 21:13:19 发布

阅读量544

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/ISxiancai/article/details/109848164

版权

elasticsearch 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

下载部署

进入官方GitHub下载
elasticsearch 安装教程有很多，大体都一样，推荐一篇
注意：启动不可以使用 root 用户，需要设置jvm内存大小，如果需要远程连接注意设置ip

基本使用

概念

先说Elasticsearch的文件存储，Elasticsearch是面向文档型数据库，一条数据在这里就是一个文档，用JSON作为文档序列化的格式，比如下面这条用户数据：

{
    "name" :     "name",
    "sex" :      "sex",
    "age" :      21,
    "birthDate": "2020/11/11",
    "about" :    "xxx",
    "interests": [ "x1", "x2" ]
}

用Mysql这样的数据库存储就会容易想到建立一张User表，有balabala的字段等，在Elasticsearch里这就是一个文档，当然这个文档会属于一个User的类型，各种各样的类型存在于一个索引当中。这里有一份简易的将Elasticsearch和关系型数据术语对照表:

关系数据库     ⇒ 数据库 ⇒ 表    ⇒ 行    ⇒ 列(Columns)

Elasticsearch  ⇒ 索引(Index)   ⇒ 类型(type)  ⇒ 文档(Docments)  ⇒ 字段(Fields) 
一般旧版本是这样认为的。但是 7.X 版本后官方有意删除类型type字段，也就是数据库到 es 的映射关系消失了
而且，7.x 作为中间过度版本，选择将类型(type) 默认设置为 _doc
关系数据库      ⇒ 表    ⇒ 行    ⇒ 列(Columns)

Elasticsearch  ⇒ 索引(Index)     ⇒ 文档(Docments)  ⇒ 字段(Fields)

字段类型

字符串类型

Text：
会使用分词器，然后进行索引
支持模糊、精确查询
不支持聚合
用于全文搜索

keyword：
不进行分词，直接索引
支持模糊、精确查询
支持聚合 (7.9.0 keyword 不支持 max min sum avg 聚合)
用于关键词搜索
注意：有的文章中还有string，该字段已经被以上字段替代。

整数类型

类型	     取值范围
byte	-128~127
short	-32768~32767
integer	-2^31~2^31-1
long	-2^63~2^63-1

浮点类型

double，float ，half_float，scaled_float

时间类型

日期类型表示格式可以是以下几种：

1.日期格式的字符串，比如 “2018-01-13” 或 “2018-01-13 12:10:30”
2.long类型的毫秒数( milliseconds-since-the-epoch，
    epoch就是指UNIX诞生的UTC时间1970年1月1日0时0分0秒)
3.integer的秒数(seconds-since-the-epoch)
   
ElasticSearch 内部会将日期数据转换为UTC，并存储为milliseconds-since-the-epoch的long型整数。

elasticsearch创建index之后，可以设置mapping，如果mapping中没有设置date的format，那么默认为两种格式：
1. date_optional_time 此格式为ISO8601标准 示例：2018-08-31T14:56:18.000+08:00
2. epoch_millis 也就是时间戳 示例1515150699465, 1515150699

设置格式例子：
    "properties":{
        "postdate":{
            "type":"date",
            "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
    }

bool类型

只接受 true/false/”true”/”false”值

binary类型

二进制字段是指用base64来表示索引中存储的二进制数据，可用来存储二进制形式的数据，例如图像。默认情况下，该类型的字段只存储不索引。二进制类型只支持index_name属性。

array类型

在ElasticSearch中，没有专门的数组（Array）数据类型，但是，在默认情况下，
任意一个字段都可以包含0或多个值，这意味着每个字段默认都是数组类型，
只不过，数组类型的各个元素值的数据类型必须相同。在ElasticSearch中，
数组是开箱即用的（out of box），不需要进行任何配置，就可以直接使用。
在同一个数组中，数组元素的数据类型是相同的，
ElasticSearch不支持元素为多个数据类型：[ 10, “some string” ]，常用的数组类型是：

字符数组: [ “one”, “two” ]
整数数组: productid:[ 1, 2 ]

对象（文档）数组:
	“user”:[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]，
ElasticSearch内部把对象数组展开为
	{“user.name”: [“Mary”, “John”], “user.age”: [12,10]}

ip类型

ip类型的字段用于存储IPv4或者IPv6的地址

增删查改

下面的操作均是操作在 kibana 上的。安装教程

# 新建一个 index
PUT nba
# 增加该 index 的mapping 映射
PUT nba/_mapping
{
    "properties": {
        "name": {
            "type": "text"
        },
        "team_name": {
            "type": "text"
        },
        "position": {
            "type": "keyword"
        },
        "play_year": {
            "type": "keyword"
        },
        "jerse_no": {
            "type": "keyword"
        }
    }
}
# 可以查看刚刚设置的 index 的mapping
GET nba/_mapping

# 可以追加字段
PUT nba/_mapping
{
    "properties": {
        "name": {
            "type": "text"
        },
        "team_name": {
            "type": "text"
        },
        "position": {
            "type": "keyword"
        },
        "play_year": {
            "type": ""integer""
        },
        "jerse_no": {
            "type": "keyword"
        }
        "birthday":{
          "type": "date"
        }
    }
}
# 单个增加数据
PUT nba/_doc/1
{
    "name": "哈登",
    "team_name": "⽕箭",
    "position": "得分后卫",
    "play_year": 10,
    "jerse_no": "13",
    "country": "AE"
}
#批量增加数据
PUT nba/_bulk
{"index": {"_index": "nba",  "_id": 1}}
{"name": "ha deng", "team_name": "huojian","position": "得分后卫","play_year": 10,"jerse_no": "13", "birthday": "1999-01-04"} 
{"index": {"_index": "nba",  "_id": 2}}
{"name": "ku li si", "team_name": "yongshi","position": "控球后卫","play_year": 10,"jerse_no": "10", "birthday": "1998-02-02"} 
{"index": {"_index": "nba",  "_id": 3}}
{"name": "zhan mu si", "team_name": "huren","position": "⼩前锋","play_year": 15,"jerse_no": "23", "birthday":"1999-02-03"}

#词条查询 term 不会分析查询条件，文章中必须包含全部的词条语句时，才匹配搜索
GET nba/_search
{
    "query": {
        "term": {
            "jerse_no": "23"
        }
    }
}
#多条terms查询
GET nba/_search
{
    "query": {
        "terms": {
            "jerse_no": [
                "23",
                "13"
            ]
        }
    }
}

ElasticSearch引擎会先分析查询字符串，将其拆分成多个分词，只要已分析的字段中包含词条的任意一个,或全部包含，就匹配查询条件，返回该文档；如果不包含任意一个分词，表示没有任何问的那个匹配查询条件

# match_all
GET nba/_search
{
    "query": {
        "match_all": {}
    },
    "from": 0,
    "size": 10
}
# match (因为默认分词器，输入 ha 也可以实现)
GET nba/_search
{
    "query": {
        "match": {
            "name":"ha deng"  
        }
    },
    "from": 0,
    "size": 10
}
# multi_match
GET nba/_search
{
    "query": {
        "multi_match": {
            "query": "si",
            "fields": ["team_name", "name"]
        }
    }
}

match查询属于全文查询，在查询时，ES会先分析查询字符串，然后根据分词构建查询。match_phrase在查询时也会先分析查询字符串，然后对这些词项进行搜索不同的是match_phrase查询只会保留包含全部查询字符串的文档

# match_phrase 
GET nba/_search
{
    "query": {
      "match_phrase": {
        "name": "li si"
      }
    }
}
#match_phrase_prefix
GET nba/_search
{
    "query": {
        "match_phrase_prefix": {
            "name": "h"
        }
    }
}

Exsit Query在特定的字段中查找非空值的文档

GET nba/_search
{
    "query": {
        "exists": {
          "field": "name"
        }
    }
}

Wildcard Query支持通配符查询，*表示任意字符，?表示任意单个字符

GET nba/_search
{
    "query": {
      "wildcard": {
        "name": {
          "value": "*h*"
        }
      }
    }
}

Regexp Query正则表达式查询

GET nba/_search
{
    "query": {
      "regexp": {
        "team_name": "h.*"
      }
    }
}

Ids Query(查找id为1和2)

GET nba/_search
{
    "query": {
      "ids": {
        "values": [1,2]
      }
    }
}

查询指定字段在指定范围内包含值（日期、数字或字符串）的文档

GET nba/_search
{
  "query": {
    "range": {
      "play_year": {
        "gte": 0,
        "lte": 10
      }
    }
  },
  "from": 0,
  "size": 20
}
# 查找1996年到2000年出生的球员
GET nba/_search
{
  "query": {
    "range": {
      "birthday": {
        "gte": 1996,
        "lte": 2000,
        "format": "yyyy"
      }
    }
  },
  "from": 0,
  "size": 20
}

布尔查询

type	description
must	必须出现在匹配文档中
filter	必须出现在文档中，但是不打分
must_not	不能出现在文档中
should	应该出现在文档中

GET nba/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "zhan"
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 20
}
# 不打分 _score
GET nba/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "name": "zhan"
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 20
}

GET nba/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "i si"
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "name": {
              "value": "zhan"
            }
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 20
}
# should 即 play_year 范围匹配不到也不会去除，只是返回分数不一样
GET nba/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "i si"
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "name": {
              "value": "zhn"
            }
          }
        }
      ],
      "should": [
        {
          "range": {
            "play_year": {
              "gte": 11,
              "lte": 20
            }
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 20
}

排序

GET nba/_search
{
  "query": {
    "match": {
      "name": "si"
    }
  },
  "sort": [
    {
      "play_year": {
        "order": "desc"
      }
    }
  ], 
  "from": 0,
  "size": 20
}

ES聚合查询是什么

聚合查询是数据库中重要的功能特性，完成对一个查询得到的数据集的聚合计算，如：找出某字段(或计算表达式的结果)的最大值，最小值，计算和，平均值等。ES作为搜索引擎，同样提供了强大的聚合分析能力
对一个数据集求最大、最小、和、平均值等指标的聚合，在ES中称为指标聚合
而关系型数据库中除了有聚合函数外，还可以对查询出的数据进行分组group by，再在组上进行指标聚合。在ES中称为“桶聚合”

max min avg min 同理 keyword 不支持

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "avgAge": {
      "avg": {
        "field": "play_year"
      }
    }
  },
  "size": 0
}

value_count统计非空字段的文档数

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "countPlayerYear": {
      "value_count": {
        "field": "jerse_no"
      }
    }
  },
  "size": 0
}

Cardinality值去重计数

GET nba/_search
{
  "query": {
    "term": {
      "play_year": {
        "value": "10"
      }
    }
  },
  "aggs": {
    "countAget": {
      "cardinality": {
        "field": "play_year"
      }
    }
  },
  "size": 0
}

stats统计count max min avg sum5个值

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "statsAge": {
      "stats": {
        "field": "play_year"
      }
    }
  },
  "size": 0
}

Extended stats比stats多4个统计结果：平方和、方差、标准差、平均值加/减两个标准差的区间

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "statsAge": {
      "extended_stats": {
        "field": "play_year"
      }
    }
  },
  "size": 0
}

Percentiles占比百分位对应的值统计，默认返回【1,5,25,50,75,95,99】分位上的值

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "statsAge": {
      "percentiles": {
        "field": "play_year"
      }
    }
  },
  "size": 0
}

自定义区间

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "statsAge": {
      "percentiles": {
        "field": "play_year",
        "percents": [
          20,
          50,
          75
        ]
      }
    }
  },
  "size": 0
}

ES聚合分析是什么

聚合分析是数据库中重要的功能特性，完成对一个查询的数据集中数据的聚合计算，如：找出字段（或计算表达式的结果）的最大值、最小值、计算和、平均值等。ES作为搜索引擎兼容数据库，同样提供了强大的聚合分析能力
对一个数据集求最大、最小、和、平均值等指标的聚合，在ES中称为指标聚合
而关系型数据库中除了有聚合函数外，还可以对查询出的数据进行分组group by，再在组上进行游标聚合。在ES中称为桶聚合

Terms Aggregation根据字段项分组聚合

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "statsAge": {
      "terms": {
        "field": "play_year",
        "size": 10
      }
    }
  },
  "size": 0
}

Order分组聚合排序通过 _key / _count

GET nba/_search
{
  "query": {
    "term": {
      "name": {
        "value": "si"
      }
    }
  },
  "aggs": {
    "statsAge": {
      "terms": {
        "field": "play_year",
        "size": 10,
        "order": {
          "_key": "desc"
        }
      }
    }
  },
  "size": 0
}

筛选分组聚合
再次注意：聚合最好用于keyword，用于text 时，会需要设置fielddata 肯影响性能
同时 include 也可以使用 “Lakers|Ro.|Warriors.”

GET nba/_search
{
  "aggs": {
    "aggsTeamName": {
      "terms": {
        "field": "jerse_no",
        "include": [
          "13",
          "10",
          "23"
        ],
        "exclude": [
          "23"
        ],
        "size": 30,
        "order": {
          "avgAge": "desc"
        }
      },
      "aggs": {
        "avgAge": {
          "avg": {
            "field": "play_year"
          }
        }
      }
    }
  },
  "size": 0
}

Range Aggregation范围分组聚合
keyword 不支持 aggregation [range]
key 增加别名

GET nba/_search
{
  "aggs": {
    "ageRange": {
      "range": {
        "field": "play_year",
        "ranges": [
          {
            "to": 10
          },
          {
            "from": 10,
            "to": 35
          },
          {
            "to": 35
          }
        ]
      }
    }
  },
  "size": 0
}

Date Range Aggregation时间范围分组聚合

GET nba/_search
{
  "aggs": {
    "birthDayRange": {
      "date_range": {
        "field": "birthday",
        "format": "MM-yyy",
        "ranges": [
          {
            "to": "01-1989"
          },
          {
            "from": "01-1989",
            "to": "01-1999"
          },
          {
            "from": "01-1999",
            "to": "01-2009"
          },
          {
            "from": "01-2009"
          }
        ]
      }
    }
  },
  "size": 0
}

Date Histogram Aggregation时间柱状图聚合
按天、月、年等进行聚合统计。可按year(1y)，quarter(1q)，month(1M)，week(1w)，day(1d)，hour(1h)，minute(1m)，second(1s)间隔聚合
fixed_interval 枚举值 milliseconds (ms)、seconds (s)、minutes (m)、hours (h)、days (d)
calendar_interval 枚举值 minute, 1m 、hour, 1h、day, 1d、week, 1w、month, 1M、quarter, 1q、year, 1y

GET nba/_search
{
  "aggs": {
    "birthday_aggs": {
      "date_histogram": {
        "field": "birthday",
        "format": "yyyy",
        "calendar_interval": "year"
      }
    }
  },
  "size": 0
}

query_string查询，如果熟悉lucene的查询语法，我们可以直接用lucene查询语法写一个查询串进行查询，ES中接到请求后，通过查询解析器，解析查询串生成对应的查询

# OR / AND
# 指定单个字段
GET nba/_search
{
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "si OR ha"
    }
  }, 
  "size": 100
}
# 指定多个字段
GET nba/_search
{
  "query": {
    "query_string": {
      "fields": ["name", "team_name"],
      "query": "si OR ha"
    }
  }, 
  "size": 100
}

参考：
1.

1咸菜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
elasticsearch 7.9.0基本使用

下载部署进入官方GitHub下载elasticsearch 安装教程有很多，大体都一样，推荐一篇注意：启动不可以使用 root 用户，需要设置jvm内存大小，如果需要远程连接注意设置ip基本使用概念先说Elasticsearch的文件存储，Elasticsearch是面向文档型数据库，一条数据在这里就是一个文档，用JSON作为文档序列化的格式，比如下面这条用户数据：{ "name" : "name", "sex" : "sex", "age" :
复制链接

扫一扫