ES 相关基本概念

最新推荐文章于 2024-10-03 22:37:03 发布

青山流水在深谷

最新推荐文章于 2024-10-03 22:37:03 发布

阅读量1k

点赞数

分类专栏： ES

本文链接：https://blog.csdn.net/dymkkj/article/details/81278490

版权

ES 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1.index --同RDBMS 的schema 数据库类型

2.Type 类型 --同数据表

3.Document --同RDBMS 一组关系:一条记录

4.Field :字段对应列

ES 主从架构,去中心花(任意节点都可以同ES集群通信,并且是等价,P2P网络架构,即使master,又是data节点

主节点(master node),数据节点(data node)

主节点: 负责集群状态变更:增删节点,index,mapping管理,replicator管理,reparition管理
为防止脑裂,徐设置多个master,生产中,master独占一台服务器,这个服务器不存储数据(防止负载过重,导致主节点不能提供服务,甚至导致脑裂)
相关配置单台机器对应配置文件配置项 :
node.master: true
node.data: false

数据节点:
数据节点用于存放数据,存放lucence索引,数据节点在生产环境也是配置成独占的配置如下
单台机器对应的配置文件配置项:
node.master.false: true
node.data: false

新增节点发现方式
组播:每个节点向指定的多播足和端口发送ping请求,每个节点响应请求；当找到主节点,就将该节点接入集群.如果多播没发现主节点,集群会通过paxos算法选择一个主节点
生产环境:不推荐使用组播方式,可能会将不相干的节点加入集群
discover.zen.ping.multicast.enabled=false

单播: 只向配置好的主机列表和端口(yml文件) 发送请求
discover.zen.ping.unicast.hosts:
-- 192.168.10:9300
-- 192.168.1.11
-- seeds.mydomain.com

注: 不需要把集群所有节点都配置,新节点会根据配置的主机和端口通信,只要发现了某个集群,就会加入到这个集群；但从高可用角度,节点配置不能配置过少.

分片:
ES提供了将索引划分成多片的能力,称作分片.
1.分片的数量只能在创建索引时指定.
2.每个分片是一个功能完善,独立的索引,此索引可以被放置到任何节点.

一个索引可以有多个shard,每个shard就是一个lucene索引
从集群角度,类似分片类似kafka的topic分区(partition)

分片意义:
1.允许水平分割,扩展你的内容容量
2.允许在(多节点的集群基础)分片上之上进行分布式,并行操作,提高吞吐,性能

副本:

网络环境进行容错设计；在节点,分片因故障而处于离线或消失情况下,故障转移机制异常重要.
ES允许创建分片的一份或多分拷贝,拷贝称之为份分片数据的复制--复制

复制意义:
1.分片/节点失败,离线状态下,复制提供高可用,复制分片不与原/主分片置于同一个节点是非常重要的.
2.因为嗦嗦可以在所有复制上并行执行,复制产生的副本可以扩展你的搜索两/吞吐量

默认情况:ES为每个索引分配5个主分片,1.个复制--意味着集群至少有两个节点.你的索引将会有5个主分片,5个复制分片(1个完全拷贝),总共两个副本,集群有个10个分片

副本:对应一个完整的数据整体--相对正本来说
分片:每个数据整体(副本,正本)按照规则(范围,hash值等)进行切分为多个分片(kafka partition),这些分片组成一个数据整体

注意:分片可以在创建index是指定,指定后无法修改,副本可以修改

每个shard(分片)最大数据了 (20亿Byte)

分片,副本设置

curl -XPUT http://hadoop:9200/myindex/mytype -d '{
"settings":{"number_of_shards":2,"number_of_replicas":1}
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"myindex"}

设置副本数
curl -XPUT http://spark/ -d '{
"number_of_replicas":0
}'

正排索引:

documentId doucment word

倒排索引:反过来进行排列

word doucmentid

ES简单操作:

upsert操作

curl -XPUT http://hadoop:9200/myindex1/mytype1/1?pretty -d '{
"name":"Jerry",
"age":22,
"sex":"male",
"hobby":["Football","BasketBall","sing"]
}'

curl -XPUT http://hadoop:9200/myindex1/mytype1/2?pretty -d '{
"name":"jerry1",
"age":23,
"sex":"female",
"hobby":["Football","BasketBall","sing"]
}'

curl -XPUT http://hadoop:9200/myindex1/mytype1/3?pretty -d '{
"name":"Jerry2",
"age":24,
"sex":"male",
"hobby":["Football","BasketBall","sing"]
}'

curl -XPUT http://hadoop:9200/myindex1/mytype1/4?pretty -d '{
"name":"jerry3",
"age":26,
"sex":"male1",
"hobby":["Football","BasketBall","sing"]
}'

curl -XPUT http://hadoop:9200/myindex1/mytype1/5?pretty -d '{
"name":"Jerry4",
"age":26,
"sex":"female",
"hobby":["Football","BasketBall","sing"]
}'

查询: GET
curl -XGET http://hadoop:9200/myindex1/mytype1/5?pretty

更新:XDELTE

curl -XPOST http://hadoop:9200/myindex1/mytype1/5?pretty -d '{
"name":"Jerry44",
"age":26,
"addr":"beijing"
"sex":"female",
"hobby":["Football","BasketBall","sing"]
}'

curl -XPOST http://hadoop:9200/myindex1/mytype1/5?pretty -d '{
"name":"Jerry44",
"age":26,
"addr":"beijing",
"sex":"female",
"hobby":["Football","BasketBall","sing"]
}'
{
"_index" : "myindex1",
"_type" : "mytype1",
"_id" : "5",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2, curl -XDELETE http://hadoop:9200/myindex1/mytype1/5?pretty
{
"found" : true,
"_index" : "myindex1",
"_type" : "mytype1",
"_id" : "5",
"_version" : 3,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
}
}

"successful" : 1,
"failed" : 0
},
"created" : false
}

ES 更新非真正修改了数据,而是有新创建了一行数据,进行数据版本管理,获取最新版本的数据,类似hbase 数据cell都有版本

删除:
curl -XDELETE http://hadoop:9200/myindex1/mytype1/5?pretty
{
"found" : true,
"_index" : "myindex1",
"_type" : "mytype1",
"_id" : "5",
"_version" : 3,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
}
}

Mapping　（index 必须是已经存在）
查看和指定字段的数据类型;显示Index 结构

curl -XGET http://hadoop:9200/myindex1/mytype1/_mapping?pretty
{
"myindex1" : {
"mappings" : {
"mytype1" : {
"properties" : {
"addr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "long"
},
"hobby" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"sex" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}

curl -XGET http://hadoop:9200/myindex/_mapping?pretty
{
"myindex" : {
"mappings" : { }
}
}

动态maping
在前面通过XPUT 创建文档，没有现实指定字段数据类型，ES自动省城了每个字段数据类型

显式Mapping：
首先查看下ES支持数据类型：
5.x版本：　简单数据类型　text,keyword(全文检索：不进行进行分词，完整检索）,date,integer,long,double,boolean,ip
支持json分层特性类型：　object,nested
特定数据类型：geo_point,geo_shape,completion

2.x 还有string类型　--> 5.x text

ES能根据字段值自动推测数据类型

注意：Mapping必须在index已经存在时才能进行创建

创建index
curl -XPUT http://hadoop:9200/mytest?pretty
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "mytest"
}

显式maping:
curl -XPUT http://hadoop:9200/mytest/orders/_mapping?pretty -d '{
"properties":{
"orderid":{"type":"text"},
"goodname":{"type":"keyword"}, //不想被分词器拆分-->5.x: 指定字段数据类型为keyword即可
"producer":{"type":"text","index":"not_analyzed"}, //不想被分词器拆分--> 2.x 指定为 "index":"not-analyzed"
"goodnums":{"type":"integer"},
"price":{"type":"float"},
"orderdate":{"type":"date","format":"dd/MM/YYYY"}
}
}'
{
"acknowledged" : true
}

注：　如需要字段不被分词器拆分，而是作为一个整体存储在索引中， 2.x,5.x分别做如下设置：
2.x: 指定为 "index":"not-analyzed"
5.x: 指定字段数据类型为keyword即可

查看maping
curl -XGET http://hadoop:9200/mytest/_mapping?pretty

结果:
{
"mytest" : {
"mappings" : {
"orders" : {
"properties" : {
"goodname" : {
"type" : "keyword"
},
"goodnums" : {
"type" : "integer"
},
"orderdate" : {
"type" : "date",
"format" : "dd/MM/YYYY"
},
"orderid" : {
"type" : "text"
},
"price" : {
"type" : "float"
},
"producer" : {
"type" : "text"
}
}
}
}
}
}

索引模板

在实际生产应用中, 有些索引可能具有相同或者相似的属性, 这个时候可以希望这些索引的类型和字段相同, 这个时候就可以使用索引模板。
比如电商企业,每个月都会有订单数据, 我可以按月建index, 索引名称:myorders_{yyyymm}, 每个月的index的type和字段数据类型均相同, 这个
时候就可以使用模板了。
索引模板对应的模板:myorders_*, 那么当新建的索引的名称匹配myorders_*, 就会自动引用模板中定义的类型映射

创建索引模板

curl -XPUT http://hadoop:9200/_template/myoders_template -d '{
"template":"myorders_*",
"settings":{"number_of_shards":5,"number_of_replicas":2},
"mappings":{
"t_order":{
"_source":{
"enabled":false
},
"properties":{
"orderid":{"type":"text"},
"price":{"type":"float"},
"supplier":{"type":"keyword"},
"created_at":{"type":"date","format":"YYYY/MM/dd HH:mm:ss"}
}

}
}

{"acknowledged":true}

查看所有模板列表:
1
curl ‐XGET http://spark1234:9200/_template?pretty
{
"myoders_template" : {
"order" : 0,
"template" : "myorders_*",
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "2"
}
},
"mappings" : {
"t_order" : {
"_source" : {
"enabled" : false
},
"properties" : {
"orderid" : {
"type" : "text"
},
"price" : {
"type" : "float"
},
"supplier" : {
"type" : "keyword"
},
"created_at" : {
"type" : "date",
"format" : "YYYY/MM/dd HH:mm:ss"
}
}
}
},
"aliases" : { }
}
}

根据模板名称查看模板:
1 curl ‐XGET http://spark1234:9200/_template/myorders_template?pretty

四查询

(1). URL查询
查询name为xiao1的记录:
select * from myindex.mytyep where name='xiao1'

curl ‐XGET http://spark1234:9200/myindex/mytype/_search?q=name:xiao1
(2). match_all查询
使用curl XGET或者curl XPOST
curl ‐XGET http://spark1234:9200/myindex/mytype/_search?pretty ‐d '{
"query":{
"match_all": {}
}
}'
(3). terms查询
查询倒排索引中与查询条件完全匹配的文档。
curl ‐XPOST http://spark1234:9200/myindex/mytype/_search?pretty ‐d '{
"query":{
"term": {
"name":{
"value":"ring"
}
}
},
"size": 10
}'
查看结果:
将name中包含ring关键字的都查询出来了:
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "myindex",
"_type" : "mytype",
"_id" : "4",
"_score" : 0.6931472,
"_source" : {20 "name" : "ring",
"age" : 26,
"sex" : "female",
"hobby" : [
"dance"
]
}
},
{
"_index" : "myindex",
"_type" : "mytype",
"_id" : "5",
"_score" : 0.25811607,
"_source" : {
"name" : "zhou ring",
"age" : 26,
"sex" : "female",
"hobby" : [
"dance"
]
}
}
]
}
}

(4). Boolean查看
类比关系型数据库的and或or操作:
must: 满足条件的数据
must_not: 不满足条件的数据
should: 至少满足其中一个条件的数据
select * from myindex.mytype where age=26 and name in ("zhou", "ring");
curl ‐XPOST http://spark1234:9200/myindex/mytype/_search?pretty ‐d '{
"query":{
"bool": {
"must":[
{
"term": {
"age": 26
}
}
],
"should":[
{
"terms": {
"name":["zhou", "ring"]
}
}
]
}

},
"size": 10
}'
可以设置mininum_number_should_match来控制使用should时至少要满足的条件的个数。 70,0-1 底端