ES 提供了类似关系型数据库中 Join 的实现。使⽤ Join 数据类型实现,可以通过维护 Parent/ Child 的关系,从⽽分离两个对象
⽗⽂档和⼦⽂档是两个独⽴的⽂档
更新⽗⽂档⽆需重新索引⼦⽂档。⼦⽂档被添加,更新或者删除也不会影响到⽗⽂档和其他的⼦⽂档
⽗⼦关系
定义⽗⼦关系的⼏个步骤
设置索引的 Mapping
索引⽗⽂档
索引⼦⽂档
按需查询⽂档
设置 Mapping
索引⽗⽂档
索引⼦⽂档
Parent / Child 所⽀持的查询
查询所有⽂档
Parent Id 查询
Has Child 查询
Has Parent 查询
使⽤ has_child 查询
使⽤ has_parent 查询
使⽤ parent_id 查询
访问⼦⽂档
更新⼦⽂档
嵌套对象 v.s ⽗⼦⽂档
Nested Object
Parent / Child
优点
⽂档存储在⼀起,读取性能⾼
⽗⼦⽂档可以独⽴更新
优点
更新嵌套的⼦⽂档时,需要更新整个⽂档
需要额外的内存维护关系。读取性能相对差
优点
⼦⽂档偶尔更新,以查询为主
⼦⽂档更新频繁
demoAPI
DELETE my_blogs
# 设定 Parent/Child Mapping
PUT my_blogs
{"settings":{"number_of_shards":2},
"mappings":{"properties":{"blog_comments_relation":{"type":"join",
"relations":{"blog":"comment"}},
"content":{"type":"text"},
"title":{"type":"keyword"}}}}#索引父文档
PUT my_blogs/_doc/blog1
{"title":"Learning Elasticsearch",
"content":"learning ELK @ geektime",
"blog_comments_relation":{"name":"blog"}}#索引父文档
PUT my_blogs/_doc/blog2
{"title":"Learning Hadoop",
"content":"learning Hadoop",
"blog_comments_relation":{"name":"blog"}}#索引子文档
PUT my_blogs/_doc/comment1?routing=blog1
{"comment":"I am learning ELK",
"username":"Jack",
"blog_comments_relation":{"name":"comment",
"parent":"blog1"}}#索引子文档
PUT my_blogs/_doc/comment2?routing=blog2
{"comment":"I like Hadoop!!!!!",
"username":"Jack",
"blog_comments_relation":{"name":"comment",
"parent":"blog2"}}#索引子文档
PUT my_blogs/_doc/comment3?routing=blog2
{"comment":"Hello Hadoop",
"username":"Bob",
"blog_comments_relation":{"name":"comment",
"parent":"blog2"}}# 查询所有文档
POST my_blogs/_search
{}#根据父文档ID查看
GET my_blogs/_doc/blog2
# Parent Id 查询
POST my_blogs/_search
{"query":{"parent_id":{"type":"comment",
"id":"blog2"}}}# Has Child 查询,返回父文档
POST my_blogs/_search
{"query":{"has_child":{"type":"comment",
"query":{"match":{"username":"Jack"}}}}}# Has Parent 查询,返回相关的子文档
POST my_blogs/_search
{"query":{"has_parent":{"parent_type":"blog",
"query":{"match":{"title":"Learning Hadoop"}}}}}#通过ID ,访问子文档
GET my_blogs/_doc/comment3
#通过ID和routing ,访问子文档
GET my_blogs/_doc/comment3?routing=blog2
#更新子文档
PUT my_blogs/_doc/comment3?routing=blog2
{"comment":"Hello Hadoop??",
"blog_comments_relation":{"name":"comment",
"parent":"blog2"}}
Update By Query & Reindex API
使⽤场景
⼀般在以下⼏种情况时,我们需要重建索引
索引的 Mappings 发⽣变更:字段类型更改,分词器及字典更新
索引的 Settings 发⽣变更:索引的主分⽚数发⽣改变
集群内,集群间需要做数据迁移
Elasticsearch 的内置提供的 API
Update By Query:在现有索引上重建
Reindex:在其他索引上重建索引
案例 1:为索引增加⼦字段
Update By Query
案例 2:更改已有字段类型的 Mappings
Reindex API
两个注意点
OP Type
跨集群 ReIndex
查看 Task API
本节回顾
Update By Query 的使⽤场景:为字段新增⼦字段;字段更换分词器,或更新分词器词库
Reindex API 的使⽤场景:修改字段类型
需要先对新索引设置 Mapping,索引的设置和映射关系不会被复制
通过查看 Task API,了解 Reindex 的状况
Remote ReIndex,需要修改 elasticsearch.yml 配置并且重启
⼀定要尽量使⽤ Index Alias 读写数据。即便发⽣ Reindex,也能够实现零停机维护
demo API
DELETE blogs/
# 写入文档
PUT blogs/_doc/1
{"content":"Hadoop is cool",
"keyword":"hadoop"}# 查看 Mapping
GET blogs/_mapping
# 修改 Mapping,增加子字段,使用英文分词器
PUT blogs/_mapping
{"properties":{"content":{"type":"text",
"fields":{"english":{"type":"text",
"analyzer":"english"}}}}}# 写入文档
PUT blogs/_doc/2
{"content":"Elasticsearch rocks",
"keyword":"elasticsearch"}# 查询新写入文档
POST blogs/_search
{"query":{"match":{"content.english":"Elasticsearch"}}}# 查询 Mapping 变更前写入的文档
POST blogs/_search
{"query":{"match":{"content.english":"Hadoop"}}}# Update所有文档
POST blogs/_update_by_query
{}# 查询之前写入的文档
POST blogs/_search
{"query":{"match":{"content.english":"Hadoop"}}}# 查询
GET blogs/_mapping
PUT blogs/_mapping
{"properties":{"content":{"type":"text",
"fields":{"english":{"type":"text",
"analyzer":"english"}}},
"keyword":{"type":"keyword"}}}
DELETE blogs_fix
# 创建新的索引并且设定新的Mapping
PUT blogs_fix/
{"mappings":{"properties":{"content":{"type":"text",
"fields":{"english":{"type":"text",
"analyzer":"english"}}},
"keyword":{"type":"keyword"}}}}# Reindx API
POST _reindex
{"source":{"index":"blogs"},
"dest":{"index":"blogs_fix"}}
GET blogs_fix/_doc/1
# 测试 Term Aggregation
POST blogs_fix/_search
{"size":0,
"aggs":{"blog_keyword":{"terms":{"field":"keyword",
"size":10}}}}# Reindx API,version Type Internal
POST _reindex
{"source":{"index":"blogs"},
"dest":{"index":"blogs_fix",
"version_type":"internal"}}# 文档版本号增加
GET blogs_fix/_doc/1
# Reindx API,version Type Internal
POST _reindex
{"source":{"index":"blogs"},
"dest":{"index":"blogs_fix",
"version_type":"external"}}# Reindx API,version Type Internal
POST _reindex
{"source":{"index":"blogs"},
"dest":{"index":"blogs_fix",
"version_type":"external"},
"conflicts":"proceed"}# Reindx API,version Type Internal
POST _reindex
{"source":{"index":"blogs"},
"dest":{"index":"blogs_fix",
"op_type":"create"}}
GET _tasks?detailed=true&actions=*reindex