ElasticSearch | 文档的父子关系

最新推荐文章于 2024-04-14 20:06:20 发布

乌鲁木齐001号程序员

最新推荐文章于 2024-04-14 20:06:20 发布

阅读量242

点赞数

本文链接：https://blog.csdn.net/weixin_33669968/article/details/106422331

版权

对象 | Nested 对象 | 局限性

每篇博客的文档中，包含作者信息，当作者信息变更时，整个博客文档都需要变更；

Parent & Child

ElasticSearch 中提供了类似关系型数据库中 Join 的实现，使用 Join 数据类型实现，可以通过维护 Parent / Child 的关系，从而分离两个对象；
父文档和子文档是两个独立的文档；
更新父文档，无需重新索引子文档；子文档被添加、更新或删除也不会影响到父文档和其他的子文档；

定义父子关系的几个步骤

设置索引的 Mapping；
索引父文档；
索引子文档；
按需查询文档；

文档父子关系 | 举个栗子

创建索引 | 设置父子关系

"type": "join" 指定这是一个父子文档；
relations 中 "blog": "comment" 指明 blog 是父文档，comment 是子文档；

DELETE my_blogs

PUT my_blogs
{
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "blog_comments_relation": {
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      },
      "content": {
        "type": "text"
      },
      "title": {
        "type": "keyword"
      }
    }
  }
}

索引父文档

blog1 和 blog2 是文档 Id；
blog 指明索引的是一个父文档；

PUT my_blogs/_doc/blog1
{
  "title":"Learning Elasticsearch",
  "content":"learning ELK @ geektime",
  "blog_comments_relation":{
    "name":"blog"
  }
}

PUT my_blogs/_doc/blog2
{
  "title":"Learning Hadoop",
  "content":"learning Hadoop",
    "blog_comments_relation":{
    "name":"blog"
  }
}

索引子文档

comment1 是子文档 Id；
routing=blog1 指明其父文档 Id 是 blog1，并且可以和其父文档索引到同一个分片中；
"name":"comment" 指明这是一篇子文档；
"parent":"blog1" 指明其父文档的 Id；

PUT my_blogs/_doc/comment1?routing=blog1
{
  "comment":"I am learning ELK",
  "username":"Jack",
  "blog_comments_relation":{
    "name":"comment",
    "parent":"blog1"
  }
}

PUT my_blogs/_doc/comment2?routing=blog2
{
  "comment":"I like Hadoop!!!!!",
  "username":"Jack",
  "blog_comments_relation":{
    "name":"comment",
    "parent":"blog2"
  }
}

PUT my_blogs/_doc/comment3?routing=blog2
{
  "comment":"Hello Hadoop",
  "username":"Bob",
  "blog_comments_relation":{
    "name":"comment",
    "parent":"blog2"
  }
}

查询所有文档

父文档和子文档都能查出来，父文档和子文档单独存储；

POST my_blogs/_search
{

}

根据父文档 Id 查询

只能查到父文档；

GET my_blogs/_doc/blog2

parent_id 查询 | 通过父文档 Id 查其子文档

"type": "comment" 查子文档；
"id": "blog2" 父文档的 Id；

POST my_blogs/_search
{
  "query": {
    "parent_id": {
      "type": "comment",
      "id": "blog2"
    }
  }
}

has_child 查询 | 通过子文档中的信息查询父文档

子文档的 username 字段值是 Jack 的子文档的父文档；

POST my_blogs/_search
{
  "query": {
    "has_child": {
      "type": "comment",
      "query" : {
          "match": {
              "username" : "Jack"
          }
        }
    }
  }
}

has_parent 查询 | 查父文档中的内容得到子文档

父文档中的 title 字段包含 Learning Hadoop 的父文档下的子文档；

POST my_blogs/_search
{
  "query": {
    "has_parent": {
      "parent_type": "blog",
      "query" : {
          "match": {
              "title" : "Learning Hadoop"
          }
      }
    }
  }
}

只通过子文档 Id 访问子文档

访问不到；

GET my_blogs/_doc/comment3

通过子文档 Id 和 routing 查找子文档

可以查到；

GET my_blogs/_doc/comment3?routing=blog2

更新子文档

PUT my_blogs/_doc/comment3?routing=blog2
{
    "comment": "Hello Hadoop??",
    "blog_comments_relation": {
      "name": "comment",
      "parent": "blog2"
    }
}

Nested 对象 vs 父子文档

	Nested Object	Parent / Child
优点	文档存储在一起，读取性能好	父子文档可以独立更新
缺点	更新嵌套的子文档时，需要更新整个文档	需要额外的内存维护关系，读取性能相对较差
适用场景	子文档偶尔更新，以查询为主	子文档更新频繁

乌鲁木齐001号程序员

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch | 文档的父子关系

对象 | Nested 对象 | 局限性每篇博客的文档中，包含作者信息，当作者信息变更时，整个博客文档都需要变更；Parent & ChildElasticSearch 中提供了类似关系型数据库中 Join 的实现，使用 Join 数据类型实现，可以通过维护 Parent / Child 的关系，从而分离两个对象；父文档和子文档是两个独立的文档；更新父文档，无需重新索引子文...
复制链接

扫一扫