Elasticsearch-数据的输入和输出

最新推荐文章于 2022-11-08 18:48:41 发布

weixin_34008933

最新推荐文章于 2022-11-08 18:48:41 发布

阅读量138

点赞数

文章标签：大数据 json

原文链接：https://segmentfault.com/a/1190000016637056

版权

文档

在 Elasticsearch 中，术语文档有着特定的含义。它是指最顶层或者根对象,
这个根对象被序列化成 JSON 并存储到 Elasticsearch 中，指定了唯一 ID

{
    "name":         "John Smith",
    "age":          42,
    "confirmed":    true,
    "join_date":    "2014-06-01",
    "home": {
        "lat":      51.5,
        "lon":      0.1
    },
    "accounts": [
        {
            "type": "facebook",
            "id":   "johnsmith"
        },
        {
            "type": "twitter",
            "id":   "johnsmith"
        }
    ]
}

文档元数据

元数据三要素

_index 文档在哪存放
_type 文档表示的对象类别
_id 文档唯一标识

_index

一个索引应该是因共同的特性被分组到一起的文档集合。
例如，你可能存储所有的产品在索引 products 中，
而存储所有销售的交易到索引 sales 中。
虽然也允许存储不相关的数据到一个索引中，但这通常看作是一个反模式的做法。

索引名 : 这个名字必须小写，不能以下划线开头，不能包含逗号

_type

数据可能在索引中只是松散的组合在一起，
但是通常明确定义一些数据中的子分区是很有用的。例如，所有的产品都放在一个索引中，但是你有许多不同的产品类别，比如 "electronics" 、 "kitchen" 和 "lawn-care"。

_id

ID 是一个字符串，当它和 _index 以及 _type 组合就可以唯一确定 Elasticsearch 中的一个文档。当你创建一个新的文档，要么提供自己的 _id ，要么让 Elasticsearch 帮你生成。

索引文档

举个例子，如果我们的索引称为 website ，类型称为 blog ，并且选择 123 作为 ID ，那么索引请求应该是下面这样

PUT /website/blog/123
{
  "title": "My first blog entry",
  "text":  "Just trying this out...",
  "date":  "2014/01/01"
}

Autogenerating IDs

自动生成的 ID 是 URL-safe、基于 Base64 编码且长度为20个字符的 GUID 字符串。
这些 GUID 字符串由可修改的 FlakeID 模式生成，这种模式允许多个节点并行生成唯一 ID ，且互相之间的冲突概率几乎为零。

POST /website/blog/
{
  "title": "My second blog entry",
  "text":  "Still trying this out...",
  "date":  "2014/01/01"
}

检索文档

响应体包括目前已经熟悉了的元数据元素，再加上 _source 字段，这个字段包含我们索引数据时发送给 Elasticsearch 的原始 JSON 文档

GET /website/blog/123?pretty

{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 1,
  "found" :    true,
  "_source" :  {
      "title": "My first blog entry",
      "text":  "Just trying this out...",
      "date":  "2014/01/01"
  }
}

返回文档的一部分

获取source中的几个字段

GET /website/blog/123?_source=title,text

{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 1,
  "found" :   true,
  "_source" : {
      "title": "My first blog entry" ,
      "text":  "Just trying this out..."
  }
}

只获取source

GET /website/blog/123/_source

{
   "title": "My first blog entry",
   "text":  "Just trying this out...",
   "date":  "2014/01/01"
}

更新整个文档

在 Elasticsearch 中文档是 不可改变 的，不能修改它们。相反，如果想要更新现有的文档，需要重建索引或者进行替换

PUT /website/blog/123
{
  "title": "My first blog entry",
  "text":  "I am starting to get the hang of this...",
  "date":  "2014/01/02"
}

在响应体中，我们能看到 Elasticsearch 已经增加了 _version 字段值：

{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 2,
  "created":   false 
}

过程：
实际上 Elasticsearch 按前述完全相同方式执行以下过程：

从旧文档构建 JSON
更改该 JSON
删除旧文档
索引一个新文档

创建新文档

当我们索引一个文档，怎么确认我们正在创建一个完全新的文档，而不是覆盖现有的呢？

_index 、 _type 和 _id 的组合可以唯一标识一个文档

方法一（op_type 查询 -字符串参数）：

PUT /website/blog/123?op_type=create
{ ... }

方法二（URL 末端使用 /_create）：

PUT /website/blog/123/_create
{ ... }

删除文档

DELETE /website/blog/123

部分更新文档

POST /website/blog/123/_update
{
   "doc" : {
      "tags" : [ "testing" ],
      "views": 0
   }
}

结果：
{
  "_index": "website",
  "_type": "blog",
  "_id": "123",
  "_version": 3,
  "found": true,
  "_source": {
    "title": "My first blog entry",
    "text": "I am starting to get the hang of this...",
    "date": "2014/01/02",
    "views": 0,
    "tags": [
      "testing"
    ]
  }
}

取回多个文档

相同index 的查询

GET /website/blog/_mget
{
   "docs" : [
      { "_id" : 2 },
      { "_type" : "pageviews", "_id" :   1 }
   ]
}

如果所有文档的 _index 和 _type 都是相同的，你可以只传一个 ids 数组，而不是整个 docs 数组

GET /website/blog/_mget
{
   "ids" : [ "2", "1" ]
}

weixin_34008933

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch-数据的输入和输出

文档在 Elasticsearch 中，术语文档有着特定的含义。它是指最顶层或者根对象, 这个根对象被序列化成 JSON 并存储到 Elasticsearch 中，指定了唯一 ID{ "name": "John Smith", "age": 42, "confirmed":...
复制链接

扫一扫