二、elasticsearch入门（数据）_exception in thread "main" elasticsearchparseexcep-CSDN博客

本文链接：https://blog.csdn.net/zxh476771756/article/details/79021292

文档

程序中大多的实体或对象能够被序列化为包含键值对的JSON对象，键(key)是字段(field)或属性(property)的名字，值(value)可以是字符串、数字、波尔类型、另一个对象、值数组或者其他特殊类型，比如表示日期的字符串或者表示地理位置的对象。

{
    "name": "John Smith",
    "age": 42,
    "confirmed": true,
    "join_date": "2014-06-01",
    "home": {
        "lat": 51.5,
        "lon": 0.1
        },
    "accounts": [
        {
        "type": "facebook",
        "id": "johnsmith"
        },
        {
        "type": "twitter",
        "id": "johnsmith"
        }
    ]
}

文档元数据
一个文档不只有数据。它还包含了元数据(metadata)——关于文档的信息。三个必须的元数据节点是：

节点	说明
_index	文档存储的地方
_type	文档代表的对象的类
_id	文档的唯一标识

索引

使用自己的ID

PUT /website/blog/123
{
"title": "My first blog entry",
"text": "Just trying this out...",
"date": "2014/01/01"
}

自增ID

POST /website/blog/
{
"title": "My second blog entry",
"text": "Still trying this out...",
"date": "2014/01/01"
}

检索文档

想要从Elasticsearch中获取文档，我们使用同样的 _index 、 _type 、 _id ，但是HTTP方法改为 GET ：

GET /website/blog/123?pretty

检索文档的一部分

GET /website/blog/123?_source=title,text
GET /website/blog/123/_source

更新

PUT /website/blog/123
{
"title": "My first blog entry",
"text": "I am starting to get the hang of this...",
"date": "2014/01/02"
}

在响应中，我们可以看到Elasticsearch把 _version 增加了

创建一个新文档

请记住 _index 、 _type 、 _id 三者唯一确定一个文档。所以要想保证文档是新加入的，最简单的方式是使用 POST 方法让Elasticsearch自动生成唯一 _id ：

POST /website/blog/
{ ... }

第一种方法使用 op_type 查询参数：

PUT /website/blog/123?op_type=create
{ ... }

或者第二种方法是在URL后加 /_create 做为端点：

PUT /website/blog/123/_create
{ ... }

如果请求成功的创建了一个新文档， Elasticsearch将返回正常的元数据且响应状态码是 201 Created 。
另一方面，如果包含相同的 _index 、 _type 和 _id 的文档已经存在， Elasticsearch将返回 409 Conflict 响应状态码

删除文档

DELETE /website/blog/123

版本控制

PUT /website/blog/1?version=1 <1>
{
"title": "My first blog entry",
"text": "Starting to get the hang of this..."
}
我们只希望文档的 _version 是 1 时更新才生效。

然而，如果我们重新运行相同的索引请求，依旧指定 version=1 ， Elasticsearch将返回 409 Conflict 状态的HTTP响应。响应体类似这样：

{
"error" : "VersionConflictEngineException[[website][2] [blog][1]:
version conflict, current [2], provided [1]]",
"status" : 409
}

局部更新

POST /website/blog/1/_update
{
    "doc" : {    //接受一个局部文档参数 doc
    "tags" : [ "testing" ],
    "views": 0
    }
}

检索多个文档

像Elasticsearch一样，检索多个文档依旧非常快。合并多个请求可以避免每个请求单独的网络开销。如果你需要从
Elasticsearch中检索多个文档，相对于一个一个的检索，更快的方式是在一个请求中使用multi-get或者 mget API。

mget API参数是一个 docs 数组，数组的每个节点定义一个文档的 _index 、 _type 、 _id 元数据。如果你只想检索一个或几个确定的字段，也可以定义一个 _source 参数：

GET /_mget
{
"docs" : [
    {
    "_index" : "website",
    "_type" : "blog",
    "_id" : 2
    },
    {
    "_index" : "website",
    "_type" : "pageviews",
    "_id" : 1,
    "_source": "views"
    }
 ]
}

如果你想检索的文档在同一个 _index 中（甚至在同一个 _type 中），你就可以在URL中定义一个默认的 /_index 或
者 /_index/_type 。
你依旧可以在单独的请求中使用这些值：

GET /website/blog/_mget
{
"docs" :  [
        { "_id" : 2 },
        { "_type" : "pageviews", "_id" : 1 }
    ]
 }

事实上，如果所有文档具有相同 _index 和 _type ，你可以通过简单的 ids 数组来代替完整的 docs 数组：

GET /website/blog/_mget
{
    "ids" : [ "2", "1" ]
}

更省时的批量操作

mget 允许我们一次性检索多个文档一样， bulk API允许我们使用单一请求来实现多个文档的 create 、 index 、 update 或 delete 。这对索引类似于日志活动这样的数据流非常有用，它们可以以成百上千的数据为一个批次按序进行索引。

POST /_bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "My first blog post" }
{ "index": { "_index": "website", "_type": "blog" }}
{ "title": "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }

多大才算太大？

试着批量索引标准的文档，随着大小的增长，当性能开始降低，说明你每个批次的大小太大了。开始的数量可以在
1000~5000个文档之间，如果你的文档非常大，可以使用较小的批次。
通常着眼于你请求批次的物理大小是非常有用的。一千个1kB的文档和一千个1MB的文档大不相同。一个好的批次最好保持
在5-15MB大小间。