Elasticsearch 入门到精通-Elasticsearch数据写入

最新推荐文章于 2024-07-01 10:11:57 发布

王stone

最新推荐文章于 2024-07-01 10:11:57 发布

阅读量1.5k

点赞数 3

分类专栏： elasticsearch 文章标签： elasticsearch 大数据 big data

本文链接：https://blog.csdn.net/wangguoqing_it/article/details/121576747

版权

elasticsearch 专栏收录该内容

38 篇文章 8 订阅

订阅专栏

一、创建索引

通过以下命令可创建一个索引：

1、创建一个索引（不指定分片和副本默认1个分片，一个副本）

PUT blog

2、创建一个指定分片和副本数量的索引

PUT blog
{
  "settings":{
    "index":{
      "number_of_shards":5,
      "number_of_replicas":1
    }
  }
}

{
  "acknowledged": true,
  "shards_acknowledged": true
}

Elasticsearch 是利用分片将数据分发到集群内各处的。分片是数据的容器，文档保存在分片内，分片又被分配到集群内的各个节点里。
当你的集群规模扩大或者缩小时， Elasticsearch 会自动的在各节点中迁移分片，使得数据仍然均匀分布在集群里。

一个分片可以是主分片或者副本分片。索引内任意一个文档都归属于一个主分片，所以主分片的数目决定着索引能够保存的最大数据量。

一个副本分片只是一个主分片的拷贝。副本分片作为硬件故障时保护数据不丢失的冗余备份，并为搜索和返回文档等读操作提供服务。

在上面例子中，主分片为5，副本分片为1.

4、查看索引的信息

get blog

查看job这个索引的信息：

{
  "blog" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index" : {
        "creation_date" : "1637996424639",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "cW_CcSXwQZ6OB8QYz887-Q",
        "version" : {
          "created" : "7070199"
        },
        "provided_name" : "blog"
      }
    }
  }
}

5、可以只查看某一项信息

GET blog/_settings

可以查看blog这个索引的settings信息：

{
  "blog" : {
    "settings" : {
      "index" : {
        "creation_date" : "1637996424639",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "cW_CcSXwQZ6OB8QYz887-Q",
        "version" : {
          "created" : "7070199"
        },
        "provided_name" : "blog"
      }
    }
  }
}

6、修改索引信息

例如，将副本分片数量修改为5：

PUT blog/_settings
{
  "number_of_replicas":5
}

7、映射

在创建索引时，我们可以预先设定映射，规定好各个字段及其数据类型，便于es更好地进行管理。比如说，以文章库为例，一篇文章的关键词字段应当作为完整的词语，而文章的正文字段必须通过中文分词器进行分词。

通过设置映射mapping，可以告知es这些字段的规则。

更详细文档参见：https://www.elastic.co/guide/...

8、数据类型

Elasticsearch支持如下类型：

字符串: text, keyword（注：5之前的版本里有string类型，5之后不再支持此类型）
数字: byte, short, integer, long, float, double
布尔型:boolean
日期: date
复杂类型：如object, nested等

9、查看映射

输入

GET blog/_mapping

可以查看blog索引下的所有字段映射关系。

10、默认映射

在创建索引存入数据时，如果只创建一个不指定任务mappging信息的索引，es会自动根据实际数据为其添加类型。
例如，通过下面的语句插入文档：

POST blog/_doc/1
{
  "id": 1,
  "name": "什么样的人最容易犯罪？",
  "content":"脸上有马赛克的人",
  "type":"科普",
  "score": 96
}

然后查看映射，结果为：

{
  "blog" : {
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "score" : {
          "type" : "long"
        },
        "type" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

可见，es自动根据类型对字段进行了映射。

11、设置映射

在创建索引时，可以设置映射规则，具体格式形如上面查看映射时的返回结果。

PUT blog
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "long"
      },
      "name": {
        "type": "text"
      },
      "type": {
        "type": "keyword"
      },
      "score": {
        "type": "double"
      }
    }
  }
}

为text类型的字段会被进行分词，然后索引，而keyword字段不会被分词。

自动转换

创建索引和映射后，插入文档时，字段会自动转换成映射中规定的类型。比如，插入"123"到integer字段，会自动尝试对字符串进行类型转换。如果无法转换，则会报错，无法插入。

二、写入文档

一个“文档”即所谓的一条记录。可对文档进行增删改操作。

1、插入文档（指定ID）

可以指定文档id，即 PUT index_name/type_name/id。

POST blog/_doc/1
{
  "id": 1,
  "name": "什么样的人最容易犯罪？",
  "content":"脸上有马赛克的人",
  "type":"科普",
  "score": 96
}

返回：

{
  "_index" : "blog",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

2、插入文档（不指定ID）

也可不指定id，则会自动分配id。注意这里要使用POST方式。

POST blog/_doc
{
  "id": 1,
  "name": "什么样的人最容易犯罪？",
  "content":"脸上有马赛克的人",
  "type":"科普",
  "score": 96
}

{
  "_index" : "blog",
  "_type" : "_doc",
  "_id" : "qTU5YH0BmoTpfuqF6vI9",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

3、查看文档

只需通过GET方式查看，

GET blog/_doc/1?pretty=true

返回文档信息：

{
  "_index" : "blog",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 3,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "id" : 1,
    "name" : "什么样的人最容易犯罪？",
    "content" : "脸上有马赛克的人",
    "type" : "科普",
    "score" : 96
  }
}

可以只查看_source中的部分字段：

GET blog/_doc/1?_source=name,content

{
  "_index" : "blog",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 3,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "什么样的人最容易犯罪？",
    "content" : "脸上有马赛克的人"
  }
}

4、修改文档

一种是通过PUT的全覆盖方式，旧数据将被删除，以新的代替。

POST blog/_doc/1
{
  "id": 1,
  "name": "什么样的人最容易犯罪？",
  "content":"脸上有马赛克的人，和不要脸的人",
  "type":"科普",
  "score": 96
}

{
  "_index" : "blog",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 5,
  "_primary_term" : 1
}

另一种是通过POST方式，只对部分字段进行修改。

POST blog/_doc/1/_update
{
  "doc": {
    "name": "什么样的人最容易犯罪？",
    "content": "脸上有马赛克的人和容易犯罪的人",
    "type": "科普",
    "score": 96
  }
}

5、删除文档

1、通过DELETE方式可删除文档：

DELETE blog/_doc/2

6、mget取回多个文档

可参考：https://www.elastic.co/guide/...

通过将查询合并，可以减少连接次数，提高效率。

GET _mget
{
   "docs" : [
      {
         "_index" : "blog",
         "_id" :    1
      },
      {
         "_index" : "blog",
         "_id" :    2
      }
   ]
}

返回两个文档：

GET _mget
{
   "docs" : [
      {
         "_index" : "blog",
         "_id" :    1
      },
      {
         "_index" : "blog",
         "_id" :    2
      }
   ]
}

返回结果

{
  "docs" : [
    {
      "_index" : "blog",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 4,
      "_seq_no" : 9,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "id" : 1,
        "name" : "什么样的人最容易犯罪？",
        "content" : "脸上有马赛克的人",
        "type" : "科普",
        "score" : 96
      }
    },
    {
      "_index" : "blog",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 2,
      "_seq_no" : 10,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "id" : 2,
        "name" : "第一个发现牛奶能喝的人，究竟对牛做了什么？",
        "content" : "我也不知道～但是你这个思路如此优秀",
        "type" : "科普",
        "score" : 96
      }
    }
  ]
}

还可进行简写，比如，index和type都相同，查找两个id，可以写作：

GET blog/_doc/_mget
{
  "ids":["1", "2"]
}

7、bulk批量写入数据

bulk API 允许在单个步骤中进行多次 create 、 index 、 update 或 delete 请求。这里只讲 index

详细参考：https://www.elastic.co/guide/...

bulk批量操作的请求比较特殊，格式为：

{ action: { metadata }}n
{ request body }n
{ action: { metadata }}n
{ request body }n ...

一般两行为一条请求，第一行说明操作和元数据，第二行是操作数据。不过delete请求只有一行。

POST _bulk
{"index":{"_index":"blog","_id":"3"}}
{"name":"生蚝熟了之后还是生蚝吗？","content":"还是生蚝","type":"科普","score":96}
{"index":{"_index":"blog","_id":"4"}}
{"name":"如果猪肾虚的话，吃猪腰子还补吗？","content":"会更虚吧","type":"科普","score":96}
{"index":{"_index":"blog","_id":"5"}}
{"name":"如果猪肾虚的话，吃猪腰子还补吗？","content":"第七次","type":"科普","score":96}

返回结果会列出每个请求的处理状态。

POST _bulk
{"index":{"_index":"blog","_id":"3"}}
{"name":"生蚝熟了之后还是生蚝吗？","content":"还是生蚝","type":"科普","score":96}
{"index":{"_index":"blog","_id":"4"}}
{"name":"如果猪肾虚的话，吃猪腰子还补吗？","content":"会更虚吧","type":"科普","score":96}
{"index":{"_index":"blog","_id":"5"}}
{"name":"如果猪肾虚的话，吃猪腰子还补吗？","content":"第七次","type":"科普","score":96}

返回结果

{
  "took" : 19,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "blog",
        "_type" : "_doc",
        "_id" : "3",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 11,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "blog",
        "_type" : "_doc",
        "_id" : "4",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 12,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "blog",
        "_type" : "_doc",
        "_id" : "5",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 2,
          "failed" : 0
        },
        "_seq_no" : 13,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

通过以上操作，可以将数据以一定的组织方式，写入到es中。下一篇将总结如何进行搜索和查找。

王stone

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch 入门到精通-Elasticsearch数据写入

创建索引通过以下命令可创建一个索引：PUT job{ "settings":{ "index":{ "number_of_shards":5, "number_of_replicas":1 } }}返回：{ "acknowledged": true, "shards_acknowledged": true}Elasticsearch 是利用分片将数据分发到集群内各处的。分片是数据的容器，文档保存在分片内，分片又被分配到集
复制链接

扫一扫

专栏目录