ElasticSearch学习（九）--核心概念及基本操作

最新推荐文章于 2021-05-28 10:11:08 发布

dicklong91

最新推荐文章于 2021-05-28 10:11:08 发布

阅读量213

点赞数

分类专栏： java 文章标签： elasticsearch es

原文链接：https://blog.csdn.net/qq_23536449/article/details/91047479

版权

java 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

转载自:https://blog.csdn.net/chengyuqiang/column/info/18392，ES版本号6.3.0
转载自:https://blog.csdn.net/qq_23536449/article/details/91047732

核心概念解读
索引：
索引(index)是ElasticSearch存放具体数据的地方，是一类具有相似特征的文档的集合。ElasticSearch中索引的概念具有不同意思，这里的索引相当于关系数据库中的一个数据库实例。在ElasticSearch中索引还可以作为动词，表示对数据进行索引操作。

类型：
在6.0之前的版本，一个ElasticSearch索引中，可以有多个类型；从6.0版本开始，，一个ElasticSearch索引中，只有1个类型。一个类型是索引的一个逻辑上的分类，通常具有一组相同字段的文档组成。ElasticSearch的类型概念相当于关系数据库的数据表。

文档：
文档是ElasticSearch可被索引的基础逻辑单元，相当于关系数据库中数据表的一行数据。ElasticSearch的文档具有JSON格式，由多个字段组成，字段相当于关系数据库中列的概念。

分片：
当数据量较大时，索引的存储空间需求超出单个节点磁盘容量的限制，或者出现单个节点处理速度较慢。为了解决这些问题，ElasticSearch将索引中的数据进行切分成多个分片（shard），每个分片存储这个索引的一部分数据，分布在不同节点上。当需要查询索引时，ElasticSearch将查询发送到每个相关分片，之后将查询结果合并，这个过程对ElasticSearch应用来说是透明的，用户感知不到分片的存在。一个索引的分片一定指定，不再修改。

副本：
其实，分片全称是主分片，简称为分片。主分片是相对于副本来说的，副本是对主分片的一个或多个复制版本（或称拷贝），这些复制版本（拷贝）可以称为复制分片，可以直接称之为副本。当主分片丢失时，集群可以将一个副本升级为新的主分片。

创建索引：
（1）简单方式

PUT test

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "test"
}

（2）索引名不能包含大写字母

PUT Test

{
  "error": {
    "root_cause": [
      {
        "type": "invalid_index_name_exception",
        "reason": "Invalid index name [Test], must be lowercase",
        "index_uuid": "_na_",
        "index": "Test"
      }
    ],
    "type": "invalid_index_name_exception",
    "reason": "Invalid index name [Test], must be lowercase",
    "index_uuid": "_na_",
    "index": "Test"
  },
  "status": 400
}

（3）重复创建

PUT test

{
  "error": {
    "root_cause": [
      {
        "type": "resource_already_exists_exception",
        "reason": "index [test/WC6GvUh1RTm1lKWfSURzTA] already exists",
        "index_uuid": "WC6GvUh1RTm1lKWfSURzTA",
        "index": "test"
      }
    ],
    "type": "resource_already_exists_exception",
    "reason": "index [test/WC6GvUh1RTm1lKWfSURzTA] already exists",
    "index_uuid": "WC6GvUh1RTm1lKWfSURzTA",
    "index": "test"
  },
  "status": 400
}

（4）指定参数

PUT blog
{ 
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  } 
}

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "blog"
}

查看索引：
（1）查看指定索引的配置信息

GET blog/_settings

{
  "blog": {
    "settings": {
      "index": {
        "creation_date": "1547090371599",
        "number_of_shards": "3",
        "number_of_replicas": "1",
        "uuid": "xwD2Y5k3TXOyNTJQ2lBitw",
        "version": {
          "created": "6030099"
        },
        "provided_name": "blog"
      }
    }
  }
}

（2）查看多个索引

GET blog,test/_settings

{

  "blog": {
    "settings": {
      "index": {
        "creation_date": "1547090371599",
        "number_of_shards": "3",
        "number_of_replicas": "1",
        "uuid": "xwD2Y5k3TXOyNTJQ2lBitw",
        "version": {
          "created": "6030099"
        },
        "provided_name": "blog"
      }
    }
  },
  "test": {
    "settings": {
      "index": {
        "creation_date": "1547090207687",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "WC6GvUh1RTm1lKWfSURzTA",
        "version": {
          "created": "6030099"
        },
        "provided_name": "test"
      }
    }
  }
}

（3）删除索引

DELETE test

{
  "acknowledged": true
}

索引的打开与关闭
（1）关闭索引

POST blog/_close

{
  "acknowledged": true
}

新建文档：index/type/id
（1）一般格式

PUT blog/csdn/1
{
  "id":1,
  "title":"Elasticsearch简介",
  "author":"chengyuqiang",
  "content":"Elasticsearch是一个基于Lucene的搜索引擎"
}

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 3
}

继续添加一条数据

POST blog/csdn/2
{
  "id":2,
  "title":"Git简介",
  "author":"chengyuqiang",
  "content":"Git是一个版本控制软件"
}

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "2",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 3
}

（2）未指定文档id

POST blog/csdn
{
  "id":3,
  "title":"Java编程",
  "author":"chengyuqiang",
  "content":"Java面向对象程序设计"
}

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "i9DSO2gBBk_Lv-BZu9bh",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 3
}

获取文档:
（1）获取已存在文档

GET blog/csdn/1

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "id": 1,
    "title": "Elasticsearch简介",
    "author": "chengyuqiang",
    "content": "Elasticsearch是一个基于Lucene的搜索引擎"
  }
}

（2）获取不存在文档

GET blog/csdn/100

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "100",
  "found": false
}

（3）Head命令查看文档是否存在

HEAD blog/csdn/1

200 - OK

HEAD blog/csdn/100

404 - Not Found
（4）批量获取文档

GET blog/csdn/_mget
{
  "ids":["1","2"]
}

返回结果

{
  "docs": [
    {
      "_index": "blog",
      "_type": "csdn",
      "_id": "1",
      "_version": 1,
      "found": true,
      "_source": {
        "id": 1,
        "title": "Elasticsearch简介",
        "author": "chengyuqiang",
        "content": "Elasticsearch是一个基于Lucene的搜索引擎"
      }
    },
    {
      "_index": "blog",
      "_type": "csdn",
      "_id": "2",
      "_version": 1,
      "found": true,
      "_source": {
        "id": 2,
        "title": "Git简介",
        "author": "chengyuqiang",
        "content": "Git是一个版本控制软件"
      }
    }
  ]
}

多文档搜索
这里介绍一下简单的文档检索操作，后面章节详细介绍
（1）检索全部文档

GET blog/_search

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "2",
        "_score": 1,
        "_source": {
          "id": 2,
          "title": "Git简介",
          "author": "chengyuqiang",
          "content": "Git是一个版本控制软件"
        }
      },
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": 1,
          "title": "Elasticsearch简介",
          "author": "chengyuqiang",
          "content": "Elasticsearch是一个基于Lucene的搜索引擎"
        }
      },
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "i9DSO2gBBk_Lv-BZu9bh",
        "_score": 1,
        "_source": {
          "id": 3,
          "title": "Java编程",
          "author": "chengyuqiang",
          "content": "Java面向对象程序设计"
        }
      }
    ]
  }
}

（2）term查询
term查询用于查找指定字段中包含指定分词的文件，只有当查询分词和文档中的分词精确匹配时才被检索到

GET blog/_search
{
  "query": {
    "term": {
      "title": "程"
    }
  }
}

由于未使用IK中文分词，每个汉字被看做独立的一个词。

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "i9DSO2gBBk_Lv-BZu9bh",
        "_score": 0.6931472,
        "_source": {
          "id": 3,
          "title": "Java编程",
          "author": "chengyuqiang",
          "content": "Java面向对象程序设计"
        }
      }
    ]
  }
}

当查询“程序”时，title字段中找不到这样的分词，默认汉字被分为单字词

GET blog/_search
{
  "query": {
    "term": {
      "title": "程序"
    }
  }
}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

（3）terms查询
查询文档汇总包含多个词的文档

GET blog/_search
{
  "query": {
    "terms": {
      "title": ["java","git"]
    }
  }
}

注意，经过分词后英文单词变成了小写，比如“Java”词项变成了"java"

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "2",
        "_score": 1,
        "_source": {
          "id": 2,
          "title": "Git简介",
          "author": "chengyuqiang",
          "content": "Git是一个版本控制软件"
        }
      },
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "i9DSO2gBBk_Lv-BZu9bh",
        "_score": 1,
        "_source": {
          "id": 3,
          "title": "Java编程",
          "author": "chengyuqiang",
          "content": "Java面向对象程序设计"
        }
      }
    ]
  }
}

（4）match查询
与term精确查询不同，对于match查询，只要被查询字段中存在任何一个词项被匹配，就会搜索到该文档

GET blog/_search

{
  "query": {
    "match": {
      "title": {
        "query": "程序"
      }
    }
  }
}

{
  "took": 30,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "i9DSO2gBBk_Lv-BZu9bh",
        "_score": 0.6931472,
        "_source": {
          "id": 3,
          "title": "Java编程",
          "author": "chengyuqiang",
          "content": "Java面向对象程序设计"
        }
      }
    ]
  }
}

更新文档
（1）更新数据
文档在Elasticsearch中是不可变的，不能修改。如果我们需要修改文档，Elasticsearch实际上重建新文档替换掉旧文档。

POST blog/csdn/2
{
  "id":2,
  "title":"Git简介",
  "author":"hadron",
  "content":"Git是一个分布式版本控制软件"
}

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "2",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 3
}

注意：

1.版本加1
2.created标识为false,因为同索引类型下已经存在同ID的文档
3.在ES内部,_version为1的文件已经被标记删除，并添加了一个完整的新文档。旧文档不会立即消失，但是不能再访问他。
再次查询

GET blog/csdn/2

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "2",
  "_version": 2,
  "found": true,
  "_source": {
    "id": 2,
    "title": "Git简介",
    "author": "hadron",
    "content": "Git是一个分布式版本控制软件"
  }
}

（2）更新字段

POST blog/csdn/2/_update
{
  "script": {
    "source": "ctx._source.content=\"Git是一个开源的分布式版本控制软件\"" 
  }
}

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "2",
  "_version": 3,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_term": 4
}

查看更新后的文档

GET blog/csdn/2

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "2",
  "_version": 3,
  "found": true,
  "_source": {
    "id": 2,
    "title": "Git简介",
    "author": "hadron",
    "content": "Git是一个开源的分布式版本控制软件"
  }
}

（3）添加新字段

POST blog/csdn/1/_update
{
  "script": "ctx._source.posttime=\"2018-01-09\""
}

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "1",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_term": 4
}

查询更新后的文档

GET blog/csdn/1

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "1",
  "_version": 2,
  "found": true,
  "_source": {
    "id": 1,
    "title": "Elasticsearch简介",
    "author": "chengyuqiang",
    "content": "Elasticsearch是一个基于Lucene的搜索引擎",
    "posttime": "2018-01-09"
  }
}

发现版本参数_version已经加1
（4）查询更新

POST blog/_update_by_query
{
   "script": {
    "source": "ctx._source.category=params.category",
    "lang":"painless",
    "params":{"category":"git"}
  },
  "query":{
    "term": {"title":"git"}
  }
}

{
  "took": 470,
  "timed_out": false,
  "total": 1,
  "updated": 1,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

删除文档

DELETE blog/csdn/1

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "1",
  "_version": 3,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 3,
  "_primary_term": 4

文档路由：
（1）指定路由

PUT blog/csdn/3?routing=chengyuqiang
{
  "id":3,
  "title":"Java简介",
  "author":"chengyuqiang",
  "content":"Oracle Java"
}

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "3",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 4
}

（2）根据路由查询

GET blog/_search?routing=chengyuqiang

{
  "took": 36,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "blog",
        "_type": "csdn",
        "_id": "3",
        "_score": 1,
        "_routing": "chengyuqiang",
        "_source": {
          "id": 3,
          "title": "Java简介",
          "author": "chengyuqiang",
          "content": "Oracle Java"
        }
      }
    ]
  }
}

（3）删除

 DELETE blog/csdn/3

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "3",
  "_version": 1,
  "result": "not_found",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 4,
  "_primary_term": 4
}

再次删除，带上错误路由

DELETE blog/csdn/3?routing=hadron

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "3",
  "_version": 2,
  "result": "not_found",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 5,
  "_primary_term": 4
}

正确的删除

DELETE blog/csdn/3?routing=chengyuqiang

{
  "_index": "blog",
  "_type": "csdn",
  "_id": "3",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 1,
  "_primary_term": 4
}

Mapping概述
前文已经把ElasticSearch的核心概念和关系数据库做了一个对比，索引(index)相当于数据库，类型(type)相当于数据表，映射Mapping相当于数据库表的表结构。ElasticSearch中的映射（Mapping）用来定义一个文档，可以定义所包含的字段以及字段类型、分词器及属性等。
映射可以分为动态映射和静态映射
（1）动态映射
我们知道，在关系数据库中，需要事先创建数据库，然后在该数据库实例下创建数据表，然后才能在该数据表中插入数据。而ElasticSearch中不需要实现定义映射（Mapping），文档写入ElasticSearch时，会根据字段自动识别类型，这种机制称为动态映射。
（2）静态映射
当然，在ElastIcSearch中也可以事先定义好映射，包含文档的各个字段及其类型等，这种方式称之为静态映射。

动态映射实例
（1）新建索引

PUT book

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "book"
}

（2）查看空mapping

GET book/_mapping

{
  "book": {
    "mappings": {}
  }
}

（3）插入文档
it类型标识IT类书籍

PUT book/it/1
{
  "bookId":1,
  "bookName":"Java程序设计",
  "publishDate":"2018-01-12"
}

{
  "_index": "book",
  "_type": "it",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

（4）再次查看映射

GET book/_mapping

{
  "book": {
    "mappings": {
      "it": {
        "properties": {
          "bookId": {
            "type": "long"
          },
          "bookName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "publishDate": {
            "type": "date"
          }
        }
      }
    }
  }
}

（5）解读
bookId字段推测为long型，bookName字段推测为text类型，publishDate字段推测为date类型，这些推测都是我们可以接受的。可见ElasticSearch的动态映射十分强大。

动态映射规则
动态映射可以帮助我们在创建索引后直接将文档数据写入ElasticSearch，让我们尽快享受到ElasticSearch检索功能。在实际项目中，如果在导入数据前不能确定包含哪些字段或者不方便确定字段类型，可以使用动态映射。当向ElasticSearch写入一个新文档时，需要一个之前没有的字段，会通过动态映射来推断该字段类型。
在这里插入图片描述

静态映射
动态映射的自动类型推测功能并不是100%正确的，这就需要静态映射机制。静态映射与关系数据库中创建表语句类似，需要事先指定字段类型。相对于动态映射，静态映射可以添加更加详细字段类型、更精准的配置信息等。
（1）新建映射
在es6.x中创建的索引只允许每个索引有单一类型。任何名字都可以用于这个类型，但是只能有一个。

PUT books
{
  "mappings":{
    "it": {
       "properties": {
          "bookId": {"type": "long"},
          "bookName": {"type": "text"},
          "publishDate": {"type": "date"}
       }
    }
  }
}

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "books"
}

（2）查看映射

GET books/_mapping

{
  "books": {
    "mappings": {
      "it": {
        "properties": {
          "bookId": {
            "type": "long"
          },
          "bookName": {
            "type": "text"
          },
          "publishDate": {
            "type": "date"
          }
        }
      }
    }
  }
}

（3）插入文件数据

PUT books/it/1
{
  "bookId":"1",
  "bookName":"Java",
  "publishDate":"2018-01-12"
}

{
  "_index": "books",
  "_type": "it",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

（4）检索

GET books/it/1

{
  "_index": "books",
  "_type": "it",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "bookId": "1",
    "bookName": "Java",
    "publishDate": "2018-01-12"
  }
}

静态+动态

PUT books/it/2
{
  "bookId":"2",
  "bookName":"Hadoop",
  "author":"chengyuqiang",
  "publishDate":"2018-01-13"
}

{
  "_index": "books",
  "_type": "it",
  "_id": "2",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

查看Mapping

GET books/_mapping

{
  "books": {
    "mappings": {
      "it": {
        "properties": {
          "author": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "bookId": {
            "type": "long"
          },
          "bookName": {
            "type": "text"
          },
          "publishDate": {
            "type": "date"
          }
        }
      }
    }
  }
}

Type设计失误
类型是Elasticsearch的一个设计失误，6.0开始后面的版本将不再支持，官方说明请参见
（1）为什么映射类型被删除？
一开始，我们谈到一个 ES的索引类似于关系型数据库中的数据库，一个映射类型则相当于关系型数据库中的一张表。
这是一个错误的类比，导致了错误的假设。在一个关系型数据库中，表之间是相互独立的。一个表中的列与另一个表中同名的列没有关系。然而在映射类型中却不是这样的。
在一个Elasticsearch的索引中，有相同名称字段的不同映射类型在Lucene内部是由同一个字段支持的。换言之，看下面的这个例子，user 类型中的 user_name字段和tweet类型中的user_name字段实际上是被存储在同一个字段中，而且两个user_name字段在这两种映射类型中都有相同的定义（如类型都是 text或者都是date）。
这会导致一些问题，比如，当你希望在一个索引中的两个映射类型，一个映射类型中的 deleted 字段映射为一个日期数据类型的字段，而在另一个映射类型中的deleted字段映射为一个布尔数据类型的字段，这就会失败。
最重要的是，在一个索引中存储那些有很少或没有相同字段的实体会导致稀疏数据，并且干扰Lucene有效压缩文档的能力。
基于这些原因，我们决定从Elasticsearch中删除映射类型的概念。

下一篇：ElasticSearch学习（十）–字段类型概述