Elasticsearch5.X索引

最新推荐文章于 2024-05-10 07:30:00 发布

W不懂

最新推荐文章于 2024-05-10 07:30:00 发布

阅读量732

点赞数

分类专栏： Elasticsearch 文章标签： elasticsearch 索引

本文链接：https://blog.csdn.net/u013571608/article/details/78480221

版权

Elasticsearch 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Elasticsearch5.X索引说明

主要内容

这里只做最基本的说明，不深入讨论，只说明实用的东西
索引和类型及属性都是可以建立别名，具体见：https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html
1. 索引数据类型说明
2. 索引基本属性说明
3. 索引示例
4. 索引操作

基本数据类型

这里只说常用的类型，5.X与1.X类型改进了一些
官方说明：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
1. string类型：5.X没有这个类型，在5.X以下版本才会有string类型，在5.X中string类型被拆分为text和keyword类型
  1.1. text类型：文本块类型，用于存放文章的正文信息，可以理解为存放大内容的文本数据类型，可对数据进行分词检索
  1.2. keyword类型：文本关键信息类型，用于存放文章中的关键信息，如姓名，手机号，邮箱地址等，不可对此类型数据进行分词
2. 数值类型：用于存放整数与小数，包含以下类型：long, integer, short, byte, double, float, half_float, scaled_float，这些类型都比较熟悉，就不多说了，不同的只是取值范围
3. date类型：日期时间类型，用于存放日期与时间，可接受多种日期时间格式输入
  3.1 友好化的日期时间格式输入，如：2015-10-10 或 2015-10-10 16:15:10
  3.2 iso 8601标准日期时间格式输入，如：2015-01-01T12:10:30Z
  3.3 时间截格式输入，如：1420070400001
  3.4 自定义日期时间输出格式，如在索引中加入：”format”: “yyyy-MM-dd HH:mm:ss”，即在索引中保存的日期时间会以定义的格式输出
4. boolean类型：布尔类型，表示真或假，可接受json中的true/false，也可以接受字符串的”true/false”,”off/no”,”1/0”，也可以接受数字1/0来表示真或假值
5. 数值范围类型：可用于表示一个范围的数字，包含对日期时间，IP范围的表示，具体见https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html
6. array类型：列表类型，支持将多个数据以列表形式存储，如：[ “one”, “two” ]，[ 1, [ 2, 3 ]]
7. object类型：对象类型，支持直接将一个json数据对象化展开
8. geo_point类型：表示经纬度数据类型，通过两个坐标来标示一个位置点，可接受多种格式数据输入
  8.1. 数字格式输入：{“lat”:41.12,”lon”:-71.34}
  8.2. 字符格式输入：”41.12,-71.34”
  8.3. geohash：哈希格式输入，如：”drm3btev3e86”
  8.4. 列表格式输入：[ -71.34, 41.12 ]
9. ip类型：支持ipv4和ipv6格式的地址

索引属性

index属性

用于设置索引特性的一些属性，如索引的数据分块，备份等
1. number_of_shards属性：指定索引数据的分块数量，默认为5，即对索引数据分为5个数据块存储。如果集群机器结点多，分块越多是越好的，因为数据分布更平均，但一个结点的数据块也不能过多，最好3个左右
  注意：这个值必须在创建索引的时候指定，指定后是不能修改，但5.X版本是可以通过接口减少分片数量，而不能扩大分片数量
2. number_of_replicas属性：指定数据块备份的数量，默认为1，即对索引数据块进行一个备份。如果对数据要求比较高，建议至少有一个备份，elk会将备份的数据块平均分布到不同的结点，这样如果某一结点丢失，备份数据还能继搜索，当然也不能100%保证，要看备份数据块的分布。但备份过多，对磁盘消耗会过大。
  注意：这个值是可以通过接口改变的，所以不用太在意
3. 更多索引属性: 如果想设置更多索引属性，见https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html
  其中有限制数据查询条目数量的属性：max_rescore_window等

mappings属性

用于索引的属性描述
1. type: 索引类型，一个索引可以包含多个类型，类型的名称与内容都是自定义的，只有类型中才可以包含索引的数据属性
  具体见：http://www.bayescafe.com/database/elasticsearch-using-index-or-type.html
2. properties: 索引type中的具体内容定义

analyzer分词属性

用于数据分词定义
1. analyzer: ik_max_word
  ik_max_word: 会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合
  ik_smart: 会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”

索引示例

完整索引示例

{
    "settings" : {
        "index" : {
            //属性设置，定义索引分30个片，不做备份
            //注意：只有部分属性是可以创建后修改的
            "number_of_shards" : 30,
            "number_of_replicas" : 0
        }
    },
    "mappings": {
        //hhscan是索引的类型
        "hhscan": {
            //索引具体数据属性定义
            //注意：数据属性定义创建后是不可修改的
            "properties": {
                //定义日期时间数据，类型为时期时间，用于保存时间时间数据，数据属性名称是自定义
                "date": {
                    "type": "date"
                },
                //定义IP数据，类型为IP，用于保存IP数据
                "redunip": {
                    "type": "ip"
                },
                "otherInfo": {
                    "analyzer": "ik_max_word",
                    "search_analyzer": "ik_max_word",
                    "type": "text"
                },
                "geoip": {
                    "type": "geo_point"
                },
                "target": {
                    "type": "keyword"
                },
                "portType": {
                    "type": "keyword"
                },
                "port": {
                    "type": "integer"
                }
            }
        }
    }
}

索引操作

检查索引是否存在

这里使用python接口来检查索引，同时提供官网示例
官网示例: http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-exists.html

from elasticsearch import Elasticsearch
from elasticsearch.client import IndicesClient
# 连接ELK
elkClient = Elasticsearch(["http://ip:9200"])
elkIndex = IndicesClient(elkClient)
# 检查索引是否存在
# exists返回false表示索引不存在，返回true表示索引存在
test = elkIndex.exists(index="test_test_test")
print test

索引创建

这里使用python接口来创建索引，同时提供官网示例
官网示例: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html

from elasticsearch import Elasticsearch
from elasticsearch.client import IndicesClient
# 连接ELK
elkClient = Elasticsearch(["http://ip:9200"])
elkIndex = IndicesClient(elkClient)
# 索引定义
mapp = {
    "settings" : {
        "index" : {
            "number_of_shards" : 5,
            "number_of_replicas" : 0
        }
    },
    "mappings": {
        "hhscan": {
            "properties": {
                "date": {
                    "type": "date"
                },
                "redunip": {
                    "type": "ip"
                }
            }
        }
    }
}
# 创建索引，index为索引名称
test = elkIndex.create(index="test_test_test", body=mapp)
print json.dumps(test)
# 成功返回结果：{"index": "test_test_test", "acknowledged": true, "shards_acknowledged": true}

索引修改

这里使用python接口来修改索引，同时提供官网示例
注意：只能修改部分索引的设置属性，不能对索引的数据属性定义进行修改，5.X可以重构索引，其原理也是重建索引然后复制数据，只是提供了官方接口，方便重构
官网示例: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html

from elasticsearch import Elasticsearch
from elasticsearch.client import IndicesClient
# 连接ELK
elkClient = Elasticsearch(["http://ip:9200"])
elkIndex = IndicesClient(elkClient)
# 要修改的属性
attUpdate = {
    "index" : {
        "number_of_replicas" : 1
    }
}
# 修改索引属性
elkIndex.put_settings(body=attUpdate,index="test_test_test")
# 成功返回结果：{"acknowledged": true}

索引删除

这里使用python接口来删除索引，同时提供官网示例
官网示例: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html

from elasticsearch import Elasticsearch
from elasticsearch.client import IndicesClient
# 连接ELK
elkClient = Elasticsearch(["http://ip:9200"])
elkIndex = IndicesClient(elkClient)
# 删除索引
elkIndex.delete(index="test_test_test")
# 成功返回结果：{"acknowledged": true}

索引复制

官方有提供接口来复制索引，使用官方提供的高级接口操作会更简单
官网示例: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

from elasticsearch import Elasticsearch
from elasticsearch import helpers
# 连接ELK
elkClient = Elasticsearch(["http://ip:9200"])
# 复制一个索引
test = helpers.reindex(elkClient,source_index="test_test_test",target_index="test_test_test_aaa")
print json.dumps(test)

其它索引操作

见官方文档的Indices部分
官方文档地址：http://elasticsearch-py.readthedocs.io/en/master/api.html#indices
注意： 可以通过mapping的选项来做很多事情，如：通过fields选项可以为一个属性定义多个类型，见https://www.elastic.co/guide/en/elasticsearch/reference/5.6/multi-fields.html

W不懂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch5.X索引

Elasticsearch5.X索引说明主要内容这里只做最基本的说明，不深入讨论，只说明实用的东西索引和类型及属性都是可以建立别名，具体见：https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html 索引数据类型说明索引基本属性说明索引示例索引操作基本数据类型这里只
复制链接

扫一扫